Leveraging Scalable Data Fusion to Enhance Customer Base Predictions

Research Talk for May 1, 2020


Leveraging Scalable Data Fusion to Enhance Customer Base Predictions


Daniel McCarthy, Emory University


May 1, 2020


UW Bothell main campus
Building: UWBB
Room: 240


When modeling the acquisition and retention behavior of a firm’s customer base, some analysts rely upon aggregated data provided by the firm, while others rely upon granular panel data provided by third parties, such as credit card panels. Leveraging both sources of data simultaneously could allow for better predictions and richer customer base insights. However, existing approaches for aggregate-disaggregate data fusion are difficult to use in this context for several reasons: the panel may not be representative of the customer base as a whole, both data sources may suffer from missingness, and the target population may be very large as it represents all potential customers of a company. We propose an aggregate-disaggregate data fusion method which is computationally scalable to massive populations and allows for censored and/or truncated, non-representative panel data. We apply our method to data from Spotify, a music streaming service. By incorporating credit card panel data with our data fusion method, we obtain better predictions and richer insights than prior work that only made use of aggregate data. In particular, we predict future aggregated metrics more accurately and separate out the initial versus repeat behavior of customers, providing deeper insight into acquisition and retention dynamics driving growth.


Download paper (PDF)