Quantifying business impact of previous investment...

Christian_Berg · ‎01-04-2017

Missed the previous parts of this series? See Become your organization’s strategic advisor by using Machine Learning and Power BI

Estimating the actual impact of an investment is an essential part of management. An investment, in this context, can be any change in resource allocation but is most often a project or marketing initiative. By quantifying the major benefits realized we obtain an objective measure of the outcome. This is useful when discussing lessons learnt and often invaluable input the next time a similar investment is considered. From a business intelligence perspective too, there is great value in comparing the actual results with the initial projection/forecast to understand limitations of existing models, and to prioritize opportunities for improvement. In addition to the aforementioned, rigorous analytics promotes a learning culture with end to end accountability. It is an essential tool for continuous improvement.

Benefit realization analysis typically happens after all effects of the investment have been documented. If the investment for instance was a mailed coupon campaign these coupons probably have an expiration date. Assuming the coupons are valid for at most a couple of months, and not years, it would make sense to perform this post mortem soon after their expiration.

Imagine that we made an investment some time ago. That investment may or may not have yielded some return already and there may or may not be an expectation of future benefits. In this post, I will just cover the estimation of benefits that have already happened.

There are many approaches to quantifying actual benefit realization or return on investment to date. The common principle is comparing an estimate of what would have happened without the investment, with the actual outcome. The actual outcome is typically well documented so the challenge is modelling what the world would have looked like had we not made the investment when we did. The complexity of this task ranges from trivial to insurmountable. The easiest scenarios are when the status quo (no investment or change) would have yielded no return, i.e. new market entry, without which we would not have had any revenue from that area. The other extreme is if an investment coincided with an unprecedented event that probably had a large impact. That is difficult because we don’t have data for unprecedented events. We then also need to normalize for the other event which may require stochastic models or other qualitative input.

The approach that we will use here will be to estimate what would have happened by automatically identifying good proxies for the business that we invested in. Proxies are typically other geographic areas, product categories, market indexes or competitors. The best proxy reacts to externalities in a similar way to the business that we invested in but is insulated from the impact of the investment. For instance, a company making a large ad hoc investment in billboard advertising in California may find that the algorithm identifies Florida as a good proxy market. Florida may have a slightly earlier season and be a smaller market for our organization but if a shift of a couple of weeks give the two markets high correlation the algorithm will correctly show Florida as a good proxy.

The only question then that remains for the analyst is to reason about cross influence. It is important that your proxy market is not impacted by the investment since that will bias the estimate of what would have been. If the marketing investment in California increased sales in Florida too, then this analysis would underestimate the return of the investment since it would assume Florida was unaffected, thereby overestimating what would have been, had the investment been absent. For example, assume Florida’s season starts 6 weeks before California’s and it generates half the revenue. We invest in a marketing campaign it in California that increases sales by 10%. If that same investment increased sales in Florida by 2% (maybe because a large portion of the Florida population visited California during the campaign), then the algorithm would think the return was only 8% in California, when in fact it was 10% more revenue in California, and 2% more in Florida. In this case we would be better of estimating the impact in both Florida and California using potentially a worse proxy from a correlation perspective that was uninfluenced by the campaign.

The dataset that I’ve used in this example contains 6 products sold by the same company but all with a different target audience. August 1 2016 the company invested $20,000 in a social media campaign for the product with internal id 1. It is now December 2016 and the Chief Marketing Officer have asked us to help her assess the result of the investment.

Since the company has 5 other products with different audiences we assume that the marketing of product 1 had little impact on them. We can therefore try using the other products as proxy and see what that suggests about the impact of the social media marketing.

Below I have plotted the actual (turquoise) and expected (red with gray field) revenue over a three year period. The turquoise line is the observed value (actual revenue for product 1). The red line is an estimate of what the algorithm expected the revenue for product 1 to be, using the results of the other products that it found to be good proxies. If the model is successful in finding good proxies for product 1 (in this case by using the other products) we should see a relatively narrow shaded area around the red line and a small gap between the red and turquoise line. If the investment had a significant impact the two lines will diverge from the dotted line. After the dotted line the red line represents the model’s prediction for what would have happened, had we not made the investment. The shaded area around the expected value represents the 95% confidence interval. If the observed value is above the shaded that means that the positive impact was statistically significant (inversely, if it is below that would be a statistically significant negative impact).

Using the report

For this report to work in Power BI Desktop you need the packages CausalImpact, dtw and MarketMatching installed. If you have not previously used these packages on your computer, install them by going to your R console in R Studio or other R GUI and copy/paste.

install.packages("devtools")
 library(devtools)
 install.packages("bsts")
 install.packages("BoomSpikeSlab")
 install.packages("dtw")
 install.packages("chron")
 devtools::install_github("google/CausalImpact")
 install_github("klarsen1/MarketMatching", build_vignettes=TRUE)

If you run into problems please see the links at the end for additional instructions.

The report has four pages. The first one uses the CausalImpact package to calculate the impact:

The output consists of three charts, the first one showing both the estimated line (dotted with confidence interval shaded) and the actual results (solid line). The second chart shows the difference between the actual and estimated with the 95% confidence interval. And the third chart shows the cumulative difference. I added the three red dotted lines to show the upper, average and lower estimate. By reading the value on the y-axis that corresponds with the upper edge of the confidence interval (shaded area) we get the maximum impact the investment had (approximately $48k). Similarly, the average impact is estimated to $37k and the minimum is $26k. In this example I had hardcoded that we were estimating the impact for product 1 from September 1.

On the second page, “Dynamic selection”, I have instead used the MarketMatching package. Also, instead of hardcoding which product we invested in and when, I’ve made both of those inputs dynamic using slicers. Another difference is that I’ve limited the number of products to use as proxies to two, with the line: matches=2, # request 2 matches. As you can see below the result is higher with less uncertainty ($52k +/-12k).

The last two pages demonstrate alternative visualizations and the difference between the different algorithms.

Links and downloads

For a comprehensive explanation of the methodologies:

Making Causal Impact Analysis Easy

Reasoning about Optimal Stable Matchings under Partial Information

https://google.github.io/CausalImpact/CausalImpact.html#installing-the-package

* Third-party programs. This software enables you to obtain software applications from other sources. Those applications are offered and distributed by third parties under their own license terms. Microsoft is not developing, distributing or licensing those applications to you, but instead, as a convenience, enables you to use this software to obtain those applications directly from the application providers.

By using the software, you acknowledge and agree that you are obtaining the applications directly from the third-party providers and under separate license terms, and that it is your responsibility to locate, understand and comply with those license terms. Microsoft grants you no license rights for third-party software or applications that is obtained using this software.

Quantifying business impact of previous investments

How to provide sample data in the Power BI Forum

Re: Power BI Tip: How To Sort A Column With Duplic...

Enhanced Data Profiling in Power Query: GUI and Ta...

Creating a “custom” or “hybrid” matrix in PowerBI

Supercharge Your Visuals: Easy Conditional Formatt...