Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Microsoft Power BI Community
- Galleries
- Quick Measures Gallery
- Re: Correlation coefficient

06-18-2017 17:07 PM - last edited 04-12-2018 23:44 PM

Daniil

Member

Correlation coefficient

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

06-18-2017
05:07 PM

The quick measure calculates the Pearson correlation coefficient between two measures within the category.

Correlation coefficient

Calculate the Pearson correlation coefficient between two measures within the category

Name: Category

Tooltip: The category in which you want to calculate the correlation coefficient

Type: Categorical field

Name: Measure X

Tooltip: The first measure in a correlation pair

Type: Numerical field / measure

Name: Measure Y

Tooltip: The second measure in a correlation pair

Type: Numerical field / measure

Correlation Coefficient := VAR Correlation_Table = FILTER ( ADDCOLUMNS ( VALUES ( {Category} ), "Value_X", CALCULATE ( {Measure X} ), "Value_Y", CALCULATE ( {Measure Y} ) ), AND ( NOT ( ISBLANK ( [Value_X] ) ), NOT ( ISBLANK ( [Value_Y] ) ) ) ) VAR Count_Items = COUNTROWS ( Correlation_Table ) VAR Sum_X = SUMX ( Correlation_Table, [Value_X] ) VAR Sum_X2 = SUMX ( Correlation_Table, [Value_X] ^ 2 ) VAR Sum_Y = SUMX ( Correlation_Table, [Value_Y] ) VAR Sum_Y2 = SUMX ( Correlation_Table, [Value_Y] ^ 2 ) VAR Sum_XY = SUMX ( Correlation_Table, [Value_X] * [Value_Y] ) VAR Pearson_Numerator = Count_Items * Sum_XY - Sum_X * Sum_Y VAR Pearson_Denominator_X = Count_Items * Sum_X2 - Sum_X ^ 2 VAR Pearson_Denominator_Y = Count_Items * Sum_Y2 - Sum_Y ^ 2 VAR Pearson_Denominator = SQRT ( Pearson_Denominator_X * Pearson_Denominator_Y ) RETURN DIVIDE ( Pearson_Numerator, Pearson_Denominator )

eyJrIjoiMGQ5YzJiYTItZWFiMy00MGI2LTg1NzktYjMwYTU1YjA2N2M3IiwidCI6ImQzMmNkYzNmLTY1NTUtNGNhYy1iYjFhLTg2OWZiMTE0MzRlNSJ9

acanepa

Member

Re: Correlation coefficient

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-15-2017
09:59 AM

Hello @Daniil,

Thanks for this, this a great solution for correlation calculations on DAX.

I have tried the calculation with different measures and I will add the following improvement. In the following line, you could get a negative number.

VAR Pearson_Denominator = SQRT ( Pearson_Denominator_X * Pearson_Denominator_Y )

I suggest this slight modification to run in all scenarios.

VAR Pearson_Denominator = SQRT(ABS( Pearson_Denominator_X * Pearson_Denominator_Y ))

Regards,

numbus

Frequent Visitor

Re: Correlation coefficient

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

11-26-2017
11:01 AM

Hi acanepa

Good programming practice would suggest avoiding a division by zero, however you need to think like a statistician - which can often be counter intuitive!

You actually want the Pearson Coefficient to "fail" when you divide by zero.

See this post for more info https://stackoverflow.com/questions/38548343/pearson-correlation-fails-for-perfectly-correlated-sets

From a DAX point of view the divide function will tolerate a division by zero.

Regards

Graeme

ZanderFick

New Member

Re: Correlation coefficient

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-24-2018
10:33 PM

Very handy addition.

Are there, however, plans to add a measure/some other output feature that will also report on the uncertainty of the Correlation Coefficient calculated for a given series pair (i.e. implementing Fisher's z-transformation and evaluating the confidence interval at difference levels that the user chooses, or just a standard set of levels like 80%, 90 % and 95%)

The risk is that people could state (and frequently do state) correlation coefficients for insufficiently sized samples and derive insights that are actually attributable to noise etc.

Thanks for the awesome work!

Highlighted
##

PowerBI_77

Frequent Visitor

Re: Correlation coefficient

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-22-2019
03:14 AM

Thanks for the formula Daniil. I'm trying to use it to calculate the correlation coefficient on some data I have but I have hit an issue - I only have **two** columns of data, one showing the month name and one showing the volume (a measure).

I.e.

**Month Volume**

Jan-07 1,000

Feb-07 1,613

Mar-07 1,128

etc.

I don't have two measures as per your data nor do I have a column showing a 'category'. How would I be able to carry out the calculation for my data please? Do I need to create a new measure column based on the month name, coverting it into a numerical value? (just a guess). Then I would classify the month column as the 'category'?

Hope this is clear (still classing myself as a Power BI newbie!)

Regards,

Lee

Daniil

Member

Re: Correlation coefficient

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

01-22-2019
02:39 PM

@PowerBI_77 please have a look at the second example in my Simple Linear Regression blog -- you can employ a similar technique here.

Also, in case people still read this -- @acanepa said in a private message there "is not an error of the calculation you created but rather an error on my end to feed the formula with wrong numbers", so please ignore his comment about negative numbers.