topic Re: Correlation coefficient in Quick Measures Gallery
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/982093#M431
<P>Hello everyone,</P>
<P>A little bit stuck on this one... Can someone guide me?I have a data set with the following fields...</P>
<P> </P>
<OL>
<LI>Year</LI>
<LI>Venue</LI>
<LI>Event Length(in days)</LI>
<LI>Average Event revenue per day</LI>
</OL>
<P> </P>
<P>I want to see if there's a correlation between even length and event revenue per day. I thought I could use the year as the category, but I think this might be wrong. Using the Quick Measure, what Category, and 2 measures should I use to establish the hypotheses that "the longer the event, the more/less menoy per day is made"</P>
<P> </P>
<P>Thanks!</P>Thu, 19 Mar 2020 13:04:36 GMTpchapple2020-03-19T13:04:36ZCorrelation coefficient
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/196274#M21
<P>The quick measure calculates the Pearson correlation coefficient <SPAN>between two measures within the category.</SPAN></P>
<P> </P>
<H4 id="toc-hId--1323657085">NAME:</H4>
<P>Correlation coefficient</P>
<P> </P>
<H4 id="toc-hId-419153250">DESCRIPTION:</H4>
<P>Calculate the Pearson correlation coefficient between two measures within the category</P>
<P> </P>
<H4 id="toc-hId--2133003711">PARAMETERS:</H4>
<P>Name: Category</P>
<P>Tooltip: The category in which you want to calculate the correlation coefficient</P>
<P>Type: Categorical field</P>
<H4 id="toc-hId--2133003711"> </H4>
<P>Name: Measure X</P>
<P>Tooltip: The first measure in a correlation pair</P>
<P>Type: Numerical field / measure</P>
<H4 id="toc-hId--2133003711"> </H4>
<P>Name: Measure Y</P>
<P>Tooltip: The second measure in a correlation pair</P>
<P>Type: Numerical field / measure</P>
<H4 id="toc-hId--390193376"> </H4>
<H4 id="toc-hId-1352616959">DAX:</H4>
<PRE>Correlation Coefficient :=
VAR Correlation_Table =
FILTER (
ADDCOLUMNS (
VALUES ( {Category} ),
"Value_X", CALCULATE ( {Measure X} ),
"Value_Y", CALCULATE ( {Measure Y} )
),
AND (
NOT ( ISBLANK ( [Value_X] ) ),
NOT ( ISBLANK ( [Value_Y] ) )
)
)
VAR Count_Items =
COUNTROWS ( Correlation_Table )
VAR Sum_X =
SUMX ( Correlation_Table, [Value_X] )
VAR Sum_X2 =
SUMX ( Correlation_Table, [Value_X] ^ 2 )
VAR Sum_Y =
SUMX ( Correlation_Table, [Value_Y] )
VAR Sum_Y2 =
SUMX ( Correlation_Table, [Value_Y] ^ 2 )
VAR Sum_XY =
SUMX ( Correlation_Table, [Value_X] * [Value_Y] )
VAR Pearson_Numerator =
Count_Items * Sum_XY - Sum_X * Sum_Y
VAR Pearson_Denominator_X =
Count_Items * Sum_X2 - Sum_X ^ 2
VAR Pearson_Denominator_Y =
Count_Items * Sum_Y2 - Sum_Y ^ 2
VAR Pearson_Denominator =
SQRT ( Pearson_Denominator_X * Pearson_Denominator_Y )
RETURN
DIVIDE ( Pearson_Numerator, Pearson_Denominator )</PRE>
<P> </P>
<P> </P>
<P><SPAN class="reportid hidden">eyJrIjoiMGQ5YzJiYTItZWFiMy00MGI2LTg1NzktYjMwYTU1YjA2N2M3IiwidCI6ImQzMmNkYzNmLTY1NTUtNGNhYy1iYjFhLTg2OWZiMTE0MzRlNSJ9</SPAN></P>Fri, 13 Apr 2018 06:44:48 GMThttps://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/196274#M21Daniil2018-04-13T06:44:48ZRe: Correlation coefficient
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/304080#M68
<P>Hello <LI-USER uid="15597"></LI-USER>,</P>
<P> </P>
<P> </P>
<P>Thanks for this, this a great solution for correlation calculations on DAX.</P>
<P> </P>
<P>I have tried the calculation with different measures and I will add the following improvement. In the following line, you could get a negative number.</P>
<P> </P>
<PRE>VAR Pearson_Denominator =
SQRT ( Pearson_Denominator_X * Pearson_Denominator_Y )</PRE>
<P>I suggest this slight modification to run in all scenarios.</P>
<P> </P>
<PRE>VAR Pearson_Denominator =
SQRT(ABS( Pearson_Denominator_X * Pearson_Denominator_Y ))</PRE>
<P> </P>
<P>Regards,</P>Wed, 15 Nov 2017 17:59:32 GMThttps://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/304080#M68acanepa2017-11-15T17:59:32ZRe: Correlation coefficient
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/311026#M69
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Hi acanepa</P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"> </P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Good programming practice would suggest avoiding a division by zero, however you need to think like a statistician - which can often be counter intuitive!</P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"> </P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">You actually want the Pearson Coefficient to "fail" when you divide by zero.</P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">See this post for more info <A href="https://stackoverflow.com/questions/38548343/pearson-correlation-fails-for-perfectly-correlated-sets" target="_blank">https://stackoverflow.com/questions/38548343/pearson-correlation-fails-for-perfectly-correlated-sets</A></P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"> </P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">From a DAX point of view the divide function will tolerate a division by zero.</P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;"> </P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Regards</P>
<P style="margin: 0in; font-family: Calibri; font-size: 11.0pt;">Graeme</P>Sun, 26 Nov 2017 19:01:27 GMThttps://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/311026#M69numbus2017-11-26T19:01:27ZRe: Correlation coefficient
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/345034#M93
<P>Very handy addition.<BR /><BR />Are there, however, plans to add a measure/some other output feature that will also report on the uncertainty of the Correlation Coefficient calculated for a given series pair (i.e. implementing Fisher's z-transformation and evaluating the confidence interval at difference levels that the user chooses, or just a standard set of levels like 80%, 90 % and 95%)<BR /><BR />The risk is that people could state (and frequently do state) correlation coefficients for insufficiently sized samples and derive insights that are actually attributable to noise etc.<BR /><BR />Thanks for the awesome work! <img id="smileyvery-happy" class="emoticon emoticon-smileyvery-happy" src="https://community.powerbi.com/i/smilies/16x16_smiley-very-happy.png" alt="Smiley Very Happy" title="Smiley Very Happy" /></P>Thu, 25 Jan 2018 06:33:32 GMThttps://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/345034#M93ZanderFick2018-01-25T06:33:32ZRe: Correlation coefficient
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/606094#M284
<P>Thanks for the formula Daniil. I'm trying to use it to calculate the correlation coefficient on some data I have but I have hit an issue - I only have <STRONG>two</STRONG> columns of data, one showing the month name and one showing the volume (a measure). </P>
<P>I.e.</P>
<P><STRONG>Month Volume</STRONG></P>
<P>Jan-07 1,000</P>
<P>Feb-07 1,613</P>
<P>Mar-07 1,128</P>
<P>etc.</P>
<P> </P>
<P>I don't have two measures as per your data nor do I have a column showing a 'category'. How would I be able to carry out the calculation for my data please? Do I need to create a new measure column based on the month name, coverting it into a numerical value? (just a guess). Then I would classify the month column as the 'category'? </P>
<P> </P>
<P>Hope this is clear (still classing myself as a Power BI newbie!) </P>
<P>Regards,</P>
<P>Lee</P>Tue, 22 Jan 2019 11:14:56 GMThttps://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/606094#M284PowerBI_772019-01-22T11:14:56ZRe: Correlation coefficient
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/606584#M285
<P><LI-USER uid="103337"></LI-USER> please have a look at the second example in my <A href="https://xxlbi.com/blog/simple-linear-regression-in-dax/" target="_self">Simple Linear Regression</A> blog -- you can employ a similar technique here.</P>
<P> </P>
<P>Also, in case people still read this -- <LI-USER uid="4119"></LI-USER> said in a private message there "is not an error of the calculation you created but rather an error on my end to feed the formula with wrong numbers", so please ignore his comment about negative numbers.</P>Tue, 22 Jan 2019 22:39:42 GMThttps://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/606584#M285Daniil2019-01-22T22:39:42ZRe: Correlation coefficient
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/982093#M431
<P>Hello everyone,</P>
<P>A little bit stuck on this one... Can someone guide me?I have a data set with the following fields...</P>
<P> </P>
<OL>
<LI>Year</LI>
<LI>Venue</LI>
<LI>Event Length(in days)</LI>
<LI>Average Event revenue per day</LI>
</OL>
<P> </P>
<P>I want to see if there's a correlation between even length and event revenue per day. I thought I could use the year as the category, but I think this might be wrong. Using the Quick Measure, what Category, and 2 measures should I use to establish the hypotheses that "the longer the event, the more/less menoy per day is made"</P>
<P> </P>
<P>Thanks!</P>Thu, 19 Mar 2020 13:04:36 GMThttps://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/982093#M431pchapple2020-03-19T13:04:36ZRe: Correlation coefficient
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/1604186#M659
<P>This is wonderfully useful thanks <LI-USER uid="15597"></LI-USER> <BR />I've been trying to modify your DAX to use in a scenario where the variables to be correlated are pivoted and the categories are contained in an attribute column (example below). Sadly I'm not having much luck, does anyone have any suggestions on how to generate a correlation matrix with this data structure?<BR /><BR /></P>
<TABLE style="border-collapse: collapse; width: 144pt;" border="0" width="192" cellspacing="0" cellpadding="0">
<TBODY>
<TR style="height: 14.5pt;">
<TD width="64" height="19" style="height: 14.5pt; width: 48pt;">Item</TD>
<TD width="64" style="width: 48pt;">Attribute</TD>
<TD width="64" style="width: 48pt;">Value</TD>
</TR>
<TR style="height: 14.5pt;">
<TD height="19" style="height: 14.5pt;">A</TD>
<TD>Value X</TD>
<TD align="right">2</TD>
</TR>
<TR style="height: 14.5pt;">
<TD height="19" style="height: 14.5pt;">B</TD>
<TD>Value X</TD>
<TD align="right">3</TD>
</TR>
<TR style="height: 14.5pt;">
<TD height="19" style="height: 14.5pt;">C</TD>
<TD>Value X</TD>
<TD align="right">5</TD>
</TR>
<TR style="height: 14.5pt;">
<TD height="19" style="height: 14.5pt;">D</TD>
<TD>Value X</TD>
<TD align="right">7</TD>
</TR>
<TR style="height: 14.5pt;">
<TD height="19" style="height: 14.5pt;">A</TD>
<TD>Value Y</TD>
<TD align="right">0</TD>
</TR>
<TR style="height: 14.5pt;">
<TD height="19" style="height: 14.5pt;">B</TD>
<TD>Value Y</TD>
<TD align="right">1</TD>
</TR>
<TR style="height: 14.5pt;">
<TD height="19" style="height: 14.5pt;">C</TD>
<TD>Value Y</TD>
<TD align="right">1</TD>
</TR>
<TR style="height: 14.5pt;">
<TD height="19" style="height: 14.5pt;">D</TD>
<TD>Value Y</TD>
<TD align="right">2</TD>
</TR>
</TBODY>
</TABLE>Sun, 17 Jan 2021 22:32:28 GMThttps://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/1604186#M659systemnova2021-01-17T22:32:28ZRe: Correlation coefficient
https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/2008108#M736
<P>Statistics only work with numbers. You need to add numerical key columns for your text columns before you can calculate the correlation.</P>Tue, 10 Aug 2021 11:41:48 GMThttps://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/2008108#M736lbendlin2021-08-10T11:41:48Z