cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Super User
Super User

Serious Issue with New Correlation Coefficient Quick Measure

OK, in my opinion, there is a serious deficiency in the new Correlation Coefficient Quick Measure. The issue is that it does not factor in the possibility that there are more rows in, say Y than in X in the source data. The original technique I posted here:

 

https://community.powerbi.com/t5/Community-Blog/Correlation-Seasonality-and-Forecasting-with-Power-B...

 

and @Daniil's version of that here:

https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/196274

 

does not exhibit these issues because they both either use measures that honor relationships or create a table that honors those relationships. In this way, unrelated rows are automatically weeded out and not factored into the correlation.

 

This can be easily seen by using the data from my original article on the subject. There are more wage rows than forecast rows. So, there is an unmatched Wages row. The SUM of Y given by the new correlation coefficient gives 242 for this data when it should, in fact, be 222. Big issue because this causes the correlation calculation to fail.

 

Overall, seems like a sloppy implementation. I would refactor it to do exactly what @Daniil did and create a table that contains the category and the the two measures that you want so that this drops out any unrelated rows.

 

 

 

 


Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


3 REPLIES 3
Super User
Super User

Re: Serious Issue with New Correlation Coefficient Quick Measure

I would suggest perhaps this:

 

{X MEASURE} and {Y MEASURE} correlation for {CATEGORY} = 
VAR __CORRELATION_TABLE = FILTER(ADDCOLUMNS(VALUES({CATEGORY}),"__X",CALCULATE({X MEASURE})),"__Y",CALCULATE({Y MEASURE}))), AND (
            NOT ( ISBLANK ( [__X] ) ),
            NOT ( ISBLANK ( [__Y] ) )
        ))
VAR __COUNT =
	COUNTX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__X]
	)
VAR __SUM_X =
	SUMX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__X]
	)
VAR __SUM_Y = SUMX(KEEPFILTERS(__CORRELATION_TABLE), [__Y])
VAR __SUM_XY =
	SUMX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__X] * [__Y] * 1.
	)
VAR __SUM_X2 =
	SUMX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__X] ^ 2
	)
VAR __SUM_Y2 =
	SUMX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__Y] ^ 2
	)
RETURN
	DIVIDE(
		__COUNT * __SUM_XY - __SUM_X * __SUM_Y * 1.,
		SQRT(
			(__COUNT * __SUM_X2 - __SUM_X ^ 2)
				* (__COUNT * __SUM_Y2 - __SUM_Y ^ 2)
		)
	)

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


RT Frequent Visitor
Frequent Visitor

Re: Serious Issue with New Correlation Coefficient Quick Measure

So is it safe to say that the quick measure correlation coefficient can't handle null/blank values but can handle using measures, and Daniil's version can handle nulls/blanks, but requires a table[column] and can't handle a measure?  Certainly the DAX is different between the two.  I have survey data that has blanks which I need to handle, but was also hoping to be able to use measures so that I could create some disconnected slicers tables to allow people to choose two survey metrics to then see if their correlation coefficient. 

Super User
Super User

Re: Serious Issue with New Correlation Coefficient Quick Measure

I guess I would need clarification on what you mean by handling or not handling slicers. In their raw form, I'm not sure that either will necessarily handle measures but I'd need to better understand what you mean by that. I would think that the original formula would be able to handle measures since it is essentially creating a temp table. But, again, that depends on how you are using measures. If you need to handle null values, the stock Quick Measure as coded will not work, you will get incorrect results.


Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!