Serious Issue with New Correlation Coefficient Qui...

Greg_Deckler · ‎01-13-2018

OK, in my opinion, there is a serious deficiency in the new Correlation Coefficient Quick Measure. The issue is that it does not factor in the possibility that there are more rows in, say Y than in X in the source data. The original technique I posted here:

https://community.powerbi.com/t5/Community-Blog/Correlation-Seasonality-and-Forecasting-with-Power-B...

and @Daniil's version of that here:

https://community.powerbi.com/t5/Quick-Measures-Gallery/Correlation-coefficient/m-p/196274

does not exhibit these issues because they both either use measures that honor relationships or create a table that honors those relationships. In this way, unrelated rows are automatically weeded out and not factored into the correlation.

This can be easily seen by using the data from my original article on the subject. There are more wage rows than forecast rows. So, there is an unmatched Wages row. The SUM of Y given by the new correlation coefficient gives 242 for this data when it should, in fact, be 222. Big issue because this causes the correlation calculation to fail.

Overall, seems like a sloppy implementation. I would refactor it to do exactly what @Daniil did and create a table that contains the category and the the two measures that you want so that this drops out any unrelated rows.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

RT · ‎01-23-2018

So is it safe to say that the quick measure correlation coefficient can't handle null/blank values but can handle using measures, and Daniil's version can handle nulls/blanks, but requires a table[column] and can't handle a measure? Certainly the DAX is different between the two. I have survey data that has blanks which I need to handle, but was also hoping to be able to use measures so that I could create some disconnected slicers tables to allow people to choose two survey metrics to then see if their correlation coefficient.

Greg_Deckler · ‎01-23-2018

I guess I would need clarification on what you mean by handling or not handling slicers. In their raw form, I'm not sure that either will necessarily handle measures but I'd need to better understand what you mean by that. I would think that the original formula would be able to handle measures since it is essentially creating a temp table. But, again, that depends on how you are using measures. If you need to handle null values, the stock Quick Measure as coded will not work, you will get incorrect results.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Greg_Deckler · ‎01-13-2018

I would suggest perhaps this:

{X MEASURE} and {Y MEASURE} correlation for {CATEGORY} = 
VAR __CORRELATION_TABLE = FILTER(ADDCOLUMNS(VALUES({CATEGORY}),"__X",CALCULATE({X MEASURE})),"__Y",CALCULATE({Y MEASURE}))), AND (
            NOT ( ISBLANK ( [__X] ) ),
            NOT ( ISBLANK ( [__Y] ) )
        ))
VAR __COUNT =
	COUNTX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__X]
	)
VAR __SUM_X =
	SUMX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__X]
	)
VAR __SUM_Y = SUMX(KEEPFILTERS(__CORRELATION_TABLE), [__Y])
VAR __SUM_XY =
	SUMX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__X] * [__Y] * 1.
	)
VAR __SUM_X2 =
	SUMX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__X] ^ 2
	)
VAR __SUM_Y2 =
	SUMX(
		KEEPFILTERS(__CORRELATION_TABLE),
		[__Y] ^ 2
	)
RETURN
	DIVIDE(
		__COUNT * __SUM_XY - __SUM_X * __SUM_Y * 1.,
		SQRT(
			(__COUNT * __SUM_X2 - __SUM_X ^ 2)
				* (__COUNT * __SUM_Y2 - __SUM_Y ^ 2)
		)
	)

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Serious Issue with New Correlation Coefficient Quick Measure

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly