Solved: Summarize and Distinct Count

12Bowers12 · ‎05-10-2019

Good afternoon, everyone,

In the following code, I created a new Data LossByClaimantCoverage based on data LossFile. In the new Data LossByClaimantCoverage, I summarized the dollar [AMOUNT] and rename as Loss Reserve.

Then I created a Measure 1 for counting based on [Loss Reserve]>0.

Then I created another Measure 2 based on the original data LossFile but got a much higher number.

Appreciate your help in advance.

Dennis

LossByClaimantCoverage =

SUMMARIZE ( LossFile,

LossFile[PREFIX],

LossFile[ClaimantCov],

"Loss Reserve", SUM ( LossFile[AMOUNT]) )

Measure 1 Loss Reserve Claim Count =

CALCULATE ( DISTINCTCOUNT ( LossByClaimantCoverage[ClaimantCov] ), LossByClaimantCoverage[Loss Reserve] > 0 )

Measure 2 Loss Reserve Claim Count =

CALCULATE ( DISTINCTCOUNT ( LossFile[ClaimantCov] ), FILTER(LossFile, SUM(LossFile[AMOUNT]) > 0))

Anonymous · ‎05-11-2019

@12Bowers12 , let's look at your measures in detail. I'll also show you how to optimize the calculated table.

SUMMARIZE() is not goot dor adding columns. It's just....slow. Refer to this article by www.sqlbi.com for more info:

https://www.sqlbi.com/articles/introducing-summarizecolumns/

LossByClaimantCoverage =
ADDCOLUMNS (
    SUMMARIZE (
        LossFile,
        LossFile[PREFIX],
        LossFile[ClaimantCov]
    ),
    "Loss Reserve", CALCULATE ( SUM ( LossFile[AMOUNT] ) )
)

This will provide a table of unique combinations of Lossfile[PREFIX] and LossFile[ClaimantCov], and then return the TOTAL amount for each combination.

[Measure1] is counting the distinct number of ClaimantCov where the TOTAL amount is > 0. This is NOT taking [PREFIX] into account. If the same [ClaimantCov] value is used for two different [PREFIX] values and the Total is > 0 for both of them, this measure will still only return 1.

[Measure2] is counting the distinct values of [ClaimantCov] for the original table, but adding a new filter:

"only keep the rows where SUM(Amount) > 0"

Here's the issue.

FILTER() introduces a row context. it's iterating over every row in the table, and performing the calculation SUM(LossFile[AMOUNT]). Since there's no filter context anymore, it's summing ALL of the values in the table, and then checking to see if that amount is > 0. If it is (and I'm guessing it will be), it keeps the row.

Once FILTER() has been applied, then it counts the distinct values. I'm guessing your FILTER() parameter is just returning the entire table (albeit, after a costly calculation).

If you want Measure2 to work, and without having to create a calculated table (which will consume additional memory in your model), try this measure:

Measure2_Updated =
VAR TempTable =
ADDCOLUMNS (
	VALUES(LossFile[ClaimantCov]),
    "Loss Reserve", CALCULATE ( SUM ( LossFile[AMOUNT] )
)
VAR Filtered_Table = 
	FILTER(
		TempTable,
		[Loss Reserve] > 0
	)
VAR Result =
	COUNTROWS(Filtered_Table)
RETURN
Result

View solution in original post

Anonymous · ‎05-11-2019

@12Bowers12 , let's look at your measures in detail. I'll also show you how to optimize the calculated table.

SUMMARIZE() is not goot dor adding columns. It's just....slow. Refer to this article by www.sqlbi.com for more info:

https://www.sqlbi.com/articles/introducing-summarizecolumns/

LossByClaimantCoverage =
ADDCOLUMNS (
    SUMMARIZE (
        LossFile,
        LossFile[PREFIX],
        LossFile[ClaimantCov]
    ),
    "Loss Reserve", CALCULATE ( SUM ( LossFile[AMOUNT] ) )
)

This will provide a table of unique combinations of Lossfile[PREFIX] and LossFile[ClaimantCov], and then return the TOTAL amount for each combination.

[Measure1] is counting the distinct number of ClaimantCov where the TOTAL amount is > 0. This is NOT taking [PREFIX] into account. If the same [ClaimantCov] value is used for two different [PREFIX] values and the Total is > 0 for both of them, this measure will still only return 1.

[Measure2] is counting the distinct values of [ClaimantCov] for the original table, but adding a new filter:

"only keep the rows where SUM(Amount) > 0"

Here's the issue.

FILTER() introduces a row context. it's iterating over every row in the table, and performing the calculation SUM(LossFile[AMOUNT]). Since there's no filter context anymore, it's summing ALL of the values in the table, and then checking to see if that amount is > 0. If it is (and I'm guessing it will be), it keeps the row.

Once FILTER() has been applied, then it counts the distinct values. I'm guessing your FILTER() parameter is just returning the entire table (albeit, after a costly calculation).

If you want Measure2 to work, and without having to create a calculated table (which will consume additional memory in your model), try this measure:

Measure2_Updated =
VAR TempTable =
ADDCOLUMNS (
	VALUES(LossFile[ClaimantCov]),
    "Loss Reserve", CALCULATE ( SUM ( LossFile[AMOUNT] )
)
VAR Filtered_Table = 
	FILTER(
		TempTable,
		[Loss Reserve] > 0
	)
VAR Result =
	COUNTROWS(Filtered_Table)
RETURN
Result

Summarize and Distinct Count

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly