Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
KHorseman
Community Champion
Community Champion

Measure to Count by Last Stage

I am trying to write a pair of measures but I'm stuck. I'm working with a table that gives the history of clients through a series of processing stages. Each row represents a stage movement event, giving the stage number, the client, and the date when they were moved to that stage. There's other data as well but that's not relevant here.

 

The two measures I want are:

 

1) [LastStage] = A distinct count of clients per stage, where that stage was the most recent one for that person.

 

2) [HighestStage] = Same thing, but where the stage was the highest numbered stage the client was ever in regardless of date (they can be moved back to earlier stages sometimes).

 

So here's a simplified example data set:

 

 

Client		Stage	Stage Date
Mike Jech	1	12/10/2015
Lou Tsegousi	1	12/10/2015
Jim Matt	3	12/11/2015
Mike Jech	2	12/12/2015
Adam Bahm	5	12/12/2015
Lou Tsegousi	2	12/13/2015
Jim Matt	4	12/14/2015
Mike Jech	3	12/15/2015
Adam Bahm	6	12/16/2015

 

What I'd like to be able to do is to use the list of stages as rows in a matrix or as the x axis on a column chart or something like that, and at each stage it would give the count of clients who have that as their last stage. Or highest.

 

I tried 

[HighestStage] = CALCULATE(
	DISTINCTCOUNT(StageHistory[Client]),
	FILTER(
		StageHistory,
		MAX(StageHistory[Stage])
	)
)

but since I'm using the stage as the row it's already in the filter context. The result was identical to DISTINCTCOUNT(StageHistory[Client]) except that the row for stage 0 was blank. I know I could use ALL to clear the filter context but then what? Or maybe this is the wrong method entirely. Anyone have any ideas?





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




11 REPLIES 11
Greg_Deckler
Super User
Super User

HighestStage = MAXX(FILTER(stages,[Client]=[Client]),[Stage])

Create as a Measure.

 

Create another measure:

 

MyCount1 = COUNTX(FILTER(stages,[Stage]=[Highest]),[Stage])

Put Stage and this measure in a column chart. Make sure [Stage] is set to "Do not summarize".

 

@KHorseman

 

Sorry, hit post before I was ready!!


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

The second measure gives an error message. "A table of multiple values was supplied where a single value was expected." I suspect it's because of the MAXX being passed into the filter but I'm not sure.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Not sure, here is my setup:

 

stages1.pngstages2.png

 

MyCount formula is as described above.

 

And, as I look closer, I think it is wrong. Grrr...


Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

You got me thinking in the right direction though. I think I have it now. Two measures:

 

 

HighestStage = CALCULATE(
	MAX(StageHistory[Stage]),
	ALLEXCEPT(StageHistory, StageHistory[Client])
)

HighestCount = CALCULATE(
	DISTINCTCOUNT(StateHistory[Client]),
	FILTER(
		StageHistory,
		[HighestStage] = StageHistory[Stage]
	)
)

 

 

HighestStage is just a base measure that never gets used anywhere. If you put it on the table with stages as rows you just get the highest stage repeated on every row. I suppose if I ever made a table with a client on each row it would be useful there. Anyway, HighestCount is what goes on the table, chart, whatever. Here's the result of both measures:

 

 

Stage	HighestStage	HighestCount
1	7		64
2	7		2
3	7		8
4	7
5	7		21
6	7		3
7	7		4

 

Those counts under HighestCount are correct according to my manual hand-count of the data. 64 people never made it past stage 1, everybody who reached stage 4 has moved onto a higher stage so that blank is correct.

 

So there's Highest. Now I need to figure out Latest. I think it'll be the same pattern though. I can use MAX on the date.

 





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




@KHorseman - Nice, I knew I was close, but I had to run out the door to pick up my son from school. Nice job. Learning from @konstantinos, you should be able to get rid of the temporary measure, I think, like this:

 

HighestCount = 

VAR HighestStage = CALCULATE(
	MAX(StageHistory[Stage]),
	ALLEXCEPT(StageHistory, StageHistory[Client])
)

RETURN CALCULATE(
	DISTINCTCOUNT(StateHistory[Client]),
	FILTER(
		StageHistory,
		HighestStage = StageHistory[Stage]
	)
)

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

I thought you were right for a minute, but it didn't work after all. HighestStage is calculated per Client while HighestCount is calculated on those clients per stage, so it results in blank lines until the last stage, then it calculates correctly for just that last stage. I believe it does this for the same reason why the measure version of HighestStage returns all 7s on that table.

 

Effectively HighestCount is behaving like an iterator, like a weird COUNTAX. It's performing HighestStage measure on each client at each stage, then returning the client count based on the filter context that iteration returns. When you run it as a variable, it only ever gets the max stage for the whole table because the variable declaration gets calculated in the table's context, and the table has stages for rows.

 

All that being said, I'll bet there's a different way to do it with a variable. I wonder what would happen if I did a SUMMARIZE in the VAR statement...





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




That's interesting, I'm going to have to play with this some more after my son goes to bed! Seems like a simple thing but is infuriatingly vexing.

Follow on LinkedIn
@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

None of this needs to be in measure logic based on the requirements as described.

 

Two calculated columns:

 

// DAX calculated columns

LastStageFlag = 
FactStageTransition[Date]
	= CALCULATE(
		MAX( FactStageTransition[Date] )
		,ALLEXCEPT( FactStageTransition, FactStageTransition[ClientKey] )
	)


MaxStageFlag = 
FactStageTransition[Stage]
	= CALCULATE(
		MAX( FactStageTransition[Stage] )
		,ALLEXCEPT( FactStageTransition, FactStageTransition[ClientKey] )
	)

These indicate whether the row in the fact table represents the most recent stage for a given client, or the greatest stage for a given client.

 

Three measures:

// DAX measures
Clients = 
DISTINCTCOUNT( FactStageTransition[ClientKey] )

LastStageCount = 
CALCULATE(
	[Clients]
	,FactStageTransition[LastStageFlag]
)


MaxStageCount = 
CALCULATE(
	[Clients]
	,FactStageTransition[MaxStageFlag]
)

These use the flags we defined above. Use CALCULATE() to filter on those flags, and count the clients for that subset of the table.

 

Easy peasy.

You could also calculate those flags in PQ, but this is an area where DAX shines over M (or I just don't know the right M yet to make that as simple and performant).

Heh, I always feel like I'm cheating when I do things in columns instead of measures. Yep, that method works. Thank you.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Columns: Anything that user filters / interaction with reports cannot change.

 

Measures: Anything that user interaction can change.

 

 

More specifically, my usual argument is that if it's known at refresh-time and unchanging in the face of filters / interaction, it should be a part of ETL, not done in the model. This sort of column logic is actually much easier to do in DAX than in Power Query, and I don't know what your ultimate source is. I'd always recommend pushing changes / transformations / new columns as far up the source chain as possible, but if this has to be done in DAX, it's not so bad.

Hmm. That just returns the max stage value for each row. And the rows are themselves stage values. So I get:

 

Stage	HighestStage
1		1
2		2
3		3
4		4
5		5
6		6
7		7

What I'm looking for is for HighestStage to give me a count of clients per stage, but only count the clients on the stage row where that was the highest stage they reached. So a client who made it to stage 5 would not be counted on rows 1-4, but only on 5, etc.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!




Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.