I am interested in replacing an iterative xls solution and I would be grateful for some advice please.
Requirement is to:
To calculate the average 'initial score' - this is the average "earliest score" for Players that have more than 1 score in a given date range, the date range would be selected prior to the calculation. In the example below for the date range of 1/01/2018 - 1/06/2018 we would exclude records for Players E & F because they each have only 1 score for this date range.
To calculate the average 'proceeding score' - this is the average 'non-earliest score' for Players that have a count of more than 1 score for a given date range, the date range would be selected prior to the calculation. In the example for the date range of 1/01/2018 - we would exclude E & F because they each have only 1 score.
I am wondering what the best approach would be to address this.
Would something like the following be a suitable approach?
Create a calculation for -
1/ a total count of records per customer;
2/ the earliest date per player;
3/ a matching attribute i.e. if "2/ earliest date per player" is equal to the Date then return 'earliest' else 'not earliest'.
and then somehow -
5/ Exclude / filter out players with a "1/ total count of records" equal to only 1 from proceeding calculations;
6/ Calculate the average earliest score where "4/ matching attribute" equal to 'earliest';
7/ Calculate the average proceeding score where "4/ matching attribute" equal to 'not earliest'.
Example of the data & result is below:
DATE | PLAYER | SCORE |
1/01/2018 | A | 2 |
1/02/2018 | A | 3 |
1/04/2018 | A | 2 |
1/06/2018 | A | 4 |
1/01/2018 | B | 2 |
1/02/2018 | B | 3 |
1/04/2018 | B | 1 |
1/01/2018 | C | 3 |
1/02/2018 | C | 2 |
1/04/2018 | C | 4 |
1/05/2018 | D | 2 |
1/06/2018 | D | 4 |
1/05/2018 | E | 2 |
1/06/2018 | F | 4 |
DESIRED AGGREGATED RESULTS | ||
AVG_INITIAL_SCORE | AVG_PROCEEDING_SCORE | |
2.25 | 2.875 |
Solved! Go to Solution.
Hi @mleepin
The logic you've described sounds exactly right - the question is how to implement this with DAX in Power BI. In this case we need to filter both players and dates per player as you've described.
Here is how I would do it (pbix link).
AVG_INITIAL_SCORE = VAR PlayersToInclude = FILTER ( VALUES ( Scores[PLAYER] ), CALCULATE ( COUNTROWS ( Scores ) ) > 1 ) RETURN CALCULATE ( AVERAGE ( Scores[SCORE] ), GENERATE ( PlayersToInclude, FIRSTDATE ( Scores[DATE] ) ) )
AVG_PROCEEDING_SCORE = VAR PlayersToInclude = FILTER ( VALUES ( Scores[PLAYER] ), CALCULATE ( COUNTROWS ( Scores ) ) > 1 ) RETURN CALCULATE ( AVERAGE ( Scores[SCORE] ), GENERATE ( PlayersToInclude, EXCEPT ( CALCULATETABLE ( VALUES ( Scores[DATE] ) ), FIRSTDATE ( Scores[DATE] ) ) ) )
EDIT: After some thought, the AVG_PROCEEDING_SCORE measure can be written a little more briefly:
AVG_PROCEEDING_SCORE v2 = CALCULATE ( AVERAGE ( Scores[SCORE] ), GENERATE ( VALUES ( Scores[PLAYER] ), EXCEPT ( CALCULATETABLE ( VALUES ( Scores[DATE] ) ), FIRSTDATE ( Scores[DATE] ) ) ) )
The initial FILTER produces a list of Players with more than 1 row.
The GENERATE function used as a filter argument within CALCULATE gives us either combinations of players and their first dates, or combinations of players and all dates but their first date.
There may be other approaches but this is how I would do it.
Regards,
Owen
Proud to be a Datanaut!
Hi again @mleepin
Yes, I'd forgotten that FIRSTDATE and LASTDATE don't always like duplicate dates.
We can fix this by using FIRSTNONBLANK and LASTNONBLANK instead.
Within your pbix file, I created these modifiedmeasures:
AVG_INITIAL_SCORE FIX = VAR PlayersToInclude = FILTER ( VALUES ( Sheet3[Player]), CALCULATE ( COUNTROWS ( Sheet3) ) > 1 ) RETURN CALCULATE ( AVERAGE ( Sheet3[Score]), GENERATE ( PlayersToInclude, FIRSTNONBLANK ( Sheet3[Date], 0 ) ) )
AVG_PROCEEDING_SCORE FIX = VAR PlayersToInclude = FILTER ( VALUES ( Sheet3[Player]), CALCULATE ( COUNTROWS ( Sheet3) ) > 1 ) RETURN CALCULATE ( AVERAGE ( Sheet3[Score] ), GENERATE ( PlayersToInclude, EXCEPT ( CALCULATETABLE ( VALUES ( Sheet3[Date] ) ), FIRSTNONBLANK ( Sheet3[Date], 0 ) ) ) )
These gives the same results as your measures at a total level.
One thing with having MIN_DATE_PER_PLAYER as a calculated column is that the min date won't respond to filter context.
Anyhow, hopefully that's useful and will leave it to you to compare the measures
Best regards,
Owen
Proud to be a Datanaut!
Thank You - following a test these calcs. that you have confirmed both work as expected.
Regards
ML.
Hi @mleepin
Apologies I missed your follow-up query.
I take it you've solved it already with that measure?
Regards,
Owen
Proud to be a Datanaut!
Hi @mleepin
The logic you've described sounds exactly right - the question is how to implement this with DAX in Power BI. In this case we need to filter both players and dates per player as you've described.
Here is how I would do it (pbix link).
AVG_INITIAL_SCORE = VAR PlayersToInclude = FILTER ( VALUES ( Scores[PLAYER] ), CALCULATE ( COUNTROWS ( Scores ) ) > 1 ) RETURN CALCULATE ( AVERAGE ( Scores[SCORE] ), GENERATE ( PlayersToInclude, FIRSTDATE ( Scores[DATE] ) ) )
AVG_PROCEEDING_SCORE = VAR PlayersToInclude = FILTER ( VALUES ( Scores[PLAYER] ), CALCULATE ( COUNTROWS ( Scores ) ) > 1 ) RETURN CALCULATE ( AVERAGE ( Scores[SCORE] ), GENERATE ( PlayersToInclude, EXCEPT ( CALCULATETABLE ( VALUES ( Scores[DATE] ) ), FIRSTDATE ( Scores[DATE] ) ) ) )
EDIT: After some thought, the AVG_PROCEEDING_SCORE measure can be written a little more briefly:
AVG_PROCEEDING_SCORE v2 = CALCULATE ( AVERAGE ( Scores[SCORE] ), GENERATE ( VALUES ( Scores[PLAYER] ), EXCEPT ( CALCULATETABLE ( VALUES ( Scores[DATE] ) ), FIRSTDATE ( Scores[DATE] ) ) ) )
The initial FILTER produces a list of Players with more than 1 row.
The GENERATE function used as a filter argument within CALCULATE gives us either combinations of players and their first dates, or combinations of players and all dates but their first date.
There may be other approaches but this is how I would do it.
Regards,
Owen
Proud to be a Datanaut!
Owen,
Thank you for this it is great.
Regards,
ML
HI Owen,
I have another question regarding this solution that you kindly presented a little while ago.
I had tested this further with a different range of dates i.e. where there is a different minimum date per player. I have found that the use of FIRSTDATE with GENERATE seems to restrict or filter the result to where a player has a minimum date that is equal to the minimum from the whole result set, rather than the minimum per player ( or from the window).
Would there be a way around this?
Hi @mleepin
From what I can tell, the measures should be behaving correctly even when players have different minimum dates, taking into account any date filters applied as well.
Could you post some sample data or a PBIX containing unexpected results, ideally with a visual showing the context where you are using the measures? There could be something I didn't consider when the measures are used in a certain context.
In particular FIRSTDATE ( Scores[DATE] ) is internally converted to FIRSTDATE ( CALCULATETABLE ( VALUES ( Scores[DATE] ) ), so context transition should give the first date per Player being iterated over inside GENERATE.
(an interesting article on this here).
How I tested the measures:
In your original sample dataset, there were a few different minimum dates among the players (dates shown as d/mm/yyyy):
To test that the AVERAGE_INITIAL_SCORE measure was working correctly, I created this DAX query in DAX Studio that includes the same Player/Date filter present in the measure.
You could equally create a calculated table in Power BI, leaving out the EVALUATE:
-- Test Average Initial Score EVALUATE VAR PlayersToInclude = FILTER ( VALUES ( Scores[PLAYER] ), CALCULATE ( COUNTROWS ( Scores ) ) > 1 ) VAR PlayerDateFilter = GENERATE ( PlayersToInclude, FIRSTDATE ( Scores[DATE] ) ) VAR FilteredTable = CALCULATETABLE( Scores, PlayerDateFilter ) RETURN FilteredTable
This returns the following table (with different min dates per Player), whose scores average to 2.25 which is what the AVERAGE_INITIAL_SCORE measure returned:
Date | Player | Score |
1/01/2018 | A | 2 |
1/01/2018 | B | 2 |
1/01/2018 | C | 3 |
5/01/2018 | D | 2 |
Similarly I used this DAX query to test AVERAGE_PROCEEDING_SCORE:
-- Test Average Proceeding Score EVALUATE VAR PlayerDateFilter = GENERATE ( VALUES ( Scores[PLAYER] ), EXCEPT ( CALCULATETABLE ( VALUES ( Scores[DATE] ) ), FIRSTDATE ( Scores[DATE] ) ) ) VAR FilteredTable = CALCULATETABLE ( Scores, PlayerDateFilter ) RETURN FilteredTable
This returns the following table, whose scores average to 2.875 which is what the AVERAGE_PROCEEDING_SCORE measure returned:
Date | Player | Score |
2/01/2018 | A | 3 |
4/01/2018 | A | 2 |
6/01/2018 | A | 4 |
2/01/2018 | B | 3 |
4/01/2018 | B | 1 |
2/01/2018 | C | 2 |
4/01/2018 | C | 4 |
6/01/2018 | D | 4 |
Best regards,
Owen
Proud to be a Datanaut!
Hello Owen,
Thank you again much appreciated and thank you for the link as well which I have been looking over. I did encoutner an error when using the variable based calcs, with a different data set. The error was - Mdx script model (9,38), a date column containing duplicate dates was specifiedin a call to function FIRSTDATE.
I did attempt another longer winded approach with more calcs - this is perhaps not best practice but this ended up providing desired results:
---Maximum date per player (column) ---
MAX_DATE_PER_PLAYER =
MAXX(
FILTER ( 'Sheet3', EARLIER ( 'Sheet3'[Player]) = 'Sheet3'[Player]),
Sheet3[Date]
)
---Minimum date per player (column)---
MIN_DATE_PER_PLAYER =
MINX(
FILTER ( 'Sheet3', EARLIER ( 'Sheet3'[Player]) = 'Sheet3'[Player]),
Sheet3[Date]
)
---Count of player occurrence per player (column)---
COUNT_PLAYERS_PER_PLAYER = COUNTX(FILTER(Sheet3, EARLIER(Sheet3[Player]) = Sheet3[Player]), Sheet3[Player])
---Average initial / earliest score for players that have >1 score (calc)---
AVG_INITIAL =
AVERAGEX(
FILTER ( 'Sheet3', Sheet3[Date] = Sheet3[MIN_DATE_PER_PLAYER] && Sheet3[COUNT_PLAYERS_PER_PLAYER] > 1),
Sheet3[Score]
)
---Average proceeding score for players that have >1 score (Calc)---
AVG_PROCEEDING =
AVERAGEX(
FILTER ( 'Sheet3', Sheet3[Date] <> Sheet3[MIN_DATE_PER_PLAYER] && Sheet3[COUNT_PLAYERS_PER_PLAYER] > 1),
Sheet3[Score]
)
A pbix is uploaded as well for reference, this demonstrates the above and the error I had mentioned, this link is here -
https://1drv.ms/u/s!ApNjWROrbni7j3YEWV24gzT5aLGr
Regards,
ML.
Hi again @mleepin
Yes, I'd forgotten that FIRSTDATE and LASTDATE don't always like duplicate dates.
We can fix this by using FIRSTNONBLANK and LASTNONBLANK instead.
Within your pbix file, I created these modifiedmeasures:
AVG_INITIAL_SCORE FIX = VAR PlayersToInclude = FILTER ( VALUES ( Sheet3[Player]), CALCULATE ( COUNTROWS ( Sheet3) ) > 1 ) RETURN CALCULATE ( AVERAGE ( Sheet3[Score]), GENERATE ( PlayersToInclude, FIRSTNONBLANK ( Sheet3[Date], 0 ) ) )
AVG_PROCEEDING_SCORE FIX = VAR PlayersToInclude = FILTER ( VALUES ( Sheet3[Player]), CALCULATE ( COUNTROWS ( Sheet3) ) > 1 ) RETURN CALCULATE ( AVERAGE ( Sheet3[Score] ), GENERATE ( PlayersToInclude, EXCEPT ( CALCULATETABLE ( VALUES ( Sheet3[Date] ) ), FIRSTNONBLANK ( Sheet3[Date], 0 ) ) ) )
These gives the same results as your measures at a total level.
One thing with having MIN_DATE_PER_PLAYER as a calculated column is that the min date won't respond to filter context.
Anyhow, hopefully that's useful and will leave it to you to compare the measures
Best regards,
Owen
Proud to be a Datanaut!
Thank You - following a test these calcs. that you have confirmed both work as expected.
Regards
ML.
Hi Owen,
More questions have arisen any advice would be most appreciated please. Where I have used the calc. column like:
---MAX_DATE_PER_PLAYER---
MAXX(
FILTER ( 'Sheet3', EARLIER ( 'Sheet3'[Player]) = 'Sheet3'[Player]),
Sheet3[Date]
)
Is it possible to create something similar but as a measure that would be responsive to the filtering of a column like Sheet3[Date] that would be controlled by a user from a slicer prior to the calculation being evaulated? The purpose would be to display the min and max dates per client (client being the window or subset) but for a selected date range rather than the full dataset.
Regards,
ML.
Following on from previous - a measure that is similar to the calc column:
MAXX(
FILTER ( 'Sheet3', EARLIER ( 'Sheet3'[Player]) = 'Sheet3'[Player]),
Sheet3[Date]
)
Measure:
CALCULATE(
MAX(Sheet3[Date]),
FILTER(
ALLSELECTED(Sheet3),
Sheet3[customer] = MAX(Sheet3[customer])
)
)
User | Count |
---|---|
101 | |
65 | |
56 | |
46 | |
46 |