Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
igaca
Helper III
Helper III

Efficiently Comparing Single Fact Table Subsets to one Another Using DAX

 

Often times I run into a scenario where I am working with multiple schedules data sets and need to draw comparisons between them to review progress, over time.

 

To facilitate this, I typically will place all schedule files in a folder, query the folder and merge all into a single fact table, where the highest level of granularity are the individual schedule files (each a unique text strings and/or associated status date).  Each one of these files is comprised of task descriptions representing different scope that collectively make up a Project, each labeled with an "Activity ID" / a unique identifier (for each file at least) that can be used as the primary basis for comparing Activity attributes (other column data) from one schedule file to another.

 

Here's a sample excerpt from a data dump into excel, to illustrate better:

 

Fact Table Subset Comparison 01.pngFact Table Subset Comparison 02.png

 

I have taken different approaches in the past to compare data from one schedule file to another including creation of a copy fact table, using DAX to create it in memory, making a comparison copy as a variable, using two disconnected schedule file slicer tables for filter context generation etc., and all with varying amounts of success.

 

I wanted to pose the question to the broader community to see what approach some of you -particularly more advanced users- would take in such a scenario? 

 

If that question is too broad for some, let me ask a more specific one.  For example:

 

How would you -most efficiently in terms of model responsiveness- go about deriving "Activity Start Slippage", that is the difference in calendar days between what an earlier schedule says for when a particular activity is scheduled to start and that which a later schedule indicates is the actual start date?

 

Any thoughts, please share....

 

Thanks!

 

Igor

 

 

 

1 ACCEPTED SOLUTION
v-sihou-msft
Employee
Employee

@igaca

 

In my opinion, I prefer the approach that merge all files into one fact table. So your table will have file as highest grain, then multiple activities group on file level. There might be data for same activity in multiple files. 

 

In this scenario, you can create a measure to get the max date of all dates before current schedule start date and group on Activity level. Then you can have current StartDate minus LastStartDate measure to get the Slippage. 

 

Slippage =
1
    * (
        CALCULATE ( MAX ( Table[StartDate] ), ALLEXCEPT ( Table, Table[ActivityId] ) )
            - CALCULATE (
                MAX ( Table[StartDate] ),
                FILTER ( Table, Table[StartDate] < MAX ( Table[StartDate] ) ),
                ALLEXCEPT ( Table, Table[ActivityId] )
            )
    )

Regards,

 

 

View solution in original post

5 REPLIES 5
v-sihou-msft
Employee
Employee

@igaca

 

In my opinion, I prefer the approach that merge all files into one fact table. So your table will have file as highest grain, then multiple activities group on file level. There might be data for same activity in multiple files. 

 

In this scenario, you can create a measure to get the max date of all dates before current schedule start date and group on Activity level. Then you can have current StartDate minus LastStartDate measure to get the Slippage. 

 

Slippage =
1
    * (
        CALCULATE ( MAX ( Table[StartDate] ), ALLEXCEPT ( Table, Table[ActivityId] ) )
            - CALCULATE (
                MAX ( Table[StartDate] ),
                FILTER ( Table, Table[StartDate] < MAX ( Table[StartDate] ) ),
                ALLEXCEPT ( Table, Table[ActivityId] )
            )
    )

Regards,

 

 

Simon:

 

Thanks for your input.  I ended up developing an alternate approach largely relying on the LOOKUPVALUE function but have taken and tested your suggestion with good results.

 

Thanks again!

 

Igor

Hi Simon:

 

Very interesting approach...let me make sure I understand it correctly:

 

First part:   CALCULATE(MAX('ScheduleData'[Start]), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID]))

 

In short, ALLEXCEPT is used to preclude anything from consideration but the current-filter-context Activity ID, for which a max Start Date is returned (among various schedule starts for that Activ ID). 

 

Second part:    CALCULATE(MAX('ScheduleData'[Start]),FILTER('ScheduleData','ScheduleData'[Start] < MAX('ScheduleData'[Start])), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID]))

 

This expression is the LastStartDate measure, one that returns the Start Date before the current Start Date.

 

In the integrated expression, we are simply taking the difference between the two and the "1" multiplier is meant for conversion from a date value.

 

Let me know if I am understanding things correctly.  Thanks!

 

 

I like the idea but it's not quite what I'm after.  Looks like it's hard to get around it without using EARLIER because File name has to be taken into account as well, not just the Start Date.  Let's take a closer look:

 

Measures as follows, derived from your approach:

 

Earlier Schedule Start = CALCULATE(MAX('ScheduleData'[Start]), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID]))

Later Schedule Start = CALCULATE(MAX('ScheduleData'[Start]),FILTER('ScheduleData','ScheduleData'[Start] < MAX('ScheduleData'[Start])), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID]))

Earlier Schedule Finish = CALCULATE(MAX('ScheduleData'[Finish]), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID]))

Later Schedule Finish = CALCULATE(MAX('ScheduleData'[Finish]),FILTER('ScheduleData','ScheduleData'[Finish] < MAX('ScheduleData'[Finish])), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID]))

Start Date Var = (CALCULATE(MAX('ScheduleData'[Start]), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID]))-CALCULATE(MAX('ScheduleData'[Start]),FILTER('ScheduleData','ScheduleData'[Start] < MAX('ScheduleData'[Start])), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID])))*1

Finish Data Var = (CALCULATE(MAX('ScheduleData'[Finish]), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID]))-CALCULATE(MAX('ScheduleData'[Finish]),FILTER('ScheduleData','ScheduleData'[Finish] < MAX('ScheduleData'[Finish])), ALLEXCEPT('ScheduleData','ScheduleData'[Activity ID])))*1

 

 

Results:

 

Test with Activity ID A14170 filtered for:

 

Measure works but is not what I'm after.  It works over a single column table of results (Start Date) but the desire is to compute a variance for the previous file  (not overall)Measure works but is not what I'm after. It works over a single column table of results (Start Date) but the desire is to compute a variance for the previous file (not overall)See notesSee notesExcel data for Activ A14170Excel data for Activ A14170

What I'm really after if the following:  below excerpt is one of a matrix visual with the schedule Data Dates (unique to each file name) across the top as column headers, Activity IDs for rows and the "Start" date column on the values field.  This view shows both the change in the start dates over time, as well as when an activity is present in a particular schedule file or not.  Now rather than showing the dates, I'd like to be able to show "Start Variance to the Previous Schedule" (for instance)

 

Measured Test - Desired Outcome.png

 

 

 

 

 

What I've previously done to have the ability to compare one subset of the same fact table to another (schedule file and associated data to another schedule file) is to use DAX to generate two virtual tables (one for one file name selection, another for the other file selection) and feed those to some function to derive a desired result.  Here's an example of comparing two schedule files along the lines of shared activity count:

 

Shared Activities =

COUNTROWS(
INTERSECT(
SUMMARIZE(FILTER('ScheduleData','ScheduleData'[Data Date]='Target Select'[TargetSelMeasure]),'ScheduleData'[Activity ID]),
SUMMARIZE(FILTER('ScheduleData','ScheduleData'[Data Date]='Update Select'[UpdateSelMeasure]),'ScheduleData'[Activity ID])
)
)

 

'Target Select' and 'Update Select' are disconnected tables containing unique values of schedule file names and associated status updated dates.  [TargetSelMeasure] and [UpdateSelMeasure] are simply returning the MAX of scalar value that corresponds to a slicer selection.

 

This allows for various comparisons between two files.  What I would like to be able to do is to make comparisons amongst all files, in a table format, where various measures are returning variances of X to the previous schedule file. 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.