Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
cplesner
Helper III
Helper III

Calculating point in time age of persons and filtering them on age

Hi there

 

--Edited everything into this for readability--

 

I have a SCD2 history table of people which include their Birthdate, Adresses over time etc. So to determine the amount of people that we have in the population at any given point in time i can use the PIT calculation like

 

 

 

NumberofPersons_Simple :=
VAR MinDate =
    MIN ( 'Period'[Date] )
VAR Calc =
    CALCULATE (
        COUNTROWS ( 'Person' ),
        'Person'[DW_ValidFrom] <= MinDate,
        'Person'[DW_ValidTo] >= MinDate,
        REMOVEFILTERS ( 'Period' )
    )
RETURN
    Calcl

 

 

 

 

But now I want to be able to filter out those people who at over a certain age at that point in time. For that i'll use the "Age" table and since there is not active relationship between those tables i'll have to lookup up the calculated age and then filter.

 

Below is my current calculation that solves that - however it's quiet slow once the dataset becomes larger and performance somehow also depends on wich visual that are in use.

 

 

 

Number of People :=
VAR MinDate =
    MIN ( 'Period'[Date] )
VAR Calc =
    CALCULATE (
        COUNTROWS ( 'Person' ),
        'Person'[DW_ValidFrom] <= MinDate,
        'Person'[DW_ValidTo] >= MinDate,
        FILTER (
            VALUES ( 'Person' ),
            VAR AgeCalculated =
                IF (
                    'Person'[BirthDate] <= MinDate,
                    TRUNC ( YEARFRAC ( 'Person'[BirthDate], MinDate ) )
                )
            RETURN
                CONTAINS ( VALUES ( 'Age'[Age] ), 'Age'[Age], AgeCalculated )
        )
    )
RETURN
    Calc

 

 

 

 

I should somehow be able to first narrow down the table to only those rows within the valid SCD2 timeperiod, then Summarize by year and then filter out/relate those years to the Age table.

 

The performance depends on the visual, so if I for example add two visuals on the canvas - there is a factor 1:5 in calculation time. The one to the left has the Month in the columns - which makes it slow, while the other visual to the right doesn't have the attribute on the columns, but is filtered on the page instead

 

cplesner_0-1677834593596.png

When I look at the server timings - fast one to the right has the following:

 

Note: 

"Borger Oplysninger" is "People"

"Alder" is "Age"

"Foedselsdato" is "BirthDate"

 

cplesner_1-1677834593891.png

 

Here it seems that the filtering is as intended. Rows are reduced to be within the ValidFrom/To range and it super fast over those approx 8 million filtered rows.

 

When the Month attribute is dragged into the column in the matrix (the one to the left), the Server timings gives the following.

cplesner_0-1677835346473.png

 

So now it's suddently doing an "additional" select Birthdate, DW_ValidFrom, DWValidTo before returning the same result. So to me it seems that it's somehow "re-treating" the Month filter an additional time, but handling that column filter different than the filter on Month thats already selected in the Filter pane...

With Maaned År on columns in the Matrix visual the code is

// DAX Query
DEFINE
  VAR __DS0FilterTable = 
    TREATAS({"sep 2019"}, 'Periode'[Måned År])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      ROLLUPADDISSUBTOTAL('Alder'[Alder], "IsGrandTotalRowTotal"),
      'Periode'[Måned År ],
      'Periode'[MaanedID],
      __DS0FilterTable,
      "AntalBorgere", '# Measures'[AntalBorgere],
      "AntalBorgere_FormatString", IGNORE('# Measures'[_AntalBorgere FormatString])
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(
      102,
      SUMMARIZE(__DS0Core, 'Alder'[Alder], [IsGrandTotalRowTotal]),
      [IsGrandTotalRowTotal],
      0,
      'Alder'[Alder],
      1
    )

  VAR __DS0SecondaryBase = 
    SUMMARIZE(__DS0Core, 'Periode'[Måned År ], 'Periode'[MaanedID])

  VAR __DS0Secondary = 
    TOPN(102, __DS0SecondaryBase, 'Periode'[MaanedID], 1, 'Periode'[Måned År ], 1)

  VAR __DS0BodyLimited = 
    NATURALLEFTOUTERJOIN(
      __DS0PrimaryWindowed,
      SUBSTITUTEWITHINDEX(
        __DS0Core,
        "ColumnIndex",
        __DS0Secondary,
        'Periode'[MaanedID],
        ASC,
        'Periode'[Måned År ],
        ASC
      )
    )

EVALUATE
  __DS0Secondary

ORDER BY
  'Periode'[MaanedID], 'Periode'[Måned År ]

EVALUATE
  __DS0BodyLimited

ORDER BY
  [IsGrandTotalRowTotal] DESC, 'Alder'[Alder], [ColumnIndex]

 

If I remove the "Maaned År" from the Column visual i get this simpler query

// DAX Query
DEFINE
  VAR __DS0FilterTable = 
    TREATAS({"aug 2023"}, 'Periode'[Måned År ])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      ROLLUPADDISSUBTOTAL('Alder'[Alder], "IsGrandTotalRowTotal"),
      __DS0FilterTable,
      "VAntalBorgere", '# Measures'[AntalBorgere],
      "v_AntalBorgere", IGNORE('# Measures'[_AntalBorgere FormatString])
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, 'Alder'[Alder], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  [IsGrandTotalRowTotal] DESC, 'Alder'[Alder]

 

11 REPLIES 11
cplesner
Helper III
Helper III

The 'Person' Table i a SCD Type 2 with the DW_ValidFrom, DW_ValidTo Date columns. So there is no direct active relationship to those two columns in the 'People' table. Each Person Age a the selected point in time depends on the seletion that the user makes, so there is also not an active relationship from the 'Age' table to the 'People' table

 

 

Number of People :=
VAR MinDate =
    MIN ( 'Period'[Date] )
VAR Calc =
    CALCULATE (
        COUNTROWS ( 'Person' ),
        'Person'[DW_ValidFrom] <= MinDate,
        'Person'[DW_ValidTo] >= MinDate,
        FILTER (
            VALUES ( 'Person' ),
            VAR AgeCalculated =
                IF (
                    'Person'[BirthDate] <= MinDate,
                    TRUNC ( YEARFRAC ( 'Person'[BirthDate], MinDate ) )
                )
            RETURN
                CONTAINS ( VALUES ( 'Age'[Age] ), 'Age'[Age], AgeCalculated )
        )
    )
RETURN
    Calc

 

 

I add two visuals on the canvas. The one to the left has the Month in the columns - which makes it slow, while the other visual to the right doesn't have the attribute on the columns, but is filtered on the page instead

 

screenshot.PNG

When i look at the server timings - fast one to the right has the following:

 

cplesner_0-1676632293723.png

"Borger Oplysninger" is "People"

"Alder" is "Age"

"Foedselsdato" is "BirthDate"

 

Here it seems that the filtering is as intended. Rows are reduced to be within the ValidFrom/To range and it super fast over those approx 8 million filtered rows.

 

When the Month attribute is dragged into the column in the matrix (the one to the left), the Server timings gives the following.

 

cplesner_1-1676632653678.png

So now it's suddently doing an "additional" select Birthdate, DW_ValidFrom, DWValidTo before returning the same result. So to me it seems that it's somehow "re-treating" the Month filter an additional time, but handling that column filter different than the filter on Month thats already selected in the Filter pane...

What are the queries generated by the 2 tables? Also, what is the performance like if you use the code I posted? 

With Maaned År on columns in the Matrix visual the code is

// DAX Query
DEFINE
  VAR __DS0FilterTable = 
    TREATAS({"sep 2019"}, 'Periode'[Måned År])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      ROLLUPADDISSUBTOTAL('Alder'[Alder], "IsGrandTotalRowTotal"),
      'Periode'[Måned År ],
      'Periode'[MaanedID],
      __DS0FilterTable,
      "AntalBorgere", '# Measures'[AntalBorgere],
      "AntalBorgere_FormatString", IGNORE('# Measures'[_AntalBorgere FormatString])
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(
      102,
      SUMMARIZE(__DS0Core, 'Alder'[Alder], [IsGrandTotalRowTotal]),
      [IsGrandTotalRowTotal],
      0,
      'Alder'[Alder],
      1
    )

  VAR __DS0SecondaryBase = 
    SUMMARIZE(__DS0Core, 'Periode'[Måned År ], 'Periode'[MaanedID])

  VAR __DS0Secondary = 
    TOPN(102, __DS0SecondaryBase, 'Periode'[MaanedID], 1, 'Periode'[Måned År ], 1)

  VAR __DS0BodyLimited = 
    NATURALLEFTOUTERJOIN(
      __DS0PrimaryWindowed,
      SUBSTITUTEWITHINDEX(
        __DS0Core,
        "ColumnIndex",
        __DS0Secondary,
        'Periode'[MaanedID],
        ASC,
        'Periode'[Måned År ],
        ASC
      )
    )

EVALUATE
  __DS0Secondary

ORDER BY
  'Periode'[MaanedID], 'Periode'[Måned År ]

EVALUATE
  __DS0BodyLimited

ORDER BY
  [IsGrandTotalRowTotal] DESC, 'Alder'[Alder], [ColumnIndex]

 

If I remove the "Maaned År" from the Column visual i get this simpler query


// DAX Query
DEFINE
  VAR __DS0FilterTable = 
    TREATAS({"aug 2023"}, 'Periode'[Måned År ])

  VAR __DS0Core = 
    SUMMARIZECOLUMNS(
      ROLLUPADDISSUBTOTAL('Alder'[Alder], "IsGrandTotalRowTotal"),
      __DS0FilterTable,
      "VAntalBorgere", '# Measures'[AntalBorgere],
      "v_AntalBorgere", IGNORE('# Measures'[_AntalBorgere FormatString])
    )

  VAR __DS0PrimaryWindowed = 
    TOPN(502, __DS0Core, [IsGrandTotalRowTotal], 0, 'Alder'[Alder], 1)

EVALUATE
  __DS0PrimaryWindowed

ORDER BY
  [IsGrandTotalRowTotal] DESC, 'Alder'[Alder]

That doesn't make much sense to me. As far as I can see, __DS0Core contains all the values you want to see in the final table, but it is then splitting it into 2 tables, independently grouped by year and by age, and then joining them back together again. No idea why it would do it like that.

I agree - it's weird that it doing that calculation as it alread has the result.

 

I'm thinking that these two below end up somehow been completely independent queries in the left visual while the engine thinks smarter in the left - although same results and filters.

cplesner_0-1676648792249.png

 

johnt75
Super User
Super User

Rather than calculating each person's age you could convert the age into a date and then use that as a direct filter, e.g.

Number of People :=
VAR MinDate =
    MIN ( 'Period'[Date] )
VAR MaxBirthDate =
    DATEADD ( { MinDate }, MIN ( 'Age'[Age] ), YEAR )
VAR Calc =
    CALCULATE (
        COUNTROWS ( 'Person' ),
        'Person'[DW_ValidFrom] <= MinDate,
        'Person'[DW_ValidTo] >= MinDate,
        'Person'[BirthDate] > MaxBirthDate
    )
RETURN
    Calc

The Calculation gives the following error:
Function 'DATEADD' cannot be used with columns added by ADDCOLUMNS or SUMMARIZE functions

The end user can choose to see only those that are at Age 5,10 or 23 in the 'Age'[Age] filter.

 

So the filtering should filter out the rows from 'People' where the calculated Age at the selected Min ('Periode'[Date]) is not 5,10 or 23. But first it need to find the initial valid rows from SCD2 'People' table for the calculation by assuring that the Min ('Periode'[Date]) is between the DW_ValidFrom  and DW_ValidTo and that the initial

 

     CALCULATE (
            COUNTROWS ( 'Borger Oplysninger' ),
            'Borger Oplysninger'[DW_ValidFrom] <= MindsteDato,
            'Borger Oplysninger'[DW_ValidTo] >= MindsteDato)

OK, try

Number of People :=
VAR MinDate =
    MIN ( 'Period'[Date] )
VAR MinAge =
    MIN ( 'Age'[Age] )
VAR MaxBirthDate =
    DATE ( YEAR ( MinDate ) - MinAge, MONTH ( MinDate ), DAY ( MinDate ) )
VAR Calc =
    CALCULATE (
        COUNTROWS ( 'Person' ),
        'Person'[DW_ValidFrom] <= MinDate,
        'Person'[DW_ValidTo] >= MinDate,
        'Person'[BirthDate] > MaxBirthDate
    )
RETURN
    Calc

Hmm

 

It doesn't seem to be solving the issue.

The end user can choose any Age that they want to filter/filter out. I have tried to meassure performance in DaxStudio and the problem with performance comes when i add "Month" attribute to the Visual.
Below are same Matrix with the same "Number of People" Measure. Only difference is that the left visual has the Month on the Column showing data for "Jan 2019" since i've selected "Jan 2019" in the filter pane. 

Difference here is the time to render the visual. The Visual to the right shows same figures since the filter pane applies the "Jan 2019" to both visuals - but the Visual to the right is about 5-6 times faster in rendering the same numbers.

 

How can this be?

 

cplesner_0-1676582986932.png

 

 

 

can you post a picture of the relationships between tables? and also say which table the month column in the visual, and in the filter pane, is coming from.

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Fabric Monthly Update - May 2024

Check out the May 2024 Fabric update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.