Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
ITManuel
Responsive Resident
Responsive Resident

Running total poor performance with large date table

Hi community,

 

I have a performance issue with the following simple running total measure:

 

 

BLCost = 
VAR _MaxDate = MAX ( TasksBLTP[TimeByDay] )
VAR _Result = 
CALCULATE (
    SUM ( TasksBLTP[TaskBaselineCost] ),
    ALL ( 'Date' ),
    'Date'[Date] <= _MaxDate
)
RETURN
    _Result

 

 

 

This is my data model:

Screenshot 2024-04-28 112201.jpg

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The TasksBLTP table has in total around 200k rows, in the filter context I'm running the measure currently the table has only 73 rows!

 

The Date table currently contains dates from 1990 to 2030, so in total 14.975 rows. I understand that this might is quite large for a date table, but due to insights required on other tables i currently need such a wide date table. 

 

When I'm running the [BLCost] measure, in a line chart with 'Date'[Date] on the the X-Axis and [BLCost] on the Y-Axis, the query takes around 10sec. to complete. 

This to me seems extremely slow considering that there are only 73 rows in TasksBLTP  for the current context.

ST1.png

 

 

 

 

 

 

After testing a bit, I discovered that the large Date table is responsible for the poor performance.

The current selection of 73 rows on TasksBLTP , would only require dates from 2024. If I limit the Date table to contain only dates for 2024, the query is actually very fast:

 

ST2.png

 

 

 

 

 

 

The same query with 10 years in the date table, takes around 700ms. 

The time requried to run the query seems to grow exponantially with the number of rows in the Date table.

 

How can I solve the issue? Should I use a separate Date table for the tables in the model for which I only need limited number of years?

 

I tried also to limit the the calculation for a certain time period, but the "ALL ( 'Date')" seems to be the driver for the poor performance.

 

Thanks in advance and Best regards

1 ACCEPTED SOLUTION

@ITManuel This is the alternate running total method: Better Running Total - Microsoft Fabric Community

You also might be interested in this post where it is a similar situation with a large date dimension that wreaks utter havoc on CALCULATE. CALCULATE Challenge - Round 1 - Microsoft Fabric Community


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

8 REPLIES 8
ITManuel
Responsive Resident
Responsive Resident

Hi @Greg_Deckler  & @lbendlin ,

 

thank you, I'm surprised and impressed about this content in relation "CALCUHATE"  🤣😂🤣.

I was not aware that there is an alternative to CALCULATE for running totals and so many other things. 

 

I tested CALCULATE vs. SUMX for running totals against different sizes of Date and Fact table. SUMX is faster with a very large Date table. However it becomes slower than the solution with CALCULATE with smaller sized Data tables depending on the size of the fact table.

 

I tested the following:

 

 

RT CALC = 
CALCULATE (
    SUM ( TasksBLTP[TaskBaselineCost] ),
    ALL ( 'Date' ),
    'Date'[Date] <= MAX ( TasksBLTP[TimeByDay] )
)

 

 

 

RT SUMX = 
VAR _MaxDate =
    MAX ( TasksBLTP[TimeByDay] )
VAR _T1 =
    FILTER ( ALLSELECTED ( TasksBLTP ), TasksBLTP[TimeByDay] <= _MaxDate )
VAR _Result =
    SUMX ( _T1, TasksBLTP[TaskBaselineCost] )
RETURN
    _Result

 

 

Date table with 41 Years: SUMX is faster 1,8x to 46x faster depending on rows in Fact table

RT 1.jpg

 

 

 

 

 

 

 

 

 

 

 

Date table with 11 Years: CALCULATE or SUMX is faster depending on rows in Fact table

RT 2.jpg

 

 

 

 

 

 

 

 

 

 

 

Date table with 4 Years: CALCULATE or SUMX is faster depending on rows in Fact table

RT 3.jpg

 

 

 

 

 

 

 

 

 

 

 

The pbix file can be downloaded under https://we.tl/t-3GUEmfy0fE

 

Since even SUMX takes 10+sec. for most selections in my real model, I'm thinking of implementing a separate Date table only for the TaskBLTP which only contains the number of rows required for the current selection.

 

Any other ideas?

 

Best regards

Keep measuring and then use whatever is appropriate in your scenario.  Follow the best practices around how to filter, when to filter etc.  The key aim is cardinality reduction without impacting the final result. Cartesian turtles all the way down.

Ashish_Mathur
Super User
Super User

Hi,

Is this any better?

BLCost = CALCULATE(SUM(TasksBLTP[TaskBaselineCost]),datesbetween('Date'[Date],MINX(ALL('Date'),'Date'[Date]),MAX('Date'[Date]))

Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Hi @Ashish_Mathur ,

 

I tried the following: 

VAR _MinDate =
    MIN ( TasksBLTP[TimeByDay] )
VAR _MaxDate =
    MAX ( TasksBLTP[TimeByDay] )
VAR _Result =
    CALCULATE (
        SUM ( TasksBLTP[TaskBaselineCost] ),
        DATESBETWEEN ( 'Date'[Date], MINX ( ALL ( 'Date' ), 'Date'[Date] ), _MaxDate )
    )
RETURN
    _Result

 

But this version results in "Visual has exceeded the available resources". 

lbendlin
Super User
Super User

instead of ALL(Date) consider using a variable definition like CALENDAR(dt"2024-01-01",TODAY())

 

 

BLCost = 
VAR Dates = CALENDAR(dt"2024-01-01",MAX ( TasksBLTP[TimeByDay] ))
RETURN 
CALCULATE (
    SUM ( TasksBLTP[TaskBaselineCost] ),
    'Date'[Date] IN Dates
)

 

Or follow Greg's advice and use an iterator function instead (like SUMX)

 

 

ITManuel
Responsive Resident
Responsive Resident

Hi @lbendlin ,

 

I tried with 

 

VAR _Dates = CALENDAR ( MIN ( TasksBLTP[TimeByDay] ), MAX ( TasksBLTP[TimeByDay] ) )
VAR _BLCostNew =
    CALCULATE (
        SUM ( TasksBLTP[TaskBaselineCost] ),
        'Date'[Date] IN _Dates,
        'Date'[Date] <= MAX ( TasksBLTP[TimeByDay] ) 
    )
RETURN
 _BLCostNew 

 

 

...but this is not returning a running total anymore.

 

Could you please guide me to Greg's SUMX Solution?

 

 

@ITManuel This is the alternate running total method: Better Running Total - Microsoft Fabric Community

You also might be interested in this post where it is a similar situation with a large date dimension that wreaks utter havoc on CALCULATE. CALCULATE Challenge - Round 1 - Microsoft Fabric Community


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

@Greg_Deckler one for you.

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.