Solved: How to improve a slow DAX query using ALL()

dimitrishuk · ‎07-18-2018

Hi,

I'm in the process of building a tabular data model for our accounts receivables that is up to date up to the last day. The table is around 1m rows and around 50MB.

A single DAX query with only 1 filter takes 1118ms to execute (uncached). There is a single relationship to a date table:

'Date Dimension'[Date] 1:* 'Accounts Receivable Ledger'[Transaction Date]

An example bucket measure is here:

Overdue 1000+:=

VAR

MaxDate = MAX('Date Dimension'[Date])

RETURN

CALCULATE(

SUM('Accounts Receivable Ledger'[Amount - Gross]),

'Accounts Receivable Ledger'[Draft- Due Date] <= (MaxDate-1) ),

'Accounts Receivable Ledger'[Invoice Due Date] <=(MaxDate-1000)

),

FILTER(

ALL('Date Dimension'[Date]), 'Date Dimension'[Date]<=(MaxDate)

)

I have been trying to optimize these queries and have shaved a fair bit off duration by refactoring using DAX studio, but I've hit a wall. Also worth noting that if I remove the date table relationship query time is about 50% than with, but I want to be able to use time intelligence functions.

The biggest culprit is the FE (formula engine) with 907ms and second is a large SE (Storage Engine) with 188ms.

Is is there a better way to write this formula to improve overall query time with or without a warm cache? The issue gets exponentially worse when I start to add in Dimension rows/filters/slicers and the model is useless for end users.

Below I provide the slow SE query as provided by DaxStudio.

SET DC_KIND="AUTO";
SELECT
'Date Dimension'[Date], 'Accounts Receivable Ledger'[Draft Due Date], 'Accounts Receivable Ledger'[Invoice Due Date],
SUM ( 'Accounts Receivable Ledger'[Amount Gross] )
FROM 'Accounts Receivable Ledger'
LEFT OUTER JOIN 'Date Dimension' ON 'Accounts Receivable Ledger'[Transaction Date]='Date Dimension'[Date]
WHERE
'Date Dimension'[Date] IN ( 38347.000000, 38766.000000, 42069.000000, 42488.000000, 40171.000000, 40590.000000, 37854.000000, 38273.000000, 41576.000000, 41995.000000..[6774 total values, not all displayed] ) VAND
'Accounts Receivable Ledger'[Draft Due Date] IN ( 38766.000000, 42069.000000, 42488.000000, 40171.000000, 40590.000000, 37854.000000, 38273.000000, 41576.000000, 41995.000000, 39678.000000..[6055 total values, not all displayed] ) VAND
'Accounts Receivable Ledger'[Invoice Due Date] IN ( 37982.000000, 38401.000000, 41704.000000, 42123.000000, 39806.000000, 40225.000000, 37489.000000, 37908.000000, 41211.000000, 41630.000000..[6118 total values, not all displayed] ) ;

'Estimated size ( volume, marshalling bytes ) : 234975, 5639400'

dimitrishuk · ‎07-24-2018

Hey guys, I managed to solve my slow query time.

What I did was define a MaxDate variable, with MaxDate being the date from my Date Table.

Then I ran the date filter once on a calculated table using SUMMARIZECOLUMNS. I removed the relationship between the date table and accounts receivable ledger. See below:

= VAR

MaxDate= MAX('Date Dimension'[Date])

RETURN

SUMMARIZECOLUMNS('Accounts Receivable Ledger'[Company],'Accounts Receivable Ledger'[Business Unit],'Accounts Receivable Ledger'[Address Number],'Accounts Receivable Ledger'[Document Unique],'Accounts Receivable Ledger'[Invoice Due Date],'Accounts Receivable Ledger'[Draft- Due Date],FILTER('Accounts Receivable Ledger','Accounts Receivable Ledger'[Transaction Date]<=MaxDate),"Open Amount",SUM('Accounts Receivable Ledger'[Amount - Gross]))

By filtering on SUMMARIZECOLUMNS and moving my measures to the new table I saved repeated IN comparisons in each measure and also evened out the spread between Formula and Storage Engines.

By grouping I also reduced the cardinality which helped alot.

The big problem with SUMMARIZECOLUMNS is it doesnt work well with calculated columns so I also had to change my table views at the SQL level. Still, performance is much better now.

View solution in original post

dimitrishuk · ‎07-24-2018

Hey guys, I managed to solve my slow query time.

What I did was define a MaxDate variable, with MaxDate being the date from my Date Table.

Then I ran the date filter once on a calculated table using SUMMARIZECOLUMNS. I removed the relationship between the date table and accounts receivable ledger. See below:

= VAR

MaxDate= MAX('Date Dimension'[Date])

RETURN

SUMMARIZECOLUMNS('Accounts Receivable Ledger'[Company],'Accounts Receivable Ledger'[Business Unit],'Accounts Receivable Ledger'[Address Number],'Accounts Receivable Ledger'[Document Unique],'Accounts Receivable Ledger'[Invoice Due Date],'Accounts Receivable Ledger'[Draft- Due Date],FILTER('Accounts Receivable Ledger','Accounts Receivable Ledger'[Transaction Date]<=MaxDate),"Open Amount",SUM('Accounts Receivable Ledger'[Amount - Gross]))

By filtering on SUMMARIZECOLUMNS and moving my measures to the new table I saved repeated IN comparisons in each measure and also evened out the spread between Formula and Storage Engines.

By grouping I also reduced the cardinality which helped alot.

The big problem with SUMMARIZECOLUMNS is it doesnt work well with calculated columns so I also had to change my table views at the SQL level. Still, performance is much better now.

OwenAuger · ‎07-20-2018

@dimitrishuk

This is just an idea: in some cases, I have found DATESBETWEEN(...) performs better than a boolean inequality on a Date column, since the boolean version translates to FILTER ( ALL (...),... )

For example, you could try this:

Overdue 1000+ :=
VAR MaxDate =
    MAX ( 'Date Dimension'[Date] )
RETURN
    CALCULATE (
        CALCULATE (
            CALCULATE (
                SUM ( 'Accounts Receivable Ledger'[Amount - Gross] ),
                DATESBETWEEN (
                    'Accounts Receivable Ledger'[Draft- Due Date],
                    BLANK (),
                    MaxDate - 1
                )
            ),
            DATESBETWEEN (
                'Accounts Receivable Ledger'[Invoice Due Date],
                BLANK (),
                MaxDate - 1000
            )
        ),
        DATESBETWEEN ( 'Date Dimension'[Date], BLANK (), MaxDate )
    )

Also, if you found performance better without the relationship with 'Date Dimension', could you use an outer CALCULATE with CROSSFILTER(..., None ) to disable the relationship for the purpose of this measure? It's not something I've ever tried so would be interested in whether it makes any difference.

On side question, is it necessary for the CALCULATE functions to be nested, effectively applying their filters in sequence? It may make no difference, but you could try a single CALCULATE with the three date filters applied simultaneously?

Regards,

Owen

Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn

v-huizhn-msft · ‎07-19-2018

Hi @dimitrishuk, After reviewing your formula posted, I can not figure out a higher efficient way to achieve it until now. I thinks the main cause is your table is huge, so it needs time to calculate? Best Regards, Angelia

How to improve a slow DAX query using ALL()

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly