Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
dimitrishuk
Helper I
Helper I

How to improve a slow DAX query using ALL()

Hi,

 

I'm in the process of building a tabular data model for our accounts receivables that is up to date up to the last day. The table is around 1m rows and around 50MB. 

 

A single DAX query with only 1 filter takes 1118ms to execute (uncached). There is a single relationship to a date table:

 

'Date Dimension'[Date] 1:* 'Accounts Receivable Ledger'[Transaction Date]

 

An example bucket measure is here:

 

Overdue 1000+:=

 

 VAR

    MaxDate = MAX('Date Dimension'[Date])

RETURN

 

CALCULATE(

    CALCULATE(

        CALCULATE(

                SUM('Accounts Receivable Ledger'[Amount - Gross]),

                'Accounts Receivable Ledger'[Draft- Due Date] <= (MaxDate-1) ),

                'Accounts Receivable Ledger'[Invoice Due Date] <=(MaxDate-1000)

        ),

        FILTER(

        ALL('Date Dimension'[Date]), 'Date Dimension'[Date]<=(MaxDate)

    )

 

I have been trying to optimize these queries and have shaved a fair bit off duration by refactoring using DAX studio, but I've hit a wall. Also worth noting that if I remove the date table relationship query time is about 50% than with, but I want to be able to use time intelligence functions.

 

The biggest culprit is the FE (formula engine) with 907ms and second is a large SE (Storage Engine) with 188ms.

 

 

Is is there a better way to write this formula to improve overall query time with or without a warm cache? The issue gets exponentially worse when I start to add in Dimension rows/filters/slicers and the model is useless for end users.

 

Below I provide the slow SE query as provided by DaxStudio.

 

SET DC_KIND="AUTO";
SELECT
'Date Dimension'[Date], 'Accounts Receivable Ledger'[Draft Due Date], 'Accounts Receivable Ledger'[Invoice Due Date],
SUM ( 'Accounts Receivable Ledger'[Amount Gross] )
FROM 'Accounts Receivable Ledger'
LEFT OUTER JOIN 'Date Dimension' ON 'Accounts Receivable Ledger'[Transaction Date]='Date Dimension'[Date]
WHERE
'Date Dimension'[Date] IN ( 38347.000000, 38766.000000, 42069.000000, 42488.000000, 40171.000000, 40590.000000, 37854.000000, 38273.000000, 41576.000000, 41995.000000..[6774 total values, not all displayed] ) VAND
'Accounts Receivable Ledger'[Draft Due Date] IN ( 38766.000000, 42069.000000, 42488.000000, 40171.000000, 40590.000000, 37854.000000, 38273.000000, 41576.000000, 41995.000000, 39678.000000..[6055 total values, not all displayed] ) VAND
'Accounts Receivable Ledger'[Invoice Due Date] IN ( 37982.000000, 38401.000000, 41704.000000, 42123.000000, 39806.000000, 40225.000000, 37489.000000, 37908.000000, 41211.000000, 41630.000000..[6118 total values, not all displayed] ) ;


'Estimated size ( volume, marshalling bytes ) : 234975, 5639400'

 

 

 

 

1 ACCEPTED SOLUTION
dimitrishuk
Helper I
Helper I

Hey guys, I managed to solve my slow query time.

 

What I did was define a MaxDate variable, with MaxDate being the date from my Date Table.

 

Then I ran the date filter once on a calculated table using SUMMARIZECOLUMNS. I removed the relationship between the date table and accounts receivable ledger. See below:

 

= VAR

     MaxDate= MAX('Date Dimension'[Date])

RETURN

 

SUMMARIZECOLUMNS('Accounts Receivable Ledger'[Company],'Accounts Receivable Ledger'[Business Unit],'Accounts Receivable Ledger'[Address Number],'Accounts Receivable Ledger'[Document Unique],'Accounts Receivable Ledger'[Invoice Due Date],'Accounts Receivable Ledger'[Draft- Due Date],FILTER('Accounts Receivable Ledger','Accounts Receivable Ledger'[Transaction Date]<=MaxDate),"Open Amount",SUM('Accounts Receivable Ledger'[Amount - Gross]))

 

By filtering on SUMMARIZECOLUMNS and moving my measures to the new table I saved repeated IN comparisons in each measure and also evened out the spread between Formula and Storage Engines. 

 

By grouping I also reduced the cardinality which helped alot.

 

The big problem with SUMMARIZECOLUMNS is it doesnt work well with calculated columns so I also had to change my table views at the SQL level. Still, performance is much better now.

View solution in original post

3 REPLIES 3
dimitrishuk
Helper I
Helper I

Hey guys, I managed to solve my slow query time.

 

What I did was define a MaxDate variable, with MaxDate being the date from my Date Table.

 

Then I ran the date filter once on a calculated table using SUMMARIZECOLUMNS. I removed the relationship between the date table and accounts receivable ledger. See below:

 

= VAR

     MaxDate= MAX('Date Dimension'[Date])

RETURN

 

SUMMARIZECOLUMNS('Accounts Receivable Ledger'[Company],'Accounts Receivable Ledger'[Business Unit],'Accounts Receivable Ledger'[Address Number],'Accounts Receivable Ledger'[Document Unique],'Accounts Receivable Ledger'[Invoice Due Date],'Accounts Receivable Ledger'[Draft- Due Date],FILTER('Accounts Receivable Ledger','Accounts Receivable Ledger'[Transaction Date]<=MaxDate),"Open Amount",SUM('Accounts Receivable Ledger'[Amount - Gross]))

 

By filtering on SUMMARIZECOLUMNS and moving my measures to the new table I saved repeated IN comparisons in each measure and also evened out the spread between Formula and Storage Engines. 

 

By grouping I also reduced the cardinality which helped alot.

 

The big problem with SUMMARIZECOLUMNS is it doesnt work well with calculated columns so I also had to change my table views at the SQL level. Still, performance is much better now.

OwenAuger
Super User
Super User

@dimitrishuk

This is just an idea: in some cases, I have found DATESBETWEEN(...) performs better than a boolean inequality on a Date column, since the boolean version translates to FILTER ( ALL (...),... )

 

For example, you could try this:

Overdue 1000+ :=
VAR MaxDate =
    MAX ( 'Date Dimension'[Date] )
RETURN
    CALCULATE (
        CALCULATE (
            CALCULATE (
                SUM ( 'Accounts Receivable Ledger'[Amount - Gross] ),
                DATESBETWEEN (
                    'Accounts Receivable Ledger'[Draft- Due Date],
                    BLANK (),
                    MaxDate - 1
                )
            ),
            DATESBETWEEN (
                'Accounts Receivable Ledger'[Invoice Due Date],
                BLANK (),
                MaxDate - 1000
            )
        ),
        DATESBETWEEN ( 'Date Dimension'[Date], BLANK (), MaxDate )
    )

Also, if you found performance better without the relationship with 'Date Dimension', could you use an outer CALCULATE with CROSSFILTER(..., None ) to disable the relationship for the purpose of this measure? It's not something I've ever tried so would be interested in whether it makes any difference.

 

On side question, is it necessary for the CALCULATE functions to be nested, effectively applying their filters in sequence? It may make no difference, but you could try a single CALCULATE with the three date filters applied simultaneously?

 

Regards,

Owen


Owen Auger
Did I answer your question? Mark my post as a solution!
Blog
Twitter
LinkedIn
v-huizhn-msft
Employee
Employee

Hi @dimitrishuk, After reviewing your formula posted, I can not figure out a higher efficient way to achieve it until now. I thinks the main cause is your table is huge, so it needs time to calculate? Best Regards, Angelia

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.