Solved: Memory issues with filter

anagnostes · ‎01-05-2017

Hi,

I have a MODEL where I have a Customers table and an Orders Table. I want to add a new mesure in the Customers table where i want to compute the average buying cyle. So i add this DAX mesure:

    BUYING_CYCLE = 
             AVERAGEX(ORDERS; 
                     DATEDIFF( CALCULATE ( 
                         MAX ( 'ORDERS'[ORDER_DATE] ) ;
                             FILTER(ORDERS;  EARLIER('ORDERS'[ORDER_DATE]) >= 'ORDERS'[ORDER_DATE] )
                     ); 'ORDERS'[ORDER_DATE]
                     ; DAY )
             )

This works fine but it's not correct. I should add a second filter like this:

FILTER(ORDERS; EARLIER('ORDERS'[ORDER_DATE]) >= 'ORDERS'[ORDER_DATE] ) && EARLIER('ORDERS'[ID]) <> 'ORDERS'[ID] )

But if I add this second expression the query is really slow and the memory gets to 98 plus %. There aren't a lot of records in the Orders table, like 25000.

Can anybody tell me how can i understand what is happening here?

Thank you.

technolog · ‎08-28-2023

The performance issue you're experiencing arises from the use of the EARLIER function, especially within a FILTER function that processes row-by-row. When you add a second filter condition using EARLIER, the complexity increases, causing the measure to consume more memory and time.

The EARLIER function is often misunderstood and is a common source of inefficiencies in DAX. When used within a row context, it returns the value of a specified column for the "earlier" row context (outer row context). This can lead to nested iterations and exponential growth in the amount of computation.

Here's a breakdown of your problem:

DATEDIFF calculates the difference between the maximum order date and the current row's order date.
The FILTER function then iterates through all rows to get the maximum order date for orders that occurred before or on the current row's order date.
Adding the second condition makes this calculation more complex, as it now has to compare the ID column for each row as well.
To optimize your DAX calculation:

Use Variables: Variables can reduce the number of times a calculation is computed, especially inside a row context.
Revise Logic: Instead of comparing every row with every other row, try to partition the data or use other DAX functions that might be more efficient.
Here's a revised version using variables:

BUYING_CYCLE = 
VAR CurrentOrderDate = 'ORDERS'[ORDER_DATE]
VAR CurrentOrderID = 'ORDERS'[ID]
VAR PreviousOrderDate = 
    CALCULATE(
        MAX('ORDERS'[ORDER_DATE]),
        FILTER(
            ALL('ORDERS'),
            'ORDERS'[ORDER_DATE] < CurrentOrderDate && 'ORDERS'[ID] <> CurrentOrderID
        )
    )
RETURN
    IF(
        ISBLANK(PreviousOrderDate),
        BLANK(),
        DATEDIFF(PreviousOrderDate, CurrentOrderDate, DAY)
    )

____________
Please join the Power BI UX/UI User Group if you need help with dashboard design and usability

Join to Data Governance User Group

Join to DENEB and Power BI Enthusiasts User Group

Join to Data Fabric Best Practices User Group

Subscribe to my medium blog

View solution in original post

technolog · ‎08-28-2023

The performance issue you're experiencing arises from the use of the EARLIER function, especially within a FILTER function that processes row-by-row. When you add a second filter condition using EARLIER, the complexity increases, causing the measure to consume more memory and time.

The EARLIER function is often misunderstood and is a common source of inefficiencies in DAX. When used within a row context, it returns the value of a specified column for the "earlier" row context (outer row context). This can lead to nested iterations and exponential growth in the amount of computation.

Here's a breakdown of your problem:

DATEDIFF calculates the difference between the maximum order date and the current row's order date.
The FILTER function then iterates through all rows to get the maximum order date for orders that occurred before or on the current row's order date.
Adding the second condition makes this calculation more complex, as it now has to compare the ID column for each row as well.
To optimize your DAX calculation:

Use Variables: Variables can reduce the number of times a calculation is computed, especially inside a row context.
Revise Logic: Instead of comparing every row with every other row, try to partition the data or use other DAX functions that might be more efficient.
Here's a revised version using variables:

BUYING_CYCLE = 
VAR CurrentOrderDate = 'ORDERS'[ORDER_DATE]
VAR CurrentOrderID = 'ORDERS'[ID]
VAR PreviousOrderDate = 
    CALCULATE(
        MAX('ORDERS'[ORDER_DATE]),
        FILTER(
            ALL('ORDERS'),
            'ORDERS'[ORDER_DATE] < CurrentOrderDate && 'ORDERS'[ID] <> CurrentOrderID
        )
    )
RETURN
    IF(
        ISBLANK(PreviousOrderDate),
        BLANK(),
        DATEDIFF(PreviousOrderDate, CurrentOrderDate, DAY)
    )

____________
Please join the Power BI UX/UI User Group if you need help with dashboard design and usability

Join to Data Governance User Group

Join to DENEB and Power BI Enthusiasts User Group

Join to Data Fabric Best Practices User Group

Subscribe to my medium blog

majdkaid22 · ‎01-12-2017

@anagnostes I had a similar issue and my table was around the same size of rows.

The problem is with how many columns your table has. You need to try to shape it down and minimize the number of columns or increase the RAM on the pc running the PBI desktop. I upgraded to 16GB Ram and that helped very well.

Memory issues with filter

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024