Solved: Re: Surprising difference in performance between D...

kevhav · ‎01-06-2020

I created a measure like this...

Version 1 =
    CALCULATE(
        DISTINCTCOUNT('MyTable'[ProductId]),
        FILTER(
            'MyTable',
            'MyTable'[Quantity] > 0
        )
    )

...but it was super-slow.

I changed it to this...

Version 2a =
    CALCULATE(
        DISTINCTCOUNT('MyTable'[ProductId]),
        'MyTable'[Quantity] > 0
    )

...and then it was fast!

I understand that Version 2a is "syntax sugar" for the following:

Version 2b =
    CALCULATE(
        DISTINCTCOUNT('MyTable'[ProductId]),
        FILTER(
            ALL('MyTable'[Quantity]),
            'MyTable'[Quantity] > 0
        )
    )

I found and read these articles:

https://exceleratorbi.com.au/simple-filters-and-syntax-sugar-in-dax/

https://powerpivotpro.com/2010/04/quick-tip-dont-over-use-filter/

However, I still don't really understand why Version 1 is super-slow, and Version 2 is fast.

Is it because in Version 2, "the ALL() function returns a table that contains all the distinct values in a column" ... which allows ... "the engine to iterate over the compressed version of the column?" (But if so...it's true that in Version 1 my first argument to FILTER is the entire table. But my second argument is an operator on a single column. I would've thought that even with Version 1 of my measure, the engine should be "smart enough" to take advantage of column store+compression in the same way. But no? Why not?)

Is it because Version 1 "creates a brand-new table in memory," whereas Version 2 does not? Why? With Version 2, does it treat my FILTER argument "much the same as if that filter came from the pivot," which does not require iterating over every row of 'MyTable'...whereas Version 1 does require iterating over every row of 'MyTable'? Why?

What is happening "under the covers" with Version 1 that is different from Version 2? Can someone please explain it in a way that I can better understand? Thank you!

kevhav · ‎01-08-2020

Hi @irobba, okay, that makes more sense!

You said "only use the FILTER function if you need it" and "FILTER would be most useful when" ... however, I'd like to think of my "Version 1" and "Version 2" as both using the FILTER function. Right? Because Version 2b includes FILTER, and it is equivalent to Version 2a.

So what is the fundamental difference between my Version 1 and my Version 2b?

I did some more searching and I found this article:

https://exceleratorbi.com.au/the-filter-function-in-dax-part-2/

That's the explanation I needed all along! If I interpret that correctly, the main difference is simply the size of the table over which FILTER is iterating.

If the first argument to FILTER is 'MyTable', then it is forced to iterate over each row of 'MyTable'.

If the first argument to FILTER is ALL('MyTable'[Quantity]), then it still iterates over a table. But it is a much smaller table: it is "all the unique values of 'MyTable'[Quantity], after removing all the filters affecting the visual." So 'MyTable' has millions of rows; but 'MyTable'[Quantity] only has a few thousand unique values.

I read further into @MattAllington's article about lineage/virtual tables. It explains how "virtual tables" are created; how virtual tables have a relationship back to the original tables from which they were created; and how filters can propagate from those virtual tables back to the original tables.

So using my Version 2b example:

The first argument to FILTER is ALL('MyTable'[Quantity]), which returns a table consisting of the unique values of [Quantity] (a few thousand rows). This become s a virtual table.
This "virtual table can be considered to have a virtual relationship back to the 'MySales'[Quantity] column from where it was born"
FILTER(ALL('MyTable',[Quantity]), 'MyTable'[Quantity] > 0) then iterates over the virtual table, and filters the virtual table to include only values greater than zero. (Iterating over this virtual table is relatively fast, because the virtual table only has a few thousand rows, as opposed to millions of rows.)
Because of the "virtual relationship," the filter on the virtual table propagates to 'MyTable', which filters 'MyTable' (similar to any other filter coming from the visual's filter context).

It was also interesting to learn that instead of ALL('MyTable'[Quantity]), you could also use VALUES('MyTable'[Quantity]), if you want the virtual table to retain any filters coming from the visual.

View solution in original post

irobba · ‎01-08-2020

Hi there kevhav,

You've almost answered your own question with the articles you've linked but in general, the simpler the better and only use the FILTER function if you need it - which would typically be because you need to filter your table results outside of the natural context in which you are using your measure.

So as you say, FILTER is creating a new table of results, it's doing this before then applying whatever other filters are passed on to it from the context in which you use the measure so, if you were to use this in a matrix for instance, it would need to create that table for each row (and column potentially) in that matrix.

With version 2a, its much simpler, you are simply adding your additional condition (quantity > 0) to the existing inherited conditions, so just like adding an and clause to the conditions that are already there rather than breaking in to 2 separate stages of filtering.

FILTER would be most useful when your condition needs to compare against a value that would be outside the "normal" context. So in this case where you are counting products, lets say you wanted to filter to only products that cost more than average, taking the average across the whole data set. You would use filter here because you want to filter the products above average first ignoring the current row context, then count the remaining products taking in to account the current row context.

In actual fact with this example, if you tried to write something like the following it would tell you cannot use AVERAGE in this way, and you would need to use the FILTER function to achieve this.

CALCULATE(
        DISTINCTCOUNT('MyTable'[ProductId]),
        'MyTable'[Price] > AVERAGE('MyTable'[Price])
    )

I hope that makes a little sense!

kevhav · ‎01-08-2020

Hi @irobba, okay, that makes more sense!

You said "only use the FILTER function if you need it" and "FILTER would be most useful when" ... however, I'd like to think of my "Version 1" and "Version 2" as both using the FILTER function. Right? Because Version 2b includes FILTER, and it is equivalent to Version 2a.

So what is the fundamental difference between my Version 1 and my Version 2b?

I did some more searching and I found this article:

https://exceleratorbi.com.au/the-filter-function-in-dax-part-2/

That's the explanation I needed all along! If I interpret that correctly, the main difference is simply the size of the table over which FILTER is iterating.

If the first argument to FILTER is 'MyTable', then it is forced to iterate over each row of 'MyTable'.

If the first argument to FILTER is ALL('MyTable'[Quantity]), then it still iterates over a table. But it is a much smaller table: it is "all the unique values of 'MyTable'[Quantity], after removing all the filters affecting the visual." So 'MyTable' has millions of rows; but 'MyTable'[Quantity] only has a few thousand unique values.

I read further into @MattAllington's article about lineage/virtual tables. It explains how "virtual tables" are created; how virtual tables have a relationship back to the original tables from which they were created; and how filters can propagate from those virtual tables back to the original tables.

So using my Version 2b example:

The first argument to FILTER is ALL('MyTable'[Quantity]), which returns a table consisting of the unique values of [Quantity] (a few thousand rows). This become s a virtual table.
This "virtual table can be considered to have a virtual relationship back to the 'MySales'[Quantity] column from where it was born"
FILTER(ALL('MyTable',[Quantity]), 'MyTable'[Quantity] > 0) then iterates over the virtual table, and filters the virtual table to include only values greater than zero. (Iterating over this virtual table is relatively fast, because the virtual table only has a few thousand rows, as opposed to millions of rows.)
Because of the "virtual relationship," the filter on the virtual table propagates to 'MyTable', which filters 'MyTable' (similar to any other filter coming from the visual's filter context).

It was also interesting to learn that instead of ALL('MyTable'[Quantity]), you could also use VALUES('MyTable'[Quantity]), if you want the virtual table to retain any filters coming from the visual.

kevhav · ‎01-07-2020

...So that I can understand how to better architect my model & measures, going forward. Any insight from the community would be appreciated. Thanks!

v-yuta-msft · ‎01-07-2020

@kevhav ,

The underlying dax code is different between the two kinds of expressions. You can use Performance Analyzer to view the underlying code. Please refer to doc below:

https://docs.microsoft.com/en-us/power-bi/desktop-performance-analyzer

Community Support Team _ Jimmy Tao

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

kevhav · ‎01-08-2020

Hi @v-yuta-msft, thanks for your reply. I tried using "Performance Analyzer" and copying the DAX query using my "Version 1" and "Version 2" measures. I looked at them, and actually they were both exactly the same. Except for my measure definition. Each query used the measure in exactly the same way, and I couldn't tell how the "underlying DAX code is different."

Surprising difference in performance between DAX measures - why?!

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly