cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Anonymous
Not applicable

Optimize SUMX / SUMMARIZE Measure

Hi everyone, 

I'm writting here to get some help to optimize the calculation of one of my measure that takes too much time to compute (10 seconds everytime you click on something / expand a row)

 

Let me explain the report : 

Model : 

  • I've managed to get a Star Schema with a Fact Table based with two base measure : Turnover and CountArticle
  • The date granularity is Quarter - Year : it is already aggregated.
  • I have many dimensions but in this example, I only use 4 : 
    • Date
    • Seller Store  : Sells an article
    • Article : Description of an article
    • Customer : Buys an article. It has a dedicated SalesMan and SalesTeam

Model Size : 

About 2 000 000 records in Fact Table.

Dimension are way smaller. (Except Customer : I've loaded the entire dimension : Maybe not optimized)

 

Subject :

  • Sales on article that are sold by my company. 

Slicers :

  • Year
  • Quarter
  • IsLogistic (feature of an article : used to exclude / include logistic products) 

 

Visual 

  • Matrix Visual
    •  Hierarchy
      • Sales Team
      • Sales Man
      • Customer
      • Seller Store
      • Article
    • Values
      • SumTurnover = SUM ( Fact[Turnover] )
      • SumTurnoverLastYear =

        CALCULATE (

            SUM ( Fact[Turnover] ),

            FILTER (

                ALL ( 'Date' ),

                'Date'[Year]

                    MAX ( 'Date'[Year] ) - 1

                    && 'Date'[Quarter] = MAX ( 'Date'[Quarter] )

            )

        )
      • SumCountArticle= SUM ( Fact[CountArticle] )
      • SumCountArticleLastYear =

        CALCULATE (

            SUM ( Fact[CountArticle] ),

            FILTER (

                ALL ( 'Date' ),

                'Date'[Year]

                    MAX ( 'Date'[Year] ) - 1

                    && 'Date'[Quarter] = MAX ( 'Date'[Quarter] )

            )

        )
      •  PriceImpact =
        IF (
            SumTurnover = 0
                || SumTurnoverLastYear = 0
                || SumCountTest = 0
                || SumCountTestLastYear = 0
                || ISBLANK ( SumTurnover )
                || ISBLANK ( SumTurnoverLastYear )
                || ISBLANK ( SumCountTest )
                || ISBLANK ( SumCountTestLastYear ),
            BLANK (),
            (
                DIVIDE ( SumTurnover, SumCountTest )
                    DIVIDE ( SumTurnoverLastYear, SumCountTestLastYear )
            ) * SumCountTestLastYear
        )

And this is where it became truly annoying. I want the Price Impact to be calculated at the Article level, then sum up to get the value at the higher level of the hierarchy.

 

HierarchyValues
Seller StoreArticleTurnoverTurnoverLastYearCountTestCountTestLastYearPrice Impact
SS1A50060023150
SS1B20030033-100
SS1C30020044100
SS2A60060021-300
SS2B200200330

 

Rollup to Seller Store Level (With +/- buttons ) : I want the Price Impact of the article level to be summed up not recalculated

 

HierarchyValuesOK (SUM)KO (Recalculated)
Seller StoreTurnoverTurnoverLastYearCountTestCountTestLastYearPrice ImpactPrice Impact2
SS110001100910150166,6666667
SS280080054-30012000

 

I hope it is clear. 

 

To get this result and because I have (as described before) several level in the hierarchy : I've used the following measure in my Matrix with SUMX and SUM. 

Even if I still do not fully understand under the hood, it is working as expected and giving me the right result. 

SummedPriceImpact =
SUMX (
    SUMMARIZE (
        'Fact',
        'Customer'[Sales Team],
        'Customer'[Sales Man],
        'Customer'[Customer],
        'Store'[Store],
        'Article'[Article]
    ),
    [PriceImpact]
)

 

BUT It takes quite a long time to compute. And it is not acceptable for my customers.

 

I have some thoughts on my measures : 

  • Usage of var instead of repeating measure
  • Optimizing the If Statement
  • Optimizing the final measure with Summarize.

 

I know that Summarize is deprecated and should be replaced by SummarizeColumn but I haven't been able to make the latest function work in my report (because of filter context)

 

It would be great to work together to find a suitable solution for my end users to reduce the calculation time. 
(I haven't been able to share a .pbix : I'm on vacation without access to Power BI (PC at work , MAC at home.. ) )

 

Thank you in advance for your help. 🙂

Romain

1 ACCEPTED SOLUTION
Kucrapok
Helper I
Helper I

Hello, I'm not sure if you have figured out why it was slow, but I think it's probably 
your if statement in your SUMX.

SummedPriceImpact =
SUMX (
    SUMMARIZE (
        'Fact',
        'Customer'[Sales Team],
        'Customer'[Sales Man],
        'Customer'[Customer],
        'Store'[Store],
        'Article'[Article]
    ),
    [PriceImpact]
)

I see 2 things :
1) [PriceImpact] triggers Context transition for each iteration which is bad with a Big Fact table
2) You should always avoid using IF statement inside an Iterator like SUMX. The use of IF statement will require a call to the Formula Engine for each iteration and the iteration is done by the Vertipaq engine.

One solution to speed up the process would be to precalculated your [PriceImpac] measure inside a Calculated column in your Article Dimension. It will be stored in memory.

Maybe I did not understand your problem very well and this won't work for you.

Also 2M rows is nothing for a Tabular model. You should be able to have a speed < 1sec like the other person said earlier.

View solution in original post

9 REPLIES 9
florinbarnea
Frequent Visitor

Hi, Have you found a solution to this problem? How is your performance? I have a very similar measure on a 1.4m rows fact table and if I plot it against months it takes 10-15 seconds for the visual to load. 

Kucrapok
Helper I
Helper I

Hello, I'm not sure if you have figured out why it was slow, but I think it's probably 
your if statement in your SUMX.

SummedPriceImpact =
SUMX (
    SUMMARIZE (
        'Fact',
        'Customer'[Sales Team],
        'Customer'[Sales Man],
        'Customer'[Customer],
        'Store'[Store],
        'Article'[Article]
    ),
    [PriceImpact]
)

I see 2 things :
1) [PriceImpact] triggers Context transition for each iteration which is bad with a Big Fact table
2) You should always avoid using IF statement inside an Iterator like SUMX. The use of IF statement will require a call to the Formula Engine for each iteration and the iteration is done by the Vertipaq engine.

One solution to speed up the process would be to precalculated your [PriceImpac] measure inside a Calculated column in your Article Dimension. It will be stored in memory.

Maybe I did not understand your problem very well and this won't work for you.

Also 2M rows is nothing for a Tabular model. You should be able to have a speed < 1sec like the other person said earlier.

Anonymous
Not applicable


@Anonymous wrote:

Hi everyone, 

I'm writting here to get some help to optimize the calculation of one of my measure that takes too much time to compute (10 seconds everytime you click on something / expand a row)

 

Let me explain the report : 

Model : 

  • I've managed to get a Star Schema with a Fact Table based with two base measure : Turnover and CountArticle
  • The date granularity is Quarter - Year : it is already aggregated.
  • I have many dimensions but in this example, I only use 4 : 
    • Date
    • Seller Store  : Sells an article
    • Article : Description of an article
    • Customer : Buys an article. It has a dedicated SalesMan and SalesTeam

Model Size : 

About 2 000 000 records in Fact Table.

Dimension are way smaller. (Except Customer : I've loaded the entire dimension : Maybe not optimized)

 

Subject :

  • Sales on article that are sold by my company. 

Slicers :

  • Year
  • Quarter
  • IsLogistic (feature of an article : used to exclude / include logistic products) 

 

Visual 

  • Matrix Visual
    •  Hierarchy
      • Sales Team
      • Sales Man
      • Customer
      • Seller Store
      • Article
    • Values
      • SumTurnover = SUM ( Fact[Turnover] )
      • SumTurnoverLastYear =
        var __year = selectedvalue( 'Date'[Year] )
        var __quarter = selectedvalue( 'Date'[Quarter] )
        var __result =
        CALCULATE (
            [SumTurnover],
            'Date'[Year] = __year - 1,
            'Date'[Quarter] = __quarter,
        ALL( 'Date' )
        )
        return __result
      • SumCountArticle= SUM ( Fact[CountArticle] )
      • SumCountArticleLastYear =
        var __year = selectedvalue( 'Date'[Year] )
        var __quarter = selectedvalue( 'Date'[Quarter] )
        var __result =
        CALCULATE (
            [SumCountArticle],
            'Date'[Year] = __year - 1,
        'Date'[Quarter] = __quarter,
        ALL( 'Date' )
        )
        return __result
      • PriceImpact =
        sumx(
        Article,
        var __sumTurnover = [SumTurnover]
        var __sumCountTest = [SumCountTest]
        var __sumTurnoverLY = [SumTurnoverLastYear]
        var __sumCountTestLY = [SumCountTestLastYear]
        var __result = (
        DIVIDE ( __sumTurnover, __sumCountTest )
             DIVIDE ( __sumTurnoverLY, __sumCountTestLY )
            ) * __sumCountTestLY
        var __shouldCalc =
             _sumTurnover <> 0
                && _sumTurnoverLY <> 0
                && _sumCountTest] <> 0
                && _sumCountTestLY] <> 0
        return
        IF( __shouldCalc, __result )
        )

Try it and see what you get...

 

Best

D

Anonymous
Not applicable

You can also try to optimize the iteration in SUMX by doing this...

var __articles =
    calculatetable(
        Article,
        Fact
    )
return
sumx(
    __articles,
    ... // rest as in my prev post
)

Best

D

Anonymous
Not applicable

Hi Everyone, 

 

I'll try all of your tips. 

Unfortunately I think that SummarizeColumn cannot be used in Measure (as it is written in The Definitive Guide To DAX) 

 

I will investigate the others parts of the Measure.

 

Thanks

 

UlfBergqvist
Responsive Resident
Responsive Resident

Have you tried writing like this?

SUMMARIZECOLUMNS (
        'Customer'[Sales Team],
        'Customer'[Sales Man],
        'Customer'[Customer],
        'Store'[Store],
        'Article'[Article],
'Fact'
    )


Using variables instead of repeating could actually help quite a lot. But it could sometimes be optimized in the DAX engine and then it doesn't help at all. In earlier versions of Analysis Services I have seen huge savings by not repeating measures.

 

One thing I noticed is that even FILTERs can be stored in variables, like this:

 

    Var LastYearFilter = FILTER (
        ALL ( 'Date' ),
        'Date'[Year]
            MAX ( 'Date'[Year] ) - 1
            && 'Date'[Quarter] = MAX ( 'Date'[Quarter] )
    )

 

Then you can use that in your both measures for last year (if you move them into a single measure). But I don't know if it has any noticable impact on performance...

 

IF statements can sometimes be really bad for performance. But I don't know in this case how to optimize it.

 

I have done similar calculations myself a few times and since it's a complex problem it will always take some time to calculate. But 10 seconds sounds a lot for 2 million rows.

Anonymous
Not applicable

summarizecolumns CAN'T BE USED IN MEASURES. It's a function for querying only.

Best
D
Anonymous
Not applicable

I think 10 sec is fine as your fact table have 2 million records.

Measure calculation is also based on how many columns along with the measure your are displaying in visual.

If you use single column and measure it will give you result in 1 2 sec.

Thanks
Pravin
Anonymous
Not applicable

@Anonymous,

 

10 sec for 2 million rows? Mate, are you joking?.... I've got 100s of millions in my models and I get speed under 1 sec for similar simple measures. Of course, this is dependent on how powerful box you have but for 2 million rows in a fact table I'd really expect the speed of simple measures to be under 1 sec.

 

Best
D

Helpful resources

Announcements
Carousel_PBI_Wave1

2023 Release Wave 1 Plans

Power BI release plans for 2023 release wave 1 describes all new features releasing from April 2023 through September 2023.

Power BI Summit Carousel 2

Global Power BI Training

Make sure you register today for the Power BI Summit 2023. Don't miss all of the great sessions and speakers!

Thank you 2022 Review

2022 Monthly Feature Releases

We had a great 2022 with a ton of feature releases to help you drive a data culture.

Top Solution Authors
Top Kudoed Authors