cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
ryan-gao
Helper III
Helper III

Is there sth wrong with summarize function?

I Reproduce the scenario in http://www.sqlbi.com/articles/all-the-secrets-of-summarize/, but  the answer is different.

QQ截图20160923103339.jpg

anyone can explain it?

 

change the first product from bike to shirt, the answer is different:

QQ截图20160923153224.jpg 

 

 

1 ACCEPTED SOLUTION

Well, the Power Pivot engine in Excel 2016 is not the same as the one used by Excel 2013 (when Marco Russo wrote his article).

 

I have made a few tests, and here are my conclusions: this is a bug.

 

The long version is the behaviour no longer works as Marco Russo explained in his article. There is still an implicit filter but only on the first column in the table (at the time of creation).

 

If you create a new table with columns listed in this order: Quantity, Product, Color, Amount, like this:

 

QuantityProductColorAmount
1ShirtRed100
2ShirtRed200
2ShirtGreen200
3BikeGreen300
3ShirtGreen300
3BikeBlue300
4ShirtBlue400

 

Your query should now return the following:

Color allcolorsales

Red     500

Green 1300

Blue    1300

 

Note that, according to my observations, what is relevant is the columns order when the table was created. Re-ordering the columns afterwards does not seem to change the results of the query. This is, in my opinion, a bug.

 

View solution in original post

7 REPLIES 7
LaurentCouartou
Solution Supplier
Solution Supplier

When Marco Russo wrote his article, the value returned (500) was:

Total of Amount for all rows with
Product = "Bike" and Quantity = 1 and Amount = 100
or
Product = "Shirt" and Quantity = 2 and Amount = 200

1200 is the total of Amount rows for all rows where Product = "Shirt".

The implicit filter seems to ignore the columns Quantity and Price.

 

When Marco Russo wrote his article, the value returned (500) was:

Total of Amount for all rows with
Product = "Bike" and Quantity = 1 and Amount = 100
or
Product = "Shirt" and Quantity = 2 and Amount = 200

1200 is the total of Amount rows for all rows where Product = "Shirt".

The implicit filter seems to ignore the columns Quantity and Price.

 

Do you use Excel 2016?

yes,Laurent, I did it in excel2016 and also Power BI, the answer is equal.

Does  different version of the tool make  different answer?

Well, the Power Pivot engine in Excel 2016 is not the same as the one used by Excel 2013 (when Marco Russo wrote his article).

 

I have made a few tests, and here are my conclusions: this is a bug.

 

The long version is the behaviour no longer works as Marco Russo explained in his article. There is still an implicit filter but only on the first column in the table (at the time of creation).

 

If you create a new table with columns listed in this order: Quantity, Product, Color, Amount, like this:

 

QuantityProductColorAmount
1ShirtRed100
2ShirtRed200
2ShirtGreen200
3BikeGreen300
3ShirtGreen300
3BikeBlue300
4ShirtBlue400

 

Your query should now return the following:

Color allcolorsales

Red     500

Green 1300

Blue    1300

 

Note that, according to my observations, what is relevant is the columns order when the table was created. Re-ordering the columns afterwards does not seem to change the results of the query. This is, in my opinion, a bug.

 

Hi Laurent,

I wish it was a bug, unfortunately it is only extremely hard to understand, starting by myself! It took me a while to find the correct answer 🙂

 

The behavior is indeed different in the 2016 version but, since I wrote that article, I learned a bit more about DAX, and we've been able to understand the topic of arbitrarily shaped sets. This lead me to define the problem in an easier way.

 

In reality, the problem of SUMMARIZE is nothing but the creation of arbitrarily shaped sets that are destroyed as a consequence of context transition. In fact, it is worth remembering that SUMMARIZE is the only function, AFAIK, that generates both a row context and a filter context. The row context is seldom used, still it is there. Thus, as soon as you call CALCULATE, you force context transition, and this is the source of the error (or... let us call it a "feature").

 

Why is this relevant? You can see the effect with a simpler table. The following one is a variation of the one used in the post. Please note that I removed the numeric columns and - in the last line - I put a Bike instead of a Shirt (this is VERY important, otherwise the problem does not show up; the previous dataset was too simple). Please note that there is NO Blue Shirt.

 

f1.png

 

On this simpler model, you can run a variation of the original query:

 

EVALUATE
SUMMARIZE (
    Sales,
    Sales[Color],
    "Sales", COUNTROWS ( Sales ),
    "AllColorSales", 
    CALCULATE (
        COUNTROWS ( Sales ),  
        ALL ( Sales[Color] )
     ),
    "AllSales", 
    CALCULATE (
        COUNTROWS ( Sales ),  
        ALL ( Sales )
    )
)

 

I replaced SUM with COUNTROWS, because now I have a simpler model. The result, as you might guess, is wrong:

 

f2.png

The last line, showing 4, is wrong. Why that? Because when you are on the BLUE row the filter context contains ( BLUE, BIKE ) and it is transformed into (BIKE), which contains 4 rows. When you are on Green, on the other hand, the filter context contains ( GREEN, [ Bike || Shirt ] ) and, after context transition, it becomes [ Bike || Shirt ], showing the correct result of 7. Strictly speaking, (BLUE, BIKE) is not an arbitrarily shaped set, it is so because it is coming as a selection of SUMMARIZE that creates the relation (BLUE, BIKE). In fact, nowhere you selected BIKE, it went into the filter context because of the filter generated by SUMMARIZE.

 

Your theory of a bug is not correct. In fact, if you replace Color with Product in the previous query, you will obtain - still - an incorrect result, this time for the Shirt row. Here is the code: 

 

EVALUATE
SUMMARIZE (
    Sales,
    Sales[Product],
    "Sales", COUNTROWS ( Sales ),
    "AllProductSales", 
    CALCULATE (
        COUNTROWS ( Sales ),  
        ALL ( Sales[Product] )
     ),
    "AllSales", 
    CALCULATE (
        COUNTROWS ( Sales ),  
        ALL ( Sales )
    )
)

All this is not to say you were wrong, of course. It took me a while to find an easy dataset that shows the behavior of SUMMARIZE and, in doing that, I learned a bit more. In fact, in the previous post, I was using a much more complex dataset, mainly because I was not fully understanding it. I am extremely grateful that you pointed out the problems with the old blog post about SUMMARIZE, because you forced me to come back to a topic I heartily hate (SUMMARIZE) and understand it better.

 

My hint - anyway - is always the same: avoid SUMMARIZE to compute anything. Use it to perform grouping by, but avoid adding columns in SUMMARIZE. It generates arbitrarily shaped sets during its partitioning of the table; and arbitrarily shaped sets are soooo hard to grasp when you use them with a CALCULATE that overrides some of their columns. I always go crazy trying to understand what is happening under the cover.

 

I guess, now, I will need to update that article... but, first, I should understand better what changed between 2016 and 2014, this requires some more time. I just wanted to write this short answer before digging more into these details.

Alberto Ferrari - SQLBI

Hi Alberto.

First, let me apologize for attributing the article to Marco.

The behavior you describe is indeed a feature. What I was referring to as a bug, is the fact that the result of the query changes depending on the order of the columns in the table, when the table was created. This buggy behavior cannot be seen in the simplified example you just provided, because it only has 2 columns.

To clarify my point: Only the leftmost column will be added to the shaped set. Furthermore, which column is the leftmost one is set when the table is added to the model. Reordering columns will not modify the results afterwards.

Here are the steps to reproduce the bug, in Power BI Desktop

 

Create a new PBI file

Add a new query and call it Sales

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcsrMTlXSUQpKTQGSTkqxOtFKwRmZRSU4xNyLUlPz4KJQzTBBR8JKnXJKU+EqMcRiAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Product = _t, Color = _t, Model = _t]),
    #"Type modifié" = Table.TransformColumnTypes(Source,{{"Product", type text}, {"Color", type text}, {"Model", type text}})
in
    #"Type modifié"

Add a new query and call it Sales2

let
    Source = Sales,
    #"Colonnes triées" = Table.ReorderColumns(Source,{"Model", "Product", "Color"})
in #"Colonnes triées"

The second query only reorders the columns from the first one.

Now, back to the model, we should have two tables : Sales and Sales2.

 

Let us add a two calculated tables: TestSales and TestSales2.

TestSales = SUMMARIZE ( Sales,
    Sales[Color],
    "Sales", COUNTROWS( Sales ),
    "AllColorSales", 
    CALCULATE (
        COUNTROWS ( Sales ), 
        ALL ( Sales[Color] )
     ),
    "AllSales", 
    CALCULATE (
        COUNTROWS ( Sales ),  
        ALL ( Sales )
    )
)

TestSales2 = SUMMARIZE ( Sales2, Sales2[Color], "Sales2", COUNTROWS( Sales2 ), "AllColorSales2", CALCULATE ( COUNTROWS ( Sales2 ), ALL ( Sales2[Color] ) ), "AllSales2", CALCULATE ( COUNTROWS ( Sales2 ), ALL ( Sales2 ) ) )

 

Both calculated tables should return the same result. (left: TestSales, right: TestSales2)

 

TestSales.pngTestSales2.png

 

 

 

 

 

 

jk

 

 

 

 

 

 

 

 

 

 

 

 

Thanks  for you detailed description Laurent,I Totally agree with you.

Hope @marcorusso can see this

Helpful resources

Announcements
November 2022 Update

Check it Out!

Click here to read more about the November 2022 updates!

Microsoft 365 Conference â__ December 6-8, 2022

Microsoft 365 Conference - 06-08 December

Join us in Las Vegas to experience community, incredible learning opportunities, and connections that will help grow skills, know-how, and more.

Top Solution Authors