Re: Improve performance of query involving Summari...

raghun · ‎04-22-2019

Hi,

I have a query like this that computes an intermediate metric:
Success := COUNTROWS( FILTER(SUMMARIZE(Table, Table[id], "TotalRows", SUM(Table[nRows]), "SuccessfulRows", SUM(Table[Successes])), [TotalRows] = [SuccessfulRows]))

Esentially, for a given id, I want to mark it as successful if number of rows per id is equal to the number of successful rows for that id.
Since I'm using summarize, the query is scanning the entire table for each id and computing the metric. My table has more than 18M rows and this single computation is taking more than 30secs to execute.

Is there a way to optimize this ? Thanks for the help.

v-shex-msft · ‎04-23-2019

HI @raghun ,

Maybe you can consider add two columns to your table to stored summary result, then write formula to compare with these fields.
It should increase calculation speed for measure but it will also increase memory usage of data model.

BTW, you can also try to use GROUPBY function to instead SUMMARIZE function:

Nested grouping using GROUPBY vs SUMMARIZE

Please understand that this link is provided with no warranties or guarantees of content changes, and confers no rights.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

ChrisMendoza · ‎04-22-2019

@raghun -

Maybe you could use DaxStudio to help identify your bottlenecks https://daxstudio.org/documentation/videos/.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

Proud to be a Super User!

raghun · ‎04-23-2019

I already did that this is what I see but couldn't figure out where I can optimize.

Line	Records	Physical Query Plan
1		AddColumns: IterPhyOp LogOp=AddColumns IterCols(0)(''[SuccessfulCalls])
2		    SingletonTable: IterPhyOp LogOp=AddColumns
3	1	    SpoolLookup: LookupPhyOp LogOp=CountRows Integer #Records=1 #KeyCols=0 #ValueCols=1 DominantValue=BLANK
4	1	        AggregationSpool<Count>: SpoolPhyOp #Records=1
5		            Filter: IterPhyOp LogOp=Filter IterCols(0, 1, 2)('Table'[Id], ''[TotalRows], ''[SuccessfulRows])
6		                Extend_Lookup: IterPhyOp LogOp=EqualTo IterCols(1, 2)(''[TotalRows], ''[SuccessfulRows])
7		                    AddColumns: IterPhyOp LogOp=AddColumns IterCols(0, 1, 2)('Table'[Id], ''[TotalRows], ''[SuccessfulRows])
8	21183022	                        Spool_Iterator<SpoolIterator>: IterPhyOp LogOp=GroupBy_Vertipaq IterCols(0)('Table'[Id]) #Records=21183022 #KeyCols=569 #ValueCols=0
9	21183022	                            ProjectionSpool<ProjectFusion<>>: SpoolPhyOp #Records=21183022
10		                                Cache: IterPhyOp #FieldCols=1 #ValueCols=0
11	21183022	                        SpoolLookup: LookupPhyOp LogOp=Sum_Vertipaq LookupCols(0)('Table'[Id]) Integer #Records=21183022 #KeyCols=569 #ValueCols=2 DominantValue=BLANK
12	21183022	                            ProjectionSpool<ProjectFusion<Copy, Copy>>: SpoolPhyOp #Records=21183022
13		                                Cache: IterPhyOp #FieldCols=1 #ValueCols=2
14	21183022	                        SpoolLookup: LookupPhyOp LogOp=Sum_Vertipaq LookupCols(0)('Table'[Id]) Integer #Records=21183022 #KeyCols=569 #ValueCols=2 DominantValue=BLANK
15	21183022	                            ProjectionSpool<ProjectFusion<Copy, Copy>>: SpoolPhyOp #Records=21183022
16		                                Cache: IterPhyOp #FieldCols=1 #ValueCols=2
17		                    EqualTo: LookupPhyOp LogOp=EqualTo LookupCols(1, 2)(''[TotalRows], ''[SuccessfulRows]) Boolean
18		                        ColValue<''[TotalRows]>: LookupPhyOp LogOp=ColValue<''[TotalRows]>''[TotalRows] LookupCols(1)(''[TotalRows]) Integer
19		                        ColValue<''[SuccessfulRows]>: LookupPhyOp LogOp=ColValue<''[SuccessfulRows]>''[SuccessfulRows] LookupCols(2)(''[SuccessfulRows]) Integer

ChrisMendoza · ‎04-23-2019

@raghun -

To be honest I don't know much about improving performance. A resource I use though is the sqlbi.com guys.

https://www.sqlbi.com/articles/best-practices-using-summarize-and-addcolumns/ presents some best practices, have you tried to implement the techniques in your project?

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

Proud to be a Super User!

Improve performance of query involving Summarize

Helpful resources

Power BI Monthly Update - April 2024

How to Get Your Question Answered Quickly