cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
jambrose
Frequent Visitor

Improving Summarize Efficiency

I'm trying to pull information from a large (~20 mil records, 48 colmuns) table, and I have a working method, but it's on the verge of what my machine can handle, and so I'm worried it won't scale up as I potentially need to expand the data.  What I currently use is:

 

Summary Table = SUMMARIZE(Big_Table, Big_Table[ID], "First Value",
     MINX(TOPN(1, Big Table, [Sort1], ASC, [Sort2], ASC, [Sort3], ASC), [Value]) )

 

So essentially, from records with the same ID, I need to pull a single value based on the first record after sorting by three different columns.  There are at most around 6000 records under one ID.  The machine can do one of these operations at a time, but if I try to do 2 (for example, pulling both first and last value) in the same Summarize, I get out of memory errors.  The machine has 16gb RAM, and I've checked to confirm it's actually capping out.  I am new to the system, so bear with me if this is a dumb question, but why is this a taxing operation in the first place?  I could understand a long processing time but not the huge memory load compared to the ~3gb size of the file.  Second question is - is there a more efficient way to accomplish what I'm trying to do?  I may need to do this on an even larger set of records, so any improvements that scale up are welcome.  The three sorting columns are a date, an alphanumeric, and an integer, and I would prefer to avoid adding another column unless absolutely necessary.

7 REPLIES 7
Super User IV
Super User IV

Re: Improving Summarize Efficiency

Generally any time you see a MINX it throws up warning flags around performance but in your case, you are only returning a 1 row table to it so it only has to iterate over 1 row. So, my guess is that it is all of the sorts you have going on. I wonder if it might be better for you to pull this in via another query and do what you are doing in DAX in M instead.


---------------------------------------

Not the Power BI thought police...

I have NEW book! 
DAX Cookbook from Packt
Over 120 DAX Recipes!
Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

jambrose
Frequent Visitor

Re: Improving Summarize Efficiency

I'd be open to trying it that way, but I haven't so far been able to find a way to do the same process (pull a top value based on sorting 3 other columns) on the query end of things.  Any ideas?

dedelman_clng Memorable Member
Memorable Member

Re: Improving Summarize Efficiency

(CAVEAT: This is just an observation - I don't have much experience with SUMMARIZE and TOPN)

 

Have you tried doing a FILTER on BigTable inside the TOPN function?  It looks like the TOPN is looking over the whole table, not just the part of the table that has the ID you are looking for.  And if it's anything like SQL, the TOPN function has to do all kinds of distincts and interations in the execution plan, which is probably what is chewing up memory.

 

So something like this:

 

Summary Table =
SUMMARIZE (
    Big_Table,
    Big_Table[ID],
    "First Value", MINX (
        TOPN (
            1,
            FILTER ( Big_Table, Big_Table[ID] = EARLIER ( Big_Table[ID] ) ),            [Sort1], ASC,
            [Sort2], ASC,
            [Sort3], ASC
        ),
        [Value]
    )
)

If you get an error that there is no "EARLIER" to be referenced, try wrapping the MINX function in CALCULATE.

 

 

Hope this helps.

David

jambrose
Frequent Visitor

Re: Improving Summarize Efficiency

I gave this a shot, and it didn't appear to make a significant difference in the memory load or execution time.  My understanding is that having it nested inside the SUMMARIZE implicitly filters in some similar manner anyway.  Or at least, the results I get (from either version) find the same value as desired - the top value out of only the rows with the given ID.  Thanks for taking a stab though!

Microsoft v-huizhn-msft
Microsoft

Re: Improving Summarize Efficiency

Hi @jambrose,

The momery consumption is mainly by your large amount of resource table. While I guess memory load is not only caused by Power BI operation but also other threads in your local computer based on my understanding. Please check your computer's tasks and memory.

After search, there are some article about high performance Power BI and DAX, maybe it's helpful.

Optimizing DAX expressions involving multiple measures.
Performance Tip for Power BI; Enable Load Sucks Memory Up.
Power BI Performance Tips and Techniques

Best Regards,
Angelia

jambrose
Frequent Visitor

Re: Improving Summarize Efficiency

Well, I gave the query side a shot, and was able to accomplish the same task with the following code:

= Table.Group(#"Big_Table", {"ID"}, 
{{"First Value",
each Table.First(Table.Sort(_,
{{"Sort1", Order.Ascending}, {"Sort2", Order.Ascending}, {"Sort3", Order.Ascending}})
)[Value] }})

However, this still runs into memory load errors and fails to run on the full data set.  Not sure what else to try here.  Unless anyone else has ideas, I think I'll try moving to R for the processing steps and then import back to Power BI to generate reports.

 

Microsoft v-huizhn-msft
Microsoft

Re: Improving Summarize Efficiency

Hi @jambrose,

You use the SSMS connect your Power BI model. Then profilie Power BI Desktop Model using SQL Server Profiler using the following steps. More details, please review this article.

Thanks,
Angelia

Helpful resources

Announcements
New Ranks Launched March 24th!

New Ranks Launched March 24th!

The time has come: We are finally able to share more details on the brand-new ranks coming to the Power BI Community!

‘Better Together’ Contest Finalists Announced!

‘Better Together’ Contest Finalists Announced!

Congrats to the finalists of our ‘Better Together’-themed T-shirt design contest! Click for the top entries.

Arun 'Triple A' Event Video, Q&A, and Slides

Arun 'Triple A' Event Video, Q&A, and Slides

Missed the Arun 'Triple A' event or want to revisit it? We've got you covered! Check out the video, Q&A, and slides now.

Community Summit North America

Community Summit North America

Innovate, Collaborate, Grow. The top training and networking event across the globe for Microsoft Business Applications

Top Solution Authors
Top Kudoed Authors