Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
FireFighter1017
Advocate II
Advocate II

Merge then aggregate takes a lot longer than merge, expand then group by

Hi,

 

I've been struggling with the merge tables and aggregate feature.

It takes a lot of time and memory.  Sometimes, it takes a file that is only a few KB and when aggregating, it turns into a GB storage.

While when I merge, expand, then group by, I know the results will be exactly what I expect and it takes seconds to execute both steps.

Am I missing somthing or is it normal for Power Query to be so ressource hungry when it comes to merge and aggregate?

 

Thanks

 

5 REPLIES 5
watkinnc
Super User
Super User

Hey there. The main man Chris Webb talks about this. Let’s say you have your smaller table, and you nested join it to your bigger table. After the merge, but BEFORE expanding the table column, select the column in the current table with unique values and right click-select “Remove Duplicates”. Then expand your table column with your aggregations. This stops all of the unnecessary table scans. Try it out!—Nate

I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Hi watkinnc

 

Thanks for your reply.

I have read Chris Webb's blog and tried everything he says in there and I couldn't get it to perform as expected.

I'm dealing with one-to-many relations, where the lhs table has only unique values in the key field.

 

Have I said already that this is unacceptable in 2020 to have such impediments in a market leading product which can be surpassed by Excel's Pivot tables?  I bet even a VbA macro would perform better!  😋

 

So to make it clear for everyone, I'm not looking for a workaround, I'm hoping to get a fix from Microsoft. 

I would even help with the testing if I could only find the time to rebuild the dataset that was showing these awfull performance issues.

 

But thanks for passing by! 😎

 

dax
Community Support
Community Support

Hi @FireFighter1017 , 

If you want to improve performance, you could try @edhans 's suggestions to use table.buffer, you also could refer to speedperformance-aspects  and improving-the-performance-of-aggregation-after-a-merge-in-power-bi-and-excel-power-query-gettransfor...  for details.

In addition, if you also could try to create relationship and use measures to see whether you could achieve your goal without merging.

Best Regards,
Zoe Zhi

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

edhans
Super User
Super User

It depends on your data. I've merged and aggregated millions of records quickly, and I've see it take forever. Depending on what you are doing, sometimes wrapping a step before the aggregation with Table.Buffer() will help. Again, depends on the data and what other transformations your queries are doing. 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

PowerBi is supposed to be a self-service tool.

R or Python can do a better job, performance wise, than PowerBI in terms of data management.

 

I'm sorry EdHans, but I have tried your solution in multiple occasions and sometimes it works, and sometimes PowerBI si just not able to deal with his own complexity.

 

It is a possible workaround.  Users shouldn't have to use undocumented functions like Table.Buffer().

I don't know what it implies, I don't know what are the consequences of using this function.  If Microsoft finds it acceptable to use this function before aggregating merged tables, then they should implement it as implicit.

 

As for adding primary keys to tables, (DAX reply) it did not improve performance for me.  As for removing duplicates, I had already made the join field unique so there was no improvement doing that either.

 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors