Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Shapiran
Helper I
Helper I

Merge queries is very slow

I'm trying to merge couple of CSV files.

CSV files size are 450MB and 4.6GB containing 570K rows and 14M rows respectivaly.

 

I'm trying to left outer join the big CSV with the smaller using 3 columns - 2 numbers and 1 text.

When trying to save the changes after the join the laptop hung in "Apply query changes", I waited over 2 hours, memory keep increaseing but wasn't able to save the changes.

 

Pbix file size prior to the join was 1.1GB.

I using a 32GB memory laptop.

 

I found a workaround:

Importing the CSV files to RDBMS, running the join via sql and exporting CSV file from the RDBMS is much faster (up to 20 minutes).

 

I don't want to add another data source and would like to resolve this within the scope of Power BI.

 

Can you please advice what's the best practice to merge large datasets?

 

Thanks

8 REPLIES 8
aswin_k
Frequent Visitor

Hello Shapiran, It is always better to avoid loading data into your Power BI Desktop. If possible, go for Direct Query Method to handle data(There are some limitations in functions). 
The following can be very helpful for you,
https://docs.microsoft.com/en-us/power-bi/power-bi-reports-performance
Regards,
Aswin

Thanks Aswin,

 

Once the data is in Power BI I get great performance, better than working with RDBMS Direct Query.

That's why I'm seeking for solution only for the merge (join).

On top of that, the join will pass the 1M rows limitation of Direct Query.

 

Is there is a better solution than simply merge the tables (model change in Power BI, creating another column...)?

 

Thanks for the performance link, will be very helpful.

 

Thanks,

Nir

Do you actually need to merge them? I ask because sometimes physical merging may not be needed. You may be able to import both tables into Power Query, then once loaded, create a Left Outer relationship between them in the Relationship view. What's the driving reason for the merge?

 

This also assumes you do not have null values in the key fields. 

 

Thanks!

--Treb 

 

Data Platform MVP

https://marqueeinsights.com/tag/power-bi/ for my Power BI blog posts

Thanks Treb,

There is a bussiness requirement forcing the join of both CSV files.

Is creating the join via relationship view faster than using merge queires option?

Thanks,

Nir

 

Yes it should be as everything is being joined in memory. I very rarely combine data using a merge. Mostly I'm doing appends of multiple files into a singular dataset.

 

Once the relationship is made between the two, you can use columns from both tables easily. It'll show up as two separate datasets in the field well. Is this a one to one or one to many relationship?

 

Thanks!

--Treb

 

Hi Treb,

 

This is one to many relationship.

 

Thanks,

Nir

Then yes, I'd try importing both and using the relationship between the two to give access to everything. 

 

Lastly, ensure your key fields have a value, otherwise you'll get a many to many relationship.

 

Hope this helps!

--Treb

 

 

Thanks Treb

Helpful resources

Announcements
PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.