Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more.
Get startedGrow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.
I'm trying to merge couple of CSV files.
CSV files size are 450MB and 4.6GB containing 570K rows and 14M rows respectivaly.
I'm trying to left outer join the big CSV with the smaller using 3 columns - 2 numbers and 1 text.
When trying to save the changes after the join the laptop hung in "Apply query changes", I waited over 2 hours, memory keep increaseing but wasn't able to save the changes.
Pbix file size prior to the join was 1.1GB.
I using a 32GB memory laptop.
I found a workaround:
Importing the CSV files to RDBMS, running the join via sql and exporting CSV file from the RDBMS is much faster (up to 20 minutes).
I don't want to add another data source and would like to resolve this within the scope of Power BI.
Can you please advice what's the best practice to merge large datasets?
Thanks
Hello Shapiran, It is always better to avoid loading data into your Power BI Desktop. If possible, go for Direct Query Method to handle data(There are some limitations in functions).
The following can be very helpful for you,
https://docs.microsoft.com/en-us/power-bi/power-bi-reports-performance
Regards,
Aswin
Thanks Aswin,
Once the data is in Power BI I get great performance, better than working with RDBMS Direct Query.
That's why I'm seeking for solution only for the merge (join).
On top of that, the join will pass the 1M rows limitation of Direct Query.
Is there is a better solution than simply merge the tables (model change in Power BI, creating another column...)?
Thanks for the performance link, will be very helpful.
Thanks,
Nir
Do you actually need to merge them? I ask because sometimes physical merging may not be needed. You may be able to import both tables into Power Query, then once loaded, create a Left Outer relationship between them in the Relationship view. What's the driving reason for the merge?
This also assumes you do not have null values in the key fields.
Thanks!
--Treb
Data Platform MVP
https://marqueeinsights.com/tag/power-bi/ for my Power BI blog posts
Thanks Treb,
There is a bussiness requirement forcing the join of both CSV files.
Is creating the join via relationship view faster than using merge queires option?
Thanks,
Nir
Yes it should be as everything is being joined in memory. I very rarely combine data using a merge. Mostly I'm doing appends of multiple files into a singular dataset.
Once the relationship is made between the two, you can use columns from both tables easily. It'll show up as two separate datasets in the field well. Is this a one to one or one to many relationship?
Thanks!
--Treb
Hi Treb,
This is one to many relationship.
Thanks,
Nir
Then yes, I'd try importing both and using the relationship between the two to give access to everything.
Lastly, ensure your key fields have a value, otherwise you'll get a many to many relationship.
Hope this helps!
--Treb
Thanks Treb
User | Count |
---|---|
89 | |
85 | |
66 | |
62 | |
57 |
User | Count |
---|---|
138 | |
112 | |
94 | |
82 | |
71 |