Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Merging Sharepoint file with Database EXTREMELY SLOW - reads CRORES of Rows with Close and Apply

Hello,

For required calculations, I am merging a query 6 times with 3 other queries (each query merged twice with different keys) and adding conditional columns after every two merges.

In Power Query Editor it all works pretty nicely. I can see the correct results. But after I click "Close & Apply", I can see it just keeps on running for over 1.5 Hour and loading about whopping 50+ CRORES of Rows and still continuing!! I eventually had to "cancel" the load to fine tune my query merge steps. Tried Table.Buffer but that did not change the behaviour at all. I can see the "View Native Query" is grayed out right from the first step so probably no folding.

 

Questions:

1. Why is it attempting to load more than 10,000 times the number of records in source table (50+ crores vs 50k)? Is it internally loading up the entire table of the right side Query for each row of the left side Query during the merge operation?

2. How can I fix this problem so that my model loads normally as it should (2-3 mins should be okay)

 

Please see steps below: 

  1. Query1 - Append: Combining four other queries each sourcing separate excel file from Sharepoint folder, total 50000 records
  2. Query2 - Merge key1 & Expand 2 columns: Table from Amazon Aurora PostgreSQL database, 70000 records
  3. Query2 - Merge key2 & Expand 2 columns: Table from Amazon Aurora PostgreSQL database, 70000 records
  4. Add conditional column
  5. Query3 - Merge key1 & Expand 2 columns: Sharepoint excel file, 50000 records
  6. Query3 - Merge key2 & Expand 2 columns: Sharepoint excel file, 50000 records
  7. Add conditional column
  8. Query4 - Merge key1 & Expand 2 columns: Sharepoint csv file, 25000 records
  9. Query4 - Merge key2 & Expand 2 columns: Sharepoint csv file, 25000 record
  10. Add conditional columns
  11. Remove other columns

Any guidance would be deeply appreciated, thanks.

Regards, Nirmit

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @v-rongtiep-msft 

Thanks for response. My connection mode is all "Import". I have actually solved this problem. Explaining it below for everyone's benefit if at all required.

 

The issue was due to data quality and not due to merging different source types. The culprit were "null" values. After detailed investigation, I found out that my both the left and right tables join columns/keys contained "null" values (not defined as primary keys). It may happen data-wise. But when you merge such datasets, each "null" record in Left table will basically fetch all the matching "null" records from Right table (which is how left outer join works anyway), latter may run into thousands or more. This results into cartesian product of resulting number of rows and as a result, inflation of output data. Of course processing will take time depending upon data volume. In my case there were hundreds and thousands of null records in Left and Right table respectively, hence the slowness and huge records count. After I handled null (and duplicates) from both tables, it started working very normally, loading up in 2-3 minutes. Hope it helps.

 

-Nirmit

View solution in original post

2 REPLIES 2
Anonymous
Not applicable

Hi @v-rongtiep-msft 

Thanks for response. My connection mode is all "Import". I have actually solved this problem. Explaining it below for everyone's benefit if at all required.

 

The issue was due to data quality and not due to merging different source types. The culprit were "null" values. After detailed investigation, I found out that my both the left and right tables join columns/keys contained "null" values (not defined as primary keys). It may happen data-wise. But when you merge such datasets, each "null" record in Left table will basically fetch all the matching "null" records from Right table (which is how left outer join works anyway), latter may run into thousands or more. This results into cartesian product of resulting number of rows and as a result, inflation of output data. Of course processing will take time depending upon data volume. In my case there were hundreds and thousands of null records in Left and Right table respectively, hence the slowness and huge records count. After I handled null (and duplicates) from both tables, it started working very normally, loading up in 2-3 minutes. Hope it helps.

 

-Nirmit

v-rongtiep-msft
Community Support
Community Support

Hi @Anonymous ,

What is the connection mode? Please try to use composite models in Power BI Desktop.

Please refer to the following document to see if it helps you.

https://docs.microsoft.com/power-bi/transform-model/desktop-composite-models

 

Best Regards

Community Support Team _ Polly

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.