cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
nirmit27
Helper I
Helper I

Merging Sharepoint file with Database EXTREMELY SLOW - reads CRORES of Rows with Close and Apply

Hello,

For required calculations, I am merging a query 6 times with 3 other queries (each query merged twice with different keys) and adding conditional columns after every two merges.

In Power Query Editor it all works pretty nicely. I can see the correct results. But after I click "Close & Apply", I can see it just keeps on running for over 1.5 Hour and loading about whopping 50+ CRORES of Rows and still continuing!! I eventually had to "cancel" the load to fine tune my query merge steps. Tried Table.Buffer but that did not change the behaviour at all. I can see the "View Native Query" is grayed out right from the first step so probably no folding.

 

Questions:

1. Why is it attempting to load more than 10,000 times the number of records in source table (50+ crores vs 50k)? Is it internally loading up the entire table of the right side Query for each row of the left side Query during the merge operation?

2. How can I fix this problem so that my model loads normally as it should (2-3 mins should be okay)

 

Please see steps below: 

  1. Query1 - Append: Combining four other queries each sourcing separate excel file from Sharepoint folder, total 50000 records
  2. Query2 - Merge key1 & Expand 2 columns: Table from Amazon Aurora PostgreSQL database, 70000 records
  3. Query2 - Merge key2 & Expand 2 columns: Table from Amazon Aurora PostgreSQL database, 70000 records
  4. Add conditional column
  5. Query3 - Merge key1 & Expand 2 columns: Sharepoint excel file, 50000 records
  6. Query3 - Merge key2 & Expand 2 columns: Sharepoint excel file, 50000 records
  7. Add conditional column
  8. Query4 - Merge key1 & Expand 2 columns: Sharepoint csv file, 25000 records
  9. Query4 - Merge key2 & Expand 2 columns: Sharepoint csv file, 25000 record
  10. Add conditional columns
  11. Remove other columns

Any guidance would be deeply appreciated, thanks.

Regards, Nirmit

1 ACCEPTED SOLUTION
nirmit27
Helper I
Helper I

Hi @v-polly-msft 

Thanks for response. My connection mode is all "Import". I have actually solved this problem. Explaining it below for everyone's benefit if at all required.

 

The issue was due to data quality and not due to merging different source types. The culprit were "null" values. After detailed investigation, I found out that my both the left and right tables join columns/keys contained "null" values (not defined as primary keys). It may happen data-wise. But when you merge such datasets, each "null" record in Left table will basically fetch all the matching "null" records from Right table (which is how left outer join works anyway), latter may run into thousands or more. This results into cartesian product of resulting number of rows and as a result, inflation of output data. Of course processing will take time depending upon data volume. In my case there were hundreds and thousands of null records in Left and Right table respectively, hence the slowness and huge records count. After I handled null (and duplicates) from both tables, it started working very normally, loading up in 2-3 minutes. Hope it helps.

 

-Nirmit

View solution in original post

2 REPLIES 2
nirmit27
Helper I
Helper I

Hi @v-polly-msft 

Thanks for response. My connection mode is all "Import". I have actually solved this problem. Explaining it below for everyone's benefit if at all required.

 

The issue was due to data quality and not due to merging different source types. The culprit were "null" values. After detailed investigation, I found out that my both the left and right tables join columns/keys contained "null" values (not defined as primary keys). It may happen data-wise. But when you merge such datasets, each "null" record in Left table will basically fetch all the matching "null" records from Right table (which is how left outer join works anyway), latter may run into thousands or more. This results into cartesian product of resulting number of rows and as a result, inflation of output data. Of course processing will take time depending upon data volume. In my case there were hundreds and thousands of null records in Left and Right table respectively, hence the slowness and huge records count. After I handled null (and duplicates) from both tables, it started working very normally, loading up in 2-3 minutes. Hope it helps.

 

-Nirmit

View solution in original post

v-polly-msft
Community Support
Community Support

Hi @nirmit27 ,

What is the connection mode? Please try to use composite models in Power BI Desktop.

Please refer to the following document to see if it helps you.

https://docs.microsoft.com/power-bi/transform-model/desktop-composite-models

 

Best Regards

Community Support Team _ Polly

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
2022 Release Wave 1 760x460.png

2022 Release Wave 1 Plan

Power Platform release plan for the 2022 release wave 1 describes all new features releasing from April 2022 through September 2022.

Power BI December 2021 Update_carousel 768x460.jpg

Check it Out!

Click here to read more about the December 2021 Updates!

Jan 2022 Dev Camp 768x460 copy.png

Power BI Dev Camp- January 27th, 2022

Mark your calendars and join us for our next Power BI Dev Camp!

Top Solution Authors