Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
jb257
Frequent Visitor

Handling huge amounts of data

Dear Community,

 

I would like to know how you would load the following data into PowerQuery and finally into PowerBI. I am provided with eight different exports that cannot be changed. The column names of the different exports are the same with different values and have to summed up after they have been loaded. Below is an overview of the data.

 

xlsx-filesno. of columnsno. of rows
file 160035 000
........
file 860035 000

 

Unfortunately, the 15-minute time resolution must be retained so that the number of rows is not reduced. The same applies to the number of columns. The Excel files are stored on a SharePoint and are only updated once a year.
The transformation of the files is not a problem in itself, the problem is the unacceptable waiting times when performing transformations such as groub by etc.
My current solution is that I have written a Python script that takes over the transformations from Power Query.
But is there also a solution that I can implement only with the help of PowerQuery and Power BI? Maybe customize the import mode of the Excel files or something similar?

 

I look forward to your thoughts

 

Best regards

 

jb

1 ACCEPTED SOLUTION
lbendlin
Super User
Super User

I ran into something similar recently. The lessons I learned were

- don't try to be cute and use List.Generate or something like that.  Create eight (yes, eight) separate Power Query queries, one for each export. This will yield maximum possible performance (don't forget to set parallelism to 8 or more).

- Use Binary.Buffer when fetching the files from SharePoint. That will tell the SharePoint server that you are impatient and not willing to wait for the chunking.

- Do as few transforms as possible. Maybe change column types, but that's it.

- load all queries into Power BI and do a UNION calculated table there.

View solution in original post

3 REPLIES 3
lbendlin
Super User
Super User

I ran into something similar recently. The lessons I learned were

- don't try to be cute and use List.Generate or something like that.  Create eight (yes, eight) separate Power Query queries, one for each export. This will yield maximum possible performance (don't forget to set parallelism to 8 or more).

- Use Binary.Buffer when fetching the files from SharePoint. That will tell the SharePoint server that you are impatient and not willing to wait for the chunking.

- Do as few transforms as possible. Maybe change column types, but that's it.

- load all queries into Power BI and do a UNION calculated table there.

Wow, those are some interesting and new findings. Generally, you prefer to keep the data model in Power BI small and clear, but obviously the performance of Power BI (Measures and Calculated Tables) seems to be much more effective. Thank you very much!

It is important to note that this is a corner case. Usually you want code that is more concise, easier to maintain, even if it is a little slower.  But in your scenario when brute force power is required you need to compromise on the code maintainability.

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors