cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
aseagull
Helper II
Helper II

Why does Power BI "Load" much more data (volume) than what's in my source file?

I'm trying to pin down some performance issues in Power Query. One of the quirks I've noticed is that even though my largest file is only 60k records and 10 columns (mostly text in a csv), and sits at under 5MB on disk, when either PQ or PBI is referencing it, the system reports loading MORE THAN 500 MB of data from the file. I see this whether it's in the lower right corner of PQ, or when PBI is loading the model, or even in PQ when I open a filter and click "Load More".

 

Fortunately, I see in Dax Studio / Vertipaq that the loaded data is not that large in memory.

 

Is this something to worry about, or to ignore? 

 

Thanks,

Amon

1 ACCEPTED SOLUTION

Hey @aseagull ,

 

I wouldn't call it a mistake, it's the price you have to pay for sourcing files, and not a relational database.

 

On the one hand you can easily load and transform (implementing your business logic) your data, on the other hand sourcing files comes with the price that this datasource type does not support query folding.

 

Chris Webb has written a lot about optimizing query performance using files as a data source. Consider the following link as a start: Chris Webb's BI Blog: Comparing The Performance Of CSV And Excel Data Sources In Power Query Chris W...

 

You might also considering using the M function Table.Buffer Table.Buffer - PowerQuery M | Microsoft Docs This can reduce the number of re-reading the base file.

 

Next to that, you also have to consider where your files are located, either on a file server or inside a SharePoint library. A file server requires an on-premises gateway if you want to automate the process. If the files are located inside a SharePoint online library a gateway is not necessary.

 

Hopefully, this provides some new ideas for tackling your challenge.

 

Regards,

Tom 



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

5 REPLIES 5
TomMartens
Super User II
Super User II

Perfect!



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany
HotChilli
Super User II
Super User II

It will depend on the number of times a source file (query) is duplicated or referenced and also the number of query transformations.  The same file can be loaded multiple times.  It happens all again on a refresh (unless incremental refresh is used)

Is it a problem? Probably not unless it's taking too long or crashing the PC.

---

Once the data is compressed and saved in a pbix , it is sometimes amazing how small the file can be.

------

When visualisations are created with measures is when the powerbi engine starts to work and use memory / processor on the PC

Wow, so if query A loads a file, then query B references A, it will load the file *twice*? That behavior would explain it, for sure. I've got a query that's 8-deep from the source query, plus it feeds from other queries that go back to that source query. I think I've counted 24 distinct paths from my "last query" back to the source file, and I'm not done with my logic yet.

 

So is my mistake thinking that I should use PQ to implement all this business logic? I'm working off a simplistic reporting tool at the moment, which doesn't give me much space to do the logic. 

 

Thanks!
Amon

Hey @aseagull ,

 

I wouldn't call it a mistake, it's the price you have to pay for sourcing files, and not a relational database.

 

On the one hand you can easily load and transform (implementing your business logic) your data, on the other hand sourcing files comes with the price that this datasource type does not support query folding.

 

Chris Webb has written a lot about optimizing query performance using files as a data source. Consider the following link as a start: Chris Webb's BI Blog: Comparing The Performance Of CSV And Excel Data Sources In Power Query Chris W...

 

You might also considering using the M function Table.Buffer Table.Buffer - PowerQuery M | Microsoft Docs This can reduce the number of re-reading the base file.

 

Next to that, you also have to consider where your files are located, either on a file server or inside a SharePoint library. A file server requires an on-premises gateway if you want to automate the process. If the files are located inside a SharePoint online library a gateway is not necessary.

 

Hopefully, this provides some new ideas for tackling your challenge.

 

Regards,

Tom 



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

Tom, indeed Table.Buffer() was it!!! Takes a 1.5h refresh down to 2m. So glad to learn about it!

Helpful resources

Announcements
PBI_User Group Leader_768x460.jpg

Manage your user group events

Check out the News & Announcements to learn more.

Get Ready for Power BI Dev Camp

Microsoft named a Leader in The Forrester Wave

Microsoft received the highest score of any vendor in both the strategy and current offering categories.

Get Ready for Power BI Dev Camp

Power BI Dev Camp - September 30th, 2021

Mark your calendars and join us for our next Power BI Dev Camp!

PowerPlatform 768x460.png

Microsoft Learn

Check out our new Discover Your Career Path blog post series and get all the details.

Top Solution Authors