Solved: Why does Power BI "Load" much more data (volume) t...

aseagull · ‎06-10-2021

I'm trying to pin down some performance issues in Power Query. One of the quirks I've noticed is that even though my largest file is only 60k records and 10 columns (mostly text in a csv), and sits at under 5MB on disk, when either PQ or PBI is referencing it, the system reports loading MORE THAN 500 MB of data from the file. I see this whether it's in the lower right corner of PQ, or when PBI is loading the model, or even in PQ when I open a filter and click "Load More".

Fortunately, I see in Dax Studio / Vertipaq that the loaded data is not that large in memory.

Is this something to worry about, or to ignore?

Thanks,

Amon

TomMartens · ‎06-11-2021

Hey @aseagull ,

I wouldn't call it a mistake, it's the price you have to pay for sourcing files, and not a relational database.

On the one hand you can easily load and transform (implementing your business logic) your data, on the other hand sourcing files comes with the price that this datasource type does not support query folding.

Chris Webb has written a lot about optimizing query performance using files as a data source. Consider the following link as a start: Chris Webb's BI Blog: Comparing The Performance Of CSV And Excel Data Sources In Power Query Chris W...

You might also considering using the M function Table.Buffer Table.Buffer - PowerQuery M | Microsoft Docs This can reduce the number of re-reading the base file.

Next to that, you also have to consider where your files are located, either on a file server or inside a SharePoint library. A file server requires an on-premises gateway if you want to automate the process. If the files are located inside a SharePoint online library a gateway is not necessary.

Hopefully, this provides some new ideas for tackling your challenge.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

View solution in original post

TomMartens · ‎06-11-2021

Perfect!

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

HotChilli · ‎06-10-2021

It will depend on the number of times a source file (query) is duplicated or referenced and also the number of query transformations. The same file can be loaded multiple times. It happens all again on a refresh (unless incremental refresh is used)

Is it a problem? Probably not unless it's taking too long or crashing the PC.

---

Once the data is compressed and saved in a pbix , it is sometimes amazing how small the file can be.

------

When visualisations are created with measures is when the powerbi engine starts to work and use memory / processor on the PC

aseagull · ‎06-11-2021

Wow, so if query A loads a file, then query B references A, it will load the file *twice*? That behavior would explain it, for sure. I've got a query that's 8-deep from the source query, plus it feeds from other queries that go back to that source query. I think I've counted 24 distinct paths from my "last query" back to the source file, and I'm not done with my logic yet.

So is my mistake thinking that I should use PQ to implement all this business logic? I'm working off a simplistic reporting tool at the moment, which doesn't give me much space to do the logic.

Thanks!
Amon

TomMartens · ‎06-11-2021

Hey @aseagull ,

I wouldn't call it a mistake, it's the price you have to pay for sourcing files, and not a relational database.

On the one hand you can easily load and transform (implementing your business logic) your data, on the other hand sourcing files comes with the price that this datasource type does not support query folding.

Chris Webb has written a lot about optimizing query performance using files as a data source. Consider the following link as a start: Chris Webb's BI Blog: Comparing The Performance Of CSV And Excel Data Sources In Power Query Chris W...

You might also considering using the M function Table.Buffer Table.Buffer - PowerQuery M | Microsoft Docs This can reduce the number of re-reading the base file.

Next to that, you also have to consider where your files are located, either on a file server or inside a SharePoint library. A file server requires an on-premises gateway if you want to automate the process. If the files are located inside a SharePoint online library a gateway is not necessary.

Hopefully, this provides some new ideas for tackling your challenge.

Regards,

Tom

Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany