cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
jaykilleen Regular Visitor
Regular Visitor

How to Improve Query Reference performance for large tables

I have read the Tips for Creating Reports and it says:

 

"When you reference a query you only load the source data once regardless of the number of queries that reference the initial query. "

 

This don't seem to be the case on my machine. I have one base query named Sales that is a Fact table from SQL Server.

 

Sales (~7Million rows of data)

RegionID

ProductID

CustomerID

RepID

DateID

NetRevenue

Quantity

Margin

 

and then 3 queries that reference Sales and group by combinations of different IDs

 

SalesByRep

SalesByCustomer

SalesByProduct

 

All these queries do is reference Sales, group the results and sum the revenue, margin and quantity fields.

 

When I run refresh each of these queries I am watching PowerBI spin for a good 15Minutes and retrieve around 7M recors from SQL Server.

 

This doesn't appear to align with what the tips are saying. Has anyone experienced this and do you have any further tips to try and improve this performance. I have also set Sales attributes 'Load to Report' and 'Enable Refresh of this Query' to FALSE.

 

 

Is this something I can improve with Table.Buffer or DirectQuery? Any ideas would be greatly appreciated.

 

1 ACCEPTED SOLUTION

Accepted Solutions
Microsoft pqian
Microsoft

Re: How to Improve Query Reference performance for large tables

@jaykilleen I see you are aware of things like Table.Buffer, so you are pretty advanced PowerQuery user.

 

I can go into a little detail about your question. The answer up front is the tip is somewhat true. The rule to remember here is each query, when been loaded to report, will always be evaluated in isolation. They may even be evaluated by different processes and thus not form a dependent tree structure.

 

For example, if you have a base query A, and then two new queries B and C referencing A. A isn't loaded to report but B and C are. When you hit the Apply button, B and C will simultaneously start loading to report. In both evaluations, A will be evaluated separately, because B isn't aware of C. 

 

Now if we have this model in mind, you can argue the tip isn't true at all. However, PowerQuery evaluations keeps a cache of data seen by evaluations on disk. So if you are within the same cache session and pulled on A multiple times, you will essentially only pay for it the first time (unless of course, you are pulling on different pages of A). This cache will ONLY apply to raw data coming from the data source, any additional transformations will need to be performed on top of it. 

 

Finally, since when you load queries to report, you always want the latest data. So each loading session is essentially a new cache session. Therefore, the tip is somewhat true here. You will only need to pay for the data coming from data source A ONCE per loading all of the queries. (Interestingly for most data sources this is true for duplicates of A as well). However, you will pay for the transformations on top of A N times where N = the number of queries been loaded.

 

With this knowledge in mind, you can see that Table.Buffer doesn't really help you either. What that does is guarantee a stable output from within a query (e.g., between multiple lets). So essentially you declare a point in your data where from there on all transformations will be done on the local copy, vs folded remotely.

 

Finally, yes, DirectQuery will most certainly help with the performance here. WIth DQ you always operate on the remote data source and never keep a local copy of the data. So there isn't a "loading" phase. You will pay for the data as the visuals need them (where it's mostly aggregated).

 

Does this make sense?

 

Regards,

PQ

View solution in original post

22 REPLIES 22
Super User III
Super User III

Re: How to Improve Query Reference performance for large tables

Have you actually tested that the refresh with one referencing query is substantially faster than with 3 (say: 5-7 minutes)?

 

 Do you load your fact table to the model or just the referencing queries?

 

Do your queries contain custom SQL-statements? They would break query folding: https://www.mssqltips.com/sqlservertip/3635/query-folding-in-power-query-to-improve-performance/

 

 

 

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark my post as a solution. Thanks!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




Microsoft pqian
Microsoft

Re: How to Improve Query Reference performance for large tables

@jaykilleen I see you are aware of things like Table.Buffer, so you are pretty advanced PowerQuery user.

 

I can go into a little detail about your question. The answer up front is the tip is somewhat true. The rule to remember here is each query, when been loaded to report, will always be evaluated in isolation. They may even be evaluated by different processes and thus not form a dependent tree structure.

 

For example, if you have a base query A, and then two new queries B and C referencing A. A isn't loaded to report but B and C are. When you hit the Apply button, B and C will simultaneously start loading to report. In both evaluations, A will be evaluated separately, because B isn't aware of C. 

 

Now if we have this model in mind, you can argue the tip isn't true at all. However, PowerQuery evaluations keeps a cache of data seen by evaluations on disk. So if you are within the same cache session and pulled on A multiple times, you will essentially only pay for it the first time (unless of course, you are pulling on different pages of A). This cache will ONLY apply to raw data coming from the data source, any additional transformations will need to be performed on top of it. 

 

Finally, since when you load queries to report, you always want the latest data. So each loading session is essentially a new cache session. Therefore, the tip is somewhat true here. You will only need to pay for the data coming from data source A ONCE per loading all of the queries. (Interestingly for most data sources this is true for duplicates of A as well). However, you will pay for the transformations on top of A N times where N = the number of queries been loaded.

 

With this knowledge in mind, you can see that Table.Buffer doesn't really help you either. What that does is guarantee a stable output from within a query (e.g., between multiple lets). So essentially you declare a point in your data where from there on all transformations will be done on the local copy, vs folded remotely.

 

Finally, yes, DirectQuery will most certainly help with the performance here. WIth DQ you always operate on the remote data source and never keep a local copy of the data. So there isn't a "loading" phase. You will pay for the data as the visuals need them (where it's mostly aggregated).

 

Does this make sense?

 

Regards,

PQ

View solution in original post

Super User III
Super User III

Re: How to Improve Query Reference performance for large tables

@pqian thank you so much for this explanation - very much appreciated. Especially as there doesn't seem to me so much about this available at the moment - or could you give more guidance on where to find infos like this?

 

Is my understanding correct that there wouldn't be a cache to share if B & C would fold back to the server?:

 

Like 1st query A is only adressing the whole table in the server and query B is filtering P&L - accounts for example. B would fold back to the server, so only the selected accounts in B will be returned from the server.

 

If query C now selects different accounts from A (Balance Sheet), only those would be returned.

 

Hence: Your example of A providing a cache to share for B and C would only work if the query folding would be broken before B & C are executed.

 

So in cases where direct query isn't an option or becomes too slow for the user (when there are interactions on the reports that would trigger expensive calculations during load), the Table.Buffer could actually make sense: Namely in those cases where the alternatives of multiple folding (sub-)queries are slower than using Table.Buffer as a folder-stop at the end of query A?

Admittedly, this might not be the case very often.

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark my post as a solution. Thanks!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




Microsoft pqian
Microsoft

Re: How to Improve Query Reference performance for large tables

@ImkeF Your observation is correct. Essentially folding the query differently to the server could potentially break the cache sharing. Whether Table.Buffer make sense in this case is still unclear to me. In the example you gave, if filtering B and C reduces the datasize dramatically, then it may still be better if you push both queries to the server and load the data separately. 

 

We are touching onto implementation details of PowerQuery/PowerBI desktop. It is not a public contract and should only serve as guidelines on how optimization should be carried out. So they aren't documented anywhere, although some experimentation should quickly reveal what's going on behind the scenes.

jaykilleen Regular Visitor
Regular Visitor

Re: How to Improve Query Reference performance for large tables

@pqian Thanks for all this detail. I think I will experiment more with DirectQuery and also just the way I build up my queries.

I think I am in a position where I should be utilising DAX on the Report View instead of creating some many queries for basic things like this.

 

This should also help keep my cache low (it is currently not difficult to get close to the 4GB limit which I know I can increase but I also run VMs on my machine and they fall halt when there is not enough spare memory).

jaykilleen Regular Visitor
Regular Visitor

Re: How to Improve Query Reference performance for large tables

@ImkeF thanks for your thoughts on this one. I hadn't actually done too much testing. I wasn't loading my Fact table to the model. I was trying to approach suggested on one of the blogs out there (just tried to find it but couldn't. I'll come link back to it when I run across it again) where you keep your data in a staging query then reference it in subsequent queries. Sort of bring everything in and build on top. Helps to seperate the business logic from the raw data. This limits my ability to just load the data into the report and let BI manage the relationships.

 

Will need to have a bit more of a think about  how I manage my data structure and queries in the Editor 😕

dwang953 Visitor
Visitor

Re: How to Improve Query Reference performance for large tables

As a followup to this, I am currently facing the same problem with trying to implement dynamic date filters on my data. Currently, I am creating several queries for the same dataset to create "Last 24 Hours" and "Last 7 Days" views. However, I am worried that this will cause refreshing times to increase significantly once my tables get close to ~5million rows. 

 

My main question is, if I filter my dataset to only show the last 24 hours of data, will this change/decrease the load time compared to loading the entire dataset, or does the query still load the whole dataset and then filter? If it filters on the database side, I don't expect to see more than 20,000 records in a day per datatable, so should I be alright? 

Super User III
Super User III

Re: How to Improve Query Reference performance for large tables

Yes, your data filters seem to make sense and if applied correctly they should fold back to the server and thereby significantly reduce loading time. You can also check that using the profiler and analyze the queries that are sent to the SQL-server.

 

The discussion here is about wheter some caching will take place in order to speed up the queries here. In short: It won't and it would not make sense in your case. Because the trick for the performance improvement that comes from only transferring your very limited amount of data from the server would be offset by using a cache here

 

So you have 2 independent queries where each should be fast enough.

 

 

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark my post as a solution. Thanks!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




healthEteam Regular Visitor
Regular Visitor

Re: How to Improve Query Reference performance for large tables

Just a question as I am learning myself.

 

In these examples where there are multiple queries that are basically refiltered versions of the original query, is it not more efficient to bring the only the main query in once and then create the subqueries within the pbx model by creating separate calculated tables, filtering the main table. 

 

I have a situation similar, bring in one query of a fact table, then have created 2 other calculated tables that are similar to your filtered queries above.

 

This way there is only one query that is run vs. 3.

Helpful resources

Announcements
Exclusive LIVE Community Event #3 – Ask Arun Anything

Exclusive LIVE Community Event #3 – Ask Arun Anything

Join us in the third Triple A event!

Meet the 2020 Season 1 Power BI Super Users!

Meet the 2020 Season 1 Power BI Super Users!

It’s the start of a new Super User season! Learn all about the new Super Users and brand-new tiered recognition system.

January 2020 Community Highlights

January 2020 Community Highlights

Make sure you didn't miss any of the things that happened in the community in January!

Top Solution Authors
Top Kudoed Authors