Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
jaykilleen
Advocate II
Advocate II

How to Improve Query Reference performance for large tables

I have read the Tips for Creating Reports and it says:

 

"When you reference a query you only load the source data once regardless of the number of queries that reference the initial query. "

 

This don't seem to be the case on my machine. I have one base query named Sales that is a Fact table from SQL Server.

 

Sales (~7Million rows of data)

RegionID

ProductID

CustomerID

RepID

DateID

NetRevenue

Quantity

Margin

 

and then 3 queries that reference Sales and group by combinations of different IDs

 

SalesByRep

SalesByCustomer

SalesByProduct

 

All these queries do is reference Sales, group the results and sum the revenue, margin and quantity fields.

 

When I run refresh each of these queries I am watching PowerBI spin for a good 15Minutes and retrieve around 7M recors from SQL Server.

 

This doesn't appear to align with what the tips are saying. Has anyone experienced this and do you have any further tips to try and improve this performance. I have also set Sales attributes 'Load to Report' and 'Enable Refresh of this Query' to FALSE.

 

 

Is this something I can improve with Table.Buffer or DirectQuery? Any ideas would be greatly appreciated.

 

1 ACCEPTED SOLUTION
pqian
Employee
Employee

@jaykilleen I see you are aware of things like Table.Buffer, so you are pretty advanced PowerQuery user.

 

I can go into a little detail about your question. The answer up front is the tip is somewhat true. The rule to remember here is each query, when been loaded to report, will always be evaluated in isolation. They may even be evaluated by different processes and thus not form a dependent tree structure.

 

For example, if you have a base query A, and then two new queries B and C referencing A. A isn't loaded to report but B and C are. When you hit the Apply button, B and C will simultaneously start loading to report. In both evaluations, A will be evaluated separately, because B isn't aware of C. 

 

Now if we have this model in mind, you can argue the tip isn't true at all. However, PowerQuery evaluations keeps a cache of data seen by evaluations on disk. So if you are within the same cache session and pulled on A multiple times, you will essentially only pay for it the first time (unless of course, you are pulling on different pages of A). This cache will ONLY apply to raw data coming from the data source, any additional transformations will need to be performed on top of it. 

 

Finally, since when you load queries to report, you always want the latest data. So each loading session is essentially a new cache session. Therefore, the tip is somewhat true here. You will only need to pay for the data coming from data source A ONCE per loading all of the queries. (Interestingly for most data sources this is true for duplicates of A as well). However, you will pay for the transformations on top of A N times where N = the number of queries been loaded.

 

With this knowledge in mind, you can see that Table.Buffer doesn't really help you either. What that does is guarantee a stable output from within a query (e.g., between multiple lets). So essentially you declare a point in your data where from there on all transformations will be done on the local copy, vs folded remotely.

 

Finally, yes, DirectQuery will most certainly help with the performance here. WIth DQ you always operate on the remote data source and never keep a local copy of the data. So there isn't a "loading" phase. You will pay for the data as the visuals need them (where it's mostly aggregated).

 

Does this make sense?

 

Regards,

PQ

View solution in original post

23 REPLIES 23

@ImkeF thanks for your thoughts on this one. I hadn't actually done too much testing. I wasn't loading my Fact table to the model. I was trying to approach suggested on one of the blogs out there (just tried to find it but couldn't. I'll come link back to it when I run across it again) where you keep your data in a staging query then reference it in subsequent queries. Sort of bring everything in and build on top. Helps to seperate the business logic from the raw data. This limits my ability to just load the data into the report and let BI manage the relationships.

 

Will need to have a bit more of a think about  how I manage my data structure and queries in the Editor 😕

As a followup to this, I am currently facing the same problem with trying to implement dynamic date filters on my data. Currently, I am creating several queries for the same dataset to create "Last 24 Hours" and "Last 7 Days" views. However, I am worried that this will cause refreshing times to increase significantly once my tables get close to ~5million rows. 

 

My main question is, if I filter my dataset to only show the last 24 hours of data, will this change/decrease the load time compared to loading the entire dataset, or does the query still load the whole dataset and then filter? If it filters on the database side, I don't expect to see more than 20,000 records in a day per datatable, so should I be alright? 

Yes, your data filters seem to make sense and if applied correctly they should fold back to the server and thereby significantly reduce loading time. You can also check that using the profiler and analyze the queries that are sent to the SQL-server.

 

The discussion here is about wheter some caching will take place in order to speed up the queries here. In short: It won't and it would not make sense in your case. Because the trick for the performance improvement that comes from only transferring your very limited amount of data from the server would be offset by using a cache here

 

So you have 2 independent queries where each should be fast enough.

 

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.