Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Referenced Query and Caching question - explanation/confirmation required

Hi,

Before I delve into my question I will declare that I have read the below article but I still have questions regarding my specific scenario:
https://blog.crossjoin.co.uk/2016/11/20/referenced-queries-and-caching-in-power-bi-and-power-query/

 

I have a Power BI report, well only the Power Query (PQ) part is complete at the moment.
The PQ model contains 14 tables. However, only a single table connects to the SQL Server database. This single table's connection (Import Mode) does contain a SQL statement to Query-Fold the statement up-stream to the SQL Server database. 

 

The resulting table name is 'K8 Extract'. The full APPLIED STEPS for this table is Source then Changed Type. No other steps are applied.

 

The 13 remaining tables all reference the 'K8 Extract' table and then go on to do a lot unqiue steps (aggregation, grouping, clauses, etc..) thus resulting in greatly different result-sets - hence the need for 13 separate tables.

 

As the 'K8 Extract' table is the only PQ table that actually connects to the SQL Server database, I am expecting that the SQL Server database to only get queried the once. This is my understanding and aim.

 

However, when I do a data refresh from within the Power BI side of PBI Desktop I see the following:
Query.PNG

At the same time as the above moment in time, I executed a SQL query againt the database the PQ table's SQL statement is running against and it returns the below result:
Stats.PNG

The TEXT column's value of 'SELECT eoft.branch_code....' is the actutal SQL statement that is executed against the SQL Server to return the 'K8_Extract' table.

As I am seeing the 'SELECT eoft.branch_code....' multiple times in the above SQL, and as I am seeing mutiple entries in the Apply query changes box, all referencing the same SQL Server/database, I am led to believe that the SQL Server, that I only want to query just the once, is indeed being queried multiple times.

Am I correct in my observation?

What do I need to do (set/configure) so the source SQL Server/database is only query the once and the other 13 tables just reference the 'K8 Extract' table and do not impact the source SQL Server/database over and over?

Thanks in advance.

8 REPLIES 8
v-piga-msft
Resident Rockstar
Resident Rockstar

Hi @Anonymous ,

Actually, if you reference the table from SQL Server in query editor, the underlying data source will be queryed in SQL Server.

What do I need to do (set/configure) so the source SQL Server/database is only query the once and the other 13 tables just reference the 'K8 Extract' table and do not impact the source SQL Server/database over and over?


As  aworkaround, you could export your table in SQL Server to excel and then get data with Excel connector in power bi so that it will not query in the sql server.

Hope @ImkeF   and @Greg_Deckler have other ideas.

Best  Regards,

Cherry

 

Community Support Team _ Cherry Gao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

@v-piga-msft  - thanks for your reply.

 

"Actually, if you reference the table from SQL Server in query editor, the underlying data source will be queryed in SQL Server."
If I have understood you correctly then yes I am aware that if you place a SQL query inside the 'SQL statement' part of a SQL Server connection (Import Mode) then the query itself will be executed upstream on the actual SQL Server database. This is my intention.

"As  aworkaround, you could export your table in SQL Server to excel and then get data with Excel connector in power bi so that it will not query in the sql server."

This wouldn't be an option due to company specific reasons.
More importantly, I would like all steps to be handled in Power Query so I am keen to understand if what I have stated, in my first post, is to be expected and therefore I have mis-understood the query-folding aspect, or do I need to configure things differently so to get my desired intention of querying the SQL Server database just the once??

Stachu
Community Champion
Community Champion

I think the essential part of the article in reference to your problem is here:

However, because buffering happens in memory and is not persisted on disk, buffering during the load of one query does not affect the load of another query. If Q1 is buffered when Q2 is loaded, Q1 will be re-buffered when Q3 is loaded.

In other words if you load 14 tables loaded to your model there will be at least 14 calls to SQL Server. If you now have more (e.g. you reference the original table multiple times in ne query) then you can try using Table.Buffer()
https://docs.microsoft.com/en-us/powerquery-m/table-buffer

and change all the references within a particular query to the buffered table


disabling the background data preview should also reduce the number of queries during development time, as described here:

https://community.powerbi.com/t5/Desktop/What-is-doing-quot-Allow-Data-Preview-To-Download-In-The/td...

 



Did I answer your question? Mark my post as a solution!
Thank you for the kudos 🙂

Anonymous
Not applicable

@Stachu thanks for your reply.
I'm glad you mentioned Table.Buffer() as this is something I am looking to implement.

So if I can just confirm exactly how best to use it.

I have the below M code which is the only query that directly connects to the source SQL Server table.
K8.PNG

The other 13 queries I have in Power Query all reference the K8 Extract query. An example of how these other queries are referencing the K8 Extract query is below. All the others queries are very similar and do reference the K8 Extract in the same manner.
RefQ.PNG

When refereshing the data via Power BI I can see, at some point of the refresh, 14 queries hitting the source SQL Server database - one for each Power Query query I presume.

With the intention of the K8 Extract query being the only query to connect/query/hit the source SQL Server database, what is the best way to use the Table.Buffer() function?

What I have done is amend the K8 Extract query to include the Table.Buffer() function (see below):
K8b.PNG

In implementing the above and then refreshing the data via Power BI, I notice the following:
1) Through viewing the Task Manager the CPU is 73%, RAM(16Gb) is 94%, Disk is 100%.
2) Without the Table.Buffer() the whole data refresh took 11 minutes, but with the Table.Buffer() bit added it's still refreshing 1 hour later (and hasn't finished yet).

3) The number of queries hitting the source SQL Server database, at any point so far, is only 5. Now is this due to the Table.Buffer() reducing the number of hits on the source SQL Server database, or is it only 5 as the other queries haven't got round to hitting the source SQL Server as my PC is grinding to a halt?

The whole point of this is to only hit the source SQL Server database once but with this RAM/Disk hogging that's happening I'm wondering if I should put the load on the source SQL Server.

Should using Table.Buffer() have the impact (RAM and Disk massively consumed) that my PC is experiencing?

Do you have any suggestions on how to achieve what I'm trying to in a good way?

Thanks.

Anonymous
Not applicable

UPDATE: this is an update following on immediately from my last post (above).

Due to the data refresh, with the Table.Buffer() implement as shown in the above post, still refreshing after an hour, I left my PC on overnight to allow the refresh to complete. I unlocked my PC the following morning to find my PC still ground to a halt. Unresponsive. I had to hard-boot my PC.

With the PC booted up without any Power BI report running, I...

1) Opened the Power BI report without the Table.Buffer() implemented, and then refreshed the data.
The full refresh took 11 minutes (again). At the half way point of the refresh I checked the Task Manager stats and they were:
CPU  @  3%
Memory  @  53%
Disk  @  4%

2) Opened the Power BI report with the Table.Buffer() implemented (as shown in the above post), and refreshed the data.
My PC came to a halt again. Unreponsive. I managed to open the Task Manager and the stats were:
CPU  @  50%
Memory  @ 93%
Disk  @  100%
I used Task Manager to end the Power BI instance 17 minutes into the data refresh. I have no doubt this refresh would've continued for hours (a day, maybe more) if I hadn't ended the task.

What is going on?
Have I used the Table.Buffer() function incorrectly?
Are the results I'm seeing expected or is something wrong?

How do I ensure the source SQL Server database is only connected to (hit) once?

@Stachu  @ImkeF  @Greg_Deckler  @MarcelBeug 

 

I've @ the above users as I see you are main contributors on Power Query, and as I also see my post as, not only, a question on the Table.Buffer() function but also a question on the possible impact of using such a function. No doubt a very useful post for others in the future. It will be much appreacited if you can help.

Thanks.

Anonymous
Not applicable

Bump.
Is anyone able to confirm/explain my experience when using the Table.Buffer() function?
Thanks.
Anonymous
Not applicable

Although, no response has been issued since my last post, I'm hoping this new info may prompt a response.

In my previous messages, in this thread, I've questioned how when I refresh my data - why does the source database get queried (hit) multiple times when I've placed a Table.Buffer() on the only query that actually connects to the source. I'm still wishing to understand this.
However, I would like to add something further...

 

In PQ, when I make a change to the single query, the only query, that connects to the source database - why is the source database being queried (hit) multiple times. I'm not performing a full refresh where all queries are refreshed (still need to understand why Table.Buffer() isn't doing it's job of hitting the source database just the once), instead this is a change to the single query within PQ.

My understanding is if I've made a change to the SQL statement in one query then only that query is updated/refreshed (you get those yellow dots swooping across the top of the query editor screen whilst it's updating). I'm not clicking of refresh preview, or refresh all, or close and apply. I'm simply making a change to the first step in the query.
Is my understanding correct on this?
If it is correct, then why am I seeing multiple hits to the source?  It seems all the other queries that reference the single query, the only quey, query that connects to the source database, are being refreshed too?  I know this as I can see the source database being queried multple times, vis SSMS, with the SQL statement that resides in the single query.


Hope this makes sense.

Please help with my understanding?!

Thanks.

I cannot help, but just wanted to point out that the issue described above causes a data accuracy issue as well. My use case is that I want to return the top N values from a SQL query, but selected randomly server-side. Then, similar to as described above, I split the data into several different parts and perform transformations, before merging them back together again. The problem this causes is that, because each PQ table seems to query SQL server again, rather than using the first query as a true reference, I get different observations randomly selected in each table! Would love a response from Power BI team as to whether I'm doing something wrong, or this is expected behaviour.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors