This is a bit of a follow on from another post of mine on the same parent requirement. The other post (see below for link) hasn't lead me to an answer so, in attempting to achieve what I need to do, I tried the method of using the Function.InvokeAfter() function. Unfortunately, it hasn't worked either.
In my other post (link above) I was pursuing the Table.Buffer() function but it hasn't resolved my objective - I would still like to know if Table.Buffer() should work and if I am implementing it correctly.
This time, this is what I've done to try and resolve my objective by using the Function.InvokeAfter().
1) Create a SQL connection named Query1. This query has a SQL statement which is folded back to the SQL Server.
2) Create a query named Query1_Referenced which references Query1. It uses the Function.InvokeAfter() function like this:
Source = Function.InvokeAfter(
3) The derived queries, about 10 of them, all now reference the Query1_Referenced query (I thought this extra reference query step will allow an additional Function.InvokeAfter() function to help). Each of the derived queries reference the Query1_Referenced query as follows:
Source = Function.InvokeAfter(
Of course, each of the derived queries have their opening statement as above but then go on to do their own unqiue logic - hence the need for each derived query.
When testing this, not only have I placed the Function.InvokeAfter() function where stated, but I have also unchecked the 'Allow data preview to download in the background' box. I have also tested my method with the 'Enable parallel loading of tables' checked AND unchecked.
My thinking here is to have the Query1 query, which connects to the SQL Server, complete ensuring the data is fully cached. Then have the Query1_Referenced query get the dataset from Query1 (not getting it's data directly from the SQL Server). Then all the derived queries reference the Query1_Referenced query (again, not getting it's data directly from the SQL Server).
When running this test, I can see the source SQL Server is being executed against multiple times with the same SQL statement. I do believe the SQL Statements are executing against the SQL Server once their Function.InvokeAfter() delayed times (so either 60 or 120 seconds) have been met.
Is what I'm trying to acheive possible?
Have I implemented the Function.InvokeAfter() function correctly?
Should I provide enough duration lag (so above the 120 seconds) to ensure that Query1 had fully completed for I kick off Query1_Referenced (and the same for the derived queries)?
Is there any way, all from within Power Query, to have the source SQL Server only be executed against once?
Solved! Go to Solution.
the first query is obviously running on the server, but as it's a custom SQL Power BI doesn't create it's own 'native query'
now imagine following scenario:
Query1 - connection to the SQL database, select a table
Query2 - reference Query1, select 2 columns, filter and group by with sum (all done using the UI)
Query3 - reference Query1, select 3 different columns, group by and count (again, only using the UI)
In this scenario actions in Query2 and Query3 can both be performed by PowerBI and SQL. As long as the query folding is active the actions are happening on the server - that means that PowerBI 'writes' the actual SQL that is executed on the server (the 'native query'), as there was no SQL provided in the beginning. Does this clarify?
please read this thread about queries evaluation chain: https://social.technet.microsoft.com/Forums/en-US/8d5ee632-fdff-4ba2-b150-bb3591f955fb/queries-evalu....
Thanks for your response.
I have read the link you've provided and I have questions.
"So if Q1 queries a SQL Server database called "MyServer/MyDatabase" and returns a single unfiltered table called "MyTable", the query sent to the server might be "select [Col1] from [MyTable]". In this case, the persistent cache will now know the result of sending "select [Col1] from [MyTable]" to "MyServer/MyDatabase". If another M query (whether through referencing Q1, or by querying the same table directly) needs the same result, the persistent cache can provide it, and the result won't have to be fetched a second time from the SQL Server."
My Query1 is a SQL connection uses a complex SQL statement (sub query, joins, multiple where clauses, Group By, Order By)
Q1 - does this complex SQL statement mean it is too complex to stored as "persistent cache"?
My Query1 has the following 'APPLIED STEPS':
Sql.Database (connection to the SQL Server via the complex SQL Statement.
Table.TransformColumnTypes (simply changing same columns types)
Table.AddKey (simply adding an Index Column)
Q2 - Can/Will Query1 with the applied steps be stored as "persistent" cache?
Q3 - Do I need to ensure that Query1 is fully complete, so the data is stored in "persistent cache", before Query2 starts to calculate, as Query2 references Query1? How can I enforce that Query2 does not start its calculation before Query1 completes? Is the Function.InvokeAfter() function required and should I be placing this function as the very first Applied Step in the referencing function?
Q4 - To confirm, is a SQL Server source connection stored in "persistent cache"?
"However, because buffering happens in memory and is not persisted on disk, buffering during the load of one query does not affect the load of another query. If Q1 is buffered when Q2 is loaded, Q1 will be re-buffered when Q3 is loaded."
I do not understand this. I'm reading the above as Q1 is stored in persistent cache and therefore Q2 does not query the source system, it only queries Q1.
Q5 - Why does Q1 need to be re-buffered when Q3 is loaded?
Q6 - Where is the Table.Buffer() function actually to be placed?
For instance, Query1 is the query that will be referenced by multiple queries.
Should the Table.Buffer() function be placed as the very last Applied Step in Query1, so all in Query1 is placed in memory so the referencing queries, and by then referencing Query1 you are taking the data from memory?
Unfortunately for me, I still do not know if what I am trying to achieve, explained via this post and the linked post, is possible with my particular scenario?
Please advise? Thanks.
some answers of mine how I understand the topic.
Nope, the complexity shouldn't be a problem. But the engine decides if it will be stored or not AND if it will be used again or not.
Chris Webb says: As far as I can see, sometimes Power Query reads data from a data source many times for a single query execution. (Source: https://blog.crossjoin.co.uk/2016/11/20/referenced-queries-and-caching-in-power-bi-and-power-query/)
Nope, because only the raw data (fetched from the SQL Server) are stored in the persistent cache.
Function.InvokeAfter won’t help at this place. It is used for delayed invocation of a function within a query – I use it like Thread.Sleep.
The PowerQuery engine creates an execution plan but we don’t see it anywhere. Therefore we don’t know in what order the data sources are requested.
Table.Buffer is a function which scope is limited to a query in which it is executed. If you have Q1 (let x = Table.Buffer(SQLSERVER_TABLE) in x) and Q2 where Q2 references Q1, the Table.Buffer doesn’t have any effect.
The answer to your Q4 question is therefore NO.
Because of the scope which is limited to one query.
Table.Buffer won’t help at this place.
To your initial question “Is there any way, all from within Power Query, to have the source SQL Server only be executed against once?” – I don’t think so.
But if you have some intermediate queries like for ETL, you can allow loading only of the last query. The rest won’t be loaded and therefore won’t send any requests to the DB.
@Nolock thanks for your response once again. Just some follow up thoughts.
Q2 - Based on your comment "because only the raw data (fetched from the SQL Server) are stored in the persistent cache" I decided to do the following to see if the SQL statement would execute/hit the source SQL Server only the once.
i) Create Query1 using a SQL Statement in the connection. The only Applied Step in this query is 'Source' to get the source data. No other steps have been applied as I want to try and have this data in the 'persistent cache'.
ii) Create Query2 which references Query1. This Query2 has the applied steps. Query2 will now be referenced by the other queries that do their own unique thing.
I was hoping that Query1, as just being the connection executing the SQL Statement, will persistent cache the data. Query2 will apply some additional steps, and all other queries will just re-execute/re-evaluate Query2 which at worse will re-calculate/re-evaluate the Query1 which is in the persistent cache and therefore won't hit the SQL Server again.
Unfortunately, having tried the above I can still see all the derived queries being executed against the SQL Server.
The reason why I am keen to understand if it's possible to restrict the source SQL Server being executed against once is because the DBAs will start questioning why I have multple instances of the same SQL Statement being executed against the SQL Server at the same time. Trying to do it all inside Power Query, I guess that it cannot be done. I hope to be proved wrong!!
Q3 - Based on your comment "The PowerQuery engine creates an execution plan but we don’t see it anywhere. Therefore we don’t know in what order the data sources are requested." I guess using the reference functionality will control what order the data sources are requested. If Query2 references Query1 then the Power Query execution plan will make it so that Query1 completes before Query2.
Q4 - Okay, thanks for clarifying this for me. So the Table.Buffer() only works for all processes within the same query (the same advanced editor section of code). Query1's table buffer cannot be called/utilised from another query (ex, Query2).
I do have a question on this though.
If the advanced editor code had something like this:
Source = <connect to the source and retrieve some data>
Remove Columns = ( Source, <remove some columns>)
Group By = (Source, <group by a column> )
In the above very rough example, the Source dataset is being used in two separate processes. By calling 'Source' in each of these lines of code, isn't this in effect having Source as buffered? How would using Table.Buffer( Source ) make it any different?
Thank you again. Having your explanations to my specific questions helps. BTW, I have read all the links you've listed prior to asking my questions but I still have the need to ask them.
Unfortunately, you can push the PowerQuery engine to send a select to a DB only once – there are no hints or something else.
Yes, you are right. Q1 will be executed before Q2. But it doesn’t mean that Q1 will be executed only once. I can happen that there are 2 “threads”: Q1 and Q1->Q2 which are executed in any order or in parallel.
It depends on the data source as mentioned in the Technet Forum.
Table.Buffer in this case does the following:
Source = <connect to the source and retrieve some data> BufferedSource = Table.Buffer(Sourcer) // data are cached in RAM till the end of the query ------- Remove Columns = ( BufferedSource, <remove some columns>) ------- Group By = (BufferedSource, <group by a column> )
Source = <connect to the source and retrieve some data> // depending on source of data it will or will not be cached ------- Remove Columns = ( Source, <remove some columns>) // this step can use query folding if supported by data source ------- Group By = (Source, <group by a column> ) // and I hope (not 100 % sure) this step can also use query folding if supported by data source
When I develop I use as Table.Buffer as List.Buffer, try using query folding but I gave up optimizing a count of database requests. My datasets are refreshed late in the night and therefore not so critical for the database infrastructure.
for SQL you should be able to see if the query folding is still happening. You just need to check if for a given step the 'View Native Query' is not greyed out when you right click on this step
we had similar conversations with DBAs questioning the number of connections to SSAS Multidimensional from Power Query. In the end the added business benefit won, and we kept the extraction process in Power Query. There were 2 main takeouts from the whole discussion
1) better communication on when to trigger the queries refresh, considering the server resources perspective
2) harder to measure - having a self service solution actually lowered the number of adhoc queries (users moved to PowerBI), but if you use direct query that may not be applicable
I guess what I am trying to say is this - is PowerBI optimal from the SQL Server usage perspective? Not for your use case - unfortunately. Do the other benefits make it worth using? Up to you and your organization to decide. If there is additional cost involved the questions is what's the added business benefit to counterbalance it. Not the most exciting discussion, but still important one.
@Stachu - thanks for your response.
Just two points...
1) I take it you're confirming too, by way of not correcting me, that you cannot stop the SQL Server connection's SQL Statement from exectuing on the source SQL Server, not only for the intended PQ query, but also for the multiple PQ reference queries.
2) Interestly, speaking of checking 'View Native Query', on the query (Query1) that is intended to executed on source SQL Server, I see this is greyed out for the Source 'Applied Step'. As this is the very first step of the query, the step that contains only the SQL statement - then I'm surprised why the 'View Native Query' is greyed out??! I would've thought that it would be enabled, especially considering I can see the query (Query1) (not to mention the reference queries) executing on the source SQL Server.
I just want to clarify something.
Query1 is a SQL Server connection. This connection has a SQL Statement (a complex statement). I tested this statement in SSMS first to make sure it's correct, then copy and paste into the Power Query SQL Statement box.
When I execute Query1 I can see it is executing against the source SQL Server from SSMS. To me, this means Query1, at the point of the SQL Statement, is being query folded. - Am I correct here?
The SQL Statement step is the 'Source' step in the Applied Steps. The 'Source' step is the very first Applied Step. No other steps are before it which could make it not query fold, and also taking into consideration SSMS is showing the query is being executed, this 'Source' step IS being query folded. However, when I right-click on the 'Source' step the View Native Query is greyed-out. This greyed-out part tells me it is not being query folded, but it's the first step and I can see it executing on the source SQL Server via SSMS - Is this SQL Statement/first step being query folded or not? (if it's not being queried folded then how is the data getting into Power Query?)
Join us for an in-depth look at the new innovations across Dynamics 365 and the Microsoft Power Platform.
We spoke with Power BI Super User, Greg Deckler, about his charity work
Continue your learning in our online communities.