I am moving our reports from shared capacity to an embedded node, and I'm having trouble exceeding the memory of the node when performing a refresh of the imported data. The data being imported is much less than what this report does successfully in the shared workspace.
Upon investigation, it appears from the diagnostics in the gateway log that the query is not being folded at all - so the full table is being retrieved, and then the filter on date must be handled in the mashup engine. And this happens identically in the shared workspace and the dedicated capacity workspace.
Here's what View Native Query shows for the last step where it is available:
select [$Outer].[RPT_Tests_Summary_PKColumn2] as [RPT_Tests_Summary_PKColumn],
[$Outer].[KeyDateTime2] as [KeyDateTime],
..... (three nested queries)
from [dbo].[RPT_Tests_Summary] as [_]
where [_].[KeyDateTime] > convert(datetime2, '2016-01-01 00:00:00')
) as [_]
) as [$Outer]
left outer join [dbo].[RPT_TestsOBD] as [$Inner] on ([$Outer].[RPT_Tests_Summary_PKColumn2] = [$Inner].[RPT_Tests_Summary_PKColumn])
) as [$Outer]
left outer join [dbo].[RPT_Stations] as [$Inner] on ([$Outer].[StationId2] = [$Inner].[StationId])
Only two steps remain in the query after this: a sort on the same column being filtered, and a Rename Columns step. Neither of those should prevent query folding. And even if they do, it is my understanding that the folded query to this point will be sent to SQL Server, and only the remaining two step performed in the mashup engine.
But that's not what is happening. Here is what the Gateway log shows with EmitQueryTrace set to True and advanced logging enabled:
As you can see, only the driving table is being retrieved, and the date filter is not being applied. (The number and size of the packets retrieved in the async stream lends credence to the idea that it is retrieving all rows.) This happens in the shared workspace as well as the dedicated capacity workspace.
This report retrieves data all the way back to 2014 in the shared workspace, but can't retrieve half of that in the A2 node. I thought I could control how much memory was needed (what size node we should choose) by controlling the date range retrieved, but it's not working because the date filter is not being sent to SQL Server.
I am using the June 2018 version of Desktop as well as the June 2018 version of the gateway.
The gateway has no problem executing this query when it is run from a report in Shared Capacity. But it fails when the report is run in Dedicated Capacity - an A2 node. So I don't think the problem is a gateway problem.
An A2 node is supposed to allow 5 GB of cache - more than allowed in Shared Capacity. Yet I am hitting some limit. And Azure shows the memory usage never exceeds 2.7 GB - the same amount it used when it was an A1 node (3 GB cache). I'm just learning how to be a capacity admin, so I may have missed something. Can someone help?
And the reason I am asking about query folding is because I came across it while investigating my basic problem, and it doesn't seem right. It will certainly increase the memory usage to retrieve all the rows, so that may very well alleviate the problem. However, since that "unfolded" query works in Shared Capacity with its 1GB limit, it may not help at all.
I'm following up on every possible cause or explanation I can think of or dig up in research as I try to move us from Shared Capacity to Dedicated Capacity. Any help (other than pointing me to the documentation, which I've already worked through - although I suppose it's possible I may have missed or mis-interpreted some step) would be greatly appreciated.
Today I am going to re-create the report from scratch to see if it's some bug in the report that causes either or both of the symptoms - failing in Dedicated Capacity, and not folding the query. If so, I've got a big but known quantity of work to rebuild all of my reports that link to this one, and any other reports that fail and all of the reports that link to them. I will file a support ticket if that doesn't work and I don't get any tips that help.
Surely changing a workspace to use Dedicated (embedded) Capacity - or loading an existing PBIX file into a new workspace assigned to Dedicated Capacity, I've tried both - shouldn't be this tough. What am I missing?
Recreating the report made no difference. I created it from scratch, with no other PBI reports in memory or accessed by Desktop since the last reboot - so there couldn't be any artifacts left in the in-memory SSAS. I created just the query and a very simple tablular report - no slicers, no measures, no sort, no other visuals, just the bare bones of the report. On the last step "View Native Query" is enabled, and it shows the date restriction and join being passed in the native query to SQL Server.
The new report will not publish successfully to the dedicated capactiry (A1 node). It fails with "An error occurred while attempting to publish 'Data Lookup - Test Summary - Rebuild.pbix': The remote server returned an error: (404) Not Found."
In the workspace in shared capacity, it publishes just fine, and runs successfully. However, it is stil performing a "SELECT * FROM RPT_Tests_Summary", according to the gateway diagnostic logs.
I am going to open a support ticket with Microsoft. However, if anyone has tips for a newbie capacity admin, it would be gratefully appreciated.
An update: Microsoft is investigating why refreshing this report succeeds in shared capacity but fails (or takes a lot of memory and time) in embedded capacity (an A1, A2 or A3 node).
I've discusses the apparent lack of query folding with them, but that's not the focus of their investigation at this time.
Can anyone give me an idea why View Native Query shows the folded query but the gateway log does not?
I've done some more testing, recreating the report from scratch again, just retriving the driving table and filtering on date - not expanding other tables within the query or anything else. No visuals, just retrieving data, as simple as you can get. View Native Query shows the date comparison and the list of columns to be retrieved, but the gateway logs do not.
So it doesn't have anything to do with complexity of the steps or any post-retrieval steps. Any other ideas?
Hi @v-shex-msft, Profiler is a great idea - I should have confirmed the behavior there.
But the gateway log shows something like
SELECT * FROM tab1
instead of the Native Query displayed in the query editor, which would be something like
SELECT tab1.colA, tab2.colB, tab3.colC
INNER JOIN tab2 ON tab2.ForeignKey = tab1.PrimaryKey
INNER JOIN tab3 ON tab3.ForeignKey = tab2.PrimaryKey where tab1.DateCol > '2017-01-01'
So I'm supposing Profiler is just going to show me the same query that the gateway log shows. Which means query folding is not taking place. If that's what's happening, there are probably two more queries being sent across the gateway, and the join is being done in the mashup engine - very expensive on performance.
I definitely want the database engine to restrict the retrieved rows on the date column and join the two other tables, without having to write the native query myself - Power BI does that automatically, and maintains it as I maintain the query. Except it looks like it's not happening.
I will find some time to try this out with Profiler to confirm that the gateway log is not misleading me.
No, there was never really an answer other than trying to wotk through the steps of the query to see what I could do to optimize what was sent to SQL Server.
Since that time, Microsoft has introduced the Performance Analyzer on the Modeling tab of Desktop. That's very helpful. It shows how much time is spent in the query sent to the database, as opposed to time spent in DAX and preparing the visual, and it gives you the query that was actually sent. It also shows you the queries for each visual - if you have several visuals on a page, that may need some optimization, too.
Two tips I've found useful:
When you expand a related table (e.g., a dimension), Query Editor doesn't give you the opportunity to name the resulting columns - other than whether or not to prefix it with the query name. If you want some other name, you can add a step to rename the column after you expand it. But that can create a very inefficient nested query to SQL Server. Instead, I go into Advanced Editor and rename the column in the output, which creates a much better SQL query, e.g.,
Be careful how many filters you create in the query itself (i.e., clicking on a column in Query Editor and filtering it there). Query Editor doesn't always handle that as cleanly as you might like - I had filters applied at several levels on deeply nested queries. Simplifying that got me down to one "where" clause in SQL at the outer level of the query - which SQL Server handles nicely.
Other than that, review the order of steps and how you do them, and see what the query looks like. I've been able to improve the queries sent to SQL Server quite a bit that way.