Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi Guys,
Could you share some experience, perhaps some tips & tricks, concerning situations when [full/first] incremental refresh in the Service is taking very long?
We built a new report recently, which refreshes in between 20 and 30 minutes in Power BI Desktop (2 x Azure SQL sources, 20M each). This report is configured as incremental and after publishing, when I start the refresh for the first time, it times-out after 5 hours of refreshing.
My first suspicion was obviously the DB but I no longer think that, since I can refresh fully in Power BI Desktop in the reasonable time. When I run the same SQL in Azure Data Studio, it's even quicker (around 15 min. to finish both queries).
Last week I ran an initial incremental refresh in the Service for a different report and it loaded 33M records in 11 minutes. Same workspace, same SQL Server, same DB. Same report author.
My queries are not complicated at all, it's "select * from Table with (nolock)" which obviously is pretty intensive but on the other hand it needs to be done at least once before the incremental refresh kicks in. I introduced (nolock) hoping to speed up the query (there are multiple updates on records in my table all the time) but the report still times-out in the service after 5 hours (with or without a (nolock)).
Is Azure SQL DB at 100% during my query execution? Well, it obviously is, but the same is true when this query is run from either Desktop or Data Studio - so 100% doesn't seem to be a problem, it's just that the query never finishes. Is the query stuck? Well, it's not blocked, sometimes it waits with NULL wait_type, sometimes with IO_QUEUE_LIMIT, sometimes with PAGEIOLATCH_EX - but again, the same is true when I run this query from elsewhere and it finishes nicely every time.
One thing I see which troubles me a bit, is that when I check:
SELECT text, wait_type, *
FROM sys.dm_exec_requests
CROSS APPLY sys.dm_exec_sql_text(sql_handle)
... it always shows my main query without any parameters. I actually thought that the initial incremental refresh goes year by year.
It's not the first report we built or deployed to a given workspace, configured as incremental and fully refreshed from the same SQL Server. Such a situation only happened once in the past when I would mark "Detect data changes" checkbox and the report would never finish refreshing. But that issue is long gone, I no longer try any checkboxes 🙂
Any ideas what else I could check and try? Or how to even isolate the problem?
I couldn't find any logs/reports that would give my any visibility of how the actual machines behind the Power BI Service are performing.
I would appreciate any hints.
Thank you,
PKD
EDIT: Actually, turning off the second data source (as an attempt to isolate the problem) did not help at all. Report was re-published and the refresh timed-out as usual after being stuck for a couple of hours on select * from...
EDIT 2: I managed to look behind the scenes a bit. Having connected the logs from the Power BI Service, I can see that during the 4 hours (this is a time it has been refreshing so far) 16 one-year-big partitions have been processed (successfully). I don't know what's happening but it seems to me, that during this full refresh Power BI generated 100 queries, each one for a year, that look like this:
let __AS_Query__ = let RangeStart=DateTime.FromText("19220101T00:00:00"), RangeEnd=DateTime.FromText("19230101T00:00:00") in PolicyBased_<I replaced some id here>, __AS_Table__ = Table.FromValue(__AS_Query__), __AS_Compact__ = Table.RemoveColumns(__AS_Table__, Table.ColumnsOfType(__AS_Table__, { type table, type record, type list })), __AS_Effective__ = Table.TransformColumnNames(__AS_Compact__, Text.Clean), __M = #shared[[List.ConformToPageReader],[Graph.Nodes],[Value.Lineage]]?, __AS_TransformClassification__ = (value) => Value.ReplaceType(Table.Distinct(List.Accumulate(__M[Graph.Nodes](__M[Value.Lineage](value)), #table({"ColumnName", "LabelId"}, {}), (current, node) => Table.Combine({current, Table.SelectColumns(Table.ExpandRecordColumn(Table.SelectRows(Value.Traits(node), each ([Provider] = "SQL" or [Provider] = "MicrosoftInformationProtection") and [Identifier] = "Classification"), "Value", {"LabelId"}), {"ColumnName", "LabelId"}, MissingField.UseNull)}))), type table [ColumnName=nullable text, LabelId = text]), __AS_WithClassification__ = __M[List.ConformToPageReader]({__AS_Effective__, __AS_TransformClassification__}), __AS_Result__ = if __M[List.ConformToPageReader] = null then __AS_Effective__ else __AS_WithClassification__ in __AS_Result__
It looks like Power BI is querying year by year for 100 years. This 100 is caused by a setting which I introduced as part of the incremental refresh to tell Power BI that it should retain data for a 100-year-old period. It may sound silly, but there's no setting to say "take all data". Anyway, the thing is that my DB only contains data for 6 years (2015-2021) so obviously all the queries for years 1921 to 2014 should complete in no time. Provided that there is an index on ModifiedAt column which there is (ModifiedAt is used for incremental refresh and the whole RangeStart and RangeEnd processing).
Does that help? Can anyone suggest what can be going on here?
Solved! Go to Solution.
Hi @Federico , good point. Let me note down some insights I've had since:
My final point is (again speaking from experience): connecting 2 tables with 25+M records in a relationship is going to kill your CPU & memory. Not even a premium capacity is going to help you. I ended up denormalizing the data and solving all the memory problems by that. This may be irrelevant in your case, I'm only saying this because after I "solved" the incremental refresh problem (i.e. turned it off), I immediatelly found myself having memory problems. There might be better solutions of course.
Good luck 🙂
Hi, I would like to know if you can solve your problem and if you acn share the solution you found. Is happening the same to me. I have 25 million registers from two different origins and it keeps refreshing until the time-out error occurs.
Thank you
Hi @Federico , good point. Let me note down some insights I've had since:
My final point is (again speaking from experience): connecting 2 tables with 25+M records in a relationship is going to kill your CPU & memory. Not even a premium capacity is going to help you. I ended up denormalizing the data and solving all the memory problems by that. This may be irrelevant in your case, I'm only saying this because after I "solved" the incremental refresh problem (i.e. turned it off), I immediatelly found myself having memory problems. There might be better solutions of course.
Good luck 🙂
Hi @v-chenwuz-msft,
Thank you for your answer. I was not aware that Direct Query is not accepted in the incremental refresh. The article you quoted is clear on that (thanks!) - it's such a pity that Power BI Desktop is not. Yes, there is a warning that the query might not be foldable. But being a user it's hard to imagine what would be wrong with "select * from <table> with (nolock)" in terms of foldability. It would be really simple to just say in the Desktop that incremental and direct query don't work together. Well... I know now.
Back to a problem: can you actually explain what is going on in the background? The fact that the engine needs to pull all the rows, even year by year, does not scare me too much. I'm refreshing those 20M rows in a couple of minutes. But what is the reason that the full refresh is taking so long, especially when it's querying "partitions" that have no data at all? I'm seeking to understanding if there's something wrong with my data source/data set or can I just assume that Power BI is completely misbehaving if you use direct query with incremental. Again - it's not slow, there's clearly something wrong, because in 5 hours it managed to finish processing 6 yearly partitions for which there was no data at all (because data starts in 2015).
Thanks in advance for helping me understand the real reason behind this behaviour.
Hi @PeKaDe ,
I guess you have read this article already.
As you said ,there is no parameters in your main qurey, so I think query folding isn’t occurring in the Power BI Desktop model.
Incremental refresh can't use a native SQL query. So, it would force the Power Query mashup engine to retrieve all source rows, and then apply filters to determine incremental changes.
(https://docs.microsoft.com/en-us/power-bi/guidance/power-query-folding#guidance )
Maybe when you create your model had do something prevent folding
I recommend you to check your model when it can be foled. hope this will help you.
Best Regards
Community Support Team _ chenwu zhu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.