cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
PeKaDe
Frequent Visitor

Never-ending incremental refresh in the Service

Hi Guys,

Could you share some experience, perhaps some tips & tricks, concerning situations when [full/first] incremental refresh in the Service is taking very long?

 

We built a new report recently, which refreshes in between 20 and 30 minutes in Power BI Desktop (2 x Azure SQL sources, 20M each). This report is configured as incremental and after publishing, when I start the refresh for the first time, it times-out after 5 hours of refreshing.

My first suspicion was obviously the DB but I no longer think that, since I can refresh fully in Power BI Desktop in the reasonable time. When I run the same SQL in Azure Data Studio, it's even quicker (around 15 min. to finish both queries).

Last week I ran an initial incremental refresh in the Service for a different report and it loaded 33M records in 11 minutes. Same workspace, same SQL Server, same DB. Same report author.

 

My queries are not complicated at all, it's "select * from Table with (nolock)" which obviously is pretty intensive but on the other hand it needs to be done at least once before the incremental refresh kicks in. I introduced (nolock) hoping to speed up the query (there are multiple updates on records in my table all the time) but the report still times-out in the service after 5 hours (with or without a (nolock)).

Is Azure SQL DB at 100% during my query execution? Well, it obviously is, but the same is true when this query is run from either Desktop or Data Studio - so 100% doesn't seem to be a problem, it's just that the query never finishes. Is the query stuck? Well, it's not blocked, sometimes it waits with NULL wait_type, sometimes with IO_QUEUE_LIMIT, sometimes with PAGEIOLATCH_EX - but again, the same is true when I run this query from elsewhere and it finishes nicely every time.

 

One thing I see which troubles me a bit, is that when I check:

SELECT text, wait_type, *

FROM sys.dm_exec_requests

CROSS APPLY sys.dm_exec_sql_text(sql_handle)

... it always shows my main query without any parameters. I actually thought that the initial incremental refresh goes year by year.

 

It's not the first report we built or deployed to a given workspace, configured as incremental and fully refreshed from the same SQL Server. Such a situation only happened once in the past when I would mark "Detect data changes" checkbox and the report would never finish refreshing. But that issue is long gone, I no longer try any checkboxes 🙂

Any ideas what else I could check and try? Or how to even isolate the problem?

I couldn't find any logs/reports that would give my any visibility of how the actual machines behind the Power BI Service are performing.

 

I would appreciate any hints.

Thank you,

PKD

 

EDIT: Actually, turning off the second data source (as an attempt to isolate the problem) did not help at all. Report was re-published and the refresh timed-out as usual after being stuck for a couple of hours on select * from...

 

EDIT 2: I managed to look behind the scenes a bit. Having connected the logs from the Power BI Service, I can see that during the 4 hours (this is a time it has been refreshing so far) 16 one-year-big partitions have been processed (successfully). I don't know what's happening but it seems to me, that during this full refresh Power BI generated 100 queries, each one for a year, that look like this:

 

let __AS_Query__ = let RangeStart=DateTime.FromText("19220101T00:00:00"),  RangeEnd=DateTime.FromText("19230101T00:00:00")  in  PolicyBased_<I replaced some id here>, __AS_Table__ = Table.FromValue(__AS_Query__), __AS_Compact__ = Table.RemoveColumns(__AS_Table__, Table.ColumnsOfType(__AS_Table__, { type table, type record, type list })), __AS_Effective__ = Table.TransformColumnNames(__AS_Compact__, Text.Clean), __M = #shared[[List.ConformToPageReader],[Graph.Nodes],[Value.Lineage]]?, __AS_TransformClassification__ = (value) => Value.ReplaceType(Table.Distinct(List.Accumulate(__M[Graph.Nodes](__M[Value.Lineage](value)), #table({"ColumnName", "LabelId"}, {}), (current, node) => Table.Combine({current, Table.SelectColumns(Table.ExpandRecordColumn(Table.SelectRows(Value.Traits(node), each ([Provider] = "SQL" or [Provider] = "MicrosoftInformationProtection") and [Identifier] = "Classification"), "Value", {"LabelId"}), {"ColumnName", "LabelId"}, MissingField.UseNull)}))), type table [ColumnName=nullable text, LabelId = text]), __AS_WithClassification__ = __M[List.ConformToPageReader]({__AS_Effective__, __AS_TransformClassification__}), __AS_Result__ = if __M[List.ConformToPageReader] = null then __AS_Effective__ else __AS_WithClassification__ in __AS_Result__

It looks like Power BI is querying year by year for 100 years. This 100 is caused by a setting which I introduced as part of the incremental refresh to tell Power BI that it should retain data for a 100-year-old period. It may sound silly, but there's no setting to say "take all data". Anyway, the thing is that my DB only contains data for 6 years (2015-2021) so obviously all the queries for years 1921 to 2014 should complete in no time. Provided that there is an index on ModifiedAt column which there is (ModifiedAt is used for incremental refresh and the whole RangeStart and RangeEnd processing).

Does that help? Can anyone suggest what can be going on here?

2 REPLIES 2
PeKaDe
Frequent Visitor

Hi @v-chenwuz-msft,

Thank you for your answer. I was not aware that Direct Query is not accepted in the incremental refresh. The article you quoted is clear on that (thanks!) - it's such a pity that Power BI Desktop is not. Yes, there is a warning that the query might not be foldable. But being a user it's hard to imagine what would be wrong with "select * from <table> with (nolock)" in terms of foldability. It would be really simple to just say in the Desktop that incremental and direct query don't work together. Well... I know now.

 

Back to a problem: can you actually explain what is going on in the background? The fact that the engine needs to pull all the rows, even year by year, does not scare me too much. I'm refreshing those 20M rows in a couple of minutes. But what is the reason that the full refresh is taking so long, especially when it's querying "partitions" that have no data at all? I'm seeking to understanding if there's something wrong with my data source/data set or can I just assume that Power BI is completely misbehaving if you use direct query with incremental. Again - it's not slow, there's clearly something wrong, because in 5 hours it managed to finish processing 6 yearly partitions for which there was no data at all (because data starts in 2015).

 

Thanks in advance for helping me understand the real reason behind this behaviour.

v-chenwuz-msft
Community Support
Community Support

Hi @PeKaDe ,

 

I guess you have read this article already.

As you said ,there is no parameters in your main qurey, so I think query folding isn’t occurring in the Power BI Desktop model.

Incremental refresh can't use a native SQL query. So, it would force the Power Query mashup engine to retrieve all source rows, and then apply filters to determine incremental changes.

(https://docs.microsoft.com/en-us/power-bi/guidance/power-query-folding#guidance  )

Maybe when you create your model had do something prevent folding

https://docs.microsoft.com/en-us/power-query/power-query-folding#transformations-that-prevent-foldin...

I recommend you to check your model when it can be foled. hope this will help you.

  

Best Regards

Community Support Team _ chenwu zhu

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
May 23 2022 epsiode 5 without aka link.jpg

The Power BI Community Show

Welcome to the Power BI Community Show! Jeroen ter Heerdt talks about the importance of Data Modeling.

charticulator_carousel_with_text (1).png

Charticulator Design Challenge

Put your data visualization and design skills to the test! This exciting challenge is happening now through May 31st!

Power BI Dev Camp Session 22 with aka link 768x460.jpg

Check it out!

Mark your calendars and join us on Thursday, May 26 at 11a PDT for a great session with Ted Pattison!

Power BI Release May 2022 768x460.png

Check it out!

Click here to read more about the May 2022 updates!

Top Solution Authors
Top Kudoed Authors