Solved: Incremental refresh with polling expression

amir_mm · ‎02-27-2024

Hello,

I'm facing a dataset refresh issue due to reaching the memory capacity limit and we won't be able to upgrade the capacity. We have 2 large tables that include Created Time Stamp and Modified Time Stamp columns, so I cannot implement regular incremental refresh because we could have a Created Time Stamp, for example, in July 2023 and a Modified Time Stamp today.

I read this amazing solution by @cpwebb and I was wondering to know if that would be applicable in my case?

My goal is to do some partitioning in the large tables and auto-refresh those partitions (I already created a powershell script in Azure Runbooks and connected to Logic app), I just need to make partitioning work.

I think I cannot go with the "detect data change" in power BI desktop as I have to choose a different column for that (Modified Time Stamp) and then it will only refresh the last partition with any change in Modified Time Stamp column but I need Created Time Stamp to be refreshed as well.

I reviewed the article by Chris Webb, and followed the steps carefuly, but the "refreshBookmark" does not appear in the script.
I checked this post with similar issue but the solution was not applicable in my case.

Any idea is much appreciated!

lbendlin · ‎02-27-2024

Go with the timestamp that has the most even distribution (likely the created date). Create a large enough "hot" window to cover most of the jitter (modifications happening soon after creation). For example make your "hot" partitions the last three months instead of the last month. Then use the Modified date to catch any entries that fall outside the "hot" window and manually refresh their partitions if needed. Plan on doing a full refresh (or a sequential manual refresh of all partitions) every now and then.

View solution in original post

cpwebb · ‎02-28-2024

No, based on what you've said the problem is with the performance of the SQL queries themselves and nothing to do with Power BI. You need to get a DBA or someone familiar with SQL performance tuning to help you here.

View solution in original post

cpwebb · ‎02-28-2024

Based on what you told me in the discussion on your other thread, no, I don't think this will help much - what you need to do is tune the underlying data source so that the queries generated for each partition are faster.

amir_mm · ‎02-28-2024

Thanks a lot @cpwebb

Currently, we have a 6-month dataset, including a large table with over 2 million rows. The entire dataset is refreshed in the Power BI service every 30 minutes, taking maximum 2-3 minutes.

For the other dataset (3 years), which has refreshing issues, I partitioned the large table and each partition contains a maximum of 700k rows. but, each of these partitions takes not less than 2 minutes to refresh, which doesn't make sense.

Could this be because the Power BI service allocates more resources than PowerShell in Azure Runbooks?

Thanks again.

cpwebb · ‎02-28-2024

No, based on what you've said the problem is with the performance of the SQL queries themselves and nothing to do with Power BI. You need to get a DBA or someone familiar with SQL performance tuning to help you here.

lbendlin · ‎02-28-2024

These runtimes and row counts do not warrant incremental refresh. Your partitions should hold about half as many rows as you can fetch without hitting either a data source timeout or the 5hr limit for partition refresh.

amir_mm · ‎03-06-2024

Thank you @cpwebb and @lbendlin.

I asked the team to optimize the SQL queries, but they didn't come up with any solutions.

The problem was with refreshing our largest table, as I kept facing memory capacity errors.

I tried removing all the calculated columns from the table, which led to a successful refresh. (There were over 40 calculated columns.)

In an attempt to pinpoint the problematic columns, I removed 7 columns with seemingly complex queries, and again, the refresh was successful.

But, it appears that these specific columns might not be the root cause, as I kept them and removed the rest, and the refresh still succeeded. It seems like the table has a memory threshold that we shouldn't exceed!!

Is there a way to determine exactly which columns are causing the issue? Unfortunately, we can't remove them from the table.

Thanks again.

lbendlin · ‎03-06-2024

We ran into the same issue today. Ingesting a 46 GB dataflow works fine by itself, but as soon as you have calculated columns on that query there's a high likelihood of maxing out the machine memory (even on a 64 GB RAM PC). The spike clearly happens after the dataflow has fully loaded and the calculated columns are starting to be computed.

You can try to use DAX Studio Metrics to see what the estimated memory usage for these calculated columns is but it's not an exact science.

As much as it pains me to say it - as this is a calculated column it may be possible to push that change into the Power Query for the dataflow, or in your case into the SQL view.

cpwebb · ‎03-08-2024

+1 to this, not using calculated columns and implementing the same logic further upstream will be the best way to solve this problem.

I don't think there's any way of breaking down memory usage during refresh by calculated column, and even if there was it might not be helpful because variations in how objects are refreshed in parallel would affect the overall memory usage.

Which error message are you getting now exactly? Is it one of the errors mentioned here: https://learn.microsoft.com/en-us/power-bi/enterprise/troubleshoot-xmla-endpoint#resource-governing-... ?

amir_mm · ‎03-11-2024

Thanks again @lbendlin and @cpwebb .

Does implementing the same logic of calculated columns in Power Query, which requires merging tables and some transformations, affect memory and performance again?

@cpwebb Yes, the error message is: Resource Governing: This operation was canceled because there wasn't enough memory to finish running it. Either reduce the memory footprint of your dataset by doing things such as limiting the amount of imported data, or if using Power BI Premium, increase the memory of the Premium capacity where this dataset is hosted. More details: consumed memory 3464 MB, memory limit 3464 MB, database size before command execution 1655 MB.

We are on premium Embedded with A2 capacity (5 GB memory).

In our largest table (the table causing the issue), we have 56 columns, comprising 15 native columns, 20 calculated columns using only the 'RELATED' function, and 21 other calculated columns.

I randomly removed 7 columns, those that appeared more complex and that resolved the issue.

These are some of the columns I removed:

Media(Latest_Photo_jpg) =

VAR Last_date =

CALCULATE ( MAX ( Medias[CreatedOn]) )

RETURN

CALCULATE (

LASTNONBLANK ( Medias[Media_URL_jpg], 1 ),

FILTER (ALL(Medias),

Medias[FollowUpId]=FollowUps[Id]),

Medias[CreatedOn]=Last_date)

Description(Latest_Notes) =

VAR Last_date_1 =

CALCULATE ( MAX ( Notes[CreatedOn]) )

RETURN

CALCULATE (

LASTNONBLANK ( Notes[Description], 1 ),

FILTER (ALL(Notes),

Notes[FollowUpId]=FollowUps[ID]),

Notes[CreatedOn]=Last_date_1)

Follow-up URL = IF(FollowUps[IsFollowUp]= TRUE(),"https://--------.com/#/--------/------/"&FollowUps[Id],BLANK())

Closed_Date = IF(FollowUps[Status] = "Closed",IF( FollowUps[LogHistory(Latest)] = BLANK(), FollowUps[CreatedOn] , FollowUps[LogHistory(Latest)] ), BLANK())

Thanks a lot for looking into this. Much appreciated.

cpwebb · ‎03-11-2024

It looks like the way you're using the FILTER function to filter entire tables in CALCULATE could be the problem. See https://xxlbi.com/blog/power-bi-antipatterns-9/

lbendlin · ‎02-27-2024

Go with the timestamp that has the most even distribution (likely the created date). Create a large enough "hot" window to cover most of the jitter (modifications happening soon after creation). For example make your "hot" partitions the last three months instead of the last month. Then use the Modified date to catch any entries that fall outside the "hot" window and manually refresh their partitions if needed. Plan on doing a full refresh (or a sequential manual refresh of all partitions) every now and then.

Incremental refresh with polling expression

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

Jumpstart your career with the Fabric Career Hub

Incremental refresh with polling expression

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024