Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
amir_mm
Helper II
Helper II

Incremental refresh with polling expression

Hello,

I'm facing a dataset refresh issue due to reaching the memory capacity limit and we won't be able to upgrade the capacity. We have 2 large tables that include Created Time Stamp and Modified Time Stamp columns, so I cannot implement regular incremental refresh because we could have a Created Time Stamp, for example, in July 2023 and a Modified Time Stamp today.

 

I read this amazing solution by @cpwebb and I was wondering to know if that would be applicable in my case?

 

My goal is to do some partitioning in the large tables and auto-refresh those partitions (I already created a powershell script in Azure Runbooks and connected to Logic app), I just need to make partitioning work.


I think I cannot go with the "detect data change" in power BI desktop as I have to choose a different column for that (Modified Time Stamp) and then it will only refresh the last partition with any change in Modified Time Stamp column but I need Created Time Stamp to be refreshed as well. 


I reviewed the article by Chris Webb, and followed the steps carefuly, but the "refreshBookmark" does not appear in the script. 
I checked this post with similar issue but the solution was not applicable in my case.

 

Any idea is much appreciated! 

 

 

2 ACCEPTED SOLUTIONS
lbendlin
Super User
Super User

Go with the timestamp that has the most even distribution (likely the created date).  Create a large enough "hot"  window to cover most of the jitter (modifications happening soon after creation).  For example make your "hot" partitions the last three months instead of the last month.  Then use the Modified date to catch any entries that fall outside the "hot"  window and manually refresh their partitions if needed.  Plan on doing a full refresh (or a sequential manual refresh of all partitions)  every now and then.

View solution in original post

No, based on what you've said the problem is with the performance of the SQL queries themselves and nothing to do with Power BI. You need to get a DBA or someone familiar with SQL performance tuning to help you here.

View solution in original post

10 REPLIES 10
cpwebb
Employee
Employee

Based on what you told me in the discussion on your other thread, no, I don't think this will help much - what you need to do is tune the underlying data source so that the queries generated for each partition are faster.

Thanks a lot @cpwebb 

 

Currently, we have a 6-month dataset, including a large table with over 2 million rows. The entire dataset is refreshed in the Power BI service every 30 minutes, taking maximum 2-3 minutes.

 

For the other dataset (3 years), which has refreshing issues, I partitioned the large table and each partition contains a maximum of 700k rows. but, each of these partitions takes not less than 2 minutes to refresh, which doesn't make sense.

 

Could this be because the Power BI service allocates more resources than PowerShell in Azure Runbooks?

 

Thanks again. 

No, based on what you've said the problem is with the performance of the SQL queries themselves and nothing to do with Power BI. You need to get a DBA or someone familiar with SQL performance tuning to help you here.

These runtimes and row counts do not warrant incremental refresh. Your partitions should hold about half as many rows as you can fetch without hitting either a data source timeout or the 5hr limit for partition refresh.

Thank you @cpwebb and @lbendlin.

I asked the team to optimize the SQL queries, but they didn't come up with any solutions.

The problem was with refreshing our largest table, as I kept facing memory capacity errors.

I tried removing all the calculated columns from the table, which led to a successful refresh. (There were over 40 calculated columns.)

In an attempt to pinpoint the problematic columns, I removed 7 columns with seemingly complex queries, and again, the refresh was successful.

But, it appears that these specific columns might not be the root cause, as I kept them and removed the rest, and the refresh still succeeded. It seems like the table has a memory threshold that we shouldn't exceed!!

 

Is there a way to determine exactly which columns are causing the issue? Unfortunately, we can't remove them from the table.

 

Thanks again. 

We ran into the same issue today. Ingesting a 46 GB dataflow works fine by itself, but as soon as you have calculated columns on that query there's a high likelihood of maxing out the machine memory (even on a 64 GB RAM PC). The spike clearly happens after the dataflow has fully loaded and the calculated columns are starting to be computed.

 

You can try to use DAX Studio Metrics to see what the estimated memory usage for these calculated columns is but it's not an exact science.

 

As much as it pains me to say it - as this is a calculated column it may be possible to push that change into the Power Query for the dataflow, or in your case into the SQL view.

+1 to this, not using calculated columns and implementing the same logic further upstream will be the best way to solve this problem.

 

I don't think there's any way of breaking down memory usage during refresh by calculated column, and even if there was it might not be helpful because variations in how objects are refreshed in parallel would affect the overall memory usage.

 

Which error message are you getting now exactly? Is it one of the errors mentioned here: https://learn.microsoft.com/en-us/power-bi/enterprise/troubleshoot-xmla-endpoint#resource-governing-... ?

Thanks again @lbendlin and @cpwebb .

Does implementing the same logic of calculated columns in Power Query, which requires merging tables and some transformations, affect memory and performance again?

 

@cpwebb Yes, the error message is: Resource Governing: This operation was canceled because there wasn't enough memory to finish running it. Either reduce the memory footprint of your dataset by doing things such as limiting the amount of imported data, or if using Power BI Premium, increase the memory of the Premium capacity where this dataset is hosted. More details: consumed memory 3464 MB, memory limit 3464 MB, database size before command execution 1655 MB. 

 

We are on premium Embedded with A2 capacity (5 GB memory).

 

In our largest table (the table causing the issue), we have 56 columns, comprising 15 native columns, 20 calculated columns using only the 'RELATED' function, and 21 other calculated columns.

I randomly removed 7 columns, those that appeared more complex and that resolved the issue.

 

These are some of the columns I removed:

Media(Latest_Photo_jpg) =
VAR Last_date =
    CALCULATE ( MAX ( Medias[CreatedOn]) )
RETURN
CALCULATE (
        LASTNONBLANK ( Medias[Media_URL_jpg], 1 ),
        FILTER (ALL(Medias),
        Medias[FollowUpId]=FollowUps[Id]),
        Medias[CreatedOn]=Last_date)

 

Description(Latest_Notes) =
VAR Last_date_1 =
    CALCULATE ( MAX ( Notes[CreatedOn]) )
RETURN
CALCULATE (
        LASTNONBLANK ( Notes[Description], 1 ),
        FILTER (ALL(Notes),
        Notes[FollowUpId]=FollowUps[ID]),
        Notes[CreatedOn]=Last_date_1)
 
Follow-up URL = IF(FollowUps[IsFollowUp]= TRUE(),"https://--------.com/#/--------/------/"&FollowUps[Id],BLANK())
 
Closed_Date = IF(FollowUps[Status] = "Closed",IF( FollowUps[LogHistory(Latest)] = BLANK(), FollowUps[CreatedOn] , FollowUps[LogHistory(Latest)] ), BLANK())
 
Thanks a lot for looking into this. Much appreciated.

It looks like the way you're using the FILTER function to filter entire tables in CALCULATE could be the problem. See https://xxlbi.com/blog/power-bi-antipatterns-9/

lbendlin
Super User
Super User

Go with the timestamp that has the most even distribution (likely the created date).  Create a large enough "hot"  window to cover most of the jitter (modifications happening soon after creation).  For example make your "hot" partitions the last three months instead of the last month.  Then use the Modified date to catch any entries that fall outside the "hot"  window and manually refresh their partitions if needed.  Plan on doing a full refresh (or a sequential manual refresh of all partitions)  every now and then.

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

Top Solution Authors