Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
najmead
Frequent Visitor

De-duplicating incrementally refreshed data

I have an issue with slow dataset loads, and have been exploring the use of incremental refreshes to improve performance.  However, my data is quickly getting out of sync with source.  My understanding of the incremental refresh process is that assuming I set a refresh window of 1 month, if a record that is created two months ago gets updated in source, it will essentially get loaded a second time -- since the original record has now fallen outside the refresh window.  This would certainly explain why my data is getting out of sync.

 

Obviously, I can extend the window, but I'm finding this doesn't really solve the problem -- in order to extend it sufficiently to cover my needs, performance starts to degrade again.

 

Consequently, is there any reason I can't de-duplicate the data with a step in Power Query?  eg, if I add steps after the filter for RangeStart and RangeEnd to sort by last_mod_date and then deduplicate by the primary key... will that work?  Or does the Power Query only apply on the partition that is being refreshed?

2 REPLIES 2
lbendlin
Super User
Super User

Obviously, I can extend the window, but I'm finding this doesn't really solve the problem 

You may be able to find a balance there that works good enough.

 

In any case, you are hitting on the true core of the issue here - incremental refresh is exactly that - incremental.  It expects data to be immutable once it has been added.  What you expect to get (differential refresh) is not what incremental refresh is about, and is not what it can deliver.

 

There is a small remedy in using the "detect data changes"  feature but it comes with a gigantic price tag. I'll let you find out what that is.

 

What you want to do is continue with your incremental refresh, and  occasionally (once a month for example) force a reprocess of a couple of the "older" partitions as well.  Doesn't even have to be a complete flush and fill of the dataset , you can use the XMLA endpoint to target individual partitions.


@lbendlin wrote:

There is a small remedy in using the "detect data changes"  feature but it comes with a gigantic price tag. I'll let you find out what that is.

I'm sorry, but I have no idea what this is referring to.

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors