Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
petersmith
New Member

Dataflow incremental refresh with primary key

Hi All,

 

Loving dataflows, but would love the ability to do an incremental refresh in more of an 'upsert' style. That is, maintain the uniqueness of a primary key within the dataset. Any ideas on how to do this? I assume it will look something like excuting some sort of M-style merge after the dataflow incremental refresh pulls in more data.


For refrence my table size is ~1 million rows.

15 REPLIES 15
Anonymous
Not applicable

I SO GLAD someone asked this question too!

Currently incremental refresh works for the data was updated based on datetime stamp, say last 10 days. It does not take into consideration the Primary Key that might have been inserted in the past.
Eg: lets say a "Sales Order" was created 3 months with a Status "Open", but only today its status was changed to "Shipped". Since I am only looking for past 10 days data for incremental refresh, a new Sales Order row gets inserted into the Dataset causing my "Sales Order" table to be duplicated.

How can we avoid this? Ideally along with the Datetime stamp, there needs to be a option where we specify the Unique Key column too. If incremental refresh contains any of the Unique Keys loaded in the past, they need to be deleted too and reinserted.

Anonymous
Not applicable

Hey SamRock,

 

Any update regarding consideration of primary key when incrementally refreshing? I have a similar situation and I am not sure Power BI dataflows can solve my issue.

Hi there

Currently incremental refresh only works with dates and unfortunately not with a primary key




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Anonymous
Not applicable

Is there any timeline for if/when it will be added as a feature for dataflows? This can be seen as a critical feature for refreshing data dependent on past state changes (i.e. Orders, Status, Deliveries, Inventory, Requests, etc)

The underlying technology for incremental refresh seems to be based on Partitions from SSAS. Therefore my guess is I don't think this will fundamentally change to support a primary key upsert any time soon.

 

Assuming therefore that when a record is updated it will appear in your incremental refresh therefore creating a duplicate you should add extra steps in your MCode to remove duplicates keeping the Max Primary Key therefore pruning the data in your Dataflows ETL process.

Hi there

I do not think that this is something that will be happening anytime soon.

You could have an additional column which is the modified datetime which could be used to update?




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

This possibilite already able disponible to users Pro?

markive
Advocate II
Advocate II

I assume the meaning of the question is similar to my own..

 

I can use a Date field in my data to sync only the last day for example. But my 'Upsert' question is will it be able to replace existing data if the primary key shows it already exists?

 

Lets say I have 10 records, one record created each day. If a user of my system updates record 2 that was originally created 2 says ago but updated today, will the incremental refresh 'Upsert' it? Or will this break the refresh?

 

When I read about partitions and how incremental refresh seemed to work in original Analysis Services it seems to be purely additive for new data incrementally.

Hi there

That would work as either by making the incremental refresh look at the DateTime column of the updated data.

Or you could look at the following below when setting it up

 

image.png





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Hi there

 

What you can do is to also enable the option for Detect Changes, where you can update the column based on a date, which will be included in the incremental process

 

image.png





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Anonymous
Not applicable

Hi @GilbertQ  Thanks. But this does not take care of my scenario. If only depends on Max date in "Changes detected". If my date column is not falling in the rangeof teh incremental refresh cycle , it will still miss the older rows. 

Let me know if I missed something

Hi there

My understanding is that it will also look for changes to the changed column and if it is newer than the last date, it will process that data too?




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Anonymous
Not applicable

Yes it does.. but you will need another date column thats different from the Incremental refresh date.
Do you have any example or scenario that can help me understand how to use this option?

Hi there

Here are more details on how it works

https://docs.microsoft.com/en-us/power-bi/service-dataflows-incremental-refresh




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

GilbertQ
Super User
Super User

Hi there

As it currently stands you have to use a DateKey in order to do the incremental refreshing.

THis will allow you to possibly only refresh the latest day. And with only having 1 million rows in total, it should be really quick.




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors