Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
DebbieE
Community Champion
Community Champion

Premium Incremental Refresh Detect data changes (How does it identify the correct row?)

Doing Incremental Refresh in Premium

I have a last update date in my data. When I detect data changes set to on, it checks the row and if this date has changed then the record is updated. 

However how does it know that the record in Power BI and the record coming in is the same record?

 

I have searched the documentation and cant find anything on this. Im assuming its the Primary key of the record or the key ? However this isnt set anywhere tht I can see

 

What about when you are loading in the fact table? The fact table contains foreign keys. Basically how doesthe incremental refresh know that this is the row that replaces the row in Power BI?

https://docs.microsoft.com/en-us/power-bi/service-premium-incremental-refresh#detect-data-changes Doesnt seem to answer this question

28 REPLIES 28

Would incremental refresh work in the following scenario?

 

Data as of March 2nd 2020:

 

DateKeyTotal SalesLast Modified
March 1st 20201300March 1st 2020
March 2nd 20202200March 2nd 2020

 

Data has changed to the following as of March 3rd 2020

 

DateKeyTotal SalesDate Modified
March 1st 20201400March 3rd 2020
March 2nd 20202600March 3rd 2020

 

i.e. Does incremental refresh work if updates are only made against the same row rather than as NEW ROWS?

Hi there

If you have enabled the "Detect Data Changes" and put it against your Date Modified column then yes it should work.




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Thanks GilbertQ

 

So what does this mean on the MS support article for incremental refresh? Reason I asked was I thought it had something to do with my example above

 

"The filter on the date column is used to dynamically partition the data into ranges in the Power BI service. Incremental refresh isn't designed to support cases where the filtered date column is updated in the source system. An update is interpreted as an insertion and a deletion, not an actual update. If the deletion occurs in the historical range and not the incremental range, it won’t get picked up. This can cause data refresh failures due to partition-key conflicts."

Hi there

From my understanding the above issue would only happen if historical data was updated and then that caused the incremental dataset to have partition-key conflicts because it has got duplicate values?




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Hi Gilbert

 

I don't understand what you mean sorry - duplicate values where?

 

Thank you

There is the potential for duplicate values if the incremental refresh was misconfigured.




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

DebbieE
Community Champion
Community Champion

Before I look at this apologies. I put 2019 rather than 2020. Its still habit at the moment

Jay Swaminarayan!

 

This case was exteremely helpful to understand the underlying mechanism of Microsoft PowerBI of Detect Changes with Incremental Refresh and How it works?

 

The question:

How does PowerBI detect changes know, what records to update based on LastUpdateDate without taking Id or Key columns into Account?

 

Note: the following answer is based on my understanding about it.

 

With this example

 

Assume these Refresh Policy Configuration:

Historical Days: 60 Months

Refresh Days:  10 Days

Range Column : OrderDate

Detect Changes: LastUpdateDate

 

 

Date of Upload 25/01/2020 (chanaged for better understanding)

In Power BI already

------------------------------Original Case---------------------

Key    ID     Surname          OrderDate         lastUpdatedate

1       24     Smyth              15/01/2020         15/01/2020

2       48      Jonis               15/01/2020         15/01/2020

3       67      Ellis                15/01/2020          15/01/2020

4       81      Truman          15/01/2020          15/01/2020

 

And you upload this data

Key    ID     Surname     OrderDate              lastUpdatedate

1       24     Smith          15/01/2020              27/09/2019

2       48      Jones          15/01/2020             27/09/2019

3       67      Ellis                15/01/2020          15/01/2020

4       81      Truman          15/01/2020          15/01/2020

-------------------------------------------------------------------

 

For sake of better understanding we add few more rows

------------------------------Original Case---------------------

Key    ID     Surname          OrderDate         lastUpdatedate

1       24     Smyth             15/01/2019

2       48      Jonis              15/01/2019

3       67      Ellis                15/02/2019

4       81      Truman          15/01/2020          15/01/2020

5       54     NSmyth           15/01/2020          15/01/2020

6       58      NJonis            16/01/2020          16/01/2020

7       57      NEllis              16/01/2020          16/01/2020

8       51      NTruman        18/01/2020          18/01/2020

 

 

And we are intending to update this data

Key    ID     Surname          OrderDate         lastUpdatedate

1       24     Smith             15/01/2019         24/01/2019

2       48      Jonis              15/01/2019          15/01/2019

3       67      EllisU               15/02/2019          23/02/2019

4       81      Truman          15/01/2020          15/01/2020

5       54     NSmyth           15/01/2020          24/01/2020

6       58      NJonis            16/01/2020          23/01/2020

7       57      NEllis              16/01/2020          16/01/2020

8       51      NTruman        18/01/2020          18/01/2020

-------------------------------------------------------------------

Records in REDs are updated on respective dates.

 

Now lets understand how PowerBI makes History and Refresh Partitions

 

Historical Partitions based on Months, it must create 60 partitions for each month. Therefore our data will sit this way.

Like....

Key    ID     Surname          OrderDate         lastUpdatedate

-----------------------------HISTORY-P1------------------------

1       24     Smyth             15/01/2019         15/01/2019

2       48      Jonis              15/01/2019         15/01/2019

-----------------------------HISTORY-P2------------------------

3       67      Ellis                15/02/2019          15/02/2019

-----------------------------HISTORY-P3------------------------

4       81      Truman          15/01/2020          15/01/2020

5       54     NSmyth           15/01/2020          15/01/2020

 

Refresh Partitions are created based on granuality, which is days (10 days). Therefore it will create 10 Refresh Partition for each day.

Like....

-----------------------------REFRESH-P1------------------------

6       58      NJonis            16/01/2020          16/01/2020

7       57      NEllis              16/01/2020          16/01/2020

-----------------------------REFRESH-P2------------------------

8       51      NTruman        18/01/2020          18/01/2020

 

Note here that

Detect changes only works with Refresh Range, therefore only Refresh Partitions are updated and the History partitions are just IGNORED. And thus any data updated in the historic periods are not considered at all.
PowerBI maintains the last MAX Updated Value for each Refresh Partitions
REFRESH P1- MAX Last Updated Date = 16/01/2020
REFRESH P2- MAX Last Updated Date = 18/01/2020

 

Now Refresh Operation takes place and gets following data

 

Key    ID     Surname          OrderDate         lastUpdatedate

1       24     Smith             15/01/2019         24/01/2019

2       48      Jonis              15/01/2019          15/01/2019

3       67      EllisU               15/02/2019          23/02/2019

4       81      Truman          15/01/2020          15/01/2020

5       54     NSmyth           15/01/2020          24/01/2020

--------------^^^IGNORED as its in HISTORY PARTITION^^^^------------

 

-----------------------------REFRESH-P1------------------------

6       58      NJonis            16/01/2020          23/01/2020

7       57      NEllis              16/01/2020          16/01/2020

 

-----------------------------REFRESH-P2------------------------

8       51      NTruman        18/01/2020          18/01/2020

 

Now, PowerBI checks the MAX LastUpdateDate for incoming data

For REFRESH-P1 : New: 23/01/2020 (> Old: 16/01/2020)
For REFRESH-P2 : New: 18/01/2020 (== Old: 18/01/2020)
Thus, It has DETECTED Changes in REFRESH-P1 partition, and not in REFRESH-P2.

Therefore all the records will be deleted from REFRESH-P1 partitions (Id 6 & 7 both will be deleted) and Insterted back with the new records (of Id 6 & 7).

It will NOT only delete Id(6) but both as it resides in same partition.

 

And REFRESH-P2 will be ignored completely.

 

This is how PowerBI detect record changes based on LastUpdateDate without using ID/Key columns as it is replacing whole partition.

 

Hope that makes sense.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors