Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Handling of Deleted data in incremental refresh

Hi All

 

I wanted to understand how the rows which are deleted from table are handled in incremental refresh dataset.

Does it identify the deleted rows and delete from dataset OR Do we need to do full refresh in PBI desktop and then publish?

 

Thanks & regards

Harsh Dhawan

9 REPLIES 9
clayhs
Frequent Visitor

  • @GilbertQ What @Arthemis_Runner said, I'm seeing the same issue.
  • I have a record with
    DateCreated: 2022-01-01
    DateUpdated: 2023-01-01
    DateDeleted: NULL
  • My incremental refresh is functioning properly as if I change DateUpdated to 2024-01-01, it will trigger the      partition covering DateCreated: 2022 as the MaxDateUpdated has changed since it was last triggered.
  • However, if update the DateDeleted to 2024-01-01 andI change the filtering portion of my SourceExpression from 
    #"Filtered Rows" = Table.SelectRows (Data, each [DateCreated] >= RangeStart and [DateCreated] <= RangeEnd)
    to
    #"Filtered Rows" = Table.SelectRows (Data, each [DateCreated] >= RangeStart and [DateCreated] <= RangeEnd and [DateDeleted] = null)

    then the partition is not triggered as the record with the updated DateUpdated which would trigger the partition has (I'm guessing here) been filtered out prior to the partition being triggered due to the record no longer have DateDeleted of NULL.

    In summary I need to have a record be soft deleted (have DateDeleted go from NULL to X, have DateUpdated go from Y to X), and for that change to DateUpdated to trigger the partition, and during the ingestion of that partition filter out the record that caused that partition to trigger as it now has a value for DateDeleted.

    Any insights would be greatly appreciated. Seems maybe something has changed as I've watch a couple of Patricks videos and the apparent assumption that you can just filter out the records triggering the paritition to process doesn't seem to hold up anymore.

    FYI: I've also tried the variation below to no avail. Results are the same.
    #"Filtered Rows" = Table.SelectRows (Data, each [DateCreated] >= RangeStart and [DateCreated] <= RangeEnd),
    #"Filtered Rows2" = Table.SelectRows (#"Filtered Rows", each [DateDeleted] = null)
GilbertQ
Super User
Super User

I am not sure as it depends on your data!





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Key columns: Created date, ETL Date, IsDeleted Flag.

Insert, Update, Delete is possible in any partition at any point

Table to be incrementally refreshed is a fact table 

 

I have access to make database modificaitons if required.

 

Right now the only solution I can think of is to include - where IsDeleted = False, in the RLS logic for all our RLS groups.

GilbertQ
Super User
Super User

Hi there

When using Incremental refresh it is done in the backend in the Power BI Service where the model is hosted.

The process actually drops the partitions which are considered to be outside of the data window of data to keep.





Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Anonymous
Not applicable

Thanks Gilbert for your reply

 

I have a table which contains 12 million rows. and it is modelled based on row level permissions from source system

This table is truncated and loaded on daily basis, Now I am being asked to look at the possibility to do incremental refresh for dataset which is loading this table

 

In case user permissions are removed from any record in source table, that data rows will not be loaded in the next etl run

 

I am not sure how that row will be removed from power bi dataset during the next incremental refresh if it no longer exists in source table now

 

Hope my question makes sense!

 

Cheers

 

 

Hi there

If I understand your question correctly let me try and explain it by like this.

When using incremental refreshing in Power BI, older data will not be refreshed or imported again into Power BI.

Based on your explanation above that would mean that certain data must be deleted or changed because the underlying data has changed.

You have 2 options, the first one is to just reload all the data. I currently have a project where I am uploading 15 million rows and it refreshes in about 20minutes.

Option number 2 is that you can use the detect data changes when using Incremental refresh with Power BI. Here is a great video explaining how to do it by GuyInaCube: https://www.youtube.com/watch?v=JsJWBr1_ktQ




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Anonymous
Not applicable

Handling deletes is just what I am after. Thanks for posting that link  🙂

 

The approach outlined by Patric is not ideal but it's the next best thing - until that capability is added to incremental refresh.

Yeah for sure, I am hopeful that over time the incremental refresh story will improve.




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Hi @GilbertQ Patrick's Method in the link doesn't seem to be working. The polling expression folds the query with the delete filter back to the source. This means that the max(lastupdateddate) queries will exclude deleted rows, hence that partition will not be updated.

 

Do you know of any way to handle this?

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors