Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Dayna
Helper V
Helper V

Incremental refresh causing duplicate rows

Hello,

 

I've implemented incremental refreshes on one of our fact tables, called 'Transactions'.

 

I can see on a given day there's been a change to some rows, and when this dataflow is consumed within the desktop, I'm receiving duplicate rows but the 'modified' and 'created' dates are different. One shows 30/01/2022 the other is 01/02/2022.

 

My incremental refresh is configured to store data for the last two years, and refresh rows for the last three months based on 'created' date, and automatically detect changes is enabled and set to watch 'modified' date.

 

I know if I do a manual refresh, it'll likely fix the issue as it'll do a normal load from the report, but I don't want to do this, I'd rather incremental worked as I would hope.

 

When I query the row within the service, only the one row is returned - I add this step after the 'canary load' step at the end of all the steps on the workspace.

 

Can anyone assist, or is this expected behaviour?

 

Many thanks,

Dayna

16 REPLIES 16
gustavofreitas
New Member

Hi Dyana,

 

My solution for this case was make a buffer for the sorted set by the modified date (descending), and then remove the duplicates.

   ...

   #"Linhas classificadas" = Table.Buffer ( Table.Sort(#"Colunas removidas", {{"Id", Order.Ascending}, {"Modificado", Order.Descending}})),

    #"Duplicatas removidas" = Table.Distinct(#"Linhas classificadas", {"Id"}) ...
 
Obs.: without applying the buffer, power bi does not guarantee that the first row of sorting will be kept.
Dayna
Helper V
Helper V

Hello,

I'm not seeing duplicates in the service though, only when consumed within the report. Is this expected behaviour for the incremental refresh?

Thanks,
Dayna

Can you confirm that your M query is shaped according to the rules I mentioned?

I've not configured this other than in the UI within the PowerBI service, it has automatically created this:

Table.SelectRows(#"Changed column type 2", each DateTime.From([pro2created]) >= RangeStart and DateTime.From([pro2created]) < RangeEnd)

That looks accurate. How do you know that records are duplicated?

When I load in this dataflow into a report, my numbers for a specific day are inflated. When digging deeper, I can see multiple records for the same ID, which should be impossible as it's unique. 

 

When I consume the dataflow in a new report, it's duplicated when filtering on one of these ID codes. Doing the same in the service though only brings back one result, which is strange... 

hmorrow
Frequent Visitor

Did you get a resolution on this? I'm having the same issue when setting incremental refresh on Power BI Online premium workspace. Any insight you can share would be helpful. Thank you.

Sadly not, I ended up disabling the incremental data refreshes. 

hmorrow
Frequent Visitor

That's too bad. Only way around I found is to remove duplicates and keep the most recent when bringing in the dataflow to a dataset. However, that's not the best option since you'd have you do that for each report you build... 

lbendlin
Super User
Super User

Duplication can only happen when partitions store overlapping date ranges. You must make sure that of the RangeStart and RangeEnd parameters one is inclusive and one is exclusive

 

For example 

 

Date >=RangeStart and Date < RangeEnd

Date >RangeStart and Date <=RangeEnd

 

are both possible (but not at the same time!)

 

However

 

Date >=RangeStart and Date <=RangeEnd

 

will lead to duplicate rows and must be avoided.

This worked for me, thank you!

I have a similar issue. The data refreshing  failed because Power BI service identified duplicate Issue IDs (which is not allowed in data model because issue ID is my primary key column). When I checked the source data (Jira system), there is no dulicate I have set up the filter query as below:

Last_Update_Date >=RangeStart and Last_Update_Date < RangeEnd

 

My guess is during the incremental data refreshing, someone has updated an issue in Jira system causing the Last_Update_Date to be changed during the incremental refreshing process . As a result, this issue is included both in archive data set and refresh data set, causing the same issue to be loaded twice. 

 

Not sure if this is the root cause and if there is a solution to the issue.

 

@lbendlin 

 

This scenario is described in the documentation. Keep in mind that Incremental Refresh expects immutable data.  If your data changes after the fact (which would technically require a differential refresh) then you need to supplement the incremental refresh with an occasional Full refresh of all partitions.

I read the documentations, indeed I need to enable 'get the latest data with direct query' in refresh setup page. This is not practical because 1) I will need to update my PB license to a premium per user one and the same for all users who need access to the Power BI report on service, 2) I need to change the storage mode of relevant tables into 'direct query' mode, this also means I need to rewrite all the DAX measures from scratch. 

 

I'm wondering if there is any other solution. My only need is to shorten the data refreshing cycle from 25 minutes to 5-10 minutes. I'm connecting Power BI to Jira data using a odata connector. 

Your data source is not compatible with Microsoft's implementation of Incremental refresh.  Run a process outside of Power BI (for example on SQL Server) that can properly handle data updates. You can then connect to that SQL server in Direct Query mode for close to real time performance.

Thank you - this was causing my issue, I never would have thought to look for this.

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.