Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hello,
I've implemented incremental refreshes on one of our fact tables, called 'Transactions'.
I can see on a given day there's been a change to some rows, and when this dataflow is consumed within the desktop, I'm receiving duplicate rows but the 'modified' and 'created' dates are different. One shows 30/01/2022 the other is 01/02/2022.
My incremental refresh is configured to store data for the last two years, and refresh rows for the last three months based on 'created' date, and automatically detect changes is enabled and set to watch 'modified' date.
I know if I do a manual refresh, it'll likely fix the issue as it'll do a normal load from the report, but I don't want to do this, I'd rather incremental worked as I would hope.
When I query the row within the service, only the one row is returned - I add this step after the 'canary load' step at the end of all the steps on the workspace.
Can anyone assist, or is this expected behaviour?
Many thanks,
Dayna
Hi Dyana,
My solution for this case was make a buffer for the sorted set by the modified date (descending), and then remove the duplicates.
...
#"Linhas classificadas" = Table.Buffer ( Table.Sort(#"Colunas removidas", {{"Id", Order.Ascending}, {"Modificado", Order.Descending}})),
Hello,
I'm not seeing duplicates in the service though, only when consumed within the report. Is this expected behaviour for the incremental refresh?
Thanks,
Dayna
Can you confirm that your M query is shaped according to the rules I mentioned?
I've not configured this other than in the UI within the PowerBI service, it has automatically created this:
That looks accurate. How do you know that records are duplicated?
When I load in this dataflow into a report, my numbers for a specific day are inflated. When digging deeper, I can see multiple records for the same ID, which should be impossible as it's unique.
When I consume the dataflow in a new report, it's duplicated when filtering on one of these ID codes. Doing the same in the service though only brings back one result, which is strange...
Did you get a resolution on this? I'm having the same issue when setting incremental refresh on Power BI Online premium workspace. Any insight you can share would be helpful. Thank you.
Sadly not, I ended up disabling the incremental data refreshes.
That's too bad. Only way around I found is to remove duplicates and keep the most recent when bringing in the dataflow to a dataset. However, that's not the best option since you'd have you do that for each report you build...
Duplication can only happen when partitions store overlapping date ranges. You must make sure that of the RangeStart and RangeEnd parameters one is inclusive and one is exclusive
For example
Date >=RangeStart and Date < RangeEnd
Date >RangeStart and Date <=RangeEnd
are both possible (but not at the same time!)
However
Date >=RangeStart and Date <=RangeEnd
will lead to duplicate rows and must be avoided.
This worked for me, thank you!
I have a similar issue. The data refreshing failed because Power BI service identified duplicate Issue IDs (which is not allowed in data model because issue ID is my primary key column). When I checked the source data (Jira system), there is no dulicate I have set up the filter query as below:
Last_Update_Date >=RangeStart and Last_Update_Date < RangeEnd
My guess is during the incremental data refreshing, someone has updated an issue in Jira system causing the Last_Update_Date to be changed during the incremental refreshing process . As a result, this issue is included both in archive data set and refresh data set, causing the same issue to be loaded twice.
Not sure if this is the root cause and if there is a solution to the issue.
This scenario is described in the documentation. Keep in mind that Incremental Refresh expects immutable data. If your data changes after the fact (which would technically require a differential refresh) then you need to supplement the incremental refresh with an occasional Full refresh of all partitions.
I read the documentations, indeed I need to enable 'get the latest data with direct query' in refresh setup page. This is not practical because 1) I will need to update my PB license to a premium per user one and the same for all users who need access to the Power BI report on service, 2) I need to change the storage mode of relevant tables into 'direct query' mode, this also means I need to rewrite all the DAX measures from scratch.
I'm wondering if there is any other solution. My only need is to shorten the data refreshing cycle from 25 minutes to 5-10 minutes. I'm connecting Power BI to Jira data using a odata connector.
Your data source is not compatible with Microsoft's implementation of Incremental refresh. Run a process outside of Power BI (for example on SQL Server) that can properly handle data updates. You can then connect to that SQL server in Direct Query mode for close to real time performance.
Thank you - this was causing my issue, I never would have thought to look for this.
User | Count |
---|---|
93 | |
85 | |
76 | |
66 | |
62 |
User | Count |
---|---|
112 | |
97 | |
95 | |
64 | |
58 |