Re: Incremental Refresh Setup Questions

Anonymous · ‎03-02-2020

Hello Community,

I have been trying to setup Incremental Refresh for a dataflow for a while now and have hit a mental wall and request help. Here goes:

I am trying to set up an incremental refresh for a dataflow that is transactional in nature. For example this table, call it Orders, is imported from a different source to a SQL Server where then I point to it with Power Query. The SQL Server source is preferable because of many reduced columns and an inserted_DateTime column that represents the last modified change to a single Order row in the SQL database.

I would like to perform an incremental refresh whereby the dataflow stores 5 years of historical data, refreshes every 1-3 days (tbd on actual refresh period) and would like to detect max changes. The detect max changes is where I am stuck. I have tried using the inserted_datetime stamp column as the detect changes with the datetime filter set to the Order "created datetime". This results in duplicates because I believe the refresh window does not check the historical data for changes therefore if an Order outside the refresh window is updated with a newer inserted_timestamp it duplicates the Order.

I have looked and tried many of the forums resolutions and am trying to setup one which many have said should work but I am not sure if I am understanding correctly and am missing data when I attempt it.

I have the Order table query in the Power Query editor. I created a custom column that just lists the date it was imported into PQ by using the following function:

Table.AddColumn(#"Removed other columns", "Custom", each DateTime.LocalNow())

My understanding is this creates the column for each record at the point of the query execute. Once incremental refresh is setup, this column would then have a PQ_Imported column thats updated to current time for each row in the refresh window.

When I do this, I am missing records from my table when performing incremental refresh. When viewing the data it is pulling the new records that changes but somewhere between the most recent and historical data is where am I guessing the rows were not pulled in. There are no duplicates.

My questions are these: Is there something I am missing to make this incremental refresh work? Can incremental refresh handle transactional data in which the datetime fields for "detect max" changes? What would be ways to test the data to find where the missing rows would be? I do not have a lot of experience in incremental load/refresh and would like to learn more about it for the future.

Thanks in advance!!

GilbertQ · ‎03-02-2020

Hi there

What I would suggest doing is starting with a simple model where you only got a few rows to setup and configure the incremental refershing.

Then once you fully understand and can grasp the concepts then move onto the Detect Data changes.

I would start with this https://radacad.com/all-you-need-to-know-about-the-incremental-refresh-in-power-bi-load-changes-only

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Power BI Blog

Anonymous · ‎03-03-2020

Thank you for the link and advice. I have already performed these small scale tests with no errors/duplicates however bringing up to scale is presenting its own problems.

I believe I can narrow my issues with the incremental refresh with a few questions.

We have our historical window and our refresh window. When "Detect data changes" is activated. Does this peer into the datetime records beyond the refresh window into the historical window? If not, what would be the suggestions to resolve this? If we had a 5 year historical window for Orders but say it can take 4 years for an Order to change, does the "detect changes" method really require a 4 year refresh window for the "detect changes" to successfully run? Because of the delete-insert nature of the incremental refresh, this seems highly unfeasible for longer timeframe historical tables.

Is each "Order Date" for the filter treated as its own row? I understand we cannot identify primary keys but if I am understanding the results correctly, each row will have an "Order Date" and a "LastUpdated_Date" (when utilzing detect changes). When "Detect changes" is on, for each Order Date record, there is a corresponding LastUpdated_Date. During the refresh period, if the Max Value of the LastUpdated_Date record is different for that specific Order Date, it will perform the delete-insert for that entire row.

If the above is true, how is the incremental refresh determining each Order Date as unique? It looks like it must be assigning some sort of index column thats assigned or Row ID to keep Order Dates apart otherwise Order Dates that are the same would become one record after the incremental refresh.

Thanks again for your help.

Incremental Refresh Setup Questions

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024