Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello Community,
I have been trying to setup Incremental Refresh for a dataflow for a while now and have hit a mental wall and request help. Here goes:
I am trying to set up an incremental refresh for a dataflow that is transactional in nature. For example this table, call it Orders, is imported from a different source to a SQL Server where then I point to it with Power Query. The SQL Server source is preferable because of many reduced columns and an inserted_DateTime column that represents the last modified change to a single Order row in the SQL database.
I would like to perform an incremental refresh whereby the dataflow stores 5 years of historical data, refreshes every 1-3 days (tbd on actual refresh period) and would like to detect max changes. The detect max changes is where I am stuck. I have tried using the inserted_datetime stamp column as the detect changes with the datetime filter set to the Order "created datetime". This results in duplicates because I believe the refresh window does not check the historical data for changes therefore if an Order outside the refresh window is updated with a newer inserted_timestamp it duplicates the Order.
I have looked and tried many of the forums resolutions and am trying to setup one which many have said should work but I am not sure if I am understanding correctly and am missing data when I attempt it.
I have the Order table query in the Power Query editor. I created a custom column that just lists the date it was imported into PQ by using the following function:
Thanks in advance!!
Thank you for the link and advice. I have already performed these small scale tests with no errors/duplicates however bringing up to scale is presenting its own problems.
I believe I can narrow my issues with the incremental refresh with a few questions.
We have our historical window and our refresh window. When "Detect data changes" is activated. Does this peer into the datetime records beyond the refresh window into the historical window? If not, what would be the suggestions to resolve this? If we had a 5 year historical window for Orders but say it can take 4 years for an Order to change, does the "detect changes" method really require a 4 year refresh window for the "detect changes" to successfully run? Because of the delete-insert nature of the incremental refresh, this seems highly unfeasible for longer timeframe historical tables.
Is each "Order Date" for the filter treated as its own row? I understand we cannot identify primary keys but if I am understanding the results correctly, each row will have an "Order Date" and a "LastUpdated_Date" (when utilzing detect changes). When "Detect changes" is on, for each Order Date record, there is a corresponding LastUpdated_Date. During the refresh period, if the Max Value of the LastUpdated_Date record is different for that specific Order Date, it will perform the delete-insert for that entire row.
If the above is true, how is the incremental refresh determining each Order Date as unique? It looks like it must be assigning some sort of index column thats assigned or Row ID to keep Order Dates apart otherwise Order Dates that are the same would become one record after the incremental refresh.
Thanks again for your help.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.