Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
richard-powerbi
Post Patron
Post Patron

Can you use incremental refresh files with file properties?

I have a situation where users export Excel files from an engineering program. The amount of Excel files being created will be roughly 100-150 per year. I want to get this data in the data model and specifically in a Dataflow. Is it possible to design something in the Power BI environment in such a way that only new or modified Excel files will be included in the refresh? (Q1)

Let's say I combine Excel files through Power Query and I include the created/modified columns from the file properties in the resulting table, will an incremental refresh in the Dataflow settings cause only the newly created and modified files to be opened and analyzed? Or will all files be opened and read before the engine realizes only 5% of the data was new/modified? (Q2)

6 REPLIES 6

Hi @richard-powerbi ,

 

Q1 - The incremental refresh does not detect any data changes. Only an ongoing period will be refreshed.

Q2 - I think you must have filtered the Excel files by date before combining them. That should work then.


Using incremental refresh with Power BI dataflows

https://docs.microsoft.com/en-us/power-bi/transform-model/service-dataflows-incremental-refresh

 

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast


@mwegener but if I filter the Excel files before combining them I remove the old data.... how is that a solution?

Hi @richard-powerbi ,

 

the old data should already be stored in the Power BI Service (Dataflow).

The Power BI Service controls the parameters RangeStart and RangeEnd and thus filters the observation period for new data.

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast


@mwegener thanks I understand things partly. Can you confirm below reasoning and answer some of the questions within?

 

Let's say this is a query:

Step 1: Source

Step 2: Filtering source with RangeStart and RangeEnd

Step 3-8: All sorts of transformations

Step 9: Result table

 

This is the table at step 2:

Index    CreatedModified
11-1-2020    5-1-2020
22-1-20202-1-2020
33-1-20204-1-2020
44-1-20204-1-2020
55-1-20205-1-2020

 

Using RangeStart (4-1-2020) and RangeEnd (5-1-2020) only on the Created column anything after step 2 will look like this in PQ Editor at step 9:

Index    CreatedModified
44-1-2020    4-1-2020
55-1-20205-1-2020

 

At the same time it will still look like this in Dataflow (Dataflow removes the RangeStart and RangeEnd filters):

Index    CreatedModified
11-1-2020    5-1-2020
22-1-20202-1-2020
33-1-20204-1-2020
44-1-20204-1-2020
55-1-20205-1-2020

 

  • It is required to have the Created and Modified columns available in the result table (step 9), otherwise you can't select them in incremental refresh setup. Meaning that any other tables like dimensions, referencing this data, also need to have these columns. A problem that I foresee is that in these dimension tables duplicates will be removed which renders these columns useless. Please tell me there is a solution for this? Or tell me this assumption correct:
    • When there are 5 tables in a Dataflow, and 1 table is the 'mother table' (the other 4 tables reference this table) you only need to setup incremental refresh for the mother table and incremental refresh will do it's magic on the 'children tables', meaning that the children tables do not need to have the Created and Modified columns.
  • Incremental refresh for Dataflows looks at the Created and Modified columns in the result table after step 9. It does not impact at step 2, other than removing the RangeStart and RangeEnd filters.
  • For the Created column:
    • The part 'from the past' in 'Store rows from the past' and 'Refresh rows from the past' means it is relative to the last date in the Created column and not relative to the actual date the Dataflow refresh happens.
    • Setting 'Store rows from the past' to 4 days, only Indexes 2, 3, 4 and 5 will be in the result table from the Dataflow. Index 1 will be removed.
    • Setting 'Refresh rows from the past' to 2 days, only Indexes 4 and 5 will be refreshed. Meaning that Index 3 will not be updated. Conclusion: setting 'Refresh rows from the past' to anything less than 'Store rows from the past' is risky if you don't know how far changes will go back. Meaning that Index 2, 3, 4 and 5 will be in the result table and only Indexes 4 and 5 are refreshed.
  • For the Modified Column:
    • Setting 'Only refresh data if the maximum value in this field changes' means that nothing will be refreshed because the maximum value of this column is evaluated for each of the periods in the incremental range. I assume a period in the incremental refresh is equal to the setting 'Refresh rows from the past' and is not equal to the setting 'Store rows from the past'. Meaning that Index 3, even though it has changed in the last 2 days, will not be refreshed and thus have outdated data. Same conclusion as before: setting 'Refresh rows from the past' to anything less than 'Store rows from the past' is risky if you don't know how far changes will go back.
  • With these settings Index 1 will never be included even when it changes.

Anyone?

Hi @richard-powerbi ,

 

I've been very busy this week.

Unfortunately I have no experience with the incremental refresh of data flows.

Here there seems to be differences between the incremental refresh of dataflows and data sets.

 

However, the concept is only suitable for fact tables that do not change after a certain point in time.

 

 

Did I answer your question?
Please mark my post as solution, this will also help others.
Please give Kudos for support.

Marcus Wegener works as Full Stack Power BI Engineer at BI or DIE.
His mission is clear: "Get the most out of data, with Power BI."
twitter - LinkedIn - YouTube - website - podcast


Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.