Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have a situation where users export Excel files from an engineering program. The amount of Excel files being created will be roughly 100-150 per year. I want to get this data in the data model and specifically in a Dataflow. Is it possible to design something in the Power BI environment in such a way that only new or modified Excel files will be included in the refresh? (Q1)
Let's say I combine Excel files through Power Query and I include the created/modified columns from the file properties in the resulting table, will an incremental refresh in the Dataflow settings cause only the newly created and modified files to be opened and analyzed? Or will all files be opened and read before the engine realizes only 5% of the data was new/modified? (Q2)
Hi @richard-powerbi ,
Q1 - The incremental refresh does not detect any data changes. Only an ongoing period will be refreshed.
Q2 - I think you must have filtered the Excel files by date before combining them. That should work then.
Using incremental refresh with Power BI dataflows
https://docs.microsoft.com/en-us/power-bi/transform-model/service-dataflows-incremental-refresh
@mwegener but if I filter the Excel files before combining them I remove the old data.... how is that a solution?
Hi @richard-powerbi ,
the old data should already be stored in the Power BI Service (Dataflow).
The Power BI Service controls the parameters RangeStart and RangeEnd and thus filters the observation period for new data.
@mwegener thanks I understand things partly. Can you confirm below reasoning and answer some of the questions within?
Let's say this is a query:
Step 1: Source
Step 2: Filtering source with RangeStart and RangeEnd
Step 3-8: All sorts of transformations
Step 9: Result table
This is the table at step 2:
Index | Created | Modified |
1 | 1-1-2020 | 5-1-2020 |
2 | 2-1-2020 | 2-1-2020 |
3 | 3-1-2020 | 4-1-2020 |
4 | 4-1-2020 | 4-1-2020 |
5 | 5-1-2020 | 5-1-2020 |
Using RangeStart (4-1-2020) and RangeEnd (5-1-2020) only on the Created column anything after step 2 will look like this in PQ Editor at step 9:
Index | Created | Modified |
4 | 4-1-2020 | 4-1-2020 |
5 | 5-1-2020 | 5-1-2020 |
At the same time it will still look like this in Dataflow (Dataflow removes the RangeStart and RangeEnd filters):
Index | Created | Modified |
1 | 1-1-2020 | 5-1-2020 |
2 | 2-1-2020 | 2-1-2020 |
3 | 3-1-2020 | 4-1-2020 |
4 | 4-1-2020 | 4-1-2020 |
5 | 5-1-2020 | 5-1-2020 |
Anyone?
Hi @richard-powerbi ,
I've been very busy this week.
Unfortunately I have no experience with the incremental refresh of data flows.
Here there seems to be differences between the incremental refresh of dataflows and data sets.
However, the concept is only suitable for fact tables that do not change after a certain point in time.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
112 | |
97 | |
85 | |
67 | |
59 |
User | Count |
---|---|
150 | |
120 | |
100 | |
87 | |
68 |