cancel
Showing results for 
Search instead for 
Did you mean: 

Power BI Incremental Refresh with ETL

I’ve been testing incremental refresh for more than half a year in an enterprise environment and I’d like to share my experience with this feature.

You can read a full description of Power BI incremental refresh here. If you have not worked with incremental refresh yet, I recommend that you read the documentation before.

 

User voice

It was about 2 months ago when I read a blog post about the incremental refresh written by Christian Wade. He says that the Power BI team is pleased to announce that incremental refresh is now available in Pro instead of Premium only. There were more than 5 000 votes of users, who would like to have incremental refresh in Pro and not only in Premium. This is always great news when a feature is available also in Pro and I am thankful for that. But I think most users have not had a chance to try the incremental refresh and they do not know many restrictions waiting for them behind the curtain.

 

GA

I should also mention that the incremental refresh is now general available. That is also great news.

 

Three use cases

I have prepared three use cases which can happen when you want to activate the incremental refresh. Before I start with each of them, I would like to describe the structure of my dataset.

000A.PNG

 

I have created four groups in Power Query Editor: configuration, extract, transform, and load. These logical groups help me structure my ETL process. In the configuration group, there is a database connection called “DB Connection MySqlServer”. It is a central place where all my connections are stored. I do not want to have them in every query separately. Instead, I always reference a connection when creating a new query.

002B.PNG

 

In this group there are also 2 datatime parameters (RangeStart and RangeEnd) – they must have a data type datetime. You should be aware of it.

000C.PNG

 

By the way, I recommend setting both parameters because if you leave them empty, you will not see any data in Power BI Desktop.

The other groups for extract, transform, and load are obvious for every BI professional, right?

 

Use case 1 – just query folding

The incremental refresh would not work without the query folding. What does it mean? After the initial load of data is completed, the next refresh will load only deltas. A delta is a subset of data from a datetime range. More about that here. The point is our query has to support the query folding to be able to load only a subset of data – from RangeStart to RangeEnd. If the query folding does not work, all pieces of data will be loaded every time your scheduled refresh is running. An extensive list of queries which can be folded can be found here.

001A.PNG

 

In this use case, I have created a step called IncRefresh which filters rows using the parameters RangeStart and RangeEnd.

At this point, I have to check if the query folding still works.

001B.PNG

 

Great, I see the SQL code and the parameters are also set up.

Next, I can switch to Power BI Desktop and set up the incremental refresh.

001C.PNG

 

In the screenshot above, there are some important notes. The incremental refresh works only in Power BI service. It makes sense because we need to configure a scheduled refresh to bring it to life.

And there is also a note saying you cannot download a dataset from Power BI service after you have set up the incremental refresh. You should be aware of it and your users / customers, too! They like to download the datasets just for checking if a column or a measure is defined in the right way.

If they try it, the following error message will pop up and you will get a call 😊

005.PNG

 

Use case 2

In the previous use case, we have completely ignored an ETL process. But Power BI Desktop is a self-service BI tool and we want to clean up our data before it appears in reports. At least with a simple ETL in which we break the query folding. What are our options if the query folding is broken?

Let us start with all three steps of ETL. The example is simplified and for demo purpose only.

The first step is an extract. It is the same step which we saw in the previous use case. And we also know that it can be folded to SQL.

002A.PNG

 

Then we create a next step, a transform one, which makes some useless demo transformations. They break the query folding.

002B.PNG

 

And the last step is just a reference to the transform one.

002C.PNG

 

After applying the changes, we can set up the incremental refresh for this query.

002D.PNG

 

Oops, it does not work. The error message says that we need to set up parameters before we can set up incremental refresh on this table. We have already set up parameters, but not used in the last query which is loaded into model.

 

Use case 3

Ok, let’s fix it. We create a new query, which references our transform step. And add the same filter we already have in the extract step.

003A.PNG

 

Now we can go back to Power BI Desktop and try to activate incremental refresh again.

003B.PNG

 

Great, it works. However, there is a warning that Power BI cannot confirm if the M query can be folded. We know that it is possible because the extract step can be folded, and we get filtered data from the data source.

And here is a proof from Azure SQL Server Performance Overview where I can see the folded query:

007.PNG

 

By the way, sometimes you see two where-clauses and other times only one. It depends on the complexity of your query, I think. I have seen that if a query contains two equal filters, a kind of a query optimizer removes one of them.

The other way to check that the incremental refresh works is to see partitions in the model. I prefer Vertipaq Analyzer, but you can also use SQL Server Management Studio. You connect to the dataset in Power BI Service and select a table as on the following screenshot:

partitions.PNG

 

Incremental refresh without a datetime column in a model

Let’s say we have a column Full DateTime. It should be used as the column for the incremental refresh. Unfortunately for Power BI, almost all values in the column are unique. This means the values cannot be compressed in any way. It also means that the column can be the most space consuming one in a table.

006.PNG

 

On the previous screenshot you can see that the column Full DateTime has almost 25 MB. But if I remove the time part of the value in M, there will be only few unique values left which repeat multiple times. We say the column is well compressed. But the column is still there and will be loaded into the model.

A colleague of mine, Michael Wolff, has come up with a genius idea. We do not need a datetime column in the model at all. If you set up the incremental refresh in Power BI Desktop, you do not select any column! It means the incremental refresh checks the usage of the parameters RangeStart and RangeEnd, but it does not care about the column itself. You can remove the datetime column before it will be loaded into the Vertipaq model. It is simply genius!

increfresh delete columns.PNG

 

But you can go even further. The incremental refresh can be set up on top of your files in SharePoint or any other files with a kind of query folding. Miquel Escobar writes about that in his revolutionary blog post.

 

Limitations

There are some limitations you should be aware of.

 

Limited ETL

I have presented a workaround for a simple ETL which cannot be folded to SQL. But I often build complex ETL processes in Power Query with joins. I personally have no idea how I could build incremental refresh with joins. If I miss something, please let me know.

 

Download of PBIX file

As mentioned earlier in this post, you cannot download a PBIX file after you have activated incremental refresh.