Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Incremental Refresh on a dataset with Dataflows as a data source

Hi,

 

I have a question about implementing Incremental Refresh on a dataset that connects solely to Dataflows. I tried setting it up but it appears Dataflows do not support query folding.

 

Few questions:

- Is it possible in any way to get Incremental Refresh working on a dataset that runs on Dataflows at this moment?

- If not, is there anything on the roadmap to enable this?

- If not, is there any way to refresh only part of a Dataflow using a workaround?

 

It feels really weird to have all the data in Azure Data Lake (Dataflows) but not being able to load it into a dataset due to memory issues.

34 REPLIES 34
v-yiruan-msft
Community Support
Community Support

Hi @Anonymous ,

You can refer the following documentation to configure incremental refresh for dataflow:

Using incremental refresh with Power BI dataflows

Incremental Refresh for Dataflows

Best Regards

Rena

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

@yingyinr Hi - you did not answer the actual question.

 

@Anonymous is asking whether a user can connect a dataset (using PBI desktop) to an existing dataflow and then turn on incremental refresh on the dataset.

 

We can have incremental refresh datasets for database sources, but can we do the same against a dataflow source? It seems to allow me to do so when I tested it, but I cannot find any official documentation on the subject.

Anonymous
Not applicable

@v-yiruan-msft Did you actually read my message? I'm not trying to set up Incremental Refresh in Dataflows, I'm trying to set it up for a dataset that uses Dataflows as a data source.

 

Could you provide any information on that or perhaps even address my actual questions?

Anonymous
Not applicable

Hi @v-yiruan-msft,

 

Any update on this?

Anonymous
Not applicable

Hey @v-yiruan-msft,

 

Would you happen to have an answer on the questions I asked? Or maybe a colleague could help you out?

 

Thanks so much in advance.

Hi @Anonymous,

Sorry for delay. There are two ways to configure incremental refresh: one is configure increment refresh in Power BI Desktop (it is available for Pro users) and another is use increment refresh with Power BI dataflows(currently only for Premium users). Currently you are connecting to dataflow as data source in Power BI Desktop, since the original data source is the dataflow, you can just configure increment refresh for the dataflow in Power BI Service. 

Best Regards

Rena

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi @v-yiruan-msft,

 

Thanks for your answer. It is absolutely clear to me that I can set up incremental refresh in dataflows. This is what I currently do. So the main benefit of this is that we only refresh the last 14 days of data each morning (approx. 2 million rows of data). This is working well and I'm happy with the feature.

 

However, our total dataset consists of approximately 70 million rows of data. And each morning, when I'm refreshing my dataset, it will refresh all of these 70 million rows of data. Even when I implement incremental refresh on the dataset, it will still refresh all rows. What I would expect it to do is refresh only the last 14 days of data, so that it only has to refresh 500 MB of data, instead of the entire ~7 GB.

 

How can I make sure that my dataset will only refresh the part that is refreshed in dataflows (the incremental part)?

Hi @Anonymous ,

According to the official documentation, it is recommend to use data sources which support query folding when configure increment refresh. Otherwise, it may execute the complete refresh... You can check the following sentences:

Query folding

It's important the partition filters are pushed to the source system when queries are submitted for refresh operations. To push filtering down means the datasource should support query folding. Most data sources that support SQL queries support query folding. However, data sources like flat files, blobs, and web feeds typically do not. In cases where the filter is not supported by the datasource back-end, it cannot be pushed down. In such cases, the mashup engine compensates and applies the filter locally, which may require retrieving the full dataset from the data source. This can cause incremental refresh to be very slow, and the process can run out of resources either in the Power BI service or in the on-premises data gateway if used.



So please check if the original data sources using in dataflow support query folding or not. You can refer this video to check it.

In addition, hope the following documenations can help you.

Power BI Incremental Refresh and Query Folding

Power BI - Checking Query Folding with View Native Query

Best Regards

Rena

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

Hi @v-yiruan-msft ,

 

Thanks again for your answer.

 

I do want to emphasize that I'm not trying to set up incremental refresh in Dataflows. I already got that working. I'm trying to set up incremental refresh in the dataset that connects to dataflows. So, because dataflows is the source, the question is if dataflows supports query folding. Does it?

Hi @Anonymous ,

Yes, please check whether the underlying data source used in the dataflow support query folding or not. For the data sources that do not support query folding, you can set incremental refresh, but it may retrieve the entire data anyways. 

Best Regards

Rena

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Anonymous
Not applicable

@v-yiruan-msft How is the underlying data source relevant? I'm connecting to Dataflows. The dataset doesn't care where I get the data from before that point, does it?

 

The question is, when I'm refreshing my dataset, do the queries it sends to dataflows get folded and is Incremental Refresh possible in this case? Or will it always load in the entire dataflow?

We have the very same question (dataset incremental refresh using dataflows as source, regardless of how the dataflow itself gets its data).

Has this been answered?

The simple answer is NO.

 

Many have tried workarounds, but nothing official from @microsoft.

@pat_mecee Have you tested this? I ran a test and it seemed to work, but I did not proceed (my responsibility is just to create the dataflow sources and I do not use PBI Desktop or datasets much).

 

It would be nice to have official communication here. 

 

https://docs.microsoft.com/en-us/power-query/dataflows/incremental-refresh#dataflow-incremental-refr...

 

"Dataflow incremental refresh and dataset incremental refresh are designed to work in tandem. It's acceptable and supported to have an incrementally refreshing entity in a dataflow, fully loaded into a dataset, or a fully loaded entity in a dataflow incrementally loaded to a dataset.

Both approaches work according to your specified definitions in the refresh settings."

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors