I have a question about implementing Incremental Refresh on a dataset that connects solely to Dataflows. I tried setting it up but it appears Dataflows do not support query folding.
- Is it possible in any way to get Incremental Refresh working on a dataset that runs on Dataflows at this moment?
- If not, is there anything on the roadmap to enable this?
- If not, is there any way to refresh only part of a Dataflow using a workaround?
It feels really weird to have all the data in Azure Data Lake (Dataflows) but not being able to load it into a dataset due to memory issues.
Would you happen to have an answer on the questions I asked? Or maybe a colleague could help you out?
Thanks so much in advance.
Sorry for delay. There are two ways to configure incremental refresh: one is configure increment refresh in Power BI Desktop (it is available for Pro users) and another is use increment refresh with Power BI dataflows(currently only for Premium users). Currently you are connecting to dataflow as data source in Power BI Desktop, since the original data source is the dataflow, you can just configure increment refresh for the dataflow in Power BI Service.
Thanks for your answer. It is absolutely clear to me that I can set up incremental refresh in dataflows. This is what I currently do. So the main benefit of this is that we only refresh the last 14 days of data each morning (approx. 2 million rows of data). This is working well and I'm happy with the feature.
However, our total dataset consists of approximately 70 million rows of data. And each morning, when I'm refreshing my dataset, it will refresh all of these 70 million rows of data. Even when I implement incremental refresh on the dataset, it will still refresh all rows. What I would expect it to do is refresh only the last 14 days of data, so that it only has to refresh 500 MB of data, instead of the entire ~7 GB.
How can I make sure that my dataset will only refresh the part that is refreshed in dataflows (the incremental part)?
Hi @Databeest ,
According to the official documentation, it is recommend to use data sources which support query folding when configure increment refresh. Otherwise, it may execute the complete refresh... You can check the following sentences:
It's important the partition filters are pushed to the source system when queries are submitted for refresh operations. To push filtering down means the datasource should support query folding. Most data sources that support SQL queries support query folding. However, data sources like flat files, blobs, and web feeds typically do not. In cases where the filter is not supported by the datasource back-end, it cannot be pushed down. In such cases, the mashup engine compensates and applies the filter locally, which may require retrieving the full dataset from the data source. This can cause incremental refresh to be very slow, and the process can run out of resources either in the Power BI service or in the on-premises data gateway if used.
So please check if the original data sources using in dataflow support query folding or not. You can refer this video to check it.
In addition, hope the following documenations can help you.
Hi @yingyinr ,
Thanks again for your answer.
I do want to emphasize that I'm not trying to set up incremental refresh in Dataflows. I already got that working. I'm trying to set up incremental refresh in the dataset that connects to dataflows. So, because dataflows is the source, the question is if dataflows supports query folding. Does it?
Hi @Databeest ,
Yes, please check whether the underlying data source used in the dataflow support query folding or not. For the data sources that do not support query folding, you can set incremental refresh, but it may retrieve the entire data anyways.
@yingyinr How is the underlying data source relevant? I'm connecting to Dataflows. The dataset doesn't care where I get the data from before that point, does it?
The question is, when I'm refreshing my dataset, do the queries it sends to dataflows get folded and is Incremental Refresh possible in this case? Or will it always load in the entire dataflow?
We have the very same question (dataset incremental refresh using dataflows as source, regardless of how the dataflow itself gets its data).
Has this been answered?
@pat_mecee Have you tested this? I ran a test and it seemed to work, but I did not proceed (my responsibility is just to create the dataflow sources and I do not use PBI Desktop or datasets much).
It would be nice to have official communication here.
"Dataflow incremental refresh and dataset incremental refresh are designed to work in tandem. It's acceptable and supported to have an incrementally refreshing entity in a dataflow, fully loaded into a dataset, or a fully loaded entity in a dataflow incrementally loaded to a dataset.
Both approaches work according to your specified definitions in the refresh settings."
Check out new user group experience and if you are a leader please create your group
Click here to read more about the April 2021 Updates!
100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.
See the latest Power BI innovations, updates, and demos from the Microsoft Business Applications Launch Event.