cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Databeest
Advocate I
Advocate I

Incremental Refresh on a dataset with Dataflows as a data source

Hi,

 

I have a question about implementing Incremental Refresh on a dataset that connects solely to Dataflows. I tried setting it up but it appears Dataflows do not support query folding.

 

Few questions:

- Is it possible in any way to get Incremental Refresh working on a dataset that runs on Dataflows at this moment?

- If not, is there anything on the roadmap to enable this?

- If not, is there any way to refresh only part of a Dataflow using a workaround?

 

It feels really weird to have all the data in Azure Data Lake (Dataflows) but not being able to load it into a dataset due to memory issues.

31 REPLIES 31
ldacey
Frequent Visitor

@Databeest @Anonymous 

 

Any luck at all? I just found this thread while searching for a solution:

 

1) I have a 50+ million row dataflow which updates incrementally (under 1 minute to refresh which is pretty neat)

2) The visualization developers need to create datasets based on my dataflow, and make some additional transformations or even join multiple entities

 

This led to an initial challenge since PBI desktop tried to download all 50 million rows. As a workaround, I created an entity with a small slice of the data. They have parameters for workspace_id, dataflow_id, and entity_name that they can swap over to point to the real historical dataflow once they are done.

 

3) They seem to need to schedule their dataset to refresh (which is based on my dataflow), and it appears to do a complete refresh and does not respect the incremental nature of the dataflow source at all. Seems like a waste of resources and time.

 

 

The ideal state for me would be: 1) I publish an incremental dataflow, 2) people connect to that dataflow and make different datasets, 3) my scheduled dataflow refresh would refresh all downstream datasets incrementally as well

Hi @ldacey ,

 

Unfortunately not. To summarize:

- Incremental Refresh implemented on datasets that connect to Power Bi Dataflows does not work at the moment. The reason is that the queries can not get folded to the data source (https://docs.microsoft.com/en-us/power-query/power-query-folding). Which is weird, obviously.

- If your workspaces are in a Premium capacity, you could try out DirectQuery on Dataflows (https://powerbi.microsoft.com/en-us/blog/power-bi-dataflows-direct-query-support/). I haven't tested this yet, so I can't share any learnings. As far as I know though, there are some limitations to DirectQuery in general (https://docs.microsoft.com/en-us/power-bi/connect-data/desktop-use-directquery). So you might want to scroll through that page before trying it.

 

What we did on our side is manage this in our database, where we create SQL VIEWS per use case which we ingest into datasets directly. Since our data source (Snowflake) supports query folding, we can use Incremental Refresh to ingest the data.

 

Hope this helps!

@Databeest @ldacey @GilbertQ @Anonymous @yingyinr 

 

Any official change in functionality to allow incremental refresh of a dataset against a dataflow with the advent of composite models?

Is it easier just to keep historical and refreshing dataflows separate and then dax union within the data model?

Hi @pat_mecee ,

 

While composite models is a great solution to a whole lot of problems, I'm afraid it won't 'solve' this one. As far as I know, it won't change the environment that Dataflows is set up in and so it does not support query folding (needed for incremental refresh).

 

You could of course try to use DirectQuery on Dataflows, but I'm not sure if you will reach the same level of performance as storing it in-memory in a data model.

 

I am seriously considering cutting out Dataflows entirely and directly connect to the database from the PBI dataset. It seems like less overhead.

ldacey
Frequent Visitor

Based on some tests I did on enhanced compute dataflows (which do support some query folding, see https://ssbipolar.com/2020/09/16/dataflows-enhanced-compute-engine-will-it-fold/ and within that article there is a link to which queries will fold), it seems like incremental dataset refreshes against an incremental dataflow can work.

 

I didn't proceed very far, but I set up an incremental refresh dataset which is pointed towards a dataflow (enhanced compute turned on, incremental refresh turned on). I then checked the XMLA endpoint in SSMS and I inspected the partitions. They contained the dates/rows I was expecting and each time I refreshed the test dataset it updated the correct partitions.

 

But yes, dataflows have been a headache with some weird failures and behavior. Sad because it is nice to allow people to access data and do their own thing without requiring access to the source (either DB or Azure Blob parquet datasets in this case). And I do not need to open up PBI desktop at all which is nice because I can do my job (providing data access) within PBI service. I am hoping for more improvements and expanded documentation..

Hi @ldacey,

 

Thanks for replying. Sounds like that could be a viable solution.

 

In theory, Dataflows are an awesome way to share data sources throughout the company and I still believe it's the way forward. I'm curious to hear from you once you have some more test results.

Anonymous
Not applicable

Hello @yingyinr,

 

Any updates from your/Microsoft side ?

 

Could you please officially confirm, that our requirement

  1. Incremental Refresh of Dataflow
  2. Incremental Refresh of Datasets depending on this Dataflow

is technically possible and supported by Microsoft.

 

Thanks and best regards

Hi @GilbertQ ,

Sorry to bother you again. I read the related blogs you wrote about increment refresh, it have good explanation of increment refresh. May I ask for your help answer @Anonymous 's question about increment refresh of dataflow? Thanks for your help in advance.


Any updates from your/Microsoft side ?

 

Could you please officially confirm, that our requirement

  1. Incremental Refresh of Dataflow
  2. Incremental Refresh of Datasets depending on this Dataflow

is technically possible and supported by Microsoft.


Best Regards

Rena

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi there

As far as I know it is possible to incrementally refresh the dataflow.


I do not think it is currently supported in Power BI pro to incrementally refresh off a dataset depending on a dataflow.




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Anonymous
Not applicable

Hi @GilbertQ ,

Thanks for your quick response and answer !

 

According to Microsoft (https://docs.microsoft.com/en-us/power-bi/transform-model/service-dataflows-incremental-refresh) "Dataflow incremental refresh and dataset incremental refresh are designed to work in tandem", i.e. in an Enterprise Datawarehouse (DWH) setting the Dataflows (usually many of them) ingesting large amounts of data every night into Power BI Azure (implicit) datalake need to use the Incremental Data Refresh and the depending Datasets, that have to be refreshed after the ingest, must be incrementally refreshed as well to keep within the timelimits set by a nightly dataload (usually only a few hours after all other source systems have submitted the data to the DWH).

 

If this (incremental refresh of Dataflows and incremental refresh of depending Datasets) is not supported/working then Power BI is not really ready for Enterprise DWH reporting/analyses.

 

What is your opinion on this or do you see an alternative way to use Power BI in an Enterprise scenario ?

 

Thanks and best regards

 

Hi there

Always if you want an enterprise DW then you would need to build an enterprise data warehouse that would be separate to dataflows in my opinion.

Whilst dataflows are great they do not have all the features that you would expect to have in a data warehouse table.




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Anonymous
Not applicable

Hi @GilbertQ 

 

Thank you very much for your answer, you are right that Dataflows do not offer all features of a "real DWH" - therefore the setup described above is using an AWS Redshift DWH => Incremental Refresh on Dataflows => Incremental Refresh on depending Datasets because we would like to use Power BI (Service) to query/anlayse the data from the DWH.

We were planning to use Dataflows as an abstraction layer and ingest mechanism as proposed by Microsoft so that this ingested data can be used by the depending (Shared-) Datasets and thereby to implement a modern Self-Service BI solution.

 

Because of the size of the data and the limited time available during nightly data refesh we require an incremental data refresh on all stages of the dataload (Dataflows & Datasets) to provide our users with updated date every day at 07:00 latest. 

 

So the the incremental data refreshs have to work in tandem - currently it appears that this (incremental refreshs for both Dataflows and depending Datasets) is not working (yet) and we still have no official confirmation from Microsoft if this will be working/implemented in the future. If this is not working/or not designed this way we don't see that Power BI can be used in larger Enterprise environments...

 

Thanks again for your support and best regards

Hi there

Currently you would have to stager them.

And yes currently unless you are using Power BI Premium you can use Enhanced data flow which should allow for incremental refreshing.




Did I answer your question? Mark my post as a solution!

Proud to be a Super User!







Power BI Blog

Anonymous
Not applicable

Hi Databeest,

 

You are absolutely correct - Power BI Dataflows ingest the data from external sources into the (implicit) Azure Datalake so the question is if Datasets can execute "query folding" towards Dataflows as this is a requirement for the efficient implementation of "Incremental Refresh", i.e. the queries have to check any changes in the underlying data partitions as created by the Incremental Refresh settings.

 

So the simple question to Microsoft is - can Datasets use Incremental Refreshs on Dataflows ? As the "tandem" Dataflow => Datasets is the proposed Data Ingest and Query Strategie within Power BI I strongly hope (request !) that this is working - if not Power BI NOT ready for Enterprise Business Intelligence !

 

Thanks for your clarifying on this, best regards

Hi @Anonymous,

 

Thanks for backing me up :).

 

I tried to ask this simple question multiple times in this thread but it doesn't seem to land properly. I just want to know if it's on the roadmap at least. Anyways, it became a lower priority for me since the rollout of Direct Query on Dataflows. Haven't gotten the chance yet to test it properly, but at least in theory it offers a solution to the unnecessary refresh time of the datasets that are running on Dataflows.

Anonymous
Not applicable

Hi @Databeest ,

 

Thanks for your answer and hint regarding "DirectQuery on Dataflows".

 

I am not sure if this is really an "equivalent" solution because like this we cannot use the (speed & performance) advantage of importet Datasets (i.e. in-memory/columnar engine) so I wonder if you have already results and user feedback for interactive queries/filters on "DirectQuery-based" reports ? Are they still performant (latency ?) at all and especially with a larger number of users sending direct queries every time to the underlying Dataflow ?

 

@yingyinrCould you please check again on the "simple question" if the setup "Incremental Data Refresh of Datasets on incrementally refreshed Dataflows" works and is officially supported by Microsoft, thank you.

 

Best regards

@Anonymous Not yet. I also don't think it will be an equally performing solution. While it feels like it should be the type of solution that helps us leverage Dataflows to scale throughout large organisations.

 

Anyways, it doesn't seem like we are getting a satisfying answer from Microsoft regarding our problem.

 

I'll let you know when we've tested the DirectQuery solution.

yingyinr
Community Support
Community Support

Hi @Databeest ,

You can refer the following documentation to configure incremental refresh for dataflow:

Using incremental refresh with Power BI dataflows

Incremental Refresh for Dataflows

Best Regards

Rena

Community Support Team _ Rena
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
ldacey
Frequent Visitor

@yingyinr Hi - you did not answer the actual question.

 

@Databeest is asking whether a user can connect a dataset (using PBI desktop) to an existing dataflow and then turn on incremental refresh on the dataset.

 

We can have incremental refresh datasets for database sources, but can we do the same against a dataflow source? It seems to allow me to do so when I tested it, but I cannot find any official documentation on the subject.

@yingyinr Did you actually read my message? I'm not trying to set up Incremental Refresh in Dataflows, I'm trying to set it up for a dataset that uses Dataflows as a data source.

 

Could you provide any information on that or perhaps even address my actual questions?

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group

March Update

Check it Out!

Click here to read more about the March 2021 Updates!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

secondImage

Experience what’s next for Power BI

See the latest Power BI innovations, updates, and demos from the Microsoft Business Applications Launch Event.

Top Solution Authors
Top Kudoed Authors