cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Cymbolz
Helper III
Helper III

Dataflow vs Dataset refresh

Haven't found any documentation on how data refresh works with respect to a dataflow and then a dataset sourced from that dataflow.

 

So looking for feedback, based on what I've discovered:

 

  • Both a dataflow and dataset need data to be refreshed
  • So I assume the dataflow is much like a data storage component on it's own that manages the updating from the data source, wherever that may be
  • And the dataset will refresh data from the dataflow 'storage'
  • Thus a logical refresh sequence (such as setting a scheduled refresh) would see the dataflow update first then the dataset aftewards (maybe 30 mins later as I suspect doing both at the same time may not yield the right results)

I've come to this conclusion after seeing the behaviour of having one or the other set for scheduled refresh.

 

I'm also seeing inconsistency in the workspace contents view where it shows last and next refresh times.  

 

For this dataflow, I've toggled off the scheduled refresh but it still shows a Next Refresh time (I would expect not to see any time stamp):

Capture 1.PNG

 

For these datasets, they have both had a refresh more recently than indicated here

 

 Capture 2.PNG

 

Here's the first one:

Capture 3.PNG

And the second:

Capture 4.PNG 

 

A bug?

 

2 ACCEPTED SOLUTIONS
otravers
Continued Contributor
Continued Contributor

>So I assume the dataflow is much like a data storage component on its own that manages the updating from the data source, wherever that may be

 

That's correct, technically PBI's Dataflow uses Azure Data Lake Gen2 for storage.

 

One use case I plan to use this dual refresh structure for, is to handle sources (e.g. static files) that don't need to be refreshed in Dataflows where they'll be imported but not under scheduled refresh. I've found PBI's scheduled refreshes to fail easily, so cutting down the service's scheduled refreshes to sources that actually need to be refreshed should lower incidents (e.g. web API timeouts, credential issues etc.).

View solution in original post

Hi,

 

After internal checking, it seems that the next refresh time update (without browser refresh) was fixed and is should be available in the following updates.

 

I will keep monitor it.

 

Thanks,

Assaf

View solution in original post

27 REPLIES 27
Jtb790
Advocate I
Advocate I

I believe Dataflows now have DirectQuery capability in Premium, therefore you could simply avoid refreshing the dataset. 

 

Though given the performance and stability of dataflows, I would think a composite model would be a better approach to limit the amount of DirectQuery happening.

SamRock
Advocate II
Advocate II

Hi ,

I dont understand... why do we have to Schedule the Dataset? From all the Microsoft articles that I read about Dataflow, it was made to look like dataset will be automatically connect to Dataflow and show users the latest data. Does this mean if I create 10 datasets using my Enterprise Dataflow, I need to configure all my 10 Datasets to refresh sepeartely? On top, we have to rely on API to get faster updates? This is totally unproductive and totally beats the purpose of having all my data on Cloud services already!

I hope this is all not true and I dont have to keep having separet schedules Dataflow and Datasets!

otravers
Continued Contributor
Continued Contributor

@SamRock dataflows are not pushing data directly to datasets, they're just making the data available in a datalake. You might even have a dataflow that does not have a destination dataset. You do have to schedule refreshes in your datasets separately from your dataflows, just like you have to schedule refreshes against any other data source.

@otraversThanks for the response. I still feel this defeats the purose of doing the ETL on PowerBI's own storage where my Datasets also reside. The purpose of Dataflow was to have a Centralized data source that all reports can consume. Now its basically telling me to take casre of indivudal report refreshes, inspite of having everything in a "centralized" place?! Ideally Dataset should work like a "Direct" Querying to Dataflow, not an Import

otravers
Continued Contributor
Continued Contributor


@SamRock wrote: Ideally Dataset should work like a "Direct" Querying to Dataflow, not an Import

Let me try and explain why I disagree. That is really not "ideal", as Direct Query is meant to be used over structured data sources, not unstructured data lakes. Azure Data Lake is not supported as a source for DQ:

https://docs.microsoft.com/en-us/power-bi/desktop-directquery-data-sources

 

Moreover Import performs better than DQ and is really Power BI's preferred mode unless you have too much data or need real-time updates.

 

You can make the case that it should be easier to sync dataflow and dataset refreshes, but I think you're possibly confused about some of the architectural options offered by Power BI. The fact that you're using dataflows as a data source in a dataset should not limit how that dataset can work or be refreshed.

If I have a Power BI report with a single data source that is a Dataflow, and I schedule a refresh of the dataset in the service - that "refresh" is just pulling down the data already there in the dataset? i.e. the dataset refresh is not triggering another refresh in the lake?

@thisisausername dataflow and dataset refreshes are completely separate. Refreshing one does not automatically refresh the other, if you want to trigger one refresh when the other is done you'll have to set that up using APIs.

 

Edit: watch the following video from Guy in a Cube if this still needs further clarification

https://www.youtube.com/watch?v=xA6YouSI6kY

I have dataflow currently running and dataset that link to the dataflow. However when i schedule an update for dataset,i realise that my dataflow also will be refresh if dataset being referesh. I'm not sure if i have wrongly set up my dataflow. How can i just update dataset without dataflow update concurrently?
Lasselakan
Advocate III
Advocate III

Do I understand it correctly that you found out that both the Dataflow and the Dataset needs to be updated (either manually or scheduled). That's a quite unfortunate dependency... I expected the Dataflow to need to be scheduled, but the the Dataset to be "DirectQuery".

I do love dataflows and what they allow us to do (preparing entities only once and using them everywhere is a dream come true).

However, that's really disappointing... I will now suffer from more delay: the dataflow will have to refresh first (let's say at 9:00), and half an hour later the dataset (9:30). I wish datasets would automatically refresh when the dataflow does...

Yes I don't understand this either. Hopefully someone can elaborate. Even if my report only connects to a single dataflow, it still creates a "dataset" when I publish it and thus I have to manage two refresh schedules (one for dataflow and one for dataset). I can see some use cases but I would think most of the time the preferred functionality is for a dataflow refresh to automatically kick off the associated dataset refresh.

otravers
Continued Contributor
Continued Contributor

The connection between Dataflow and Dataset refreshes could be automated using the API and MS Flow. Something like this:

 

1. Trigger your dataflow refresh via the API, for instance using PowerShell, which you could host/run from Azure Functions

Incidentally this approach opens you up more fine-grained control of your refreshes than using the static scheduling options in the Power BI service. For instance you could trigger your dataflow refreshes based on sources being updated, or integrate your Power BI refreshes in broader Azure Data Factory processes:

 

2. Use NotifyOption in the API call above with MailOnCompletion sending an email to an inbox monitored by a Flow workflow. This email is your Flow trigger.

 

3. Trigger your "client" dataset refresh via the Power BI REST API as the follow-up action in the same Flow, as explained here:

4. Push a notification (email, RSS, Teams etc.) once the thing is complete, based on a similar flow triggered by the dataset refresh's success (or failure). A side benefit of handling your own notifications is that you get to choose the format, content, notification channel, and recipient(s), whereas the Power BI service hard-codes all of these.

Usmannb
Frequent Visitor

Hi @otravers 
I have created custom connector in Power Automate to refresh dataflow but i am not getting an email on refresh completion. I have followed the links you have provided. Do you have any idea why i am not recieving the email?

Hey

 

Make sure you set the "NotifyOption" parameter to "MailOnCompletion". You will receive email both on failure and on success of teh refresh

 

Refer to this Doc: https://docs.microsoft.com/en-us/rest/api/power-bi/dataflows/refreshdataflow#notifyoption

otravers
Continued Contributor
Continued Contributor

I ended up writing a blog post elaborating on my initial entry to tie everything together in Power Automate. However, even though I'm using the NotifyOption, I'm not getting the emails on success/failure dataflow refreshes - not even from the documentation Try It button. I'll update once I figure out why but this is puzzling as I followed the documented syntax to the letter.

https://www.oliviertravers.com/power-automate-powerbi-dataset-dataflow-refreshes/

 

@SamRock can you share screenshots of how you set it up?

@otravers 
Here is a screenshot of flow setup. I use the same for Dataflow as well

tmp.PNG

otravers
Continued Contributor
Continued Contributor

@SamRock and you're actually getting emails for dataflows? I contacted Microsoft support and they pointed me to this page that states:

 

 "Triggers a refresh for the specified dataflow. The only supported mail notification options are either in case of failure, or none. MailOnCompletion is not supported."

 

Edit: I got confirmation that the documentation is incorrect and will be corrected in the next couple of weeks.

otravers
Continued Contributor
Continued Contributor

New dataflow API call to retrieve dataflow refresh requests:

https://docs.microsoft.com/en-us/rest/api/power-bi/dataflows/getdataflowtransactions

Usmannb
Frequent Visitor

Hi @otravers 

Please do share once you finally get the email.

otravers
Continued Contributor
Continued Contributor

Good news, detailed triggers and actions for dataflows are coming soon to Power Automate which will make this refresh orchestration much easier. See 33 minutes into this video:

https://myignite.microsoft.com/sessions/d784f75a-f081-468d-84b2-d500142c5998

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group

April Update

Check it Out!

Click here to read more about the April 2021 Updates!

MBAS Carousel

Sign up for our May 4th event!

May the fourth be with you, join us online!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

Top Solution Authors
Top Kudoed Authors