Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Dataflows being throttled in non-premium workspaces?

I have a few Dataflows in a workspace that's not backed by Premium capacity. These Dataflows retrieve multiple .csv files from some Azure Blob Containers (residing in the same region as the Power BI service) then transform/extract/combine the data in those files into various entities. As there's no Premium, there is no linked entities, but there are multiple "not load enabled" entities serving as reusable intermediate steps. As the number of entities grow, I was looking into wasy to optimize my query to reduce data load time, yet I seem to find something very intruguing: no matter how simple or complicated the transformation behind an entity, the minimum time it takes to render one entity is 30s. An entity that is as simple as directly reading a 10-row-2-column reference table may take 31s, while a big fact tables that takes 3 to 4 intermediate tables with some joins and lookups may take 34s. So I just wonder, is this "minimum 30s" phenomon a result of:

  1. Power BI service throttling dataflow (potentially for non premium users) or
  2. The connection overhead to the Azure Blob Container (which is incurred per entity instead of per refresh), or
  3. My illusion 🙂

Note that the time mentioned above are as recorded in the "refresh history" modal window from sheduled and on-demand refreshes, not the time it takes to preview refreshes during query authoring.

I would also be interest to hear from other users. Are you experiencing the same pattern? How about the user who are running Dataflows in Premium? Are your dataflows running faster?

5 REPLIES 5
v-frfei-msft
Community Support
Community Support

Hi @Anonymous ,

 

Theoretically speaking, to run dataflow in premium workspace, it should be faster. Power BI Premium provides dedicated and enhanced resources to run the Power BI service for your organization. So you will get greater scale and performance. For more details, please check the online document.

 

 

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the others find it more quickly.
Anonymous
Not applicable

Thanks for your reply. Would you please provide me with any definitive answer on why the minimum time required to refresh each entity, no matter how small the size or how simple the transformation, is 30s?

Hi @Anonymous ,

 

As the online document, there is no instruction about that. Power BI dataflows use the Power BI data refresh process to keep your data up to date. So it is impacted by many factors like network , performance of underlying datasource and size of data etc..

 

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the others find it more quickly.
Anonymous
Not applicable

With respect, I wouldn't consider this one-size-fits-all generic answer where you said the dataflow refresh is "impacted by many factors like network , performance of underlying datasource and size of data etc" acceptable. I understand that this must be a contributing factor, but it definitely wouldn't explain the whole thing. The reasons are two folds:

  • The minumum time I have ever able to achieve for a datwflow with the simplest possible entity (sigle-cell dummy table created with #table function with hardcoded value) is 31 seconds. The result is consistent after several runs. See below example.

Dataflow Refresh - Single Entity - 1.png

 

 

 

 

 

 

 

Dataflow Refresh - Single Entity - 2.png

 

 

 

 

 

Dataflow Refresh - Single Entity - 3.png

 

 

 

 

 

 

 

Dataflow Refresh - Single Entity - 4.png

 

  • For dataflow with multiple enties, the fluctuation we have observed in the time it take to refresh is too marginal, and infact, unbelievably consistent! For example, I have a data flow that has 9 entities. Some of them are very complicated and involve multiple intermediate tables and lookups. Over the past two weeks of scheduled daily refresh, the time it took to refresh has always been between 4'40" and 4'42". Which is is a perfect function of 9*30+(10~12) seconds. This is what led me to come up with this "A dataflow with N entities would take about N*30 + M seconds" theory. I then went back to look at the history of this dataflow when it had fewer entities, and the refresh times were following the above pattern. I am happy to attribute the flucturation between M seconds to the unpredicatability you mentioned, and am totally OK with that. But what I'm really asking here is why the minimum 30 seconds wait for each entity we add?

Dataflow Rrefresh History - 9 Entities.png

 

 

 

 

 

 

 

 

It would be really helpful if you can either:

  • Communicate with the engineering team and acquire some insights on why the 30s wait is necessary; or
  • Confirm this is a bug that can be fixed in future releases; or
  • Confirm this is deliberate resource throttling for non-premium users to encourage us to move to premium; or
  • Tell me how I can build my dataflow better to achive a faster load time.

I see your response was thrown in the "too hard to respond bucket"

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors