Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Performance issues while using one dataflow as source for many Datasets

We have a dataflow that's a bit large, with around 55 million records and we use it as a data source for four datasets. 

All datasets use Import mode to get the data from this dataflow. 

Three of those, use only this dataflow as a data source. The fourth uses this same dataflow plus another dataflow which is small with ~100k records only.

 

We have noticed that when we trigger those four datasets to refresh concurrently, the performance is much worse then if we trigger them individually, one at a time. At first we thought it could be a matter of server resources, but we tried running those on an almost dedicated premium P1 capacity and we still found issues, including an error with the following message:

Data source error:The operation was throttled by Power BI Premium because there were too many datasets being processed concurrently. Please try again later.


To put that to the test, we tried refreshing 5 big datasets, all having big dataflows as sources as well, but different dataflows, not the same one.  We did not see the same performance issues in that test, refresh times weren't far away from the expected.

 

I know dataflows are CSV files in Azure Data Lake in the background, so I'm wondering if anybody knows if there's some good practice to follow here, with regards to sharing a dataflow to be imported by many datasets. Is there such a thing as a dataset locking down a dataflow? For example, making it unavailable for other datasets?

If that is the case, then the more concurrent dataset refreshes are executed using the same dataflow, the slower the performance would be. If that is so, I'd be surprised since one of the main advantages of a dataflow is exactly reusability. 

 

Are we missing something here? Is there some setting in the premium capacity that can enhance parallel reading of a dataflow for datasets?

3 REPLIES 3
V-lianl-msft
Community Support
Community Support

Hi @Anonymous ,

 

See if it will help you.

https://radacad.com/how-to-use-dataflow-to-make-the-refresh-of-power-bi-solution-faster 

 

Best Regards,
Liang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous
Not applicable

Hi Liang,

 

The article you shared only speaks about creating dataflows and then using them as a datasource for your datasets, which is what I stated we're already doing. 

 

Thanks,

Daniel

lbendlin
Super User
Super User

Your P1 capacity has a very limited set of renderers - six concurrent and 2400 per hour. What I would do is investigate if your dataset refresh is calling the dataflow twice, similar to what is described in this blog https://blog.crossjoin.co.uk/2020/07/05/why-is-power-bi-running-my-sql-query-twice/ . 

 

Have you tried to use the dataflow source in direct query mode?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors