Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
m13eam
Resolver II
Resolver II

Use of dataflows considerably slowing dataset refresh - any reason why?

I have a situation where I need to load a 60000 csv file into PowerQuery for modelling into a star schema. We don't have Premium so I cannot chain dataflows. Loading the file into a dataflow, and using this as a source for a dataset is causing that dataset to refresh MUCH slower than simply loading the csv directly into a dataset and refreshing that dataset. Does anyone know why?

 

Full background:

Plan a: My initial approach was to load the csv file into a dataflow and perform initial cleansing and transformations on it. Then, within the dataset, use the dataflow as a source and perform the steps required to create the dimensions and facts. However, whilst this approach worked when using a sample extract of 100 rows, it didn't scale. The dataflow refreshed ok with 60k rows, but the dataset didn't - it would always time out. For 100 rows, it took 90 seconds. For 500 rows, it would take 350-450 seconds. It would not complete for all 60000 rows.

 

Plan b: I then created a different dataset that used the csv file as its source. The dataset then performed all the steps the dataflow did, plus all the dim/fact creation that the dataset in Plan a did - so the dataset in plan b is doing more work than in plan a. For 100 rows, it took 5 seconds. For 500 rows, it took 13 seconds. It was able to handle all 60000 rows in just under 10 minutes.

 

So, I'll clearly be going with plan b! But I don't understand why the dataset was so slow in plan a (when it was using the dataflow source that already had transformations done to it). Any ideas would be most welcome!

2 REPLIES 2
v-yinliw-msft
Community Support
Community Support

Hi @m13eam ,

 

Complex or large dataflows can take more time to refresh, as can poorly optimized dataflows. The following sections provide guidance on how to mitigate long refresh durations.

Especially with a lot of transformations because using shared capacity it is sharing the memory and CPU.

 

For more information, you can refer these documentations:

Understanding and optimizing dataflows refresh - Power BI | Microsoft Learn

Dataflows Limitations, restrictions and supported connectors and features - Power BI | Microsoft Lea...

 

Hope these help you.

 

Best Regards,

Community Support Team _Yinliw

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

 

Thanks but the issue I'm having is not with the dataflow. My issue is with the dataset. In otherwise identical datasets:

  • when it uses a dataflow as its source, it is slow
  • when it uses a csv file as its source, it is fast

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors