Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
sjrrkb123
Helper III
Helper III

Combining Multiple Parquet files into a table in Dataflows Very Slow

I have two dataflows pointing to the same set of data. Dataflow A is on SQL server (on prem) and dataflow B is in multiple .parquet files that need to be combined into a single table.

 

When I run the dataflow A, it takes about 1minute and 30seconds to do a select * from the table. Below is a screenshot of the refresh time

sjrrkb123_2-1695751854467.png

 

 

When I run dataflow B, it takes 9-10minutes to get the same amount of data. Below is the screenshot of the power query:

sjrrkb123_0-1695751124608.png

    Below is the refresh time (the top is using [HierarchicalNavigation=true] and the bottom query is without it)

sjrrkb123_1-1695751216421.png

 

A couple of things:

  1. The parquet files are stored in an ALDS gen 2 datalake
  2. They are the output of processing from a spark notebook in Azure Synapse
  3. The sql server table has 3.3million rows
  4. This is using an On Prem gateway to manage the datasources 

 

Why is my ALDS gen 2 load so much slower than on prem?

1 ACCEPTED SOLUTION

Turns out our best solution will be to spin up a vm, install a gateway on it, and use it for the dataflow refresh from alds gen2 datalakes.

View solution in original post

3 REPLIES 3
lbendlin
Super User
Super User

Don't use an on-prem gateway for your Azure data sources. You may not even need a VNet gateway.

Turns out our best solution will be to spin up a vm, install a gateway on it, and use it for the dataflow refresh from alds gen2 datalakes.

FYI you cannot use VNet gateway for power bi dataflows.
What is a virtual network (VNet) data gateway (Preview) | Microsoft Learn

I will look into other options and get back t oyou

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

Top Solution Authors
Top Kudoed Authors