cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
AlexMB
Regular Visitor

Much more data when connecting to dataflow than directly to SQL

I've set up a dataflow that pulls selected tables from our ERP SQL.

 

When I connect to the dataflow from PBI Desktop it loads several GB's of data even though I'm selecting only a few 100 lines from two tables.

 

If I connect directly to the SQL I can load the lines in seconds. Using the dataflow it takes several minutes.

 

Am I doing something wrong?

 

My approach in both cases is the same:

  • Get Data (SQL vs Dataflow)
  • Select tables (two, in this case)
  • Transform data (filtered down to a few 100 lines in each table)
1 ACCEPTED SOLUTION

Yes. When you use a Dataflow as a source, to get folding, the Enhanced Compute Engine has to be turned on (premium capacity) and you need to use the Dataflows connector (not the Power BI Dataflows one).

Pat

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


View solution in original post

4 REPLIES 4
AlexMB
Regular Visitor

Thanks @mahoneypat 

 

Could you point me one step further?

 

Google is giving me lots of contradictory information about query folding regarding dataflows. I don't know the term, so don't know what I'm looking for.

 

One thing I see mentioned is the "Enhanced Compute Engine". The dataflow doesn't sit within a premium capacity. Is this the root of my issue, perhaps?

Yes. When you use a Dataflow as a source, to get folding, the Enhanced Compute Engine has to be turned on (premium capacity) and you need to use the Dataflows connector (not the Power BI Dataflows one).

Pat

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Thanks. So outside of Premium, Dataflows isn't really a viable option, since any datasets will have to pull entire tables every time.

mahoneypat
Super User
Super User

There is likely something breaking "query folding" in your query. The use of a SQL statement returns only the desired rows, but you should be able to get similar refresh time if you maintain query folding. There are indicators in the query editor for Dataflows to show if it is in place or not, and you can modify/rearrange your steps (e.g., do filtering and column selection first) to potentially maintain it.

 

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Helpful resources

Announcements
Microsoft Build 768x460.png

Microsoft Build is May 24-26. Have you registered yet?

Come together to explore latest innovations in code and application development—and gain insights from experts from around the world.

May 23 2022 epsiode 5 without aka link.jpg

The Power BI Community Show

Welcome to the Power BI Community Show! Jeroen ter Heerdt talks about the importance of Data Modeling.

Power BI Dev Camp Session 22 with aka link 768x460.jpg

Check it out!

Mark your calendars and join us on Thursday, May 26 at 11a PDT for a great session with Ted Pattison!

charticulator_carousel_with_text (1).png

Charticulator Design Challenge

Put your data visualization and design skills to the test! This exciting challenge is happening now through May 31st!