cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
jthomson
Solution Sage
Solution Sage

Query performance question

Hi,

 

Quick one, and from what I understand of Power Query, the answer's probably that it doesn't matter, but let's say I make a simple query by importing in a .csv file, let PBI promote the headers and auto convert the data types, then I delete a large percentage of the columns. I believe the way Power Query works is that it kind of goes backwards, and would recognise that it doesn't need to do any data type conversion on the columns that eventually get deleted? Is there any performance benefit in going back through my query and removing the conversion of a field that's eventually removed?

2 ACCEPTED SOLUTIONS
d_gosbell
Super User
Super User


@jthomson wrote:

Is there any performance benefit in going back through my query and removing the conversion of a field that's eventually removed?


No, your understanding is correct. Conversion steps on columns that are not referenced in the final output should not be executed. 

 

The big issue with sources like CSV files is that there is no query engine behind them. If you have a 100Mb csv and you only need 10% of the columns then query engine cannot just ask for the 10Mb of data that it needs, it still has to load the entire 100Mb and parse through it to pull out just the columns that are needed.

 

With "smarter" data sources like relational databases Power Query can "fold" in operations like filtering and remove columns to create a more targetted query against the database to only request the subset of columns and rows that you need.

View solution in original post

v-luwang-msft
Community Support
Community Support

HI @jthomson ,

In my opinion,power query would recognise that it doesn't need to do any data type conversion on the columns that eventually get deleted.both operations are consistent in terms of performance.

I read an article that explored Query performance performance similar to yours, and the article explored:

Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?

Conclusion after testing:

Removing unwanted columns in a step immediately after a merge results in the same performance benefits as removing unwanted columns in a step immediately before a merge

These results may surprise some of you. Why does removing columns after a merge perform so well? The answer is that just because you remove columns from a table in the step after a merge, it doesn’t mean that the Power Query engine actually removes the columns after it has performed the merge. On the contrary, it’s clever enough to realise that if you remove columns in the step after a merge then there’s no need for those columns to be present at the point where it performs the merge.

 

To learn more details,you could read the following article:

Optimising The Performance Of Power Query Merges In Power BI, Part 2: Does When You Remove Columns M...

 

 

Wish it is helpful for you!

 

Best Regards

Lucien

View solution in original post

2 REPLIES 2
v-luwang-msft
Community Support
Community Support

HI @jthomson ,

In my opinion,power query would recognise that it doesn't need to do any data type conversion on the columns that eventually get deleted.both operations are consistent in terms of performance.

I read an article that explored Query performance performance similar to yours, and the article explored:

Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?

Conclusion after testing:

Removing unwanted columns in a step immediately after a merge results in the same performance benefits as removing unwanted columns in a step immediately before a merge

These results may surprise some of you. Why does removing columns after a merge perform so well? The answer is that just because you remove columns from a table in the step after a merge, it doesn’t mean that the Power Query engine actually removes the columns after it has performed the merge. On the contrary, it’s clever enough to realise that if you remove columns in the step after a merge then there’s no need for those columns to be present at the point where it performs the merge.

 

To learn more details,you could read the following article:

Optimising The Performance Of Power Query Merges In Power BI, Part 2: Does When You Remove Columns M...

 

 

Wish it is helpful for you!

 

Best Regards

Lucien

View solution in original post

d_gosbell
Super User
Super User


@jthomson wrote:

Is there any performance benefit in going back through my query and removing the conversion of a field that's eventually removed?


No, your understanding is correct. Conversion steps on columns that are not referenced in the final output should not be executed. 

 

The big issue with sources like CSV files is that there is no query engine behind them. If you have a 100Mb csv and you only need 10% of the columns then query engine cannot just ask for the 10Mb of data that it needs, it still has to load the entire 100Mb and parse through it to pull out just the columns that are needed.

 

With "smarter" data sources like relational databases Power Query can "fold" in operations like filtering and remove columns to create a more targetted query against the database to only request the subset of columns and rows that you need.

View solution in original post

Helpful resources

Announcements
UG GA Amplification 768x460.png

Launching new user group features

Learn how to create your own user groups today!

November Power BI Update 768x460.png

Check it Out!

Click here to read more about the November 2021 Updates!

M365 768x460.jpg

Microsoft 365 Collaboration Conference | December 7–9, 2021

Join us, in-person, December 7–9 in Las Vegas, for the largest gathering of the Microsoft community in the world.