Optimising The Performance Of Power Query Merges In Power BI, Part 2: Does When You Remove Columns M...

jthomson · ‎06-17-2021

Hi,

Quick one, and from what I understand of Power Query, the answer's probably that it doesn't matter, but let's say I make a simple query by importing in a .csv file, let PBI promote the headers and auto convert the data types, then I delete a large percentage of the columns. I believe the way Power Query works is that it kind of goes backwards, and would recognise that it doesn't need to do any data type conversion on the columns that eventually get deleted? Is there any performance benefit in going back through my query and removing the conversion of a field that's eventually removed?

d_gosbell · ‎06-17-2021

@jthomson wrote:

Is there any performance benefit in going back through my query and removing the conversion of a field that's eventually removed?

No, your understanding is correct. Conversion steps on columns that are not referenced in the final output should not be executed.

The big issue with sources like CSV files is that there is no query engine behind them. If you have a 100Mb csv and you only need 10% of the columns then query engine cannot just ask for the 10Mb of data that it needs, it still has to load the entire 100Mb and parse through it to pull out just the columns that are needed.

With "smarter" data sources like relational databases Power Query can "fold" in operations like filtering and remove columns to create a more targetted query against the database to only request the subset of columns and rows that you need.

View solution in original post

v-luwang-msft · ‎06-21-2021

HI @jthomson ,

In my opinion，power query would recognise that it doesn't need to do any data type conversion on the columns that eventually get deleted.both operations are consistent in terms of performance.

I read an article that explored Query performance performance similar to yours, and the article explored：

Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?

Conclusion after testing：

Removing unwanted columns in a step immediately after a merge results in the same performance benefits as removing unwanted columns in a step immediately before a merge

These results may surprise some of you. Why does removing columns after a merge perform so well? The answer is that just because you remove columns from a table in the step after a merge, it doesn’t mean that the Power Query engine actually removes the columns after it has performed the merge. On the contrary, it’s clever enough to realise that if you remove columns in the step after a merge then there’s no need for those columns to be present at the point where it performs the merge.

To learn more details,you could read the following article:

Optimising The Performance Of Power Query Merges In Power BI, Part 2: Does When You Remove Columns M...

Wish it is helpful for you!

Best Regards

Lucien

View solution in original post

v-luwang-msft · ‎06-21-2021

HI @jthomson ,

In my opinion，power query would recognise that it doesn't need to do any data type conversion on the columns that eventually get deleted.both operations are consistent in terms of performance.

I read an article that explored Query performance performance similar to yours, and the article explored：

Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?

Conclusion after testing：

Removing unwanted columns in a step immediately after a merge results in the same performance benefits as removing unwanted columns in a step immediately before a merge

These results may surprise some of you. Why does removing columns after a merge perform so well? The answer is that just because you remove columns from a table in the step after a merge, it doesn’t mean that the Power Query engine actually removes the columns after it has performed the merge. On the contrary, it’s clever enough to realise that if you remove columns in the step after a merge then there’s no need for those columns to be present at the point where it performs the merge.

To learn more details,you could read the following article:

Optimising The Performance Of Power Query Merges In Power BI, Part 2: Does When You Remove Columns M...

Wish it is helpful for you!

Best Regards

Lucien

d_gosbell · ‎06-17-2021

@jthomson wrote:

Is there any performance benefit in going back through my query and removing the conversion of a field that's eventually removed?

No, your understanding is correct. Conversion steps on columns that are not referenced in the final output should not be executed.

The big issue with sources like CSV files is that there is no query engine behind them. If you have a 100Mb csv and you only need 10% of the columns then query engine cannot just ask for the 10Mb of data that it needs, it still has to load the entire 100Mb and parse through it to pull out just the columns that are needed.

With "smarter" data sources like relational databases Power Query can "fold" in operations like filtering and remove columns to create a more targetted query against the database to only request the subset of columns and rows that you need.

Query performance question

Optimising The Performance Of Power Query Merges In Power BI, Part 2: Does When You Remove Columns M...

Optimising The Performance Of Power Query Merges In Power BI, Part 2: Does When You Remove Columns M...

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly