Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi,
Quick one, and from what I understand of Power Query, the answer's probably that it doesn't matter, but let's say I make a simple query by importing in a .csv file, let PBI promote the headers and auto convert the data types, then I delete a large percentage of the columns. I believe the way Power Query works is that it kind of goes backwards, and would recognise that it doesn't need to do any data type conversion on the columns that eventually get deleted? Is there any performance benefit in going back through my query and removing the conversion of a field that's eventually removed?
Solved! Go to Solution.
@jthomson wrote:
Is there any performance benefit in going back through my query and removing the conversion of a field that's eventually removed?
No, your understanding is correct. Conversion steps on columns that are not referenced in the final output should not be executed.
The big issue with sources like CSV files is that there is no query engine behind them. If you have a 100Mb csv and you only need 10% of the columns then query engine cannot just ask for the 10Mb of data that it needs, it still has to load the entire 100Mb and parse through it to pull out just the columns that are needed.
With "smarter" data sources like relational databases Power Query can "fold" in operations like filtering and remove columns to create a more targetted query against the database to only request the subset of columns and rows that you need.
HI @jthomson ,
In my opinion,power query would recognise that it doesn't need to do any data type conversion on the columns that eventually get deleted.both operations are consistent in terms of performance.
I read an article that explored Query performance performance similar to yours, and the article explored:
Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?
Conclusion after testing:
Removing unwanted columns in a step immediately after a merge results in the same performance benefits as removing unwanted columns in a step immediately before a merge
These results may surprise some of you. Why does removing columns after a merge perform so well? The answer is that just because you remove columns from a table in the step after a merge, it doesn’t mean that the Power Query engine actually removes the columns after it has performed the merge. On the contrary, it’s clever enough to realise that if you remove columns in the step after a merge then there’s no need for those columns to be present at the point where it performs the merge.
To learn more details,you could read the following article:
Wish it is helpful for you!
Best Regards
Lucien
HI @jthomson ,
In my opinion,power query would recognise that it doesn't need to do any data type conversion on the columns that eventually get deleted.both operations are consistent in terms of performance.
I read an article that explored Query performance performance similar to yours, and the article explored:
Does it make a difference to Power Query merge performance if you remove unwanted columns from your source tables in the step before the merge or in the step afterwards?
Conclusion after testing:
Removing unwanted columns in a step immediately after a merge results in the same performance benefits as removing unwanted columns in a step immediately before a merge
These results may surprise some of you. Why does removing columns after a merge perform so well? The answer is that just because you remove columns from a table in the step after a merge, it doesn’t mean that the Power Query engine actually removes the columns after it has performed the merge. On the contrary, it’s clever enough to realise that if you remove columns in the step after a merge then there’s no need for those columns to be present at the point where it performs the merge.
To learn more details,you could read the following article:
Wish it is helpful for you!
Best Regards
Lucien
@jthomson wrote:
Is there any performance benefit in going back through my query and removing the conversion of a field that's eventually removed?
No, your understanding is correct. Conversion steps on columns that are not referenced in the final output should not be executed.
The big issue with sources like CSV files is that there is no query engine behind them. If you have a 100Mb csv and you only need 10% of the columns then query engine cannot just ask for the 10Mb of data that it needs, it still has to load the entire 100Mb and parse through it to pull out just the columns that are needed.
With "smarter" data sources like relational databases Power Query can "fold" in operations like filtering and remove columns to create a more targetted query against the database to only request the subset of columns and rows that you need.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
98 | |
77 | |
66 | |
54 |
User | Count |
---|---|
144 | |
104 | |
101 | |
86 | |
64 |