Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Does having too many columns in the query editor slow down a report / dataset or have other impacts?

I am having a debate with our sysadmin.   

 

He likes to put every single column from our systems (ERP, CRM, etc) into the dataflows that we use to create our datasets.    He feels that I can delete them in the query editor when creating a report and then when the report gets published the response time and refresh times won't be affected because the columns are not in the actual report.  

 

My feeling is that the report will be affected from a user experience perspective, but also from other issues like opening an App that contains this reports (I've noticed it can take up to 45 seconds to load an App for the first time).    Or that the load on the system for background refreshes will also be affected by having all of those columns in the dataset (and the dataflows).   

 

Can anyone shed some light on this?   Thanks!

1 ACCEPTED SOLUTION

@Anonymous Well, yes, not having query folding slows down data load in general. However, not all data sources support query folding. So, if your data source doesn't support query folding, then fine but in that case, then absolutely if you do not need the columns they should never show up to Power Query because the operation of removing those columns is causing processing to happen on the gateway (even if they are cloud only resources it uses Microsoft's cloud gateway). In short, it is causing extra processing to happen when loading the data. First, the source system has to render that data and ship it over the network. Second, your data refresh has to ingest that data and then get rid of it. Those are all extra network packets and cycles that would not happen if the "view" from the data source didn't include those extra columns to begin with.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

4 REPLIES 4
Greg_Deckler
Super User
Super User

@Anonymous If the extraneous columns actually make it to the data model then it definitely impacts dataset size and then potentially user experience. If you remove the extraneous columns in Power Query Editor then it comes down to whether or not query folding is happening. If you can right click on the last step in the query and you have the option to "View Native Query" then query folding is happening and thus the removal steps of the query are not impacting data load. If that is not the case, then there are steps happening locally on the gateway that are definitely impacting data refresh speeds.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler   Thanks Greg.  I've just checked a number of queries that all come from dataflows (which themselves are coming from our ERP system).   View Native Query is greyed out on all of them.    So if I understand you correctly, this is not a good thing?    We don't use a gateway as far as I know.   

@Anonymous Well, yes, not having query folding slows down data load in general. However, not all data sources support query folding. So, if your data source doesn't support query folding, then fine but in that case, then absolutely if you do not need the columns they should never show up to Power Query because the operation of removing those columns is causing processing to happen on the gateway (even if they are cloud only resources it uses Microsoft's cloud gateway). In short, it is causing extra processing to happen when loading the data. First, the source system has to render that data and ship it over the network. Second, your data refresh has to ingest that data and then get rid of it. Those are all extra network packets and cycles that would not happen if the "view" from the data source didn't include those extra columns to begin with.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler   Awesome.  Thank you Greg.   And thanks for all you and the other MVP's do for this community!   

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.