Solved: Re: power query refresh really slow- excel file

Anonymous · ‎11-20-2019

Hi Everyone,

New to Power BI and Power Query here.

I'm trying to combine 10 excel files in a folder (each one is about 15 MB), when I apply changes, it took me like 4hours to refresh the query and used all of the memory and CPU!!

I've unchecked a few boxes in Settings to help speed up the load and read countless threads about buffer and folding but they don't seem to be applicable for me or I simply don't know how to use it.

Can anyone please take a look at my query and let me know how I can fix it?

Many Thanks!!!

let
Source=Folder.Files("M:\Dept\Global Interim Data Set (GIDS)"),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Content", "Name"}),
#"Added Custom" = Table.AddColumn(#"Removed Other Columns", "Custom", each Excel.Workbook([Content])),
#"Removed Columns" = Table.RemoveColumns(#"Added Custom",{"Content"}),
#"Expanded Custom" = Table.ExpandTableColumn(#"Removed Columns", "Custom", {"Name", "Data", "Item", "Kind", "Hidden"}, {"Custom.Name", "Custom.Data", "Custom.Item", "Custom.Kind", "Custom.Hidden"}),
#"Filtered Rows1" = Table.SelectRows(#"Expanded Custom", each Text.Contains([Custom.Name], "Client") and not Text.Contains([Custom.Name], "$")),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"Custom.Name", "Custom.Kind", "Custom.Hidden"}),
#"obtain date" = Table.AddColumn(#"Removed Columns1","Date", each Text.Select([Name],{"0".."9"})),
#"Removed Columns2" = Table.RemoveColumns(#"obtain date",{"Name", "Custom.Item"}),
SkipRows = Table.AddColumn(#"Removed Columns2","Skip first 4",each Table.PromoteHeaders(Table.Skip([Custom.Data],4))),
#"Removed Columns4" = Table.RemoveColumns(SkipRows,{"Custom.Data"}),
unioncol = List.Union(
List.Buffer(
List.Transform(#"Removed Columns4"[Skip first 4], each Table.ColumnNames(_)))),
ExpandCol = Table.Buffer(
Table.ExpandTableColumn(#"Removed Columns4", "Skip first 4",unioncol)),
RemoveEmpty = FnRemoveEmptyColumns(ExpandCol),
ColName = Table.ColumnNames(RemoveEmpty),

#"Removed Duplicates" = Table.Buffer(Table.Distinct(RemoveEmpty, List.Range(Table.ColumnNames(RemoveEmpty),1,162)))
in
#"Removed Duplicates"

dax · ‎11-20-2019

Hi ddm123,

I find a similar post for this problem, you could try to refer to Any way to speed up really slow refresh times in Power Query? | Power Query | Excel Forum for details.

Best Regards,
Zoe Zhi

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Jimmy801 · ‎12-15-2019

Hello @Anonymous

I undestand. Then I can't figure out how these 4 lines of everey sheet are removed after you've combined them.

I would go for

- Create one query that threahts your file like you need it

- Create a parameter with one path/name of a sample path

- replace the filename in your first query with the parameter and convert it to a function

- use the data source to extract the file names of this folder... in case do some filtering

- add new column where you use your funciton and path and name as input.

That should do the trick

If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

View solution in original post

dax · ‎11-20-2019

Hi ddm123,

I find a similar post for this problem, you could try to refer to Any way to speed up really slow refresh times in Power Query? | Power Query | Excel Forum for details.

Best Regards,
Zoe Zhi

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Anonymous · ‎12-12-2019

Hi @dax ,

I read the post and converted my file to .csv that did decrease the refreshing time to ~3 hours (slightly increase). is there any other way to speed it up??? Maybe I'm asking for too much....

Jimmy801 · ‎12-13-2019

Hello @Anonymous

very difficult to get an idea what exactly happens here.

- Columns were expanded and without use deleted afterwards,

- by extracting a plain xlsx-file i don't get how a column reference can be a Table in this case for "Table.Skip" on row level - SkipRows = Table.AddColumn(#"Removed Columns2","Skip first 4",each Table.PromoteHeaders(Table.Skip([Custom.Data],4))),

- a customfunction is used (not knowing what it does) - maybe use table.buffer in there

- I don't know if Table.Buffer at the end is very usefull

Maybe a change of concept in retrieving data would be best (maybe to create function to query the files, and put the data together at the end other change logic - difficult to answer if the real goal is not known.

BR

Jimmy

Anonymous · ‎12-14-2019

Hi @Jimmy801 ,

thanks so much for the response. I put the "Table.Skip" because for all the files i'm combining the first 4 rows were not needed to transfer to the query, also the cutsom function was used to remove all the empty columns for the files. I will try to create a function to connect the data to see how much of the difference it will make. Each file contains like over 100 columns and by using power query I'm trying to combine the data together as the files in the folder will be added on a monthly basis, then build a data model out of the flat files with 100 columns here.

I'm just so frustrated that it takes so long to combine the files every time I refresh the data model. One other thing is it seems like for every step added on PQ, the data was retrieved from the very first step that I don't really understand why and I think that's another issue that causes the data super slow to load. by adding Table.Buffer step I thought the purpose is to cache the table in the memory to accelerate the data load. However, I realized that when adding a step after the buffer step, the data was still pulling from the be beginning... Am I understand how table.bufffer works wrong?

Jimmy801 · ‎12-15-2019

Hello @Anonymous

I undestand. Then I can't figure out how these 4 lines of everey sheet are removed after you've combined them.

I would go for

- Create one query that threahts your file like you need it

- Create a parameter with one path/name of a sample path

- replace the filename in your first query with the parameter and convert it to a function

- use the data source to extract the file names of this folder... in case do some filtering

- add new column where you use your funciton and path and name as input.

That should do the trick

If this post helps or solves your problem, please mark it as solution.
Kudos are nice to - thanks
Have fun

Jimmy

Anonymous · ‎12-19-2019

Hi @Jimmy801 ,

Very good suggestion. It decreased the loading time a lot!!!

power query refresh really slow- excel file

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024