cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Helper III
Helper III

Combing thousands of excel files using Power Query. Is my workflow correct?

Hi, 

 

I've built a power query function to extract data from excel files. The excel files are in the same format. The files are stored on a Onedrive folder. I've successfully queried the data using get data > sharepoint folder and then filtering and then invoking the custom function to extract the data from columns to rows. In the end I get data from each excel file to be a row. This works. 

 

However, it takes quite some time to refresh this data and the source will grow quite a bit. Right now we are looking at over 2000 excel files. Is there another option I should be looking for to speed this process? I've built it such that a master excel file is doing the power query and then I'm using that as the source for a Power BI dashboard that allows me to look across all the data. How should I go about to speed up this process and making it scaleable? Help. Thanks in advance.  

1 ACCEPTED SOLUTION

Hello @ttseng 

 

exactly, I only added Table.Buffer

So give it a test and let us know

 

Jimmy

View solution in original post

11 REPLIES 11
Super User III
Super User III

Hello @ttseng 

have you been able to solve the problem with the replies given?

If so, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

All the best

Jimmy

Hi @Jimmy801 ,

 

Thanks for the follow up. 

 

My code has changed a bit since I last posted, but wanted to confirm that the only change was adding "table.buffer" at the following section:

 

Daily_Data_Entry_Sheet = Table.Buffer(Source{[Item="Daily_Data_Entry",Kind="Sheet"]}[Data]),

 

if so, i can test it on my current query function.

 

Thanks again,

 

Tony

Hello @ttseng 

 

exactly, I only added Table.Buffer

So give it a test and let us know

 

Jimmy

View solution in original post

Hi @Jimmy801 

 

Sorry for the late reply on this, but I was finally able to do a little testing on this. 

 

I ended up just directly querying the files into power bi and skipping the excel step. I was having trouble setting up auto refresh of the data connection in excel (i wanted to not have to open the file and hit refresh). I copied the powery query function into my PBI and rebuilt the connection. I then buit two different versions with one with the table.buffer and then uploaded it to the pbi service and then setup auto refresh.

 

Interestingly, it appears to take a little longer. Both refresh was scheduled for 11:00pm.

 

tablebuffer.PNG

 

Thanks.

Hello @ttseng 

 

was the initially request, do combine thousands of Excel-files, fullfilled?

 

Hello

have you been able to solve the problem with the replies given?
If so, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

All the best

Jimmy

Super User III
Super User III

Hello

The approach is all right. If you want you can post your custom function (and if possible a excel file), then i can have a look on this. But reading 2k excel files will take some time. By the way... how long does it take to refresh? Another suggestion is, to use a csv file instead, because they can easier be accessed. But i don't know if this could be a option for you.

All the best

Jimmy

Thanks. 

This is the master template that is being used. I'm extracting the indicator table (C14:E33). I have the function write the province,  district, and facility and then pivot the indicators to columns for each excel file so I get a master table with each row representing each excel file. 

 

Capture.PNG

 

It takes about 20-30 minutes to refresh.

 

Is CSV a feasible option?

Should I be doing this query in Power BI rather than have the excel doing the power query?

Would it be possible to autoamte the refresh without having to open the excel file and udpate the query?

Hello

Could you please post the m code from the advanced editor?

Jimmy

Sure thing. 

(ExcelFiles) =>
let
    Source = Excel.Workbook(ExcelFiles, null, true),
    Daily_Data_Entry_Sheet = Source{[Item="Daily_Data_Entry",Kind="Sheet"]}[Data],
    #"Promoted Headers" = Table.PromoteHeaders(Daily_Data_Entry_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Column1", type text}, {"Column2", type any}, {"DAILY MONITORING TOOL : Instruction  >>  : Please Select Province , District and then Facility ", type text}, {"Column4", type any}, {"Column5", type any}, {"Column6", type any}, {"Column7", type any}, {"Column8", type any}, {"Column9", type any}, {"Column10", type any}, {"Column11", type any}, {"Column12", type any}, {"Column13", type any}}),
    #"Select Columns" = Table.SelectColumns(#"Changed Type",{"DAILY MONITORING TOOL : Instruction  >>  : Please Select Province , District and then Facility ", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13"}),
    #"Removed Columns" = Table.RemoveColumns(#"Select Columns",{"Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Province", each #"Changed Type"[#"DAILY MONITORING TOOL : Instruction  >>  : Please Select Province , District and then Facility "]{2}),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "District", each #"Changed Type"[Column4]{2}),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Facility ", each #"Changed Type"[Column5]{2}),
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Date", each #"Changed Type"[Column4]{54}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom3",{{"Date", type text}}),
    #"Added Custom4" = Table.AddColumn(#"Changed Type1", "First Name", each #"Changed Type"[Column5]{52}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Added Custom4",{{"First Name", type text}}),
    #"Added Custom5" = Table.AddColumn(#"Changed Type2", "Last Name", each #"Changed Type"[Column5]{53}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Added Custom5",{{"Last Name", type text}}),
    #"Added Custom6" = Table.AddColumn(#"Changed Type3", "Name", each [First Name] & " "&[Last Name]),
    #"Changed Type4" = Table.TransformColumnTypes(#"Added Custom6",{{"Name", type text}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Changed Type4",{"Column6", "First Name", "Last Name"}),
    #"Removed Top Rows" = Table.Skip(#"Removed Columns1",13),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Top Rows",{{"DAILY MONITORING TOOL : Instruction  >>  : Please Select Province , District and then Facility ", "Indicator"}, {"Column4", "Male"}, {"Column5", "Female"}}),
    #"Kept Range of Rows" = Table.Range(#"Renamed Columns",0,19),
    #"Changed Type5" = Table.TransformColumnTypes(#"Kept Range of Rows",{{"Male", type text}, {"Female", type text}}),
    #"Replaced Value1" = Table.ReplaceValue(#"Changed Type5",null,"Null",Replacer.ReplaceValue,{"Male"}),
    #"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",null,"Null",Replacer.ReplaceValue,{"Female"}),
    #"Added Custom7" = Table.AddColumn(#"Replaced Value2", "Custom", each Text.Combine({[Indicator], "_Male", [Male]}, "-")),
    #"Added Custom8" = Table.AddColumn(#"Added Custom7", "Custom.1", each Text.Combine({[Indicator], "_Female", [Female]}, "-")),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom8",{"Indicator", "Male", "Female"}),
    #"Added Custom9" = Table.AddColumn(#"Removed Columns2", "Indicator", each Text.Combine({[Custom], [Custom.1]}, ";")),
    #"Removed Columns3" = Table.RemoveColumns(#"Added Custom9",{"Custom", "Custom.1"}),
    #"Added Custom10" = Table.AddColumn(#"Removed Columns3", "Custom", each Text.Split([Indicator],";")),
    #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom10", "Custom"),
    #"Removed Columns4" = Table.RemoveColumns(#"Expanded Custom",{"Indicator"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Removed Columns4",{{"Custom", "Indicator"}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Renamed Columns1", "Indicator", Splitter.SplitTextByEachDelimiter({"-"}, QuoteStyle.Csv, true), {"Indicator.1", "Indicator.2"}),
    #"Changed Type6" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Indicator.1", type text}, {"Indicator.2", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type6","_Male",null,Replacer.ReplaceValue,{"Indicator.2"}),
    #"Pivoted Column" = Table.Pivot(#"Replaced Value", List.Distinct(#"Replaced Value"[Indicator.1]), "Indicator.1", "Indicator.2")
in
    #"Pivoted Column"

Hello @ttseng 

 

try this function and let us know if there was an impact on performance

(ExcelFiles) =>
let
    Source = Excel.Workbook(ExcelFiles, null, true),
    Daily_Data_Entry_Sheet = Table.Buffer(Source{[Item="Daily_Data_Entry",Kind="Sheet"]}[Data]),
    #"Promoted Headers" = Table.PromoteHeaders(Daily_Data_Entry_Sheet, [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Column1", type text}, {"Column2", type any}, {"DAILY MONITORING TOOL : Instruction  >>  : Please Select Province , District and then Facility ", type text}, {"Column4", type any}, {"Column5", type any}, {"Column6", type any}, {"Column7", type any}, {"Column8", type any}, {"Column9", type any}, {"Column10", type any}, {"Column11", type any}, {"Column12", type any}, {"Column13", type any}}),
    #"Select Columns" = Table.SelectColumns(#"Changed Type",{"DAILY MONITORING TOOL : Instruction  >>  : Please Select Province , District and then Facility ", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13"}),
    #"Removed Columns" = Table.RemoveColumns(#"Select Columns",{"Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13"}),
    #"Added Custom" = Table.AddColumn(#"Removed Columns", "Province", each #"Changed Type"[#"DAILY MONITORING TOOL : Instruction  >>  : Please Select Province , District and then Facility "]{2}),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "District", each #"Changed Type"[Column4]{2}),
    #"Added Custom2" = Table.AddColumn(#"Added Custom1", "Facility ", each #"Changed Type"[Column5]{2}),
    #"Added Custom3" = Table.AddColumn(#"Added Custom2", "Date", each #"Changed Type"[Column4]{54}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom3",{{"Date", type text}}),
    #"Added Custom4" = Table.AddColumn(#"Changed Type1", "First Name", each #"Changed Type"[Column5]{52}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Added Custom4",{{"First Name", type text}}),
    #"Added Custom5" = Table.AddColumn(#"Changed Type2", "Last Name", each #"Changed Type"[Column5]{53}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Added Custom5",{{"Last Name", type text}}),
    #"Added Custom6" = Table.AddColumn(#"Changed Type3", "Name", each [First Name] & " "&[Last Name]),
    #"Changed Type4" = Table.TransformColumnTypes(#"Added Custom6",{{"Name", type text}}),
    #"Removed Columns1" = Table.RemoveColumns(#"Changed Type4",{"Column6", "First Name", "Last Name"}),
    #"Removed Top Rows" = Table.Skip(#"Removed Columns1",13),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Top Rows",{{"DAILY MONITORING TOOL : Instruction  >>  : Please Select Province , District and then Facility ", "Indicator"}, {"Column4", "Male"}, {"Column5", "Female"}}),
    #"Kept Range of Rows" = Table.Range(#"Renamed Columns",0,19),
    #"Changed Type5" = Table.TransformColumnTypes(#"Kept Range of Rows",{{"Male", type text}, {"Female", type text}}),
    #"Replaced Value1" = Table.ReplaceValue(#"Changed Type5",null,"Null",Replacer.ReplaceValue,{"Male"}),
    #"Replaced Value2" = Table.ReplaceValue(#"Replaced Value1",null,"Null",Replacer.ReplaceValue,{"Female"}),
    #"Added Custom7" = Table.AddColumn(#"Replaced Value2", "Custom", each Text.Combine({[Indicator], "_Male", [Male]}, "-")),
    #"Added Custom8" = Table.AddColumn(#"Added Custom7", "Custom.1", each Text.Combine({[Indicator], "_Female", [Female]}, "-")),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom8",{"Indicator", "Male", "Female"}),
    #"Added Custom9" = Table.AddColumn(#"Removed Columns2", "Indicator", each Text.Combine({[Custom], [Custom.1]}, ";")),
    #"Removed Columns3" = Table.RemoveColumns(#"Added Custom9",{"Custom", "Custom.1"}),
    #"Added Custom10" = Table.AddColumn(#"Removed Columns3", "Custom", each Text.Split([Indicator],";")),
    #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom10", "Custom"),
    #"Removed Columns4" = Table.RemoveColumns(#"Expanded Custom",{"Indicator"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Removed Columns4",{{"Custom", "Indicator"}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Renamed Columns1", "Indicator", Splitter.SplitTextByEachDelimiter({"-"}, QuoteStyle.Csv, true), {"Indicator.1", "Indicator.2"}),
    #"Changed Type6" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Indicator.1", type text}, {"Indicator.2", type text}}),
    #"Replaced Value" = Table.ReplaceValue(#"Changed Type6","_Male",null,Replacer.ReplaceValue,{"Indicator.2"}),
    #"Pivoted Column" = Table.Pivot(#"Replaced Value", List.Distinct(#"Replaced Value"[Indicator.1]), "Indicator.1", "Indicator.2")
in
    #"Pivoted Column"

 

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

I'm not sure a custom function is needed to do this aggregation. If you are simply appending the files to each other, this is a built-in function. It may be faster to append all to one dataset and then group the results.

 

I have a blog post on how to use the out of box merge process here: https://marqueeinsights.com/how-to-merge-multiple-excel-files-with-power-bi/

 

Hope this helps.

--Treb, Power BI MVP

 

 

https://getstartedwithpowerbi.com

Helpful resources

Announcements
secondImage

Congratulations!

We are excited to announce the Power BI Super Users!

Wave Release 2

Check out the updates in Power BI.

Overview of Power BI 2020 release wave 2!

Microsoft Ignite

Microsoft Ignite

Join digitally, March 2–4, 2021 to explore new tech that's ready to implement. Experience the keynote in mixed reality through AltspaceVR!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

Top Solution Authors