Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
OuluChris
Helper I
Helper I

Power Query Performance with Custom Functions applied to Excel Workbooks

Hi,

 

I'm encountering a severe performance issue when using Power Query in the following circumstances:

 

I'm trying to get the data from 263 Excel workbooks loaded into the data model (263 isn't a fixed number but I don't expect any big variations; say between 200 to 400).

 

There are four distinct data "regions" in the workbooks which I've named "Details", "Finance", "Workload" and "Revenue" and my intent is to get each of these out separately and then add them to the data model with relationships between them. I've only described the process for one of them below but they are all basically the same; it's just different areas of the workbooks that are being extracted.

 

To start I reference the template Excel workbook that has been used to create all the others like this:

 

let
Source = Folder.Files("\\server folder"),
#"\\server folder\_template name xlsm1" = Source{[#"Folder Path"="\\server folder\", Name = "template name.xlsm"]}[Content]
in
#"\\template name xlsm1"

 

I then create a parameter with a type of binary and the default value of the template file. I call this the template parameter

 

With this, I can create the first of my queries as follows:

 

let
Source = Excel.Workbook(#"Template Parameter", null, true),
Navigation = Source{[Name="Resources"]}[Data],
#"Removed Other Columns" = Table.SelectColumns(Navigation,{"Column3", "Column4"}),
#"Kept Range of Rows" = Table.Range(#"Removed Other Columns",3,9),
#"Removed Blank Rows" = Table.SelectRows(#"Kept Range of Rows", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
#"Transposed Table" = Table.Transpose(#"Removed Blank Rows"),
#"Promoted Headers" = Table.PromoteHeaders(#"Transposed Table", [PromoteAllScalars=true])
in
#"Promoted Headers"

 

I then create a function with this and call it fnDetails

 

I get a list of all the Excel files using Folder.Files and call this File List.

 

Lastly, I put the two together as follows:

 

let
Source = #"File List",
#"Invoked Custom Function" = Table.AddColumn(Source, "fnProjectDetails", each fnProjectDetails([Content])),
#"Removed Errors" = Table.RemoveRowsWithErrors(#"Invoked Custom Function", {"fnProjectDetails"}),
#"Expanded fnProjectDetails" = Table.ExpandTableColumn(#"Removed Errors", "fnProjectDetails", {"THE COLUMN NAMES BEING EXPANDED"})
in
#"Expanded fnProjectDetails"

 

This all seems like it should work. Unfortunately, whilst it does sorta work, it can take several hours to load the data which is basically pointless. It would be faster to open them all up and write the information down!

 

Task Manager doesn't show the CPU, memory, or disk doing anything unusual, and the network has plenty of spare capacity compared to what is actually being used.

 

I feel sure I'm misunderstanding something but several weeks worth of reading and experimenting has not got me over this issue.

 

Thanks,

 


Chris

4 REPLIES 4
mahoneypat
Employee
Employee

If you can provide a link to two mock files, a query can be provided that combines them as an example.  If you did not define tables for your data ranges (i.e., Format as Table or Ctrl-T), then you will have to extract the data from the whole sheet.  If possible, convert them to Tables and then you can quickly get the contents out with Kind = Table and Name = Details, etc. once you open it with Excel.Workbook.  Since your files already exist, you can't do that, so you will need to continue with the current approach.

Pat

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Jimmy801
Community Champion
Community Champion

Hello @OuluChris 

 

I've had the same requirements, but I had to read 2k+ Excel files and I can tell you that the refresh doesn't take too long. Here my concept

- I've created one function that takes the file path as parameter and the output is a table with 3 columns and 1 row for 3 differenct datasets (worksheets)

Jimmy801_0-1612264939376.png

then I created on basic query that reads one time from the folder and apply my function on every file. Then expand the 3 columns, surrounding it with table.buffer

Jimmy801_1-1612265087750.png

next step is to create for every dataset a new query, where I take my basic query, deleting not needed column, expand the columns of the table and in case manipulating further

Jimmy801_2-1612265198216.png

 

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

 

mahoneypat
Employee
Employee

What is the purpose of opening the template file for every file?  Why not just make 4 functions, each to extract the table from each workbook (Details, Finance, etc.).  I also wouldn't do the Table.Range and Table.SelectRows in the function, but would do that after the tables are combined from all templates.

 

If you do need something from the template file for each file, I would pull extract that in your final query and wrap it in Table.Buffer( ) so it only get called once.  If you can't do without the template file, using Table.Buffer will be key to decrease refresh time significantly (avoid doing it 4 X 263 times).

 

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Thanks for your thoughts.

 

I like the idea of not opening the template file for every file, and there isn't anything I need from the template, but I don't know how to write the custom function without reference to it. I based what I've got so far on what I read here: https://docs.microsoft.com/en-us/power-query/custom-function

(I didn't use one of the files I'm importing as the template as these will change over time)

 

Also, I don't understand how I could not use the Table.Range and Table.SelectRows in the function. If I don't use them in the function don't I just end up importing the whole sheet?

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

Top Solution Authors
Top Kudoed Authors