Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Reply
mcolb88
Helper III
Helper III

Power Query Performance with Excel

I built a model in Power BI Desktop using Power Query to retrieve data from a folder of Excel files. I created a query that reads files from a folder, then invokes a function to parse each Excel file performing the necessary translations. When I apply the changes to retrieve all of the data, its loading over 1GB of data on a file that is only 1031 KB.  Any suggestions on what to look for that could be causing the mashup engine to read the file over and over again? 

 

The Excel files are generated by saving a PDF to Excel. This causes some odd formatting that reasults in an Excel file that contains 30 plus columns and each file has a variable number of columns to read. Due to the variable number of columns in each file I had to write a custom merge using List.Accumulate for reach record since the Text.Combine would not work with a list of columns the way I would have expected it to.

 

The way it's functioning, it's unusable from a speed perspective. Any help/suggestions are greatly appreicated.

 

The main query that loads the files form a folder is as follows:

 

let
Source = Folder.Files(SourceDataFolder & "\Examiner"),
#"Filtered Rows" = Table.SelectRows(Source, each (Text.StartsWith([Name], "ALL EXAMINERS ") and [Extension] = ".xlsx")),
#"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"Name", "Folder Path"}),
#"Invoked Custom Function" = Table.AddColumn(#"Removed Other Columns", "ERData", each LoadERFile([Folder Path] & [Name])),
#"Expanded ERData" = Table.ExpandTableColumn(#"Invoked Custom Function", "ERData", {"Order Number", "Cycle Date", "Fee Type", "Fee Amount"}, {"Order Number", "Cycle Date", "Fee Type", "Fee Amount"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Expanded ERData",{"Order Number", "Cycle Date", "Fee Type", "Fee Amount"})
in
#"Removed Other Columns1"

 

and the LoadERFile function that parses the data from a single file is below:

 

let
Source = (ExaminerReimbursementFileName as text) => let
Source = Excel.Workbook(File.Contents(ExaminerReimbursementFileName), null, true),
#"Removed Other Columns" = Table.SelectColumns(Source,{"Data"}),
ColumnList = Table.ColumnNames(#"Removed Other Columns"{0}[Data]),
ExpandedColumnList = List.Transform(ColumnList, each "Data." & _),
#"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns", "Data", Table.ColumnNames(#"Removed Other Columns"{0}[Data]), ExpandedColumnList),
#"Filtered Rows" = Table.SelectRows(#"Expanded Data", each (Text.Upper(Text.From([Data.Column3])) <> "MISC FEE TOTAL")),
TransformCommand = List.Transform(ExpandedColumnList, each {_, type text}),
#"Changed to Text" = Table.TransformColumnTypes(#"Filtered Rows", TransformCommand),
#"Added Index" = Table.AddIndexColumn(#"Changed to Text", "Index", 0, 1),
#"Added Merged Record" = Table.AddColumn(#"Added Index", "Merged", each List.Accumulate(Record.FieldValues(Record.SelectFields(#"Added Index"{[Index]}, ExpandedColumnList)), "", (state, current) => if current = null then state else state & Text.From(current) & "|")),
#"Added Upper Column1" = Table.AddColumn(#"Added Merged Record", "Column1", each Text.Upper(Text.From([Data.Column1], "en-US"))),
#"Added Order Number" = Table.AddColumn(#"Added Upper Column1", "Order Number", each if [Column1] = null then null
else if not Text.StartsWith([Column1], "CASE #") then null
else if Text.AfterDelimiter([Column1], "CASE #") = "" then Text.Trim([Data.Column2])
else Text.Trim(Text.AfterDelimiter([Column1], "CASE #", 1))),
#"Filled Down" = Table.FillDown(#"Added Order Number",{"Order Number"}),
#"Added Cycle Date" = Table.AddColumn(#"Filled Down", "Cycle Date", each if [Column1] <> null and Text.StartsWith([Column1],"CYCLE DATE") then Text.Clean(Text.AfterDelimiter([Column1],"CYCLE DATE")) else null),
#"Filled Down1" = Table.FillDown(#"Added Cycle Date",{"Cycle Date"}),
#"Added Fee Type" = Table.AddColumn(#"Filled Down1", "Fee Type", each
if Text.Contains([Merged], "Total") then null
else if Text.Contains([Merged], "Examiner Fee|") or Text.EndsWith([Merged], "Examiner Fee") then "Examiner Fee"
else if Text.Contains([Merged], "Mileage Fee|") or Text.EndsWith([Merged], "Mileage Fee") then "Mileage Fee"
else if Text.Contains([Merged], "Referral Fee|") or Text.EndsWith([Merged], "Referral Fee") then "Referral Fee"
else if Text.Contains([Merged], "Out of Area|") or Text.EndsWith([Merged], "Out of Area") then "Out of Area"
else if Text.Contains([Merged], "- Deductions|") or Text.EndsWith([Merged], "- Deductions") then "Deductions"
else null),
#"Filtered Rows1" = Table.SelectRows(#"Added Fee Type", each ([Fee Type] <> null)),
#"Added Fee Amount" = Table.AddColumn(#"Filtered Rows1", "Fee Amount", each Text.Remove(Text.AfterDelimiter([Merged], [Fee Type] &"|"), "|")),
#"Removed Other Columns1" = Table.SelectColumns(#"Added Fee Amount",{"Order Number", "Cycle Date", "Fee Type", "Fee Amount"})
in
#"Removed Other Columns1"
in
Source


Thanks

1 ACCEPTED SOLUTION

Thanks @v-yulgu-msft that's good information. 

 

I spent  more time disecting the queries and found the source of the problem.

 

Referencing the current record values inthe form of a list using Record.FieldValues(Record.SelectFields(#"Added Index"{[Index]}, ExpandedColumnList)) was causing Power Query to read much more data than was required. This was replaced with using _ to reference to the current record.

 

Using Record.Field(_, current) within the function of a List.Accumulate and iterating the ExpandedColumnList instead of a list of record values kept to reading only the current record and the performance is now back to just a few seconds. It would be very helpful to have a reference that helps us understand what the mashup engine is doing under the hood.

 

Thanks

 

View solution in original post

3 REPLIES 3
v-yulgu-msft
Employee
Employee

Hi @mcolb88,

 

For Power Query reference performance, you could see pqian's comment in this thread.

 

For some tips about how to reduce the memory consumption, please see whether below article is helpful to your scenario:

Performance Tip for Power BI; Enable Load Sucks Memory Up

 

Best regards,
Yuliana Gu

Community Support Team _ Yuliana Gu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks @v-yulgu-msft that's good information. 

 

I spent  more time disecting the queries and found the source of the problem.

 

Referencing the current record values inthe form of a list using Record.FieldValues(Record.SelectFields(#"Added Index"{[Index]}, ExpandedColumnList)) was causing Power Query to read much more data than was required. This was replaced with using _ to reference to the current record.

 

Using Record.Field(_, current) within the function of a List.Accumulate and iterating the ExpandedColumnList instead of a list of record values kept to reading only the current record and the performance is now back to just a few seconds. It would be very helpful to have a reference that helps us understand what the mashup engine is doing under the hood.

 

Thanks

 

mcolb88
Helper III
Helper III

So a little more investigation, I stripped the query back to determine where the slowness comes from, as soon as a reference any part of the Merged column, the mashup engine goes form reading 1M to several GB. Why isn't power query reading the data once and performing the translations without re-reading information?

 

Helpful resources

Announcements
March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Fabric Community Conference

Microsoft Fabric Community Conference

Join us at our first-ever Microsoft Fabric Community Conference, March 26-28, 2024 in Las Vegas with 100+ sessions by community experts and Microsoft engineering.

Fabric Partner Community

Microsoft Fabric Partner Community

Engage with the Fabric engineering team, hear of product updates, business opportunities, and resources in the Fabric Partner Community.