Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
dh731r
Frequent Visitor

XLSX file 10MB, but Power Query loads 100MB into Power Query

Hi,

 

My file size is only 10MB, but when I load it into Power Query, it seems to be loading 10 times the file size.

 

I do not have any tables or pivot tables in the dataset and am wondering why this is happening.

dh731r_1-1636050647545.png

 

 

Thanks,

Daniel

1 ACCEPTED SOLUTION

Hi @dh731r ,
I believe this is the problematic step:

    #"Filtered Rows2" = Table.SelectRows(#"Changed Type1", let latest = List.Max(#"Changed Type1"[Data Date]) in each [Data Date] = latest),

In there, you're referencing the previous whole table for each row in your table.
This is where buffering usually helps. 
If your source table is very large, you can buffer the list whose Max you are retrieving in a previous step like so:

 

BufferedList = List.Buffer(#"Changed Type1"[Data Date]),
#"Filtered Rows2" = Table.SelectRows(#"Changed Type1", let latest = List.Max( BufferedList ) in each [Data Date] = latest),

 

You can find more performance tips on my blog here: Speed/Performance aspects – The BIccountant

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

6 REPLIES 6
v-stephen-msft
Community Support
Community Support

Hi @dh731r ,

 

Chris Webb has written a lot about optimizing query performance using files as a data source. Consider the following link as a start: Chris Webb's BI Blog: Comparing The Performance Of CSV And Excel Data Sources In Power Query Chris W...

 

You might also considering using the M function Table.Buffer Table.Buffer - PowerQuery M | Microsoft Docs This can reduce the number of re-reading the base file.

 

Best Regards,

Stephen Tao

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

ImkeF
Super User
Super User

Hi @dh731r ,
usually PQ would load the source data twice: First while retrieving the metadata to generate the evaluation plan and then to fetch the data itself.
So you must have some transformations in your query(ies) that trigger multiple evaluations.
Would need to see the M-code to determine why that happens.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

dh731r
Frequent Visitor

Appreciate the help!

 

Here is my M code, doesn't seem to be doing anything different than what I am doing with my other queries, but the other queries are only loaded twice.

 

let
    Source = Folder.Files("U:\DA Database\Macro Inputs"),
    #"Filtered Rows" = Table.SelectRows(Source, each not Text.Contains([Name], "~")),
    #"VAR: Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Folder Path], "IFP")),
    // Name = BuildPlan
    #"VAR: Filtered Rows2" = Table.SelectRows(#"VAR: Filtered Rows1", each Text.Contains([Name], "BuildPlan")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"VAR: Filtered Rows2", "Name", Splitter.SplitTextByEachDelimiter({"__"}, QuoteStyle.Csv, true), {"Name.1", "Name.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Name.1", type text}, {"Name.2", type text}}),
    #"Extracted Text Before Delimiter" = Table.TransformColumns(#"Changed Type", {{"Name.2", each Text.BeforeDelimiter(_, "."), type text}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Extracted Text Before Delimiter",{"Content", "Name.1", "Name.2"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Name.2", "Data Date"}, {"Name.1", "File Name"}}),
    #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns","_","-",Replacer.ReplaceText,{"Data Date"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Replaced Value",{{"Data Date", type date}}),
    #"Filtered Rows2" = Table.SelectRows(#"Changed Type1", let latest = List.Max(#"Changed Type1"[Data Date]) in each [Data Date] = latest),
    #"Added Custom" = Table.AddColumn(#"Filtered Rows2", "Data Table", each Excel.Workbook([Content])),
    #"Expanded Data Table" = Table.ExpandTableColumn(#"Added Custom", "Data Table", {"Name", "Data", "Item", "Kind", "Hidden"}, {"Name", "Data", "Item", "Kind", "Hidden"}),
    // Hidden = FALSE
    #"VAR: Filtered Rows3" = Table.SelectRows(#"Expanded Data Table", each ([Hidden] = false)),
    // Name <> contain Pivot
    #"VAR: Filtered Rows4" = Table.SelectRows(#"VAR: Filtered Rows3", each not Text.Contains([Name], "Pivot")),
    #"Removed Other Columns1" = Table.SelectColumns(#"VAR: Filtered Rows4",{"Name", "Data"}),
    #"Added Custom1" = Table.AddColumn(#"Removed Other Columns1", "GetColumnNames", each Table.ColumnNames([Data])),
    #"Removed Other Columns3" = Table.SelectColumns(#"Added Custom1",{"GetColumnNames"}),
    #"Expanded GetColumnNames" = Table.ExpandListColumn(#"Removed Other Columns3", "GetColumnNames"),
    #"Removed Duplicates" = Table.Distinct(#"Expanded GetColumnNames"),
    #"Expanded Data" = Table.ExpandTableColumn(#"Removed Other Columns1", "Data", #"Removed Duplicates"[GetColumnNames]),
    #"Added Index" = Table.AddIndexColumn(#"Expanded Data", "Index", 0, 1, Int64.Type),
    #"Table Header Index" = Table.SelectRows(#"Added Index", each Text.Contains([Column1], "IFP BUILD"))[Index]{0},
    #"Table Skip by Index" = Table.Skip(#"Added Index", #"Table Header Index"),
    #"Promoted Headers" = Table.PromoteHeaders(#"Table Skip by Index", [PromoteAllScalars=true]),
    #"Renamed Columns2" = Table.RenameColumns(#"Promoted Headers",{{Table.ColumnNames(#"Promoted Headers"){0}, "Build Year"}}),
    #"Removed Other Columns2" = Table.SelectColumns(#"Renamed Columns2",{"Build Year", "Copper CLLIDA"}),
    #"Extracted Text Before Delimiter1" = Table.TransformColumns(#"Removed Other Columns2", {{"Build Year", each Text.BeforeDelimiter(_, " "), Int64.Type}}),
    #"Changed Type2" = Table.TransformColumnTypes(#"Extracted Text Before Delimiter1",{{"Copper CLLIDA", type text}}),
    ColumnNames = Table.ColumnNames(#"Changed Type2"),
    #"Converted to Table" = Table.FromList(ColumnNames, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Add Null" = Table.AddColumn(#"Converted to Table", "Null", each null),
    #"Renamed Columns1" = Table.RenameColumns(#"Add Null",{{"Column1", "ColumnHeader"}}),
    #"Transpose Table" = Table.Transpose(#"Renamed Columns1"),
    ColList = Table.ToColumns(#"Transpose Table"),
    #"Replace Error with null" = Table.ReplaceErrorValues(#"Changed Type2", ColList),
    #"Renamed Columns3" = Table.RenameColumns(#"Replace Error with null",{{"Build Year", "IFP Year"}}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Renamed Columns3",{{"IFP Year", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type3", {"IFP Year", "Copper CLLIDA"}, {{"Multiple PSA Count (IFP)", each Table.RowCount(_), Int64.Type}}),
    #"Added Custom2" = Table.AddColumn(#"Grouped Rows", "Multiple PSA Indicator (IFP)", each if [#"Multiple PSA Count (IFP)"] > 1 then "Yes" else "No"),
    #"Changed Type4" = Table.TransformColumnTypes(#"Added Custom2",{{"Multiple PSA Indicator (IFP)", type text}}),
    #"Grouped Rows1" = Table.Group(#"Changed Type4", {"Copper CLLIDA", "Multiple PSA Count (IFP)", "Multiple PSA Indicator (IFP)"}, {{"Count", each List.Max([IFP Year]), type nullable text}}),
    #"Renamed Columns4" = Table.RenameColumns(#"Grouped Rows1",{{"Count", "IFP Year"}}),
    #"Changed Type5" = Table.TransformColumnTypes(#"Renamed Columns4",{{"IFP Year", Int64.Type}})
in
    #"Changed Type5"

 

 

Hi @dh731r ,
I believe this is the problematic step:

    #"Filtered Rows2" = Table.SelectRows(#"Changed Type1", let latest = List.Max(#"Changed Type1"[Data Date]) in each [Data Date] = latest),

In there, you're referencing the previous whole table for each row in your table.
This is where buffering usually helps. 
If your source table is very large, you can buffer the list whose Max you are retrieving in a previous step like so:

 

BufferedList = List.Buffer(#"Changed Type1"[Data Date]),
#"Filtered Rows2" = Table.SelectRows(#"Changed Type1", let latest = List.Max( BufferedList ) in each [Data Date] = latest),

 

You can find more performance tips on my blog here: Speed/Performance aspects – The BIccountant

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

dh731r
Frequent Visitor

That helped to cut down to around 80 MB loaded. How do I find out what other steps causes additional evaluations?

Hi @dh731r ,

the next potential cause for performance problems is that you're referencing a nested object within an iterator:

ImkeF_0-1636143043905.png

That might also possibly cause duplicate evaluation.

As you are importing multiple files from a folder here, I'm wondering why you don't use the "standard-method" to do all the file-related transformations in the function procedure that will automatically be created if you click on expanding the binaries.
Hard to follow just the code and imagine what's actually happening, but you seem to do some heavy logic with column names from the actual files that might cause the whole content from the files beeing evaluated multiple times.

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors