Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
newtoPowerBI
New Member

Query Performance Slow 2,500 KB File Reading as 6GB

Hello,

 

I need help figuring out why a small .csv file is coming into Power Query and reading as 6GB. I have a few merged queries in my workbook so I am wondering if that is whats causuing the slow down. Can anyone help?

 

Below is the source query

 

let
Source = Table.Buffer(Csv.Document(File.Contents("C:\Users\eddy.gramajo\Desktop\Invoice Upload\Billing Report (Post-Upload).csv"),[Delimiter=",", Columns=22, Encoding=1252, QuoteStyle=QuoteStyle.None])),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"OPPORTUNITY_ID", type text}, {"Arrangement Id", type text}, {"Customer Name", type text}, {"Bill To Customer Name", type text}, {"Item Number", type text}, {"Item Description", type text}, {"Sales Order", Int64.Type}, {"Sales Order Line", Int64.Type}, {"Sales Order Id", Int64.Type}, {"Sales Order Line Id", Int64.Type}, {"Qty Ordered", type text}, {"So Book Date", type text}, {"Qty Shipped", type text}, {"Invoice Number", type text}, {"Invoice Line Id", Int64.Type}, {"Invoice Line", Int64.Type}, {"Qty Invoiced", type number}, {"Invoice Date", type date}, {"Ext Sell Price", type number}, {"Rec Amt", type number}, {"Def Amt", type number}, {"Cancelled Flag", type text}})
in
#"Changed Type"

 

 

Here is an example of a merged query

 

let
Source = Table.Buffer(Csv.Document(File.Contents("C:\Users\eddy.gramajo\Desktop\Invoice Upload\INV_API_01Nov2017_30Nov2017_3cded995-0889-4798-97cb-86c7ff08e2d3-2017.12.05 (1).csv"),[Delimiter=",", Columns=73, Encoding=1252, QuoteStyle=QuoteStyle.None])),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"TRAN_TYPE", type text}, {"OPPORTUNITY_ID", type text}, {"OPPORTUNITY_NAME", type text}, {"PRODUCT_LINE", type text}, {"PRODUCT_FAMILY", type text}, {"PRODUCT_CLASS", type text}, {"PRODUCT_CATEGORY", type text}, {"BILL_TO_CUSTOMER_NUMBER", Int64.Type}, {"BILL_TO_CUSTOMER_NAME", type text}, {"BILL_TO_STATE", type text}, {"BILL_TO_COUNTRY", type text}, {"SHIP_TO_CUSTOMER_NUMBER", Int64.Type}, {"SHIP_TO_CUSTOMER_NAME", type text}, {"SHIP_TO_STATE_OR_PROVINCE", type text}, {"SHIP_TO_COUNTRY", type text}, {"PO_NUMBER", type text}, {"SO_BOOK_DATE", type date}, {"SALES_ORDER", Int64.Type}, {"SALES_ORDER_ID", Int64.Type}, {"SALES_ORDER_LINE_ID", Int64.Type}, {"SALES_ORDER_LINE", Int64.Type}, {"ORDER_TYPE", type text}, {"INVOICE_DATE", type date}, {"INVOICE_NUMBER", type text}, {"INVOICE_ID", Int64.Type}, {"INVOICE_LINE", Int64.Type}, {"INVOICE_LINE_ID", Int64.Type}, {"ORIG_INV_LINE_ID", type text}, {"ITEM_ID", Int64.Type}, {"ITEM_NUMBER", type text}, {"ITEM_DESCRIPTION", type text}, {"QUANTITY_ORDERED", type text}, {"QUANTITY_INVOICED", type number}, {"QUANTITY_TIER", type text}, {"UNIT_OF_MEASURE", type text}, {"UNIT_LIST_PRICE", type number}, {"UNIT_SELL_PRICE", type number}, {"EXT_LIST_PRICE", type number}, {"EXT_SELL_PRICE", type number}, {"RULE_START_DATE", type date}, {"RULE_END_DATE", type date}, {"DURATION_IN_MONTHS", Int64.Type}, {"TRANS_CURR_CODE", type text}, {"BASE_CURR_CODE", type text}, {"EX_RATE", Int64.Type}, {"RCURR_EX_RATE", Int64.Type}, {"DEF_ACCTG_SEG1", Int64.Type}, {"REV_ACCTG_SEG1", Int64.Type}, {"DEF_ACCTG_SEG2", type text}, {"REV_ACCTG_SEG2", type text}, {"DEFERRED_REVENUE_FLAG", type text}, {"UNBILLED_ACCOUNTING_FLAG", type text}, {"UNDELIVERED_FLAG", type text}, {"ELIGIBLE_FOR_FV", type text}, {"ELIGIBLE_FOR_CV", type text}, {"STANDALONE_FLAG", type text}, {"FLAG_97_2", type text}, {"PCS_FLAG", type text}, {"RETURN_FLAG", type text}, {"CANCELLED_FLAG", type text}, {"ORG_ID", Int64.Type}, {"BUSINESS_UNIT", type text}, {"REGION", type text}, {"DEPARTMENT", type text}, {"SALES_TEAM", type text}, {"DISTRIBUTION_TYPE", type text}, {"SOB_ID", Int64.Type}, {"CUSTOMER_CLASS", type text}, {"INDUSTRY", type text}, {"METRO", type text}, {"GLOBAL_EMP_COUNT", type text}, {"OPEN_JOB_COUNT", type text}, {"SEC_ATTR_VALUE", Int64.Type}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"QUANTITY_TIER", "GLOBAL_EMP_COUNT", "OPEN_JOB_COUNT"}),
#"Merged Queries" = Table.NestedJoin(#"Removed Columns",{"SALES_ORDER_LINE_ID"},#"All Arrangement Report",{"Sales Order Line Id"},"All Arrangement Report",JoinKind.LeftOuter),
#"Expanded All Arrangement Report" = Table.ExpandTableColumn(#"Merged Queries", "All Arrangement Report", {"Ext Sell Price"}, {"All Arrangement Report.Ext Sell Price"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded All Arrangement Report", each ([All Arrangement Report.Ext Sell Price] <> null)),
#"Merged Queries1" = Table.NestedJoin(#"Filtered Rows",{"INVOICE_LINE_ID"},#"Billing Report (Pre-Upload)",{"Invoice Line Id"},"Billing Report (Pre-Upload)",JoinKind.LeftOuter),
#"Expanded Billing Report (Pre-Upload)" = Table.ExpandTableColumn(#"Merged Queries1", "Billing Report (Pre-Upload)", {"Invoice Line Id"}, {"Billing Report (Pre-Upload).Invoice Line Id"}),
#"Filtered Rows1" = Table.SelectRows(#"Expanded Billing Report (Pre-Upload)", each ([#"Billing Report (Pre-Upload).Invoice Line Id"] = null)),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"Billing Report (Pre-Upload).Invoice Line Id"}),
#"Filtered Rows2" = Table.SelectRows(#"Removed Columns1", each [DEPARTMENT] = "Sales and Marketing : B2B Sales and Marketing : 110 Sales - Employer"),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows2",{{"All Arrangement Report.Ext Sell Price", "SO Amount"}}),
#"Merged Queries2" = Table.NestedJoin(#"Renamed Columns",{"SALES_ORDER_LINE_ID"},#"Billing Report (Pre-Upload)",{"Sales Order Line Id"},"Billing Report (Pre-Upload)",JoinKind.LeftOuter),
#"Aggregated Billing Report (Pre-Upload)" = Table.AggregateTableColumn(#"Merged Queries2", "Billing Report (Pre-Upload)", {{"Ext Sell Price", List.Sum, "Sum of Billing Report (Pre-Upload).Ext Sell Price"}}),
#"Replaced Value" = Table.ReplaceValue(#"Aggregated Billing Report (Pre-Upload)",null,0,Replacer.ReplaceValue,{"Sum of Billing Report (Pre-Upload).Ext Sell Price"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "Total Billings to Date", each [#"Sum of Billing Report (Pre-Upload).Ext Sell Price"] +[EXT_SELL_PRICE]),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Sum of Billing Report (Pre-Upload).Ext Sell Price", type number}, {"Total Billings to Date", type number}}),
#"Rounded Off" = Table.TransformColumns(#"Changed Type1",{{"Total Billings to Date", each Number.Round(_, 2), type number}}),
#"Added Custom1" = Table.AddColumn(#"Rounded Off", "Rebill?", each if [Total Billings to Date] > [SO Amount] then "Yes" else "No" ),
#"Filtered Rows3" = Table.SelectRows(#"Added Custom1", each ([#"Rebill?"] = "No")),
#"Renamed Columns1" = Table.RenameColumns(#"Filtered Rows3",{{"Sum of Billing Report (Pre-Upload).Ext Sell Price", "Sum of Billing Report Ext Sell Price"}})
in
#"Renamed Columns1"

5 REPLIES 5
yoava
Regular Visitor

Hi @v-shex-msft @newtoPowerBI,

 

I am also experianceing unusual performance degredataion with small files being read as very large ones

I have a 2,500 KB csv that shows the same behaviour described above. 

 

 

 

let
    Source = Csv.Document(File.Contents("C:\input.csv"),[Delimiter=",", Columns=2, Encoding=65001, QuoteStyle=QuoteStyle.Csv]),
    #"Use First Row as Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Trimmed Text" = Table.TransformColumns(#"Use First Row as Headers",{{"ADD_CONFIG", Text.Trim, type text}}),
    #"Cleaned Text" = Table.TransformColumns(#"Trimmed Text",{{"ADD_CONFIG", Text.Clean, type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Cleaned Text", each [ADD_CONFIG] <> null and [ADD_CONFIG] <> ""),
    #"Parsed XML" = Table.TransformColumns(#"Filtered Rows",{{"ADD_CONFIG", Xml.Tables}}),
    #"Expanded ADD_CONFIG" = Table.ExpandTableColumn(#"Parsed XML", "ADD_CONFIG", List.Union(List.Transform(#"Parsed XML"[ADD_CONFIG], each Table.ColumnNames(_))))
in
    #"Expanded ADD_CONFIG"

 

 

Using some trial and errors, i figured the performance degradation is directly related to the following expression:

 

    #"Expanded ADD_CONFIG" = Table.ExpandTableColumn(#"Parsed XML", "ADD_CONFIG", List.Union(List.Transform(#"Parsed XML"[ADD_CONFIG], each Table.ColumnNames(_))))

 

If i replace it with an actual comma seperated list (generated using the same List.Union(List.Transform)), the expansion is immediate:

 

= Table.ExpandTableColumn(#"Parsed XML", "ADD_CONFIG", {"foobar0","foobar1","foobar10","foobar100","foobar101","foobar102","foobar103","foobar104","foobar107","foobar108","foobar109","foobar11","foobar110","foobar112","foobar12","foobar13","foobar14","foobar15","foobar16","foobar17","foobar18","foobar19","foobar2","foobar20","foobar21","foobar22","foobar23","foobar24","foobar25","foobar27","foobar28","foobar29","foobar3","foobar30","foobar31","foobar32","foobar33","foobar34","foobar35","foobar36","foobar37","foobar38","foobar39","foobar4","foobar40","foobar41","foobar42","foobar43","foobar44","foobar45","foobar46","foobar47","foobar48","foobar49","foobar5","foobar50","foobar51","foobar52","foobar53","foobar54","foobar55","foobar56","foobar57","foobar58","foobar59","foobar6","foobar60","foobar61","foobar62","foobar63","foobar64","foobar65","foobar67","foobar68","foobar69","foobar7","foobar70","foobar71","foobar72","foobar73","foobar74","foobar75","foobar76","foobar77","foobar78","foobar79","foobar8","foobar81","foobar82","foobar83","foobar84","foobar85","foobar86","foobar87","foobar88","foobar89","foobar9","foobar91","foobar92","foobar93","foobar94","foobar95","foobar96","foobar97","foobar98","foobar99"})

 

 Note that this expansion does not create additional rows, only additional columns. 

v-shex-msft
Community Support
Community Support

Hi @newtoPowerBI,

 

Can you please share pbix file and the csv file for test ? It is hard to find out the issue from your query formula.

 

Notice: please do mask on sensitive data before sharing.

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

HI @v-shex-msft

 

It looks like when I merge and aggregate data from another table it increases the size significantly. In one case I merged a query and aggregated a sum of a column and the file read as 10GB. Does anyone know a way to not have this happen?


Thank you!


Eddy

Hi @newtoPowerBI,

 

Have you try to expand summary columns in your merge table?(column which stored list, records, tables )

When you expand merge table, it will generate lots duplicate rows.

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

HI @v-shex-msft,

 

Here is a link to the data where most of the query is derived from.

 

https://www.dropbox.com/sh/5kzn1vm38mkis87/AABSRMdhGpGDi6TsOSkpZ5u5a?dl=0

 

Thank you!

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.