cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
newtoPowerBI
New Member

Query Performance Slow 2,500 KB File Reading as 6GB

Hello,

 

I need help figuring out why a small .csv file is coming into Power Query and reading as 6GB. I have a few merged queries in my workbook so I am wondering if that is whats causuing the slow down. Can anyone help?

 

Below is the source query

 

let
Source = Table.Buffer(Csv.Document(File.Contents("C:\Users\eddy.gramajo\Desktop\Invoice Upload\Billing Report (Post-Upload).csv"),[Delimiter=",", Columns=22, Encoding=1252, QuoteStyle=QuoteStyle.None])),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"OPPORTUNITY_ID", type text}, {"Arrangement Id", type text}, {"Customer Name", type text}, {"Bill To Customer Name", type text}, {"Item Number", type text}, {"Item Description", type text}, {"Sales Order", Int64.Type}, {"Sales Order Line", Int64.Type}, {"Sales Order Id", Int64.Type}, {"Sales Order Line Id", Int64.Type}, {"Qty Ordered", type text}, {"So Book Date", type text}, {"Qty Shipped", type text}, {"Invoice Number", type text}, {"Invoice Line Id", Int64.Type}, {"Invoice Line", Int64.Type}, {"Qty Invoiced", type number}, {"Invoice Date", type date}, {"Ext Sell Price", type number}, {"Rec Amt", type number}, {"Def Amt", type number}, {"Cancelled Flag", type text}})
in
#"Changed Type"

 

 

Here is an example of a merged query

 

let
Source = Table.Buffer(Csv.Document(File.Contents("C:\Users\eddy.gramajo\Desktop\Invoice Upload\INV_API_01Nov2017_30Nov2017_3cded995-0889-4798-97cb-86c7ff08e2d3-2017.12.05 (1).csv"),[Delimiter=",", Columns=73, Encoding=1252, QuoteStyle=QuoteStyle.None])),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"TRAN_TYPE", type text}, {"OPPORTUNITY_ID", type text}, {"OPPORTUNITY_NAME", type text}, {"PRODUCT_LINE", type text}, {"PRODUCT_FAMILY", type text}, {"PRODUCT_CLASS", type text}, {"PRODUCT_CATEGORY", type text}, {"BILL_TO_CUSTOMER_NUMBER", Int64.Type}, {"BILL_TO_CUSTOMER_NAME", type text}, {"BILL_TO_STATE", type text}, {"BILL_TO_COUNTRY", type text}, {"SHIP_TO_CUSTOMER_NUMBER", Int64.Type}, {"SHIP_TO_CUSTOMER_NAME", type text}, {"SHIP_TO_STATE_OR_PROVINCE", type text}, {"SHIP_TO_COUNTRY", type text}, {"PO_NUMBER", type text}, {"SO_BOOK_DATE", type date}, {"SALES_ORDER", Int64.Type}, {"SALES_ORDER_ID", Int64.Type}, {"SALES_ORDER_LINE_ID", Int64.Type}, {"SALES_ORDER_LINE", Int64.Type}, {"ORDER_TYPE", type text}, {"INVOICE_DATE", type date}, {"INVOICE_NUMBER", type text}, {"INVOICE_ID", Int64.Type}, {"INVOICE_LINE", Int64.Type}, {"INVOICE_LINE_ID", Int64.Type}, {"ORIG_INV_LINE_ID", type text}, {"ITEM_ID", Int64.Type}, {"ITEM_NUMBER", type text}, {"ITEM_DESCRIPTION", type text}, {"QUANTITY_ORDERED", type text}, {"QUANTITY_INVOICED", type number}, {"QUANTITY_TIER", type text}, {"UNIT_OF_MEASURE", type text}, {"UNIT_LIST_PRICE", type number}, {"UNIT_SELL_PRICE", type number}, {"EXT_LIST_PRICE", type number}, {"EXT_SELL_PRICE", type number}, {"RULE_START_DATE", type date}, {"RULE_END_DATE", type date}, {"DURATION_IN_MONTHS", Int64.Type}, {"TRANS_CURR_CODE", type text}, {"BASE_CURR_CODE", type text}, {"EX_RATE", Int64.Type}, {"RCURR_EX_RATE", Int64.Type}, {"DEF_ACCTG_SEG1", Int64.Type}, {"REV_ACCTG_SEG1", Int64.Type}, {"DEF_ACCTG_SEG2", type text}, {"REV_ACCTG_SEG2", type text}, {"DEFERRED_REVENUE_FLAG", type text}, {"UNBILLED_ACCOUNTING_FLAG", type text}, {"UNDELIVERED_FLAG", type text}, {"ELIGIBLE_FOR_FV", type text}, {"ELIGIBLE_FOR_CV", type text}, {"STANDALONE_FLAG", type text}, {"FLAG_97_2", type text}, {"PCS_FLAG", type text}, {"RETURN_FLAG", type text}, {"CANCELLED_FLAG", type text}, {"ORG_ID", Int64.Type}, {"BUSINESS_UNIT", type text}, {"REGION", type text}, {"DEPARTMENT", type text}, {"SALES_TEAM", type text}, {"DISTRIBUTION_TYPE", type text}, {"SOB_ID", Int64.Type}, {"CUSTOMER_CLASS", type text}, {"INDUSTRY", type text}, {"METRO", type text}, {"GLOBAL_EMP_COUNT", type text}, {"OPEN_JOB_COUNT", type text}, {"SEC_ATTR_VALUE", Int64.Type}}),
#"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"QUANTITY_TIER", "GLOBAL_EMP_COUNT", "OPEN_JOB_COUNT"}),
#"Merged Queries" = Table.NestedJoin(#"Removed Columns",{"SALES_ORDER_LINE_ID"},#"All Arrangement Report",{"Sales Order Line Id"},"All Arrangement Report",JoinKind.LeftOuter),
#"Expanded All Arrangement Report" = Table.ExpandTableColumn(#"Merged Queries", "All Arrangement Report", {"Ext Sell Price"}, {"All Arrangement Report.Ext Sell Price"}),
#"Filtered Rows" = Table.SelectRows(#"Expanded All Arrangement Report", each ([All Arrangement Report.Ext Sell Price] <> null)),
#"Merged Queries1" = Table.NestedJoin(#"Filtered Rows",{"INVOICE_LINE_ID"},#"Billing Report (Pre-Upload)",{"Invoice Line Id"},"Billing Report (Pre-Upload)",JoinKind.LeftOuter),
#"Expanded Billing Report (Pre-Upload)" = Table.ExpandTableColumn(#"Merged Queries1", "Billing Report (Pre-Upload)", {"Invoice Line Id"}, {"Billing Report (Pre-Upload).Invoice Line Id"}),
#"Filtered Rows1" = Table.SelectRows(#"Expanded Billing Report (Pre-Upload)", each ([#"Billing Report (Pre-Upload).Invoice Line Id"] = null)),
#"Removed Columns1" = Table.RemoveColumns(#"Filtered Rows1",{"Billing Report (Pre-Upload).Invoice Line Id"}),
#"Filtered Rows2" = Table.SelectRows(#"Removed Columns1", each [DEPARTMENT] = "Sales and Marketing : B2B Sales and Marketing : 110 Sales - Employer"),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows2",{{"All Arrangement Report.Ext Sell Price", "SO Amount"}}),
#"Merged Queries2" = Table.NestedJoin(#"Renamed Columns",{"SALES_ORDER_LINE_ID"},#"Billing Report (Pre-Upload)",{"Sales Order Line Id"},"Billing Report (Pre-Upload)",JoinKind.LeftOuter),
#"Aggregated Billing Report (Pre-Upload)" = Table.AggregateTableColumn(#"Merged Queries2", "Billing Report (Pre-Upload)", {{"Ext Sell Price", List.Sum, "Sum of Billing Report (Pre-Upload).Ext Sell Price"}}),
#"Replaced Value" = Table.ReplaceValue(#"Aggregated Billing Report (Pre-Upload)",null,0,Replacer.ReplaceValue,{"Sum of Billing Report (Pre-Upload).Ext Sell Price"}),
#"Added Custom" = Table.AddColumn(#"Replaced Value", "Total Billings to Date", each [#"Sum of Billing Report (Pre-Upload).Ext Sell Price"] +[EXT_SELL_PRICE]),
#"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Sum of Billing Report (Pre-Upload).Ext Sell Price", type number}, {"Total Billings to Date", type number}}),
#"Rounded Off" = Table.TransformColumns(#"Changed Type1",{{"Total Billings to Date", each Number.Round(_, 2), type number}}),
#"Added Custom1" = Table.AddColumn(#"Rounded Off", "Rebill?", each if [Total Billings to Date] > [SO Amount] then "Yes" else "No" ),
#"Filtered Rows3" = Table.SelectRows(#"Added Custom1", each ([#"Rebill?"] = "No")),
#"Renamed Columns1" = Table.RenameColumns(#"Filtered Rows3",{{"Sum of Billing Report (Pre-Upload).Ext Sell Price", "Sum of Billing Report Ext Sell Price"}})
in
#"Renamed Columns1"

5 REPLIES 5
yoava
Regular Visitor

Hi @v-shex-msft @newtoPowerBI,

 

I am also experianceing unusual performance degredataion with small files being read as very large ones

I have a 2,500 KB csv that shows the same behaviour described above. 

 

 

 

let
    Source = Csv.Document(File.Contents("C:\input.csv"),[Delimiter=",", Columns=2, Encoding=65001, QuoteStyle=QuoteStyle.Csv]),
    #"Use First Row as Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
    #"Trimmed Text" = Table.TransformColumns(#"Use First Row as Headers",{{"ADD_CONFIG", Text.Trim, type text}}),
    #"Cleaned Text" = Table.TransformColumns(#"Trimmed Text",{{"ADD_CONFIG", Text.Clean, type text}}),
    #"Filtered Rows" = Table.SelectRows(#"Cleaned Text", each [ADD_CONFIG] <> null and [ADD_CONFIG] <> ""),
    #"Parsed XML" = Table.TransformColumns(#"Filtered Rows",{{"ADD_CONFIG", Xml.Tables}}),
    #"Expanded ADD_CONFIG" = Table.ExpandTableColumn(#"Parsed XML", "ADD_CONFIG", List.Union(List.Transform(#"Parsed XML"[ADD_CONFIG], each Table.ColumnNames(_))))
in
    #"Expanded ADD_CONFIG"

 

 

Using some trial and errors, i figured the performance degradation is directly related to the following expression:

 

    #"Expanded ADD_CONFIG" = Table.ExpandTableColumn(#"Parsed XML", "ADD_CONFIG", List.Union(List.Transform(#"Parsed XML"[ADD_CONFIG], each Table.ColumnNames(_))))

 

If i replace it with an actual comma seperated list (generated using the same List.Union(List.Transform)), the expansion is immediate:

 

= Table.ExpandTableColumn(#"Parsed XML", "ADD_CONFIG", {"foobar0","foobar1","foobar10","foobar100","foobar101","foobar102","foobar103","foobar104","foobar107","foobar108","foobar109","foobar11","foobar110","foobar112","foobar12","foobar13","foobar14","foobar15","foobar16","foobar17","foobar18","foobar19","foobar2","foobar20","foobar21","foobar22","foobar23","foobar24","foobar25","foobar27","foobar28","foobar29","foobar3","foobar30","foobar31","foobar32","foobar33","foobar34","foobar35","foobar36","foobar37","foobar38","foobar39","foobar4","foobar40","foobar41","foobar42","foobar43","foobar44","foobar45","foobar46","foobar47","foobar48","foobar49","foobar5","foobar50","foobar51","foobar52","foobar53","foobar54","foobar55","foobar56","foobar57","foobar58","foobar59","foobar6","foobar60","foobar61","foobar62","foobar63","foobar64","foobar65","foobar67","foobar68","foobar69","foobar7","foobar70","foobar71","foobar72","foobar73","foobar74","foobar75","foobar76","foobar77","foobar78","foobar79","foobar8","foobar81","foobar82","foobar83","foobar84","foobar85","foobar86","foobar87","foobar88","foobar89","foobar9","foobar91","foobar92","foobar93","foobar94","foobar95","foobar96","foobar97","foobar98","foobar99"})

 

 Note that this expansion does not create additional rows, only additional columns. 

v-shex-msft
Community Support
Community Support

Hi @newtoPowerBI,

 

Can you please share pbix file and the csv file for test ? It is hard to find out the issue from your query formula.

 

Notice: please do mask on sensitive data before sharing.

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

HI @v-shex-msft

 

It looks like when I merge and aggregate data from another table it increases the size significantly. In one case I merged a query and aggregated a sum of a column and the file read as 10GB. Does anyone know a way to not have this happen?


Thank you!


Eddy

Hi @newtoPowerBI,

 

Have you try to expand summary columns in your merge table?(column which stored list, records, tables )

When you expand merge table, it will generate lots duplicate rows.

 

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

HI @v-shex-msft,

 

Here is a link to the data where most of the query is derived from.

 

https://www.dropbox.com/sh/5kzn1vm38mkis87/AABSRMdhGpGDi6TsOSkpZ5u5a?dl=0

 

Thank you!

Helpful resources

Announcements
Power BI October Update 2021.jpg

Power BI Release

Click here to read more about the October 2021 Release!

Microsoft Ignite 768x460.png

Find your focus

Explore the latest tools,training sessions,technical expertise, networking and more.

Power BI Womens Summit 2021 768 x460.jpg

Interviews, learning sessions, allies, and more!

#PowerBIWomenSummit

Teds Dev Camp Oct. 2021 768x460.jpg

Power BI Dev Camp - October 28th, 2021

Mark your calendars and join us for our next Power BI Dev Camp!