cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
bigsky5829
Frequent Visitor

Optimizing Sharepoint File Source

Hi,

 

I have some queries that take a while to load.

 

Let's say I have 5 source files in csv format.  I am loading each as follows:

let
Source = SharePoint.Contents("https://usicapps.sharepoint.com/sites/OperationalExcellenceTeam/", [ApiVersion = 15]){[Name="Shared Documents"]}[Content]{[Name="Data for Excel Dashboards"]}[Content],
#"Field Passed Certifications csv" = Source{[Name="Field Passed Certifications.csv"]}[Content],
#"Imported CSV" = Csv.Document(#"Field Passed Certifications csv",[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date Submit", type date}, {"Record ID", Int64.Type}, {"Employee ID", Int64.Type}, {"Observer ID", Int64.Type}, {"Certification Type", type text}, {"Employee Name", type text}, {"Observer Name", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Employee ID"}, {{"Cert Date", each List.Max([Date Submit]), type nullable date}}),
#"Renamed Columns" = Table.RenameColumns(#"Grouped Rows",{{"Employee ID", "employee id"}})
in
#"Renamed Columns"

 

Each of those queries are connections only.  They are referenced by 10 other queries with merge, expand, and group operations.  It takes 5-10 minutes for the 10 downstream queries to run. What I fear is happening is for the 10 downstream queries, they are all loading the 5 source files independently.  Is there a way to load the source file once, and then use the same load for all downstream queries?

 

I tried modifying the source file queries to add a table.buffer at the end step, but that didn't appear to improve performance:

 

let
Source = SharePoint.Contents("https://usicapps.sharepoint.com/sites/OperationalExcellenceTeam/", [ApiVersion = 15]){[Name="Shared Documents"]}[Content]{[Name="Data for Excel Dashboards"]}[Content],
#"Field Passed Certifications csv" = Source{[Name="Field Passed Certifications.csv"]}[Content],
#"Imported CSV" = Csv.Document(#"Field Passed Certifications csv",[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date Submit", type date}, {"Record ID", Int64.Type}, {"Employee ID", Int64.Type}, {"Observer ID", Int64.Type}, {"Certification Type", type text}, {"Employee Name", type text}, {"Observer Name", type text}}),
#"Grouped Rows" = Table.Group(#"Changed Type", {"Employee ID"}, {{"Cert Date", each List.Max([Date Submit]), type nullable date}}),
#"Renamed Columns" = Table.Buffer(Table.RenameColumns(#"Grouped Rows",{{"Employee ID", "employee id"}}))
in
#"Renamed Columns"

 

3 REPLIES 3
v-jingzhang
Community Support
Community Support

Hi @bigsky5829 

 

You have several "Merged Queries" steps by "Employee ID" column, maybe you can try tirlibibi17's method from Power Query - Merging more than 2 queries - Index issue : excel (reddit.com). Merge queries first and perform other transformation steps later. 

 

Or you may try the method from this video (How to Merge Two or More Excel Tables with Power Query - YouTube). Create only connections for base queries and load data of the result table into Excel. 

 

Best Regards,
Community Support Team _ Jing
If this post helps, please Accept it as Solution to help other members find it.

KNP
Super User
Super User

Table.Buffer can certainly help.

It can be quite a bit of guess work where to put it to improve performance.

 

It is a little difficult to tell without seeing the query dependencies (you could post a screenshot of that if you like), but I would suggest adding the Table.Buffer at the start of the queries that reference the source/stage queries.

Also, while you're testing this, remove the Table.Buffer from the source/stage queries.

 

Have I solved your problem?
Please click Accept as Solution so I don't keep coming back to this post, oh yeah, others may find it useful also ;).
chrome-9xf-Zagzel-B

If you found this post helpful, please give Kudos.
It gives me a sense of instant gratification and, if you give me Kudos enough times, magical unicorns will appear on your screen.
If you find my signature vaguely amusing, please give Kudos.
KIfp67uy-Sr
Proud to be a Super User!PBI-Super-User-Rank-30x30-1x
bigsky5829
Frequent Visitor

Unfortunately adding the table.buffer into the main query didn't help.  I added it in for the "base" file and for each file as I joined them into the query.

 

I would post a screenshot but to see everything at once, it is too zoomed out and too small.  Instead I'll post the query below and a simplified flow chart I put together.

 

Interestingly, it takes about the same time to reload if I refresh all, or if I just refresh one query.  

 

 

There are 10 of these queries in the files.  Grey boxes are sharepoint csvs.  Blue boxes are two tables in excel.  Green boxes are 20 tables in excel.  Each query is identical except they reference different green boxes.  For example, query 1 references tables 1a and 4a, query 2 references tables 1b and 4b, etc.  This so each user can open the excel file (on sharepoint) and navigate to their tab that is configured for them.

bigsky5829_0-1634843161328.png

 

let
    Source = #"employee base",
    #"Removed Columns" = Table.RemoveColumns(Source,{"Employee Action Date"}),
    #"Merged Queries6" = Table.NestedJoin(#"Removed Columns", {"District"}, District3, {"District"}, "District1", JoinKind.Inner),
    #"Added Custom2" = Table.AddColumn(#"Merged Queries6", "HireDateRange", each HireDateRange),
    #"Expanded HireDateRange" = Table.ExpandTableColumn(#"Added Custom2", "HireDateRange", {"Hire Date Start", "Hire Date End"}, {"Hire Date Start", "Hire Date End"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded HireDateRange", each [Employee Original Hire Date]>=[Hire Date Start] and [Employee Original Hire Date]<=[Hire Date End]),
    #"Added Custom3" = Table.AddColumn(#"Filtered Rows", "includeterms", each IncludeTerms),
    #"Expanded includeterms" = Table.ExpandTableColumn(#"Added Custom3", "includeterms", {"Include Terms?"}, {"Include Terms?"}),
    #"Filtered Rows1" = Table.SelectRows(#"Expanded includeterms", each if [#"Include Terms?"] = false then ([Employment Status] <> "Term") else 1=1),
    #"Merged Queries" = Table.NestedJoin(#"Filtered Rows1", {"Employee ID"}, #"damages new", {"locator_id"}, "damages new", JoinKind.LeftOuter),
    #"Expanded damages new" = Table.ExpandTableColumn(#"Merged Queries", "damages new", {"deviated_locator_at_fault", "locator_id"}, {"damages new.deviated_locator_at_fault", "damages new.locator_id"}),
    #"Added Custom" = Table.AddColumn(#"Expanded damages new", "dmg count", each if [damages new.locator_id] = null then 0 else 1),
    #"Grouped Rows" = Table.Group(#"Added Custom", {"Employee", "Employee ID", "Employee Original Hire Date", "Employment Status", "District", "Employee Termination Date"}, {{"FL Damages", each List.Sum([dmg count]), type number}, {"AF Damages", each List.Sum([damages new.deviated_locator_at_fault]), type nullable number}}),
    #"Replaced Value2" = Table.ReplaceValue(#"Grouped Rows",null,0,Replacer.ReplaceValue,{"AF Damages"}),
    #"Merged Queries1" = Table.NestedJoin(#"Replaced Value2", {"Employee ID"}, #"2021 locates hours weekly", {"Employee ID"}, "2021 locates hours weekly", JoinKind.LeftOuter),
    #"Expanded 2021 locates hours weekly" = Table.ExpandTableColumn(#"Merged Queries1", "2021 locates hours weekly", {"Closed Locates"}, {"2021 locates hours weekly.Closed Locates"}),
    #"Grouped Rows1" = Table.Group(#"Expanded 2021 locates hours weekly", {"Employee", "Employee ID", "Employee Original Hire Date", "Employment Status", "District", "Employee Termination Date", "FL Damages", "AF Damages"}, {{"Closed Locates", each List.Sum([2021 locates hours weekly.Closed Locates]), type nullable number}}),
    #"Merged Queries3" = Table.NestedJoin(#"Grouped Rows1", {"Employee ID"}, #"Post-Certification Training Form 1", {"Employee ID of the Trainee"}, "Post-Certification Training Form 1", JoinKind.LeftOuter),
    #"Expanded Post-Certification Training Form 1" = Table.ExpandTableColumn(#"Merged Queries3", "Post-Certification Training Form 1", {"ID", "Date of the Training"}, {"Post Cert Rides", "Date of the Training"}),
    #"Replaced Value1" = Table.TransformColumns(#"Expanded Post-Certification Training Form 1",{{"Post Cert Rides", each if _ = null then 0 else 1, Int64.Type}}),
    #"Grouped Rows2" = Table.Group(#"Replaced Value1", {"Employee", "Employee ID", "Employee Original Hire Date", "Employment Status", "District", "Employee Termination Date", "FL Damages", "AF Damages", "Closed Locates"}, {{"Post Cert Rides", each List.Sum([Post Cert Rides]), type number}, {"Last Post Cert Ride", each List.Max([Date of the Training]), type nullable date}}),
    #"Merged Queries4" = Table.NestedJoin(#"Grouped Rows2", {"Employee ID"}, ojts, {"employee id"}, "ojts", JoinKind.LeftOuter),
    #"Expanded ojts" = Table.ExpandTableColumn(#"Merged Queries4", "ojts", {"Number of OJTs", "Latest OJT"}, {"Number of OJTs", "Latest OJT"}),
    #"Merged Queries5" = Table.NestedJoin(#"Expanded ojts", {"Employee ID"}, certifications, {"employee id"}, "certifications", JoinKind.LeftOuter),
    #"Expanded certifications" = Table.ExpandTableColumn(#"Merged Queries5", "certifications", {"Cert Date"}, {"Cert Date"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded certifications", "quallist", each quals3),
    #"Expanded quallist" = Table.ExpandTableColumn(#"Added Custom1", "quallist", {"Qualifications"}, {"Qualifications"}),
    #"Merged Queries2" = Table.NestedJoin(#"Expanded quallist", {"Employee ID", "Qualifications"}, quals, {"EmployeeId", "QualificationPathName"}, "quals", JoinKind.LeftOuter),
    #"Expanded quals" = Table.ExpandTableColumn(#"Merged Queries2", "quals", {"Qual Status"}, {"Qual Status"}),
    #"Removed Columns1" = Table.RemoveColumns(#"Expanded quals",{"Employee ID"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns1", List.Distinct(#"Removed Columns1"[Qualifications]), "Qualifications", "Qual Status")
in
    #"Pivoted Column"

Helpful resources

Announcements
Power BI December 2021 Update_carousel 768x460.jpg

Check it Out!

Click here to read more about the December 2021 Updates!

User Group Leader Meeting January 768x460.png

Calling all User Group Leaders!

Don't miss the User Group Leader meetings on January, 24th & 25th, 2022.

Jan 2022 Dev Camp 768x460 copy.png

Power BI Dev Camp- January 27th, 2022

Mark your calendars and join us for our next Power BI Dev Camp!

Top Solution Authors