Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
ilyasss
Helper II
Helper II

Sharepoint refresh very very slow - optimized the query

Hello guys, 

 

Hope you are well. 

 

I have a query to retrieve some data from an excel file on sharepoint. It works but it's really really slow to update for a small file : 5 minutes for a file with less than 4000 rows and 6 columns. I get the impression the issue it's how PBI goes to the file with the function Sharepoint.contents... 

 

Do you have any idea to optimise the query code below ? Many thanks  

 

 

 

let
    Source = SharePoint.Contents("https:XXXX.sharepoint.com/sites/"),
    Y = Source{[Name="Y"]}[Content],
    P = Y{[Name="W"]}[Content],
    Z = P{[Name="Z"]}[Content],
    #"Z" = Z{[Name="Z"]}[Content],
    #"U" = #"Z"{[Name="U"]}[Content],
    #"Filtered Hidden Files1" = Table.SelectRows(#"Alert", each [Attributes]?[Hidden]? <> true),
    #"Filtered Hidden Files2" = Table.SelectRows(#"Filtered Hidden Files1", each [Attributes]?[Hidden]? <> true),
    #"Filtered Hidden Files3" = Table.SelectRows(#"Filtered Hidden Files2", each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files3", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Renamed Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Table Column1",{"Content", "Source.Name", "Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path"}),
    #"Removed Blank Rows" = Table.SelectRows(#"Removed Columns", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    #"Filtered Rows" = Table.SelectRows(#"Removed Blank Rows", each ([DateTime] <> null)),
    #"Changed Type" = Table.TransformColumnTypes(#"Filtered Rows",{{"DateTime", type date}}),
in
    #"Filtered Rows1"

 

 

 

9 REPLIES 9
edhans
Super User
Super User

SharePoint.Contents is definitly much faster than SharePoint.Folder, so you are on the right track there.


It looks like you've been modifying and copyin M code though. For example, you are filtering hidden files 3 times.

    #"Filtered Hidden Files1" = Table.SelectRows(#"Alert", each [Attributes]?[Hidden]? <> true),
    #"Filtered Hidden Files2" = Table.SelectRows(#"Filtered Hidden Files1", each [Attributes]?[Hidden]? <> true),
    #"Filtered Hidden Files3" = Table.SelectRows(#"Filtered Hidden Files2", each [Attributes]?[Hidden]? <> true),

That is the same thing, every time.

And I don't know what the custom function being invoked is. I know it is related to combining files, but that could be your issue.

 

Honestly, as simple as the remaining transformations are, I'd start over with a clean connection and simple combine operation. And you don't need to do the complex combine if you are always going for the same file, but you didn't specify that in your post - if the file name changes, or if you are combining multiple files. If it doesn't change, just drill down to the folder and click on the file itself, not the COMBINE button.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Hello @edhans ,

 

Thank you for help. 

 

Well spotted, the hidden filters have been added by the invoke functio so I didn't want to touch them. Finally I delete two of them and it didn't change anything in terms of performance. 

 

I used a invoke function because in the future, I expecte more than one file in this folder so I prefer to use the invoke function now. Nevertheless, I tried without combining and the performances are similar. 

 

I noticed that PBi downloads 38 MB of data whereas the file is only 900 KB. Does it seem normal to you ? Furthermore, even at the first steps, for example Source or Navigation, it takes ages to PBi to get the information about % of cell empty, errors... which is basic info and the file is small so getting this should be fast but it takes 1-2 minutes.  Then I really get the impression that the issue it's how the PBI get the file in sharepoint... 

 

 

Thank you very much.

 

Hi @ilyasss 

 

Thanks for the explanation on why you are using combine. That is good forward thinking. I often "combine" for the same reasons even if only one file today. 😁

 

But there is something else odd about your code. Look at the first lines.

 

let
    Source = SharePoint.Contents("https:XXXX.sharepoint.com/sites/"),
    Y = Source{[Name="Y"]}[Content],
    P = Y{[Name="W"]}[Content],
    Z = P{[Name="Z"]}[Content],
    #"Z" = Z{[Name="Z"]}[Content],
    #"U" = #"Z"{[Name="U"]}[Content],

 

What is all of that intending to do? Are those your folder names? It seems a bit odd. Normally the first place you navigate to is Shared Documents.

 

Below is how the normal navigation process happens when you use SharePoint.Contents(), right before you either click on or combine files.

 

let
    Source = SharePoint.Contents("https://xxxxx.sharepoint.com/sites/GNDN", [ApiVersion = 15]),
    #"Shared Documents" = Source{[Name="Shared Documents"]}[Content],
    General = #"Shared Documents"{[Name="General"]}[Content],
    GNDN = General{[Name="GNDN"]}[Content]
in
    GNDN

 

 

As to your comment why is Power BI pulling 38MB of data for a 900KB file, SharePoint data pulls are extremely inefficient, but SharePoint.Contents() mitigates that. However, it isn't magical. It still has to navigate through the folders you select, and if some of your folders have thousands of files, Power BI will read in ALL of the file names before moving to the next folder. It doesn't just move to the end folder like you'd think it would.

 

How many files are in those folders? I think that might be the answer to the issue here.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

The same thing happens to me and I think your answer is logical since I load in a one drive that I have loaded with many files my code is the following and it takes forever.

Let
Source = SharePoint.Files("https://xxx/sites/xxx", [ApiVersion = 15]),
#"Custom added" = Table.AddColumn(Source, "Custom", each Excel.Workbook([Content])),
#"Filtered Rows1" = Table.SelectRows(#"Custom added", each ([Name] <> "DimDepartments.xlsx" and [Name] <> "DimTiendas.xlsx")),
#"Custom expanded" = Table.ExpandTableColumn(#"Filtered Rows1", "Custom", {"Name", "Data", "Item", "Kind", "Hidden"}, {"Name.1", "Data", "Item", "Kind", "Hidden"}),
#"Filtered Rows" = Table.SelectRows(#"Custom Expanded", each ([Item] = "INFORMATION LOAD")),
#"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Kind", "Hidden", "Extension", "Date accessed", "Date modified", "Date created", "Attributes", "Folder Path", "Content", "Name", "Name.1", "Item"}),
#"Data expanded" = Table.ExpandTableColumn(#"Removed columns", "Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21"}, {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21"}),
#"Top Rows Removed" = Table.Skip(#"Data Expanded",12),
#"Promoted Headers" = Table.PromoteHeaders(#"Top Rows Removed", [PromoteAllScalars=true]),
#"Filtered Rows1 Important" = Table.SelectRows(#"Promoted Headers", each ([Status] = "Square" or [Status] = "Missing" or [Status] = "Unassociated Leftover")),
#"Renamed Columns" = Table.RenameColumns(#"Filtered Rows1 Important",{{"STORE", "DimStore"}}),
#"Columns Removed1 Not Needed" = Table.RemoveColumns(#"Renamed Columns",{"Column1", "Model", "Required", "Checked", "Cost", "Observations", "DAMAGED COMPARISON", "DD", "Month"}),
#"Filtered Rows1 Removing Voids" = Table.SelectRows(#"Removed Columns1 Not Needed", each ([Status] <> null and [Status] <> "" and [Status] <> "X")),
#"Aggregate Conditional Column" = Table.AddColumn(#"Filtered Rows1 Removing Voids", "SKU Status", each if [Status] = "Unassociated Leftover" then "Non-Associated SKU" else "Associated SKU"),
#"Changed Type" = Table.TransformColumnTypes(#"Conditional Column Added",{{"License", Type Text}, {"Sku", Int64.Type}, {"Description", Type Text}, {"Difference", Int64.Type}, {"Damaged Units", type any}, {"Status", type text}, {"Monetary Difference", type number}, {"Date", type date}, {"Performed By", type text}, {"Multiply Squares", type number}, {"Multiplie Damaged", Int64.Type}, {"DimTienda", type text}, {"SKU Status", type text}})
in
#"Changed type"

I can change the SharePoint.Files function to SharePoint.Contents?

Hello @edhans

 

Thank you. 

 

I corrected the query a little bit. 

let
    Source = SharePoint.Contents("https:XXXX.sharepoint.com/sites/DocumentCenter"),
    Y = Source{[Name="Y"]}[Content],
    P = Y{[Name="W"]}[Content],
    Z = W{[Name="Z"]}[Content],
    #"Z" = Z{[Name="Z"]}[Content],
    #"U" = #"Z"{[Name="U"]}[Content],

First it goes to the Document Center. Is it the same as the shared document ? 

After that, it goes through the different folders to get the correct file.

 

I would say that each folder containes a dozen folders/files, which doesn't seem to be a lot... But doing it 4 times ( folder within a folder within a folder...) to get the right file could explain the slow performances of this query. 

 

By any chance, is there a way to link the query directly to the right folder to avoid the previous steps excluding moving the folder  ? 

 

Many thanks, 

 

 

DocumentCenter is your site name. 

 

THe Y is the first folder, which just seems strange to me. But not sure how your folders are named. No, you cannot go straight to a folder. You have to start at the top, and navigate down unfortunately.

 

You could test the speed theory by copying that file to the top level folder and see if it is any faster. Then make a decision to permenantly move or deal with the speed.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
nandukrishnavs
Super User
Super User

@ilyasss  Are you using the SharePoint folder connector? If not, use that connector.



Did I answer your question? Mark my post as a solution!
Appreciate with a kudos
🙂


Regards,
Nandu Krishna

Hi @nandukrishnavs

 

Thank you.  I have tried with this but it's slower ( 15-20 minutes) because the sharepoint folder connector scans every files on sharepoint and there is hundreds of thousands of files on this sharepoint... It's why I decided to use the sharepoint.Contents function which is faster in this case. 

Since you have so many files to filter through (not sure if query folding happens with Sharepoint connectors), did you try to connect directly to the Excle file with the Web connector?  The link shows how to do it with a OneDrive file but the process is the same with Sharepoint.  Although, if you are writing queries like that, I'm guessing you already know about this approach.

 

https://www.c-sharpcorner.com/article/connect-to-onedrive-excel-file-from-power-bi-desktop/

 

If this works for you, please mark it as solution.  Kudos are appreciated too.  Please let me know if not.

Regards,

Pat

 

 

 

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.