cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
androo
Frequent Visitor

Power Query Scraping Sharepoint Folder - Most efficient Filtering Strategy

I fairly frequently have need to scrape data from multiple (usually Excel) files stored in sharepoint libraries. Unfortunately most of our sharepoint libraries were created by simply porting over the old fileshare structures. The result is that the path lengths are often very long. As it's beyond my powers to get the structures flattened any time soon I just have to live with them. So, I was wondering if anyone can tell me what might be the best filtering strategy.

 

For example. If this is the root:

let
Source = SharePoint.Files("https://officesharedservice.sharepoint.com/sites/hfs/finance/", [ApiVersion = 15]),

 

Is filtering to the target folder in one line like this,

 

#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Folder Path], "Divisions/Finance and Resources/Income/Thing/Things Up/1234.56/GEN/"))

 

Faster or slower than,

 

#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Folder Path], "Divisions/")),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Folder Path], "Finance and Resources/")),
#"Filtered Rows2" = Table.SelectRows(#"Filtered Rows1", each Text.Contains([Folder Path], "Income")),
#"Filtered Rows3" = Table.SelectRows(#"Filtered Rows2", each Text.Contains([Folder Path], "Thing")),
#"Filtered Rows4" = Table.SelectRows(#"Filtered Rows3", each Text.Contains([Folder Path], "Things Up")),
#"Filtered Rows5" = Table.SelectRows(#"Filtered Rows4", each Text.Contains([Folder Path], "1234.56")),
#"Filtered Rows6" = Table.SelectRows(#"Filtered Rows5", each Text.Contains([Folder Path], "GEN"))

 

Other strategies might be to have one filter line but with multiple "and contains" clauses, or, just one filter for say the last ten characters of the path needed. Anyone know what is likely to work best (i.e. fastest), or if it makes no difference what I do?

 

Of course it would be much better if we could just enter the target in full in the first source statement, but, for sharepoint, apparently not possible.

 

 

 

1 ACCEPTED SOLUTION
v-yingjl
Community Support
Community Support

Hi @androo ,

You can refer this blog about monitoring the runningtime of each query step to distinguish:

Timing Power Query Queries 

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

6 REPLIES 6
androo
Frequent Visitor

Thanks for that. I was hoping for something more robustly helpful and everyday useful, a rule of thumb even. But, I can see that this is a question that goes to the heart of how power query works, and, unfortunately, it is a black box. It looks like that's the best answer I'm going to get though.

androo
Frequent Visitor

I can see that logic, but, on the other hand searching for the entire path in one go essentially means looking in every room of every building in the city until you find the right one. On the other hand searching for the path in separate chunks, "what neighbourhood", what street", "what floor is it on", what wing of the building, north, south east or west, etc etc, narrows the search each time and seems to me could be as fast or faster. Can someone provide a technically reasoned theory as to which is the better approach? 

v-yingjl
Community Support
Community Support

Hi @androo ,

You can refer this blog about monitoring the runningtime of each query step to distinguish:

Timing Power Query Queries 

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

androo
Frequent Visitor

Ok, I can see that that might make it a little quicker to build new queries of this kind, but, what I'm asking isn't that. I'm asking what filtering approach works fastest (say the whole path or part of the path in one filter, or multiple filters that might run faster as the code narrows the search), or, if it makes no difference at all.

v-yingjl
Community Support
Community Support

Hi @androo ,

The whole path parameter could be faster than more single parts of path parameter because each query statement would send a query to the source, more queries, more requests, I think.

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

v-yingjl
Community Support
Community Support

Hi @androo ,

You can use create a new query to extract the last 10 characters of the path by Text.end() function, then create a parameter to use the list query, filter the source table using the parameter:

parameter.pngfilter.png

You can refer this viedeo which introduces create a parameter from a query:

Populate a Power BI parameter list using a query 

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group

March Update

Check it Out!

Click here to read more about the March 2021 Updates!

secondImage

The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.

secondImage

Experience what’s next for Power BI

See the latest Power BI innovations, updates, and demos from the Microsoft Business Applications Launch Event.

Top Solution Authors
Top Kudoed Authors