Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
androo
Advocate II
Advocate II

Power Query Scraping Sharepoint Folder - Most efficient Filtering Strategy

I fairly frequently have need to scrape data from multiple (usually Excel) files stored in sharepoint libraries. Unfortunately most of our sharepoint libraries were created by simply porting over the old fileshare structures. The result is that the path lengths are often very long. As it's beyond my powers to get the structures flattened any time soon I just have to live with them. So, I was wondering if anyone can tell me what might be the best filtering strategy.

 

For example. If this is the root:

let
Source = SharePoint.Files("https://officesharedservice.sharepoint.com/sites/hfs/finance/", [ApiVersion = 15]),

 

Is filtering to the target folder in one line like this,

 

#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Folder Path], "Divisions/Finance and Resources/Income/Thing/Things Up/1234.56/GEN/"))

 

Faster or slower than,

 

#"Filtered Rows" = Table.SelectRows(Source, each Text.Contains([Folder Path], "Divisions/")),
#"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.Contains([Folder Path], "Finance and Resources/")),
#"Filtered Rows2" = Table.SelectRows(#"Filtered Rows1", each Text.Contains([Folder Path], "Income")),
#"Filtered Rows3" = Table.SelectRows(#"Filtered Rows2", each Text.Contains([Folder Path], "Thing")),
#"Filtered Rows4" = Table.SelectRows(#"Filtered Rows3", each Text.Contains([Folder Path], "Things Up")),
#"Filtered Rows5" = Table.SelectRows(#"Filtered Rows4", each Text.Contains([Folder Path], "1234.56")),
#"Filtered Rows6" = Table.SelectRows(#"Filtered Rows5", each Text.Contains([Folder Path], "GEN"))

 

Other strategies might be to have one filter line but with multiple "and contains" clauses, or, just one filter for say the last ten characters of the path needed. Anyone know what is likely to work best (i.e. fastest), or if it makes no difference what I do?

 

Of course it would be much better if we could just enter the target in full in the first source statement, but, for sharepoint, apparently not possible.

 

 

 

1 ACCEPTED SOLUTION

Hi @androo ,

You can refer this blog about monitoring the runningtime of each query step to distinguish:

Timing Power Query Queries 

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

6 REPLIES 6
androo
Advocate II
Advocate II

Thanks for that. I was hoping for something more robustly helpful and everyday useful, a rule of thumb even. But, I can see that this is a question that goes to the heart of how power query works, and, unfortunately, it is a black box. It looks like that's the best answer I'm going to get though.

androo
Advocate II
Advocate II

I can see that logic, but, on the other hand searching for the entire path in one go essentially means looking in every room of every building in the city until you find the right one. On the other hand searching for the path in separate chunks, "what neighbourhood", what street", "what floor is it on", what wing of the building, north, south east or west, etc etc, narrows the search each time and seems to me could be as fast or faster. Can someone provide a technically reasoned theory as to which is the better approach? 

Hi @androo ,

You can refer this blog about monitoring the runningtime of each query step to distinguish:

Timing Power Query Queries 

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

androo
Advocate II
Advocate II

Ok, I can see that that might make it a little quicker to build new queries of this kind, but, what I'm asking isn't that. I'm asking what filtering approach works fastest (say the whole path or part of the path in one filter, or multiple filters that might run faster as the code narrows the search), or, if it makes no difference at all.

Hi @androo ,

The whole path parameter could be faster than more single parts of path parameter because each query statement would send a query to the source, more queries, more requests, I think.

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

v-yingjl
Community Support
Community Support

Hi @androo ,

You can use create a new query to extract the last 10 characters of the path by Text.end() function, then create a parameter to use the list query, filter the source table using the parameter:

parameter.pngfilter.png

You can refer this viedeo which introduces create a parameter from a query:

Populate a Power BI parameter list using a query 

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors