SharePoint File Retrieval Limit

Status: Needs Info
by vega Member on ‎11-20-2017 08:07 AM

Hello,

 

I posted this in the Desktop section of the forums, but I feel that this is a bug and it is something that is hindering a project that my company is working on. 

I believe that there is a limit to the number of files that Power BI can retrieve from SharePoint. When retrieving from SharePoint, Power BI retrieves from the root folder and extracts all files from the root folder and its sub-folders, without hierarchy. My company has many subfolders in our SharePoint directory, with many files in it. When querying, I notice that many files are missing when the files are listed in the query editor. I feel that the missing files are due to the large number of files that we have in this particular library. Is there a known file limit for SharePoint?

 

Thanks,

Status: Needs Info
Comments
by v-haibl-msft Super Contributor
on ‎11-21-2017 07:03 PM

@vega

 

I'm also not able to repro this issue on my side. Are you able to see the "missing" files with OData Feed connector? The query should be like below and it will list all the files in your sharepoint folder.

 

let
    Source = OData.Feed("https://***.sharepoint.com/_vti_bin/listdata.svc"),
    Documents_table = Source{[Name="Documents",Signature="table"]}[Data]
in
    Documents_table

 

Best Regards,
Herbert

by Vicky_Song Established Member
on ‎11-21-2017 07:03 PM
Status changed to: Needs Info
 
by hermesgoddess Frequent Visitor
on ‎02-08-2018 10:02 AM

Hi. 

 

I am having the same issue. I have a large volume of files within my Sharepoint Hierarchy. When I pull in new files with new data, I *need* to be able to locate the new file source file from that initial drop-down list of files so that I can define the header rows (since they vary depending on my CVS data source). I just created three new data source files, uploaded to SharePoint, and they are not appearing in that list of example files to set up my header rows & source data info.

 

I checked my OData Feed list of documents. The newest files DO show up here. 

(my newest files)

Newest files in OData Feed List.PNG

I only have 462 rows (files or folders). That doesn't seem like it should be too much. And such a random number to be "maxed out" at. Doesn't make sense to me. You'll notice there are duplicates of my files currently in my folders. The first iteration was saved as an XLSX file since the files themselves are pulling from an outside add-in, which can be refreshed from within Excel. I wasn't sure if I would break that feed in Excel by converting the file to CSV. But when I couldn't see the files saved as XLSX files, I converted to CSV becasue all of my other files in SharePoint are CSV right now. Didn't help though. I cannot see either the XLSX files or the CSV files in my "Example Files" drop-down list. 

by edhans Senior Member
‎02-11-2018 08:00 AM - edited ‎02-11-2018 08:02 AM

The fix is to change the source statement.

 

When you originally create the link, PowerQuery will generate a source line like this:

 

= SharePoint.Files("https://****.sharepoint.com/sites/SiteName", [ApiVersion = 15])

 

Remove the ApiVersion part that is highlighted. It should now read the entire library, which might take time depending on the file count. With the ApiVersion in place, I think it limits it to 5,000 files, and I don't know which 5,000 files. It certianly isn't the most recent 5,000 files as new files are usually skipped.

 

= SharePoint.Files("https://****.sharepoint.com/sites/SiteName")

 

I wish it would always read in the most recent 5,000 files as that is what most people would be reporting over.

 

Or better yet, let us target specific folders instead of having to start at the root of the Sharepoint library and filter down to specific folders.

 

by Jaap_Olsthoorn Member
on ‎06-06-2018 07:08 AM

Yes please, let us start at a particular folder! Our sharepoint has way too many files to not use the "apiversion" bit, but that never gets me the files I need.

by RJacob Frequent Visitor
on ‎07-12-2018 03:06 PM

I am obsering the same issue. When there are a large number of files in SharePoint, Power BI is unable to find files.

by edhans Senior Member
on ‎07-15-2018 07:33 AM

@Jaap_Olsthoorn - I'm afraid you only have two options:

  1. Remove the APIVersion string. That is ok up to 15,000-20,000 files, but even then that takes a LONG time to retrieve the data.
  2. Archive files. We've done this for some of our sites. We create an "Accounting - Archive" site for the "Accounting" sharepoint site, and just mass archive a ton of older files that we rarely access any longer. Because many of our folders are by year, it makes it somewhat easy to do, but it would be a HUGE pain for sites where that isn't the case.

The real fix would be for the Power BI and Sharepoint teams to get together and let us directly target a specific folder in the Get Data screen instead of having to start at the root every time.

 

by Jmenas Member
on ‎08-09-2018 03:05 AM

Hi,

 

I agree with what @edhans, skipping the root folder would be way more efficient. Not even filtering by type works than this a big issue.

 I tried removing the API string but that didn't work.


Hopefully, it is fixed soon! 

Best,
J.

 

by 1tcook Occasional Visitor
on ‎09-05-2018 10:02 AM

This could be due to a limitation in SharePoint called the List View Threshold. Even though you can store millions of rows of data in a list there is a 5.000 item limit of what Sharpoint will show in one view. A document library is a specialized list.

by edhans Senior Member
on ‎09-05-2018 10:21 AM

Then how come I can go to a Sharepoint site in Office 365 today, click "Export to Excel" and a Query.IQY file gets downloaded and will read in the file name, path, last user to modify, item type (file or folder), file size, and last edit date for over 13,500 files in under 2 minutes? 

Power Query would choke on that. It should use the same logic to get the files we need. THe IQY file is doing this:

 

<LIST><VIEWGUID>XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX</VIEWGUID><LISTNAME>{XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX}</LISTNAME><LISTWEB>https://OurSite.sharepoint.com/sites/SalesArchive/_vti_bin</LISTWEB><LISTSUBWEB></LISTSUBWEB><ROOTFOLDER>undefined</ROOTFOLDER></LIST>

I modified the string a bit to hide our company specific info.

 

Frustrating we can get this into Excel so fast, but Power Query (Excel or Power BI) won't read it in near that fast nor complete.