cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
hjaf
Advocate I
Advocate I

SharePoint list query alternative or optimization

Hello everyone!

 

Because sharepoint list queries quickly comes too slow to work with in Power BI, I have experimented with the query that sharepoint generates for excel. This is vastly more effective in Excel compared to getting the same information using the sharepoint query in Power BI. I think the excel export query is basically a query of the list view, where all the lookup values are flattened.  I can't seem to find the way to recreate this query in Power BI, It looks like its using a OLE DB method, but not sure how to continue from there, and some posts suggest that this is not supported by PowerBI.
Connectionstring for excel: "Provider=Microsoft.Office.List.OLEDB.2.0;Data Source="";ApplicationName=Excel;Version=12.0.0.0"
Anyone have experience dealing with this?

 

I think the reason for queries to sharepoint-resources become extremely slow, is due to all the lookup and choice columns, to get the value of the column I have to expand them. An alternative to exploring these view queries is optimizing the queries, suggestions are most welcome! 😄

1 ACCEPTED SOLUTION
mahoneypat
Super User
Super User

@hjaf FYI that I finally made a video to describe this approach, and am adding it here for others that may find this post.  It also gets the count of items and makes the right number of API calls.

Get SharePoint List Data with Power BI ... Fast - YouTube

 

Also, a reminder to mark one of these as the solution.

 

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


View solution in original post

15 REPLIES 15
CmdrKeene
Helper IV
Helper IV

I'm loving this approach to use the paged response to get 5000-item chunks of list items quickly, but I'm wondering if anyone has gotten filters working with that in the same query? 

I realize I can retrieve all items and then filter in the query editor, but I'd love to get a pre-filtered output from sharepoint.

I tried adding $filter=Modified ge datetime'2021-05-08T09:59:32Z' to my URL string and at first I thought it was working, but then I realized my final results were showing a lot of duplication (the same items appearing many times). I think maybe the combination of filtering and pagination is causing the issue, but I'm not sure.

 


CmdrKeene
Richard_VTR
New Member

Hi @mahoneypat ... I am using your approach to get the items using REST API and it works really fast. But I have published the file to the power bi service and I cannot set the schedule refresh because I have this error: "You can't schedule refresh for this dataset because the following data sources currently don't support refresh".

 

Do you know how to overcome this problem?

 

Thanks

Someone in this forum said to look up a solution here: https://blog.crossjoin.co.uk/2016/08/16/using-the-relativepath-and-query-options-with-web-contents-i...

 

I did so, and after reading a number of posts on the subject I was able to get it working in PowerBI online service.  I won't bore you with a lot of details, here's just a simple modified M Code you need to use instead.

 

What this does is uses a static URL for the first parameter of the Web.Contents() function, then uses a little known (apparently) 2nd parameter to actually pass the rest of the URL (relative path) and query/value pairs into the query string.

 

You should be able to just modify the bold parts of this and be ready to go.  I modified this from the original on the blog post and it even has the original commented-out bits. You can remove those if you want. 

 

I actually hard-coded most of the paths instead of using baseurl or other variables. I probably could have still used them but I wanted immediate success 🙂

 

This has been an aboslute lifesaver to me. Now my list with 120K list items only takes 15 seconds to refresh.

 

 

let
baseurl = "https://TENANT.sharepoint.com/sites/SITE/_api/web/lists/GetByTitle('LIST')/",
itemcount = Json.Document(Web.Contents("https://TENANT.sharepoint.com/sites/SITE/_api/web/lists/GetByTitle('LIST')/"&"ItemCount", [Headers=[Accept="application/json"]]))[value],
skiplist = List.Numbers(0, Number.RoundUp(itemcount/5000), 5000),
#"Converted to Table" = Table.FromList(skiplist, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
#"Renamed Columns" = Table.RenameColumns(#"Converted to Table",{{"Column1", "Skip"}}),
#"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{"Skip", type text}}),
fieldselect = "&$top=5000", // all fields with no expansion
//fieldselect = "&$top=5000&$select = Id,Title,Person,Date", // list desired fields (no expansion)
//fieldselect = "&$top=5000&$select=Id,Title,Choice,LookupColumn/Title,LookupColumn/Project,LookupColumn/ProjectStatus,Date,Person/LastName,Person/FirstName,Person/EMail&$expand=LookupColumn,Person",
//Custom1 = Table.AddColumn(#"Changed Type", "Items", each Json.Document(Web.Contents("baseurl" & "/items?$skipToken=Paged=TRUE%26p_ID=" & [Skip] & fieldselect, [Headers=[Accept="application/json"]]))),

Custom1 = Table.AddColumn(#"Changed Type", "Items", each Json.Document(Web.Contents(
"https://TENANT.sharepoint.com/sites/SITE/_api/web/lists/GetByTitle('LIST')/",
[
RelativePath="/items?$skipToken=Paged=TRUE%26p_ID=" & [Skip] & fieldselect,
Headers=[Accept="application/json"]
]
)
)
),
#"Expanded Items" = Table.ExpandRecordColumn(Custom1, "Items", {"value"}, {"value"}),
#"Expanded value" = Table.ExpandListColumn(#"Expanded Items", "value")
in
#"Expanded value"

 

 


CmdrKeene

Glad you found that approach.  I was about to post a link to same article.

Pat

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


mahoneypat
Super User
Super User

@hjaf FYI that I finally made a video to describe this approach, and am adding it here for others that may find this post.  It also gets the count of items and makes the right number of API calls.

Get SharePoint List Data with Power BI ... Fast - YouTube

 

Also, a reminder to mark one of these as the solution.

 

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


View solution in original post

Thank you, your video helped me. After I moved to version 2 everything was faster. MicrosoftTeams-image.png

If you think version 2 is fast, wait until you try the "good" method using the REST API.  I have a list with more than 150,000 items, and it refreshes in about 8 seconds.

 


CmdrKeene

Thank you times a bazillion.


CmdrKeene
mahoneypat
Super User
Super User

Sharepoint List data sources can be slow.  Fortunately, there is a much faster way.  Try a blank query with this formula as your Source.

= Json.Document(Web.Contents("https://<YourTenantName>.sharepoint.com/sites/<YourSiteName>/_api/web/lists/GetByTitle('<YourListNam...", [Headers=[Accept="application/json"]]))

 

Replace all the parts in < >, including the < >.  You will get a JSON response.

 

There is another version that does pagination if your list is >5000 items.  Please let me know if you need that one.

 

If this solution works for you, please mark it as the solution.  Kudos are appreciated too.  Please let me know if not.

Regards,

Pat

 

 

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


@mahoneypat Awesome!
yes, I most definately need to use pagination 🙂 near 20k items in the lists 🙂 

Replace after the ? with the following

 

?$skipToken=Paged=TRUE%26p_ID=30&$top=5000", [Headers=[Accept="application/json"]]))

 

I would make a list with = {0, 5000, 10000, 15000, 20000}  or something more dynamic for when the list gets bigger.  Convert that to a table and add a custom column that concatenates the list value in place of the 30 in red text above.  Then expand the table to get all your data.

 

Sharepoint Lists can be slow.  This approach has saved much refresh time.

 

If this works for you, please mark it as solution.  Kudos are appreciated too.  Please let me know if not.

Regards,

Pat

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


@mahoneypat  you are awesome!

I have tried replying and asking for guidance only to have my post marked as spam. But I eventually came up with a solution. Do you concur with this? Quote or correct it, then I'll mark mark your reply as a solution so that other people may get the whole picture 🙂

The query i ended up at that seems to be working(replaced RED values):

  • Column1 values is created with:  List.Generate(() => 0each _ < 120000each _ + 5000)
  • SPItems: Json.Document(Web.Contents("https://TennantShortName.sharepoint.com/sites/SiteName/_api/web/lists(guid'ListGUID')/items?$skipToken=Paged=TRUE%26p_ID="&Text.From([Column1])&"&$top=5000", [Headers=[Accept="application/json"]])))


Complete commented query:

 

let
    Source = List.Generate(() => 0, each _ < 120000, each _ + 5000),  // Generate a list that increments 5000 up to max value 120 000
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error), // Convert the list into a table
    #"Added Custom" = Table.AddColumn(#"Converted to Table", "SPItems", each Json.Document(Web.Contents("https://<TennantShortName>.sharepoint.com/sites/<SiteName>/_api/web/lists(guid'<ListGUID>')/items?$skipToken=Paged=TRUE%26p_ID="&Text.From([Column1])&"&$top=5000", [Headers=[Accept="application/json"]]))), //custom column that does the actual query to sharepoint 
    //Note: replace <TennantShortName>, <SiteName> and <ListGUID> 
    //Instead of using list guid, you can use list names with GetByTitle(): "https://<TennantShortName>.sharepoint.com/sites/<SiteName>/_api/Web/Lists/GetByTitle('<List Title>')/items?$skipToken=Paged=TRUE%26p_ID="&Text.From([Column1])&"&$top=5000"
    #"Expanded SPItems" = Table.ExpandRecordColumn(#"Added Custom", "SPItems", {"odata.metadata", "odata.nextLink", "value"}, {"odata.metadata", "odata.nextLink", "value"}), //expand the results 
    #"Removed Duplicates1" = Table.Distinct(#"Expanded SPItems", {"odata.nextLink"}), // Remove the duplicated nextLink items to get unique items
    #"Expanded value" = Table.ExpandListColumn(#"Removed Duplicates1", "value"), // Expand the results from queries into new rows
    #"Expanded value1" = Table.ExpandRecordColumn(#"Expanded value", "value", {"Id", "Title"}), // Expand wanted columns in the sharepoint list
    #"Removed Columns" = Table.RemoveColumns(#"Expanded value1",{"Column1", "odata.metadata", "odata.nextLink"}),  // Remove columns from the initial query.
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Id", Int64.Type}, {"Title", type text}}) // Type setting
in
    #"Changed Type"

 




That looks good to me. I'm glad it works for you. I'm curious, how much of an upgrade time improvement did you see?

I wanted to write a blog with this hoosierbi.com (my blog about making Power BI pro bono with non-profit benefits). Your question exactly on this subject. I ended up doing a query version and function of it which makes it easier to modify/use. The function could be used if there were multiple lists that had the same columns and one had a tenant table, site, list (tenant, of course, would not change within a company).

Here they are:

As a function -

Leave

Origin (tenant name, name, name, and so on) >

Leave

site : site name,

tenant - tenant's name,

list : list name,

getdata á Json.Document(Web.Contents("https://" & tenant & ".sharepoint.com/sites/" & site & "/_api/web/lists/GetByTitle('" & list & "')/items?$top-5000", [Headers-[Accept"application/json"]]))

In

Getdata

In

Source

As a consultation

Leave
Source?

Leave

"NameOfMySite" website,

tenant : "NameOfMyTenant",

list of names "NameOfMyList",

getdata á Json.Document(Web.Contents("https://" & tenant & ".sharepoint.com/sites/" & site & "/_api/web/lists/GetByTitle('" & list & "')/items?$top-5000", [Headers-[Accept"application/json"]]))

In

Getdata
In
Source

If this works for you, mark it as a solution. Praise is also appreciated. Please let me know if you don't.

Best regards

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


I probably did a lot sub-optimal steps in the previously used standard method, so I went from literally 3-4 hours, down to less than 3 minutes! But even just getting the raw data with the traditional query method still took 1+ hour, I believe its due to a lot of choice-columns and lookups in the list.

 

It would be interesting to create a function that determines the maximum ID and automatically get all the items. I think there is some room for further optimizing, regarding the overlapping queries this method produces. because the first 5k items have IDs ranging from 5k to 80k means that the first 16-ish queries will basically return the same data, but anyways. going down from hours to mere minutes is a giant leap I am very satisfied with 😄 Thank you again @mahoneypat

PS Updates to previous query: I realized did not clear out all the duplicates, so I added an additional duplication removal on IDs. I also put tennantId, sitename and list into parameters which made it a bit easier to configure:) 

Ok, this is promising! 

 

I successfully got the first 5000 item in a breeze, however, the way you suggest iterating / paging the query is somewhat unclear to me.
In my case, the lowest ID starts at 5495, item # 5000 has ID around 80k. The "density" of the id range varies because of creations and deletions over time, and that sharepoint doesn't immediately re-use ID's. I also noticed that sharepoint provides an odata.nextLink value for the next 5k items, maybe I can somehow  create an iteration that continues until this property is not appearing. Right now this appears to happen at about 120 000 (even though the item list contains just around 20k items).

 

I can probably use a list as you suggested and have the list stop at 200 000 in 5000 increments, but not exactly sure how to make a query that iterates around the this list. can you provide an example?

PS:
I had to add "/items?" in the uri so the current uri for web.contents() in my case is as follows, could you use these uris in your answer so that when I tag the post as a solution, other people would have a better understanding 🙂


= Json.Document(Web.Contents("https://<TennantShortName>.sharepoint.com/sites/<SiteShortName>/_api/Web/Lists/GetByTitle('<List Title>')/items?%24skiptoken=Paged%3dTRUE%26p_ID%3d<StartAtId>&%24top=5000" , [Headers=[Accept="application/json"]]))

or using list guid the url looks like this :
https://<TennantShortName>.sharepoint.com/sites/<SiteShortName>/_api/web/lists(guid'<item list guid>')/items?%24skiptoken=Paged%3dTRUE%26p_ID%3d<StartAtId>&%24top=5000

Helpful resources

Announcements
UG GA Amplification 768x460.png

Launching new user group features

Learn how to create your own user groups today!

November Power BI Update 768x460.png

Check it Out!

Click here to read more about the November 2021 Updates!

M365 768x460.jpg

Microsoft 365 Collaboration Conference | December 7–9, 2021

Join us, in-person, December 7–9 in Las Vegas, for the largest gathering of the Microsoft community in the world.