cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Anonymous
Not applicable

Handling a Paginated API with Prebuilt Next Page URI in Response [EDITED]

I am trying to access data from the Asana API.

 

When making a call to the api for say, "projects," 

GET /projects?limit=5&workspace=xxxxxxx

Or in PBI, 

Json.Document(Web.Contents("https://app.asana.com/api/1.0" & "/projects?" & "limit=5" & "&workspace=xxxxxxx", [Headers=[#"Content-Type"="application/json", Authorization="Bearer somePAT"]]))

The api will respond with a list of projects and their attributes, along with a "next_page" object like so: 

{
  "data": [
    {
      "gid": "12345678910111212",
      "name": "Project 1",
      "resource_type": "project"
    },
    {
      "gid": "12345678910111213",
      "name": "Project 2",
      "resource_type": "project"
    }

.
.
.
], "next_page": { "offset": "someSequence", "path": "/projects?limit=5&workspace=xxxxxxxx&offset=someSequence", "uri": "https://app.asana.com/api/1.0/projects?limit=5&workspace=xxxxxxxx&offset=someSequence" } }

Obviously I want to extract the list of projects, but since pagination is compulsory with a maximum of 100 records per call, I need a way to successively make calls based on the contents of "next_page." The "uri" is a fully formed api, so theres no need to do any calculations with offsets or limits. I just need to be able to access the projects, pull the next uri, and then make another call using that next uri. This needs to happen until a "next_page" is no longer returned in the response.

 

I've done a decent bit of research and have yet to find a workable solution.

 

What's the best way to go about this in Power BI? Any tips or useful code snippets or references to appropriate documentation would be greatly appreciated.

 

=================================================

EDIT:

I have since played around a bit and am trying to use List.Generate to solve my problem. In pseudocode, this is my intention

records = {set of records returned from first api call}
while that same call hasNextPage():
    build the next api call from the original uri 
    append the returned records to our existing set of records

 

Here's what I've come up with so far

let
    uri = "https://app.asana.com/api/1.0/projects?limit=100&workspace=xxxxxxx",
    headers = [Headers=[#"Content-Type"="application/json", Authorization="somePAT"]],

    //Loop over pages
    alldata = List.Generate(
        () =>  Json.Document(Web.Contents(uri, headers))[data],
        each hasNextPage(uri),
//how do I change the uri here, to apply the new offset each iteration? each getRecords(uri & "&offset=" & getNextOffset(uri & "&offset=" & Json.Document(Web.Contents(uri, headers))[next_page][offset])) ), output = Table.FromList(alldata, Splitter.SplitByNothing(), null, null, ExtraValues.Error) in output

 

Where hasNextPage(uri) makes a call to the uri, then checks if the next_page object is just null, signifying the end of available pages of records. getRecords just returns raw data from source. 

let
    Source = (uri) => let
        Source = Json.Document(Web.Contents(uri, [Headers=[#"Content-Type"="application/json", Authorization="Bearer somePAT"]])),
        data = Source[data]
    in data
in Source

 

This now gives me a single column filled with lists, each of which contains 100 records. The problem is that it runs forever and returns hundreds of thousands of records (far more than SHOULD be returned). It never actually even loads all the rows if I apply changes.

 

Can someone help me debug the List.Generate function I'm running above?

 

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Solved it using a custom recursive function which grabs the offset, builds the next page's uri, appends the data to a rolling total of the data, then makes another call with the new uri, provided a next page exists to be called upon. 

 

(baseuri as text) =>
let
    headers = [Headers=[#"Content-Type"="application/json", Authorization="Bearer APIKEY"]],

    initReq = Json.Document(Web.Contents(baseuri, headers)),
    initData = initReq[data],
    //We want to get data = {lastNPagesData, thisPageData}, where each list has the limit # of Records, 
    //then we can List.Combine() the two lists on each iteration to aggregate all the records. We can then
    //create a table from those records
    gather = (data as list, uri) =>
        let
            //get new offset from active uri
            newOffset = Json.Document(Web.Contents(uri, headers))[next_page][offset],
            //build new uri using the original uri so we dont append offsests
            newUri = baseuri & "&offset=" & newOffset,
            //get new req & data
            newReq = Json.Document(Web.Contents(newUri, headers)),
            newdata = newReq[data],
            //add that data to rolling aggregate
            data = List.Combine({data, newdata}),
            //if theres no next page of data, return. if there is, call @gather again to get more data
            check = if newReq[next_page] = null then data else @gather(data, newUri)
        in check,
    //before we call gather(), we want see if its even necesarry. First request returns only one page? Return.
    outputList = if initReq[next_page] = null then initData else gather(initData, baseuri),
    //then place records into a table. This will expand all columns available in the record.
    expand = Table.FromRecords(outputList)
in
    expand

This returns a fully expanded table of records from from all pages of data.

 

Extensions of functionality or efficieny modifications are more than welcome!

 

View solution in original post

17 REPLIES 17
KJanssens
Helper II
Helper II

For anyone looking for a way to limit the calls, when there are a lot of pages:

you can add a rowcount limit like below:

 

... SNIP ...
            //add that data to rolling aggregate
            data = List.Combine({data, newdata}),
            datacount = List.Count(data),
            //if theres no next page of data, return. if there is, call @gather again to get more data
            check = if newReq[links] = null or datacount >= 10000 then data else @gather(data, newUri)
... SNIP ...

in this case it will stop when 10000 rows have been fetched...

shriti1995
Frequent Visitor

(baseurl as text)=>
let
initReq = Json.Document(Web.Contents(baseurl)),
nextUrl = initReq[#"@odata.nextLink"],
initValue= initReq[value],
gather=(data as list, url)=>
let
newReq=Json.Document(Web.Contents(url)),
newNextUrl = newReq[#"@odata.nextLink"],
newData= newReq[value],
data=List.Combine({data,newData}),
Converttotable = Record.ToTable(newReq),
Pivot_Columns = Table.Pivot(Converttotable, List.Distinct(Converttotable[Name]), "Name", "Value", List.Count),
Column_Names=Table.ColumnNames(Pivot_Columns),
Contains_Column=List.Contains(Column_Names,"@odata.nextLink"),
check = if Contains_Column = true then @gather(data, newNextUrl) else data
in
check,
Converttotable = Record.ToTable(initReq),
Pivot_Columns = Table.Pivot(Converttotable, List.Distinct(Converttotable[Name]), "Name", "Value", List.Count),
Column_Names=Table.ColumnNames(Pivot_Columns),
Constain_Column=List.Contains(Column_Names,"@odata.nextLink"),
outputList= if Constain_Column= true then @gather(initValue,nextUrl) else initValue,
expand=Table.FromRecords(outputList)

in
expand

@Anonymous Your code helped me a lot to get the values. Here is another way to get the data, when the last API page doesn't bring null but has Blank.

@Syndicate_Admin @rwaanders @Anonymous @wes2015 

 

I am using this code and very close to getting it correct. Is there anyone that could help me if I provided some more details? Please & thank you, I have been stuck for over a week trying to figure this out and can't find any good examples or documentation that fit my exact scenario.

rwaanders
Frequent Visitor

Hi there,

 

I'm having the same kind of API (https://manual.yesplan.be/en/developers/rest-api/#pagination), but i'm getting stuck with the "loop".

 

See below my code, which gets stuck on different pages each time. Most of the time on page 2, but sometimes on page 3.

When I try the URL (for example: https://odeon.yesplan.nl/api/events?page=2&book=4758791937&api_key=********) which throws error 404 via the internet, I get more detailed information:

 

{"contents":{},"message":"Page key \"2\" is not the current page for the book, which is \"3\"."}        

 When I then change the page=2 to page=3, i do get the correct results. If i then refresh the page again, it throws the above error once again stating that the current page should be 4. Then editing the code to page=4 gives me the correct information.

 

It's alsmost like as if the query makes a double request, which causes the page index to change to 3 where the query still tries to get page 2.

 

Do you know how to fix this?

 

See code:

 

(baseuri as text) =>
let
initReq = Json.Document(Web.Contents(baseuri)),
initData = initReq[data],
gather = (data as list, uri) =>
let
//get new offset from active uri
newOffset = Json.Document(Web.Contents(uri))[pagination][next],
//build new uri using the original uri so we dont append offsests
newUri = newOffset&"&api_key=**********",
//get new req & data
newReq = Json.Document(Web.Contents(newUri)),
newdata = newReq[data],
//add that data to rolling aggregate
data = List.Combine({data, newdata}),
//if theres no next page of data, return. if there is, call @gather again to get more data
check = if newReq[pagination][next] = null then data else @gather(data, newUri)
in check,
//before we call gather(), we want see if its even necesarry. First request returns only one page? Return.
outputList = if initReq[pagination][next] = null then initData else gather(initData, baseuri),
//then place records into a table. This will expand all columns available in the record.
expand = Table.FromRecords(outputList)
in
expand

 

Note, if i change this part in the code:

 

check = if newReq[pagination][next] = null then data else @gather(data, newUri)

 

To this:

 

check = if newReq[pagination][next] = null then data else data 

It will only loop once, since I removed the recurring loop. I will then get the first 200 rows (which is 2 pages, 100 rows per page). So it looks like in the loop part it's using the wrong/old url[pagination][next].

 

Any idea on how to fix this? 

Note, if i change this part in the code:

 

check = if newReq[pagination][next] = null then data else @gather(data, newUri)

 

To this:

 

check = if newReq[pagination][next] = null then data else data 

It will only loop once, since I removed the recurring loop. I will then get the first 200 rows (which is 2 pages, 100 rows per page). So it looks like in the loop part it's using the wrong/old url[pagination][next].

 

Any idea on how to fix this? 

Anonymous
Not applicable

Solved it using a custom recursive function which grabs the offset, builds the next page's uri, appends the data to a rolling total of the data, then makes another call with the new uri, provided a next page exists to be called upon. 

 

(baseuri as text) =>
let
    headers = [Headers=[#"Content-Type"="application/json", Authorization="Bearer APIKEY"]],

    initReq = Json.Document(Web.Contents(baseuri, headers)),
    initData = initReq[data],
    //We want to get data = {lastNPagesData, thisPageData}, where each list has the limit # of Records, 
    //then we can List.Combine() the two lists on each iteration to aggregate all the records. We can then
    //create a table from those records
    gather = (data as list, uri) =>
        let
            //get new offset from active uri
            newOffset = Json.Document(Web.Contents(uri, headers))[next_page][offset],
            //build new uri using the original uri so we dont append offsests
            newUri = baseuri & "&offset=" & newOffset,
            //get new req & data
            newReq = Json.Document(Web.Contents(newUri, headers)),
            newdata = newReq[data],
            //add that data to rolling aggregate
            data = List.Combine({data, newdata}),
            //if theres no next page of data, return. if there is, call @gather again to get more data
            check = if newReq[next_page] = null then data else @gather(data, newUri)
        in check,
    //before we call gather(), we want see if its even necesarry. First request returns only one page? Return.
    outputList = if initReq[next_page] = null then initData else gather(initData, baseuri),
    //then place records into a table. This will expand all columns available in the record.
    expand = Table.FromRecords(outputList)
in
    expand

This returns a fully expanded table of records from from all pages of data.

 

Extensions of functionality or efficieny modifications are more than welcome!

 

Hi @Anonymous , this looks like something I could use in my case.

 

I have borrowed your script and applied it to the API im working with, but I'm hitting a wall when the looping is finished. 

 

I suspect this has something to do with the fact that when there are no more results the API does not return the "next_page" part of the JSON, its just not there.

 

When running the API with a limit of 1, the paging is there

wes2015_0-1633680782156.png

 

When running with a limit of 100 (there are 33 rows in the demo database) the "paging" part is gone

wes2015_1-1633680862673.png

 

I have played around with a try/otherwise solution, but it just wont work. 

Did you have the same issue?

 

Wes

Anonymous
Not applicable

This isnt something I had to deal with because the API I was using returns "next_page: null" once we've exhausted all the data instead of just returning nothing. That said, all you should need to do is modify my stop condition 

check = if newReq[next_page] = null then data else @gather(data, newUri)

and the initial request checker

outputList = if initReq[next_page] = null then initData else gather(initData, baseuri),

to check if the paging key is contained in the json. You might be able to do it with error handling in the way youve been trying, but off the top of my head, you could also try and cast the json to regular text then use Text.Contains to see if ""paging": {" exists as a substring.

Thanks, yes I mended the stop condition to look for the "paging" column as shown below, and the query run without errors now.

 

But its not returning more than 100 rows still.

 

 

The List.Contains ( Table.ColumnNames() part works by itselft in a standalone query, but not sure how to check if it works inside the stop condition inside the "gather" function.

 

Do you see any major flaws?

 

let
    baseuri = "https://api.hubapi.com/crm/v3/objects/contacts
limit=100&archived=false&hapikey="  & apikey ,

    //headers = [Headers=[#"Content-Type"="application/json", Authorization="Bearer " & apikey ]],

    initReq = Json.Document(Web.Contents(baseuri)),
    #"Converted to Table" = Record.ToTable(initReq),
    initData = initReq[results],
    
    //We want to get data = {lastNPagesData, thisPageData}, where each list has the limit # of Records, 
    //then we can List.Combine() the two lists on each iteration to aggregate all the records. We can then
    //create a table from those records
    
    gather = (data as list, uri) =>
        
        
        let
            
            //get new offset from active uri
            newOffset = Json.Document(Web.Contents(uri))[paging][next][after],
            
            //build new uri using the original uri so we dont append offsests
            newUri = baseuri & "&after=" & newOffset,
            
            //get new req & data
            newReq = Json.Document(Web.Contents(newUri)),
            newdata = newReq[results],
            
            //add that data to rolling aggregate
            data = List.Combine({data, newdata}),
            
            //if theres no next page of data, return. if there is, call @gather again to get more data
            check = if List.Contains ( Table.ColumnNames(newReq as table), "paging" ) = true  then @gather(data, newUri) else data
        in 
        
        check,
    
    //before we call gather(), we want see if its even necesarry. First request returns only one page? Return.
    outputList = if List.Contains (Table.ColumnNames( #"Converted to Table" as table) , "paging" ) = true then gather(initData, baseuri) else initData,
    
    //then place records into a table. This will expand all columns available in the record.
    expand = Table.FromRecords(outputList)


in
    
    
    expand

 

 

Hi, 

 

I am a newbie to API paginations. This code is the easiest to follow on pagination i have found. 
When i copy the code and update the credentials, i get the below  error. Please help!!

 

Pagination error.PNG

Thanks, Bianca

Anonymous
Not applicable

Hey Bianca, not all APIs are built the same! Depending on the API, the output json will have a different structure and may not have a "next_page" field. You need to figure out how your API handles pagination and modify the code accordingly. Add some details about the API youre using and maybe I can help

@Anonymous thank you so much!! this code helped me a lot! 

Thanks to you, I learned a new way to paginate API requests in Power Query!

Superb work! Congratulations!!

mahoneypat
Microsoft
Microsoft

Please see this video for one way to do this.

Power BI - Tales From The Front - REST APIs - YouTube

 

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Anonymous
Not applicable

Great video! Unfortunately, what you explain only works if you can predetermine counts and offsets. I am not given a "total count" on which I can base successive requests. Each request returns a next_page, which gives me the next random offset token:

"next_page": {
    "offset": "yJ0eXAiOiJKV1QiLCJhbGciOiJIRzI1NiJ9",
    "path": "/projects?limit=5&workspace=xxxxxxxx&offset=yJ0eXAiOiJKV1QiLCJhbGciOiJIRzI1NiJ9",
    "uri": "https://app.asana.com/api/1.0/projects?limit=5&workspace=xxxxxxxx&offset=yJ0eXAiOiJKV1QiLCJhbGciOiJIRzI1NiJ9"
  }

So for each request, I need to pull the actual data into a table as well as grab the offset (or more easily, the fully constructed uri), and append the returned data to the same table.

 

In python, I'd just have a loop to run through each json return, grab the offset and make the next call. Thats the functionality I'd like here, but I'm having trouble translting that to a functional language like M/PQ. 

If you know how many to expect, you can hard code it in to the List.Numbers function (if you can't dynamically use a $count to get it).  You could also go past it and see which calls have error or are empty.

Pat

 





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


Anonymous
Not applicable

Like I said above, I dont have the ability to premptively find out a count of exdisting records. I just need to continually make the call, check for an offset in next_page, then append that offset to my uri and repeat until next_page = null in the returned json.

 

Ive made some edits to my original question. Could you take a look at that? I'm relying on List.Generate and having some trouble.

Helpful resources

Announcements
November 2022 Update

Check it Out!

Click here to read more about the November 2022 updates!

Microsoft 365 Conference â__ December 6-8, 2022

Microsoft 365 Conference - 06-08 December

Join us in Las Vegas to experience community, incredible learning opportunities, and connections that will help grow skills, know-how, and more.

Top Solution Authors
Top Kudoed Authors