cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
KenvM Frequent Visitor
Frequent Visitor

Getting Data from Elastic Stack with REST with Scroll

Hello,

 

I'm pulling data into Power BI using the Elastic Stack REST API, but I'm running into a problem when trying to pull more than 10k records.

 

In order to do so, I need to use the scroll function: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-scroll.html

 

The first call:

 

 

GET /_search?scroll=1m 
{ "query": ... }

Will return the first set of results plus a scroll id: 

 

DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAD4...

 

Which is used for subsequent calls:

 

 

GET /_search {
  "scroll_id": "DXF1Z..."
}

 

 

Each call returns a set of records plus a new scroll id. The scroll id is then passed onto the next record, and so on. 

 

For performance reasons, I don't want to increase the number of records that can be returned by the REST API.

 

Is it possible to solve this using Power BI?


Thanks,


Ken

 

2 ACCEPTED SOLUTIONS

Accepted Solutions
Super User
Super User

Re: Getting Data from Elastic Stack with REST with Scroll

There have been a few different threads like this but I believe the solution lies in creating a Power Query function that you call recursively. Here is an example of functions and recursion on a completely unrelated topic but might get you there:

 

https://community.powerbi.com/t5/Community-Blog/Using-Recursion-to-Solve-Hex-to-Decimal-Conversion/b...

 


Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


KenvM Frequent Visitor
Frequent Visitor

Re: Getting Data from Elastic Stack with REST with Scroll

Thanks for pointing me in the right direction.

 

For anyone else that runs into this problem, here's the first iteration of a working function: 

 

let
    RecursiveElasticFetch = (queryUrl, scrollUrl, scrollId, counter) =>

    let
        Counter = if (counter = null) then 0 else counter,

        Results = if (scrollId = null) then            
            Json.Document(Web.Contents(queryUrl))
        else
            Json.Document(Web.Contents(scrollUrl&scrollId, [Headers=[MyHeader=Text.From(Counter)]])),

        ParsedResults = Table.FromList(Results[hits][hits], Splitter.SplitByNothing(), null, null, ExtraValues.Error),

        Return = if (Counter < 10) then
            ParsedResults & RecursiveElasticFetch(queryUrl, scrollUrl, Results[_scroll_id], Counter+1)
        else
            ParsedResults 
    in
        Return 
in
    RecursiveElasticFetch

For the parameters:

 

queryUrl - The query URL that kicks off the search: http://elasticsrch-dev:9200/intranet*/_search?scroll=1m&source={....}

scrollUrl - The URL used for subsequent searches (the scroll id is appended at the end): http://elasticsrch-dev:9200/_search/scroll?scroll=1m&scroll_id=

scrollId - Used to pass the scroll_id to subsequent calls. Leave it blank for the first call

counter - Used to limit the number of iterations. Leave it blank for the first call. 

 

18 REPLIES 18
Super User
Super User

Re: Getting Data from Elastic Stack with REST with Scroll

There have been a few different threads like this but I believe the solution lies in creating a Power Query function that you call recursively. Here is an example of functions and recursion on a completely unrelated topic but might get you there:

 

https://community.powerbi.com/t5/Community-Blog/Using-Recursion-to-Solve-Hex-to-Decimal-Conversion/b...

 


Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!


KenvM Frequent Visitor
Frequent Visitor

Re: Getting Data from Elastic Stack with REST with Scroll

Thanks for pointing me in the right direction.

 

For anyone else that runs into this problem, here's the first iteration of a working function: 

 

let
    RecursiveElasticFetch = (queryUrl, scrollUrl, scrollId, counter) =>

    let
        Counter = if (counter = null) then 0 else counter,

        Results = if (scrollId = null) then            
            Json.Document(Web.Contents(queryUrl))
        else
            Json.Document(Web.Contents(scrollUrl&scrollId, [Headers=[MyHeader=Text.From(Counter)]])),

        ParsedResults = Table.FromList(Results[hits][hits], Splitter.SplitByNothing(), null, null, ExtraValues.Error),

        Return = if (Counter < 10) then
            ParsedResults & RecursiveElasticFetch(queryUrl, scrollUrl, Results[_scroll_id], Counter+1)
        else
            ParsedResults 
    in
        Return 
in
    RecursiveElasticFetch

For the parameters:

 

queryUrl - The query URL that kicks off the search: http://elasticsrch-dev:9200/intranet*/_search?scroll=1m&source={....}

scrollUrl - The URL used for subsequent searches (the scroll id is appended at the end): http://elasticsrch-dev:9200/_search/scroll?scroll=1m&scroll_id=

scrollId - Used to pass the scroll_id to subsequent calls. Leave it blank for the first call

counter - Used to limit the number of iterations. Leave it blank for the first call. 

 

wendt_1 Frequent Visitor
Frequent Visitor

Re: Getting Data from Elastic Stack with REST with Scroll

Hey @KenvM,

 

Thanks for sharing this iteration! Have you made any new changes? I'm having an issue after the second loop of this script. First call returns one set of records, second call returns a new set, third and all subsequent calls returns a repeat set of data (not sure if from first or second call, but I could verify if needed). 

Have you encountered this yourself, and if so, have you found a solution?

Link to a more detailed description on the Elastic forums here: Elastic Forums

Thanks!

KenvM Frequent Visitor
Frequent Visitor

Re: Getting Data from Elastic Stack with REST with Scroll

I looked at your example posted on the Elastic Search site, and I think the problem may be in this section:

 

 

        ## If function is called w/o a scrollID, then make the initial query, otherwise, query with a scrolling query
        Results = if (scrollID = null) then
            Json.Document(Web.Contents(url,[Content = Text.ToBinary(urlBody)]))
        else
            Json.Document(Web.Contents(scrollURL,[Content = Text.ToBinary(scrollBody&scrollID&scrollEND)])),

 

 

I believe you need to pass the scrollId as part of the query on subsequent calls. Otherwise, you get a new request each time.

 

Try:

 

        ## If function is called w/o a scrollID, then make the initial query, otherwise, query with a scrolling query
        Results = if (scrollID = null) then
            Json.Document(Web.Contents(url,[Content = Text.ToBinary(urlBody)]))
        else
            Json.Document(Web.Contents(scrollURL&scrollId,[Content = Text.ToBinary(scrollBody&scrollID&scrollEND)])),

 

 

wendt_1 Frequent Visitor
Frequent Visitor

Re: Getting Data from Elastic Stack with REST with Scroll

Thanks for pointing that out! I actually am passing the scrollID with that calll, but I'm passing it as part of the body as opposed to the URL. Because the _scroll_id value passed back by Elasticsearch grows with the number of shards you have (if my understanding is correct), it can get crazy long. The one I'm returning is 13,873 characters long. Too long to send back as part of the URL. One forum post on that here.

 

The bolded part below is where I'm passing it.

        ## If function is called w/o a scrollID, then make the initial query, otherwise, query with a scrolling query
        Results = if (scrollID = null) then
            Json.Document(Web.Contents(url,[Content = Text.ToBinary(urlBody)]))
        else
            Json.Document(Web.Contents(scrollURL,[Content = Text.ToBinary(scrollBody&scrollID&scrollEND)])),

 

KenvM Frequent Visitor
Frequent Visitor

Re: Getting Data from Elastic Stack with REST with Scroll

Good to know, thanks! 

 

I replicated the problem in our environment and I believe I have a fix.

 

Power BI caches results from repeat calls to the same URL - this is why I added in the customer header. Adding the same snippet to your call seems to fix the problem for me.

 

            Json.Document(Web.Contents(scrollURL,[Content = Text.ToBinary(scrollBody&scrollID&scrollEND), Headers=[MyHeader=Text.From(Counter)]])),

 

Let me know if this works for you.

 

 

wendt_1 Frequent Visitor
Frequent Visitor

Re: Getting Data from Elastic Stack with REST with Scroll

You. Are. A lifesaver. I had no idea that Power cached results, and your solution worked like a charm. Thank you for the time you put in to help me with this issue!

Anonymous
Not applicable

Re: Getting Data from Elastic Stack with REST with Scroll

Hi Dear all, Im currently working with this function but I need to connect at ElasticSearch server using credentials (Basic usernameSmiley Tongueassword) , Do you know if this is possible?

 

I could not get a good result Smiley Sad

 

 

thanks a lot for your help.

Highlighted
KenvM Frequent Visitor
Frequent Visitor

Re: Getting Data from Elastic Stack with REST with Scroll

I haven't needed to do so yet.

 

I believe that the solutions on this page may help you though. They should how to pass headers through when making a web.contents call.

 

https://community.powerbi.com/t5/Desktop/Issue-with-getting-data-via-API-with-bearer-token/td-p/1262...