cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
bagge_rob Frequent Visitor
Frequent Visitor

Refresh error when combining multiple Web queries

We are receiving the following error when Power Bi Service tries to refresh our dataset with the query below.

 

[Unable to combine data] Section1/Query1/AutoRemovedColumns1 is accessing data sources that have privacy levels which cannot be used together. Please rebuild this data combination.

 

let
    // The FirstQuery return a list of records and a scrollId
    FirstQuery = Json.Document(Web.Contents("https://<url to elasticsearch cluster>.com", [Query=[scroll="5m", size="10"]])),

    // The SecondQuery uses the scrollId from FirstQuery to make its call
    SecondQuery = Json.Document(Web.Contents("https://<url to elasticsearch cluster>.com", [Query=[scroll_id=FirstQuery[_scroll_id]]])),
    
    // The list of records from FirstQuery and SecondQuery are combined and returned.
    CombinedResults = FirstQuery[hits][hits] & SecondQuery[hits][hits]
in
    CombinedResults

 

As you can see the query makes two requests to an Elasticsearch cluster and combines their response data. The second query uses the response from the first query as a query param. The authentication is done with Basic Auth and we use Power BI's native Basic Auth to authenticate the data source.

 

The query works fine locally using Power BI Desktop, it just fails when refrehsing on Power BI service.

 

This looks like a bug as AutoRemovedColumns1 is not part of our query. We have tried all different combinations of privacy settings Power BI offers, but nothing works. The only way we can get it to work is to hard-code the basic auth header into the web calls themselves and set the Authentication for the query within Power BI to Anonymous.

 

Does anyone have any suggestions, or is this a bug that Microsoft need to look into?

10 REPLIES 10
Kane Regular Visitor
Regular Visitor

Re: Refresh error when combining multiple Web queries

I am also interested in this question.

MarkCBB Member
Member

Re: Refresh error when combining multiple Web queries

@bagge_rob Did you ever find out what was causing this? I am getting the same problem.

bagge_rob Frequent Visitor
Frequent Visitor

Re: Refresh error when combining multiple Web queries

Yeah we finally got it working.

 

What we did was to abstract the queries in to Power Query functions and then have one query that called the other functions to create a response. When doing this the online refresh worked.

 

If you are particularly interested in making scroll queries to Elasticsearch we used this recursive function to do so:

() =>
let
    FetchElasticsearchData= (CallsRemaining, InScrollId, ListOfResults) =>
        let
            Response = Function.Invoke(ContinueScrollQuery, {InScrollId, "CacheBust" & Number.ToText(CallsRemaining)}),
            OutScrollId = Response[_scroll_id]
        in
            if CallsRemaining > 0 
                then @FetchElasticsearchData(CallsRemaining- 1, OutScrollId, ListOfResults & {Response[hits][hits]})
                else ListOfResults,

    DocumentsPerRequest = 1000,
    StartScrollQueryResult = Function.Invoke(StartScrollQuery, {DocumentsPerRequest}),
    Result =  FetchElasticsearchData(Number.RoundDown(StartScrollQueryResult[hits][total]/DocumentsPerRequest), StartScrollQueryResult[_scroll_id], {StartScrollQueryResult[hits][hits]})
in
    Result

Where StartScrollQuery and ContinueScrollQuery are referenced functions.

 


With this said though we quickly found out that Power BI is bad at transforming large chunks of JSON data into the tabular format it needs to do its visualisations. The transformation part of the query often took up to 10 minutes for around 500 000 blobs of JSON data.

What we ended up doing in the end was to write a NodeJS script to stage the elasticsearch data into a csv file in an Amazon S3 bucket and then pulling data from there instead. This made fetching large quantities of data very fast.

Hope this helps,
Rob

MarkCBB Member
Member

Re: Refresh error when combining multiple Web queries

@bagge_rob Thank you for the reply, that doesnt really help me, but it has given me a place to start investigating. My Data is coming from Excel Files location on my local machine. I belive the error is from a merge that I have done, so I am going to try and find a way around it. 

 

This is the error I get:

[Unable to combine data] Section1/MAIN_HISTORY/AutoRemovedColumns1 references other queries or steps, so it may not directly access a data source. Please rebuild this data combination.

This is my query:

let
    Source = Folder.Files(File_Dir & "\eXceler8\Projects\Excel Add-Ins\Clients\iRam Internal\Live files\iRAM_ADDIN_APP_01\SSF\SF\SFS\1PBI_DB\History"),

    #"Filtered Rows5" = Table.SelectRows(Source, each Text.Contains([Name], Vendor)),

    #"Filtered Rows" = Table.SelectRows(#"Filtered Rows5", each ([Folder Path] = File_Dir & "\eXceler8\Projects\Excel Add-Ins\Clients\iRam Internal\Live files\iRAM_ADDIN_APP_01\SSF\SF\SFS\1PBI_DB\History\")),

    #"Filtered Rows1" = Table.SelectRows(#"Filtered Rows", each Text.StartsWith([Name], "Main") and not Text.Contains([Name], "Conflicted")),

    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows1",{"Content"}),

    #"Added Custom" = Table.AddColumn(#"Removed Other Columns", "GetExcelData", each Excel.Workbook([Content])),

    #"Expanded GetExcelData" = Table.ExpandTableColumn(#"Added Custom", "GetExcelData", {"Name", "Data", "Item", "Kind", "Hidden"}, {"GetExcelData.Name", "GetExcelData.Data", "GetExcelData.Item", "GetExcelData.Kind", "GetExcelData.Hidden"}),

    #"Filtered Rows2" = Table.SelectRows(#"Expanded GetExcelData", each ([GetExcelData.Kind] = "Sheet") and ([GetExcelData.Name] = "MAIN")),

    #"Removed Other Columns1" = Table.SelectColumns(#"Filtered Rows2",{"GetExcelData.Data"}),

    #"Expanded GetExcelData.Data" = Table.ExpandTableColumn(#"Removed Other Columns1", "GetExcelData.Data", {"Column1", "Column2", "Column3", "Column4", "Column5", "Column6", "Column7", "Column8", "Column9", "Column10", "Column11", "Column12", "Column13", "Column14", "Column15", "Column16", "Column17", "Column18", "Column19", "Column20", "Column21", "Column22", "Column23", "Column24", "Column25", "Column26", "Column27", "Column28", "Column29", "Column30", "Column31", "Column32", "Column33", "Column34", "Column35", "Column36", "Column37", "Column38", "Column39"}, {"GetExcelData.Data.Column1", "GetExcelData.Data.Column2", "GetExcelData.Data.Column3", "GetExcelData.Data.Column4", "GetExcelData.Data.Column5", "GetExcelData.Data.Column6", "GetExcelData.Data.Column7", "GetExcelData.Data.Column8", "GetExcelData.Data.Column9", "GetExcelData.Data.Column10", "GetExcelData.Data.Column11", "GetExcelData.Data.Column12", "GetExcelData.Data.Column13", "GetExcelData.Data.Column14", "GetExcelData.Data.Column15", "GetExcelData.Data.Column16", "GetExcelData.Data.Column17", "GetExcelData.Data.Column18", "GetExcelData.Data.Column19", "GetExcelData.Data.Column20", "GetExcelData.Data.Column21", "GetExcelData.Data.Column22", "GetExcelData.Data.Column23", "GetExcelData.Data.Column24", "GetExcelData.Data.Column25", "GetExcelData.Data.Column26", "GetExcelData.Data.Column27", "GetExcelData.Data.Column28", "GetExcelData.Data.Column29", "GetExcelData.Data.Column30", "GetExcelData.Data.Column31", "GetExcelData.Data.Column32", "GetExcelData.Data.Column33", "GetExcelData.Data.Column34", "GetExcelData.Data.Column35", "GetExcelData.Data.Column36", "GetExcelData.Data.Column37", "GetExcelData.Data.Column38", "GetExcelData.Data.Column39"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Expanded GetExcelData.Data"),

    #"Filtered Rows3" = Table.SelectRows(#"Promoted Headers", each [MAIN_ID] <> "MAIN_ID"),

    #"Changed Type" = Table.TransformColumnTypes(#"Filtered Rows3",{{"MAIN_ID", type text}, {"SITE_ID", type text}, {"ARTICLE_ID", type text}, {"VENDOR_ID", type text}, {"VENDOR_NAME", type text}, {"VENDOR_CODE", type text}, {"CHANNEL", type text}, {"CAT NAME", type text}, {"SUB CAT NAME", type text}, {"ARTICLE", type text}, {"BARCODE", type text}, {"VEND. PROD.", type text}, {"MATERIAL DESCRIPTION", type text}, {"REPLENISHMENT", type text}, {"SALES RANKING", type text}, {"BMC", type text}, {"DSC ALERT", type text}, {"DATE LAST SOLD", type text}, {"SITE", type text}, {"SITE NAME", type text}, {"SITE RANKING", type text}, {"PR ST", type text}, {"SELLING UOM", type text}, {"LOOKUP", type text}}),
    #"Uppercased Text" = Table.TransformColumns(#"Changed Type",{{"MAIN_ID", Text.Upper}, {"SITE_ID", Text.Upper}, {"ARTICLE_ID", Text.Upper}, {"VENDOR_ID", Text.Upper}, {"VENDOR_NAME", Text.Upper}, {"VENDOR_CODE", Text.Upper}, {"CHANNEL", Text.Upper}, {"CAT NAME", Text.Upper}, {"SUB CAT NAME", Text.Upper}, {"ARTICLE", Text.Upper}, {"BARCODE", Text.Upper}, {"VEND. PROD.", Text.Upper}, {"MATERIAL DESCRIPTION", Text.Upper}, {"REPLENISHMENT", Text.Upper}, {"SALES RANKING", Text.Upper}, {"BMC", Text.Upper}, {"DSC ALERT", Text.Upper}, {"DATE LAST SOLD", Text.Upper}, {"SITE", Text.Upper}, {"SITE NAME", Text.Upper}, {"SITE RANKING", Text.Upper}, {"PR ST", Text.Upper}, {"SELLING UOM", Text.Upper}, {"LOOKUP", Text.Upper}}),
    #"Renamed Columns" = Table.RenameColumns(#"Uppercased Text",{{"Column39", "DATE"}}),

    #"Changed Type1" = Table.TransformColumnTypes(#"Renamed Columns",{{"DATE", type date}, {"PROMO SP", Currency.Type}, {"INCL SP", Currency.Type}, {"NETT COST", Currency.Type}, {"MAC", Currency.Type}, {"SOH", Int64.Type}, {"SOO", Int64.Type}, {"SIT", Int64.Type}, {"STO", Int64.Type}, {"NUM. OF SITE", Int64.Type}, {"NUM. OF SKU", Int64.Type}, {"AVE MONTHLY SALES", Int64.Type}, {"ACT DSC", Int64.Type}, {"LAST RECPT Y/M", type date}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type1",{"VENDOR_CODE", "CAT NAME", "SUB CAT NAME", "ARTICLE", "BARCODE", "VEND. PROD.", "MATERIAL DESCRIPTION", "SITE", "SITE NAME"}),
    #"Filtered Rows4" = Table.SelectRows(#"Removed Columns", each ([MAIN_ID] <> null)),

    #"Replaced Value" = Table.ReplaceValue(#"Filtered Rows4",null,"UNKNOWN",Replacer.ReplaceValue,{"SALES RANKING"}),

    #"Replaced Value1" = Table.ReplaceValue(#"Replaced Value",null,"NOT ON F&R",Replacer.ReplaceValue,{"REPLENISHMENT"}),

    #"Filtered Rows6" = Table.SelectRows(#"Replaced Value1", each [MAIN_ID] <> null),

    #"Replaced Value2" = Table.ReplaceValue(#"Filtered Rows6","NOT FOUND","0",Replacer.ReplaceText,{"PR ST"}),

    #"Merged Queries" = Table.NestedJoin(#"Replaced Value2",{"ARTICLE_ID"},ARTICLES,{"ARTICLE_ID"},"NewColumn",JoinKind.LeftOuter),

    #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {"ARTICLE"}, {"ARTICLE"}),
    #"Replaced Value3" = Table.ReplaceValue(#"Expanded NewColumn","1000002213-WINLEX E (PTY) LTD (TOBACCO)","1000002214-WINLEX E (PTY) Ltd (CONFECTION",Replacer.ReplaceText,{"MAIN_ID", "SITE_ID", "ARTICLE_ID", "VENDOR_ID", "VENDOR_NAME"})

in
    #"Replaced Value3"

 

Does anything there stand out for you?

 

 

MarkCBB Member
Member

Re: Refresh error when combining multiple Web queries

I was able to sort out my challage, I found the answer in this post, Matt shared a blog post by Ken Plus.

http://community.powerbi.com/t5/Desktop/Formula-Firewall-Query-references-other-queries-so-it-may-no...

 

Highlighted
Dejazeus Occasional Visitor
Occasional Visitor

Re: Refresh error when combining multiple Web queries

Hi Rob. Bumped into this post looking for power query examples that can directly grab files in S3 buckets. You elude to that in your response here. Any chance you could elaborate or even share some example code?

 

Thanks in advance!

Cheers,

-m@

Robi2 Frequent Visitor
Frequent Visitor

Re: Refresh error when combining multiple Web queries

Hi, @bagge_rob

 

I am using your example to try and fix the same issue you had at the beginning, but I keep getting the same error when trying to auto refresh. I posted in detail about it here.

 

Could you please also share the code in your 2 functions that actually go and get the data? (StartScrollQuery and ContinueScrollQuery)

 

I am trying to figure out what I haven't done correctly.

 

Thanks,

Robert

bagge_rob Frequent Visitor
Frequent Visitor

Re: Refresh error when combining multiple Web queries

Hello,

 

Sorry for such a late reply. Unfortunately I'm not at the same company anymore and don't have access to the code. Even though the approach worked we ended up abondoning it due to it taking a lot of time fetching all the data. We ended up creating a AWS Elasticbeanstalk app that periodically streamed the contents of our Elasticsearch datastore to a csv file that we stored in a AWS S3 bucket. Power BI then downloaded the csv-file instead. The only problem with this approach is that Power BI at the time didn't have a good way to authenticate to AWS, PowerQuery didn't have the required functions to create a AWS signature. We went around this by having a AWS Lambda function authenticate Power BI and redirect it to the csv file if successful (Potentially you can do this directly with AWS API Gateway now). Very complex solution to what seemed to be an easy problem to solve, but the immaturity of Power BI at the time made this tricky.

 

I think the change in approach helped us speed up the "download and transform data" process from like 10 minutes to 10 seconds. This worked pretty well for us as it lets Power BI do what it is good at (working with tabular data). We coded the streaming app in NodeJS, but any modern programming language should be able to do this.

 

If you have a lot of data you might want to split it into several csv files that you load with a recursive query like the one in the top of this thread. (If you can do without I'd recommend that).

 

Hope this is of any help,

cheers!

Rob

Robi2 Frequent Visitor
Frequent Visitor

Re: Refresh error when combining multiple Web queries

Hi,

 

Thanks for your in depth reply! Smiley Happy

 

We were actually considering doing pretty much the same thing, just use elasticdump to create JSON files, put them on Azure and then have Power BI get them from there.

Now after reading your answer, this is definitely what we will do, and I would also recommend this to everyone as it makes the whole process a lot simpler to maintain and you can also stop relying on your gateway to have the data up to date. (Because Power BI can directly connect to Azure)

 

Thanks again,

Robert