Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I am writing a custom connector to interface with an API, and I'm running into some incredibly frustrating issues with the way that Web.Contents is behaving. The connector goes as follows;
The connector looks something like this:
shared Connector.Contents = (IDs as list) => let longLivedAuthToken = let shortLivedAuthToken = Extension.GetCredentials()[Username] & Extension.GetCredentials()[Password], response = Binary.Buffer(Web.Contents("/Login", Authorization = shortLivedAuthToken)),//This does not get cached in response[longLivedAuthToken], BigCSV = let table = Binary.Buffer(Web.Contents("/BigCSVEndpoint", Authorization = longLivedAuthToken)),//Neither does this // ^
//It looks like this doesn't use the value longLivedAuthToken,
//But rather it replaces this with the actual call (think a C macro)
promoted = Table.PromoteHeaders(table), named = CustomRenameFunction(promoted, Table.First(promoted)), in named, manySmallCSVs = let manySmallCSVs = List.Transform(IDs, each Binary.Buffer(Web.Contents("/SmallCSVEndpoint", Authorization = longLivedAuthToken, query = _))),//Or this // ^
//It looks like this doesn't use the value longLivedAuthToken,
//But rather it replaces this with the actual call (think a C macro)
promoted = List.Transform(manySmallCSVs, each Table.PromoteHeaders(_)), renamed = List.Transform(manySmallCSVs, each (CustomRenameFunction(_, Table.First(_)))) in renamed navtable = let allCSVs = List.Combine(manySmallCSVs, {BigCSV}), table = Table.GenerateNavigationTableFromList(allCSVs) in table in navtable;
My issue is two fold:
Firstly, I can see in the server logs that a log in attempt is made over 7-8 times for each CSV in the import (in general there are 7-10 csvs). This eventually causes the /login endpoint to return a 409 Conflict error (too many concurrent users of the endpoint) which causes the whole import to fail. I have tried buffering the results of the login call using Binary.Buffer() on the Web.Contents call, but it does not seem to work.
Secondly, the first issue also manifests with the CSV downloads. Hitting these endpoints with a browser generally takes ~2 seconds to download the files individually in a single request, but for some bizarre reason importing them in the connector takes minutes, and on the server side there are literally hundreds of requests being made. (It looks like Power BI is downloading them in small chuncks, and making subsequent requests to get the next chunck? This is not programmed behaviour on the API side but most of the request contain the line 'expect 100 continue' which seems to indicate this is what is happening)
I have found that disabling parallel loads solves the issue of the /login attempts returning a 409 status code, but that is just because they all happen in series rather than in parallel, the same number of total calls are still made and the whole import takes a lot longer (scales with the number of csvs).
An interesting thing that I've noted though is that there seem to be Three distinct phases of the import. The first phase occurs immediately after supplying the ID and credentials, and results in ~3 calls to the /login endpoint and a similarly small number of calls to the csv endpoints. This phase ends when the preview window opens with all the csv files in the navigation table unchecked. The names of these tables are derived from specific cells in the tables, so they must have been loaded into memory for the names to have been pulled out (and the endpoints send the csvs in full). At this point I would expect that all the necessary requests have been made, and that no further API calls are necessary (otherwise how could the tables be dynamically named based off of data inside the table?).
Now when I start checking all the boxes for the tables, phase 2 starts, and calls start being made again for what I'm assuming is preview data. Why does this have to happen, shouldn't the data already be stored/cached?
Phase 2 ends and Phase 3 starts when I click the load button. Doing this causes all the endpoints to start being hit again, accounting for a further 4-5 calls per CSV which finally overwhelms the /login endpoint and the load errors out.
I managed to eek out an error message from Phase 3 that looks somethin like this
Formulas: section Section1; shared _csv1 = let Source = Connector.Contents("471516961986314240"), _csv11 = Source{[Key="_csv1"]}[Data] in _csv11; shared _csv2 = let Source = Connector.Contents("471516961986314240"), _csv21 = Source{[Key="_csv2"]}[Data] in _csv21; shared _csv3 = let Source = Connector.Contents("471516961986314240"), _csv31 = Source{[Key="_csv3"]}[Data] in _csv31; ...
This error message contains a shared _csvX = let... once for each check box that I checked. What it looks like it's doing internally is calling the connector (which gets all the csv's) once for every csv (which again, makes no sense because the csv's have already been imported). So in total if the import generates 7 output csv's, this explicitly fetches 7*7 csv's, or 49. This relationship of ~n^2 coincides with the number of login attempts that I am seeing. I'm assuming that this is not intended behaviour, and that the Source = Connector.Contents("...") should be cached from the initial import that generated the navtable that I interacted with, or that at least only one more call to it should be made. It's impossible to find out what is and isn't being cached, and the lazy loading evaluation model means that even if I write Binarry.Buffer(), which would result in an in memory copy being kept and subsequent uses of the output being fetched from cache rather than from calling Web.Contents, the call to Binary.Buffer itself seems to be deffered until the contents are actually needed, which means that a call to it can be outside of the initial scope where it would have been useful to cache the information.
I think that this is essentially what is going wrong:
let rand = (List.Random(2)), x = rand, y = rand, z = rand in {x{0},y{0},z{0}}
In this example, you would expect that rand gets evaluated into a list containing two random numbers, and x,y,z would then reference that List. The output would then be a list containing the same three numbers.
This is not the case. The evaluation of List.Random(2) is put off until it is absolutely needed, and the evaluation model generates something like this as the actual output {(List.Random(2)){0}, (List.Random(2)){0}, (List.Random(2)){0}} For a dynamic call like this you end up with three diferent numbers.
If you add a buffer and change the code to
let rand = List.Buffer(List.Random(2)),//Buffer this now x = rand, y = rand, z = rand in {x{0},y{0},z{0}}
Then the output generated is a list containing three identical numbers.
It seems as if in my connector buffering Web.Contents does not have the same effect, and I'm left with a waterfall of API calls that should not need to be made.
Any help would be greatly appreciated.
I actually have a similar problem, and the Table.Buffer function also didn't help. When my navigator appears, 3 calls have been made aswell. I figured the only workaround would be to uncheck "Enable Parallel Loading of Tables" and/or Load queries slowly, disabling load of the ones already in the query editor. I'm an intern who just made a custom connector as the first project, so i can't help you. But if you find any solution, i would appreciate it.
Kind Regards.
Hi @Anonymous ,
If I understand your scenario correctly that you have problems when you import data via the custom connector in power bi?
If so, could you show the error message details with the screenshot?
In addition, you could have a reference of this blog firstly which should be helpful.
Best Regards,
Cherry