Best practice Reference or Append Query - Load lar...

PWRBI · ‎08-17-2022

Hello,

I have a very large amount of data, so I need to split it into multiple queries to keep the data source from crashing.

But then I have to merge those queries back together to have a single query and transform the fields.
However, when referenced again from the source, they are all loaded together and this leads to chrash in the data source.
Unfortunately, with the reference, the data source is also loaded again, what alternative is there to the reference?

jbwtp · ‎08-17-2022

Hi @PWRBI,

Theoretically, switching off "Parallel table loading" in the setting may help to avoid concurrent access to the datasource. But if you are having those problems, I guess, the datasource doesnot support anything like pagination or query folding. In this case splitting query into separate queries may do the opposed as each query may be loading the entire datasource and then filter it based on the filters it has. Without knowing exactly what you are doing, it is hard to tell, but this is quite possible scenario. Maybe re-think your strategy and test if you can load the entire dataset from the source at once one more time? Can you split the datasource to a smaller data tables/sources?In this case, splitting can make sense (if you requesting separate files or tables each time), in this case try prohibiting parallel data load in settings.

Cheers,

John

PWRBI · ‎08-17-2022

Thank you for your feedback.

So the individual queries work great and I have already deactivated parallel queries.

I have the following example:

query 1
query 2
Query 3
Query 4
Query 5

These 5 queries run one after the other and the database has no problems.

In the Master Query I refer to queries 1 - 5 and in this query all five queries are started at the same time and the database crashes.

jbwtp · ‎08-17-2022

Hi @PWRBI,

I see your point, yes, you are right since this is a single query referring to several other queries, it triggers them to run all at once regardless the settings.

Do you wan tto try playing along these lines:

let
    Source_1 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMgKTxkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    Source_2= Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMgKTxkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    Source_3 = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMgKTxkqxsQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t]),
    Combine_1 = Table.Combine({Source_1, Function.InvokeAfter(()=>Source_2, #duration(0,0,0,10))}),
    Combine_2 = Table.Combine({Combine_1, Function.InvokeAfter(()=>Source_3, #duration(0,0,0,10))})
in Combine_2

The idea behind it to manually force the delay on gettin the date from each table (using Function.InvokeAfter). You will need to somehow manually measure timing for each query to load and then add 20% on top for the duration parameter. I think this may works, but the performance of course will be quite suboptimal.

Kind regards,

John

Best practice Reference or Append Query - Load large amount of data in multiple steps

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024