Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi,
I am fetching some data with the Google BigQuery connector and doing an inner join between a customer dim table and the fact table.
This is taking some time and I want to hear if anyone have any best practice tips for what can make it go faster? I have looked into Native Query and that doesn't work very well with the BigQuery connector, but I want to hear if anyone else have any tips or ideas.
Best,
Ali A
Hi @Anonymous ,
depends on how you do the merge - assuming in PQ, not on the server side via SQL or alike, this can help a bit:
Also try to reduce the size of the tables being merged to a bare minimum (i.e. filter off all unnecessary data, remove all unnecessary columns before the merge, not after).
Depending on the size of your fact table, try to Table.Buffer() it before the merge and test if this improves performance (note that it can actually make it worse).
Kind regards,
JB
This is a very good tip.
I will try this.
Do you have any pracitcal examples of Table.Buffer() ?
Thanks for responding!
Best,
Ali A
@Anonymous
When I do the inner join it reads all the row from the fact table and then does the inner join.
Can't this be optimized in some way that it doesn't have to load all of this every single time? I
Best,
Ali A
Hi @Anonymous
1. Table.Buffer is pretty simple:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMkIijcGkCRJpCiNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Column 1" = _t]),
#"Changed column type" = Table.TransformColumnTypes(Source, {{"Column 1", Int64.Type}}),
Buffer = Table.Buffer(#"Changed column type")
in
Buffer
This command force recalculation of the table (just the first layer of scalars, if any cell in the table refer to a list or a function or another table, they are not resolved at this stage) and loads it into memory. Advantage: the table is loaded in the memory and therefore process as quickest as possible. Downside: the size of the memory is finit and big tables force using disk, which makes the etire query run slower. Therefore it requires a real-time testing in eery application. But as a rule of thumb: small tables that are used several/referred to times inside one execustion chain (!) should be buffered as this reduce the number of loads from connector.
2. I am not sure that I can help with your last question, I am not using BigData. If this is SQL like, I think you can prefilter and extract a list of values from your main table to pre-filter the fact table on the server side, but I think this cure may be worse than the disease. What may heppen is the fact table is loaded more than once - as far as I know this happens on merges in some scenarios. Test if buffering the fact table will help.
Kind regards,
John
Thanks again for a great response.
I am testing some things and made a support case with Microsoft.
I also think there are other things here. What I ended up doing is using native query which seems to work, but makes it a bit more difficult moving between DEV and PROD.
Just have to continue working on it.
Best,
Ali A
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.