Re: Optimizing inner join between dim and fact tab...

Anonymous · ‎11-09-2019

Hi,

I am fetching some data with the Google BigQuery connector and doing an inner join between a customer dim table and the fact table.

This is taking some time and I want to hear if anyone have any best practice tips for what can make it go faster? I have looked into Native Query and that doesn't work very well with the BigQuery connector, but I want to hear if anyone else have any tips or ideas.

Best,

Ali A

Anonymous · ‎11-09-2019

Hi @Anonymous ,

depends on how you do the merge - assuming in PQ, not on the server side via SQL or alike, this can help a bit:

https://blog.crossjoin.co.uk/2018/03/16/improving-the-performance-of-aggregation-after-a-merge-in-power-bi-and-excel-power-query-gettransform/

Also try to reduce the size of the tables being merged to a bare minimum (i.e. filter off all unnecessary data, remove all unnecessary columns before the merge, not after).

Depending on the size of your fact table, try to Table.Buffer() it before the merge and test if this improves performance (note that it can actually make it worse).

Kind regards,

JB

Anonymous · ‎11-09-2019

This is a very good tip.

I will try this.

Do you have any pracitcal examples of Table.Buffer() ?

Thanks for responding!

Best,

Ali A

Anonymous · ‎11-09-2019

@Anonymous

When I do the inner join it reads all the row from the fact table and then does the inner join.

Can't this be optimized in some way that it doesn't have to load all of this every single time? I

Best,

Ali A

Anonymous · ‎11-10-2019

Hi @Anonymous

1. Table.Buffer is pretty simple:

let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMkIijcGkCRJpCiNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Column 1" = _t]),
  #"Changed column type" = Table.TransformColumnTypes(Source, {{"Column 1", Int64.Type}}),
  Buffer = Table.Buffer(#"Changed column type")
in
  Buffer

This command force recalculation of the table (just the first layer of scalars, if any cell in the table refer to a list or a function or another table, they are not resolved at this stage) and loads it into memory. Advantage: the table is loaded in the memory and therefore process as quickest as possible. Downside: the size of the memory is finit and big tables force using disk, which makes the etire query run slower. Therefore it requires a real-time testing in eery application. But as a rule of thumb: small tables that are used several/referred to times inside one execustion chain (!) should be buffered as this reduce the number of loads from connector.

2. I am not sure that I can help with your last question, I am not using BigData. If this is SQL like, I think you can prefilter and extract a list of values from your main table to pre-filter the fact table on the server side, but I think this cure may be worse than the disease. What may heppen is the fact table is loaded more than once - as far as I know this happens on merges in some scenarios. Test if buffering the fact table will help.

Kind regards,

John

Anonymous · ‎11-11-2019

Thanks again for a great response.

I am testing some things and made a support case with Microsoft.

I also think there are other things here. What I ended up doing is using native query which seems to work, but makes it a bit more difficult moving between DEV and PROD.

Just have to continue working on it.

Best,

Ali A

Optimizing inner join between dim and fact table

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024