Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Optimizing inner join between dim and fact table

Hi,


I am fetching some data with the Google BigQuery connector and doing an inner join between a customer dim table and the fact table.

 

This is taking some time and I want to hear if anyone have any best practice tips for what can make it go faster? I have looked into Native Query and that doesn't work very well with the BigQuery connector, but I want to hear if anyone else have any tips or ideas.

 

Best,

 

Ali A

5 REPLIES 5
Anonymous
Not applicable

Hi @Anonymous ,

 

depends on how you do the merge - assuming in PQ, not on the server side via SQL or alike, this can help a bit:

https://blog.crossjoin.co.uk/2018/03/16/improving-the-performance-of-aggregation-after-a-merge-in-power-bi-and-excel-power-query-gettransform/

 

Also try to reduce the size of the tables being merged to a bare minimum (i.e. filter off all unnecessary data, remove all unnecessary columns before the merge, not after).

 

Depending on the size of your fact table, try to Table.Buffer() it before the merge and test if this improves performance (note that it can actually make it worse).

 

Kind regards,

JB

Anonymous
Not applicable

This is a very good tip.


I will try this.

 

Do you have any pracitcal examples of Table.Buffer() ?


Thanks for responding!

 

Best,

 

Ali A

Anonymous
Not applicable

@Anonymous 

 

When I do the inner join it reads all the row from the fact table and then does the inner join.

 

Can't this be optimized in some way that it doesn't have to load all of this every single time? I

 

Best,

 

Ali A

Anonymous
Not applicable

Hi @Anonymous 

 

1. Table.Buffer is pretty simple:

let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMkIijcGkCRJpCiNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Column 1" = _t]),
  #"Changed column type" = Table.TransformColumnTypes(Source, {{"Column 1", Int64.Type}}),
  Buffer = Table.Buffer(#"Changed column type")
in
  Buffer

This command force recalculation of the table (just the first layer of scalars, if any cell in the table refer to a list or a function or another table, they are not resolved at this stage) and loads it into memory. Advantage: the table is loaded in the memory and therefore process as quickest as possible. Downside: the size of the memory is finit and big tables force using disk, which makes the etire query run slower. Therefore it requires a real-time testing in eery application. But as a rule of thumb: small tables that are used several/referred to times inside one execustion chain (!) should be buffered as this reduce the number of loads from connector.

 

2. I am not sure that I can help with your last question, I am not using BigData. If this is SQL like, I think you can prefilter and extract a list of values from your main table to pre-filter the fact table on the server side, but I think this cure may be worse than the disease. What may heppen is the fact table is loaded more than once - as far as I know this happens on merges in some scenarios. Test if buffering the fact table will help.

 

Kind regards,

John

Anonymous
Not applicable

Thanks again for a great response.

 

I am testing some things and made a support case with Microsoft.

 

I also think there are other things here. What I ended up doing is using native query which seems to work, but makes it a bit more difficult moving between DEV and PROD.

 

Just have to continue working on it.

 

Best,

 

Ali A

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors