cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
aliahmad Regular Visitor
Regular Visitor

Optimizing inner join between dim and fact table

Hi,


I am fetching some data with the Google BigQuery connector and doing an inner join between a customer dim table and the fact table.

 

This is taking some time and I want to hear if anyone have any best practice tips for what can make it go faster? I have looked into Native Query and that doesn't work very well with the BigQuery connector, but I want to hear if anyone else have any tips or ideas.

 

Best,

 

Ali A

5 REPLIES 5
jborro Member
Member

Re: Optimizing inner join between dim and fact table

Hi @aliahmad ,

 

depends on how you do the merge - assuming in PQ, not on the server side via SQL or alike, this can help a bit:

https://blog.crossjoin.co.uk/2018/03/16/improving-the-performance-of-aggregation-after-a-merge-in-po...

 

Also try to reduce the size of the tables being merged to a bare minimum (i.e. filter off all unnecessary data, remove all unnecessary columns before the merge, not after).

 

Depending on the size of your fact table, try to Table.Buffer() it before the merge and test if this improves performance (note that it can actually make it worse).

 

Kind regards,

JB

aliahmad Regular Visitor
Regular Visitor

Re: Optimizing inner join between dim and fact table

This is a very good tip.


I will try this.

 

Do you have any pracitcal examples of Table.Buffer() ?


Thanks for responding!

 

Best,

 

Ali A

aliahmad Regular Visitor
Regular Visitor

Re: Optimizing inner join between dim and fact table

@jborro 

 

When I do the inner join it reads all the row from the fact table and then does the inner join.

 

Can't this be optimized in some way that it doesn't have to load all of this every single time? I

 

Best,

 

Ali A

jborro Member
Member

Re: Optimizing inner join between dim and fact table

Hi @aliahmad 

 

1. Table.Buffer is pretty simple:

let
  Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlSK1YlWMkIijcGkCRJpCiNjAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"Column 1" = _t]),
  #"Changed column type" = Table.TransformColumnTypes(Source, {{"Column 1", Int64.Type}}),
  Buffer = Table.Buffer(#"Changed column type")
in
  Buffer

This command force recalculation of the table (just the first layer of scalars, if any cell in the table refer to a list or a function or another table, they are not resolved at this stage) and loads it into memory. Advantage: the table is loaded in the memory and therefore process as quickest as possible. Downside: the size of the memory is finit and big tables force using disk, which makes the etire query run slower. Therefore it requires a real-time testing in eery application. But as a rule of thumb: small tables that are used several/referred to times inside one execustion chain (!) should be buffered as this reduce the number of loads from connector.

 

2. I am not sure that I can help with your last question, I am not using BigData. If this is SQL like, I think you can prefilter and extract a list of values from your main table to pre-filter the fact table on the server side, but I think this cure may be worse than the disease. What may heppen is the fact table is loaded more than once - as far as I know this happens on merges in some scenarios. Test if buffering the fact table will help.

 

Kind regards,

John

aliahmad Regular Visitor
Regular Visitor

Re: Optimizing inner join between dim and fact table

Thanks again for a great response.

 

I am testing some things and made a support case with Microsoft.

 

I also think there are other things here. What I ended up doing is using native query which seems to work, but makes it a bit more difficult moving between DEV and PROD.

 

Just have to continue working on it.

 

Best,

 

Ali A

Helpful resources

Announcements
New Topics Started Badges Coming

New Topics Started Badges Coming

We're releasing new versions of the badge that everyone's talking about. ;) Check your inbox for notifications.

MBAS 2020

Save the new date (and location)!

Our business applications community is growing—so we needed a different venue, resulting in a new date and location. See you there!

Difinity Conference

Difinity Conference

The largest Power BI, Power Platform, and Data conference in New Zealand

Top Solution Authors
Top Kudoed Authors (Last 30 Days)