Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Best practice - INNER JOIN or LEFT OUTER INNER when dim table is missing key (OLTP to OLAP)

Hi,

 

https://www.dropbox.com/s/h0o64n50lj0b6w6/OLTPtoOLAP.pbix?dl=0

Please take a look at my .pbix file in the dropbox (link to file above).

 

I have taken a basic OLTP schema and an attempted to create an OLAP schema to optimise the data for Power BI reporting. I have a question on what is the best practice in creating the data model.

The OLTP tables (Clients, Colors, Countries, InvoiceLines, Invoices, Stock) are loaded without 'enable load'. I merged the InvoiceLine table with the Invoices table via an INNER JOIN on 'InvoiceID'. I used an INNER JOIN as there really shouldn't be any Invoices without InvoiceLines and visa-versa. This new table is the FACT table. Through looking at the applied steps you can see what I've done.

 

My question relates to this, to produce a Country dimension I needed to take the CountryID from the CLIENTS table and place it in the FACT table. CountryID is taken from the CLIENTS table, rather then the COUNTRIES table, as there is a direct join between the FACT table and CLIENT table. Through looking at the applied steps you will see I have merged the FACT table and CLIENTS table based on 'clientID'. Now the FACT table has the CountryID column present to be able to look up against the COUNTRY table/dimension.

The FACT table has an entry/row with the 'clientid' column having a value of 32, but the CLIENTS tables doesn't have an entry for a client with the clientid of 32. I presume this is bad data and no constraints use.

If I do this merge on a INNER JOIN on clientid then the entry in the FACT table with client32 is removed from the FACT table going forward, due to the way the INNER JOIN works.

The other option is if I merge using a LEFT OUTER JOIN on the FACT table the FACT table entry/row with clientid of 32 will remain in the FACT table but could never be sliced via the CLIENT dimension due to 32 not being in the client. I guess the CLIENT dimension will show the slicer as blank or unknown (depending on if I replaced the values).

 

What is the correct practice here:

1) If I use an INNER JOIN the measure in the FACT table will disappear which distorts the numbers/calculation/measures. I don't think this is clever and as you will removing valuable information.

2) If I use the LEFT OUTER JOIN then the measure can be used in the FACT table via another slicer (i.e. date).

 

Hopefully you get my drift here. What is the correct thing to do?  Any tips?
Thanks.

Note: this is the not the finished report, I know I need to remove columsn I've moved to the FACT table, but it is merely a version to demonstrate my question.

3 REPLIES 3
v-huizhn-msft
Employee
Employee

Hi @Anonymous,

This link is invalid, I am still confusing your requirement without .pbix file. Please update it for further analysis.

Best Regards,
Angelia

Anonymous
Not applicable

Hmm, it must have disappeared over the weekend.

 

@v-huizhn-msft please try this link:

https://www.dropbox.com/s/13rdc93dd8q9ejv/OLTPtoOLAP.pbix?dl=0

 

Hi @Anonymous,

If you want to keep valuable information, you'd better use LEFT OUTER INNER. You can create two new tables, one using INNER JOIN, another one using LEFT OUTER INNER, then you can create report based on one of them.

Best Regards,
Angelia

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.