Showing results for 
Search instead for 
Did you mean: 
Frequent Visitor

Which columns to merge on while creating star schema data model from flat file (excel/csv)



I'm having a doubts regarding process of creating star schema from flat file. I checked out Guy In The Cube tutorial on creating Dims table with index column but he does it on only one column. Later he merge main table which will be fact table on this one column and then joins the ID column from created DIM table.


In real life situation we would perform this operation on multiple dimension columns and then how should we perform merging index column from dim to fact table? I've seen tutorials where people performed join on lowest granularity column (for example ZIP Code in Geo Dim) and it seems fine but what if future data would be incorrect having one ZIP in 2 regions? Shouldn't we perfrom merging on all of the columns from dim to fact? 


Super User III
Super User III

Ideally you want to join on index columns that have no semantics attached to them (basically a list of integers). But in the real world that is often wishful thinking. Create your composite keys as needed (for example ISO country code plus postal/zip code) and use these.  Keep the key size as small as you can.


As for the hierarchies - that's totally up to you. You can spend a lot of effort to create and maintain hierarchies, or you can let the data model do the work for you.

Helpful resources

PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group

April Update

Check it Out!

Click here to read more about the April 2021 Updates!


The largest Power BI virtual conference

100+ sessions, 100+ speakers, Product managers, MVPs, and experts. All about Power BI. Attend online or watch the recordings.


Experience what’s next for Power BI

See the latest Power BI innovations, updates, and demos from the Microsoft Business Applications Launch Event.