Showing results for 
Search instead for 
Did you mean: 
Frequent Visitor

Which columns to merge on while creating star schema data model from flat file (excel/csv)



I'm having a doubts regarding process of creating star schema from flat file. I checked out Guy In The Cube tutorial on creating Dims table with index column but he does it on only one column. Later he merge main table which will be fact table on this one column and then joins the ID column from created DIM table.


In real life situation we would perform this operation on multiple dimension columns and then how should we perform merging index column from dim to fact table? I've seen tutorials where people performed join on lowest granularity column (for example ZIP Code in Geo Dim) and it seems fine but what if future data would be incorrect having one ZIP in 2 regions? Shouldn't we perfrom merging on all of the columns from dim to fact? 


Super User III
Super User III

Ideally you want to join on index columns that have no semantics attached to them (basically a list of integers). But in the real world that is often wishful thinking. Create your composite keys as needed (for example ISO country code plus postal/zip code) and use these.  Keep the key size as small as you can.


As for the hierarchies - that's totally up to you. You can spend a lot of effort to create and maintain hierarchies, or you can let the data model do the work for you.

Helpful resources

PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!


Are You Ready?

Test your skills now with the Cloud Skills Challenge.

Top Solution Authors