Which columns to merge on while creating star schema data model from flat file (excel/csv)
I'm having a doubts regarding process of creating star schema from flat file. I checked out Guy In The Cube tutorial on creating Dims table with index column but he does it on only one column. Later he merge main table which will be fact table on this one column and then joins the ID column from created DIM table.
In real life situation we would perform this operation on multiple dimension columns and then how should we perform merging index column from dim to fact table? I've seen tutorials where people performed join on lowest granularity column (for example ZIP Code in Geo Dim) and it seems fine but what if future data would be incorrect having one ZIP in 2 regions? Shouldn't we perfrom merging on all of the columns from dim to fact?
Ideally you want to join on index columns that have no semantics attached to them (basically a list of integers). But in the real world that is often wishful thinking. Create your composite keys as needed (for example ISO country code plus postal/zip code) and use these. Keep the key size as small as you can.
As for the hierarchies - that's totally up to you. You can spend a lot of effort to create and maintain hierarchies, or you can let the data model do the work for you.