Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ptphuy
Frequent Visitor

Benefits and Issues of Snowflake schema vs Star schema

Hello everyone,

 

Currently, I have star schema in my data model which contains 1 fact table with 5 dimensions (& hierarchy in each dimention). With this model, the fact table contains too many rows (47mil rows) because it has to map to the lowest row of the hierarchy in each dimension. Therefore, I would like to redesign my model to snowflake schema which (I think) will reduce the amount of rows in the fact table because the fact table only need to map with the highest row of the hierarchy in the dimension. My question is that when I design my data model in snowflake schema, Is Power BI able to drill up and drill down value in the hierarchy of dimension? (For example, if I have hierarchy with 3 levels, I will break them down into 3 tables - the highest level will be mapped with fact table, the second level will be mapped with the first level and the third level will be mapped with the second level). In addition, Is there any disadvantages of snowflake schame in comparison with star schema? Please help. Thank you so much!

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Start by removing the fact_row_id , this will greatly reduce model size and is not necesarry because the rest of the columns are joined to dimension tables. Try it and see the result!

 

The drill down is no problem, you can even create a drill down path for non-related dimensions in visuals. 

 

 

View solution in original post

8 REPLIES 8
Anonymous
Not applicable

Hi ptphuy,

 

beforing considering changing to a snowflake model I would spent some time finding out why your current model should be rebuild? Is there s performance issue? The reason I ask is that 47mil rows shouldn't be that big of a problem in Power BI. Because the engine will store unique column values there is a high compression on your data model. Also, the tabular model will store the data in memory so please check if you have enough memory.

 

Do you have columns with a lot of unique values? So for example an index, autonumber or concatenation of multiple columns in one column? Removing these can greatly improve performance. Also, do you use complex calculations or do you encouter issues with a simple SUM measure?

 

Switching to a snowflake means you get a lot more joins, that can decrease performance. 

 

Please share the reason why you consider changing your model. 

Hi @Anonymous,

 

Thanks for your advice. I would like to answer your questions below:

 

Beforing considering changing to a snowflake model I would spent some time finding out why your current model should be rebuild? Is there s performance issue? The reason I ask is that 47mil rows shouldn't be that big of a problem in Power BI. Because the engine will store unique column values there is a high compression on your data model. Also, the tabular model will store the data in memory so please check if you have enough memory. --> The reason why I rebuild my data model is that actually I have 5-6 fact tables which contain around 47mil rows up to 120mil rows. I need to combine them into one single dataset, and this leads to exceeding file size upload capacity of Power BI. I believe that the file size of dataset will be reduced by rebuilding data model (which will reduce number of rows of fact tables).

 

Do you have columns with a lot of unique values? So for example an index, autonumber or concatenation of multiple columns in one column? Removing these can greatly improve performance. Also, do you use complex calculations or do you encouter issues with a simple SUM measure? --> currently, fact tables only contains raw data with unique row_id for each dimension. (row_id is the lowest level of hierarchy in each dimension)

For example:

fact_row_id   dim_product_row_id    dim_customer_row_id  .....

1                    1                                  1

2                    2                                  1

3                    3                                  2

4                    4                                  4

 

Switching to a snowflake means you get a lot more joins, that can decrease performance. --> I agree with this, but I am not sure that Power BI support drill up & drill down the hierarchy when switching to a snowflake.

 

Please share the reason why you consider changing your model. 

Anonymous
Not applicable

Start by removing the fact_row_id , this will greatly reduce model size and is not necesarry because the rest of the columns are joined to dimension tables. Try it and see the result!

 

The drill down is no problem, you can even create a drill down path for non-related dimensions in visuals. 

 

 

Hi @Anonymous,

 

I would like to ask you one more question.

 

Start by removing the fact_row_id , this will greatly reduce model size and is not necesarry because the rest of the columns are joined to dimension tables. Try it and see the result!

--> After few days, I tried to rebuild my data model and actually snowflake schema didnt reduce the number of rows in the fact table. Because fact table still has to map with the lowest level of hierarchy in the dimension. I cant remove the fact_row_id because it represents each transaction of a day. So my question is if I dont remove fact_row_id, the number of the fact table is still the same in both star and snowflake schema, right?

Anonymous
Not applicable

That's correct, you still get the same detail level.

 

About the fact_row_id, is it connected to a dimension? So does it store valuable business information? If it is only used to count the transactions you could remove it and calculate the number of rows based on the dimensions and store that in a column. If this key represents a ordernumber or something this won't apply but it looked like it holds a technical number. 

 

It would look like this:

 

dim_product_row_id    dim_customer_row_id  ..... Transactions

1                                  1                                        1 

2                                  1                                        1

3                                  2                                        1

4                                  4                                        1

 

Knowing that unique column values are stored, this removes the cardinality of the fact_row_id.

Actually each fact_row_id has different transaction day, so that's why even I remove the fact_row_id, the number of rows in the fact table doesnt decrease. Thank you for your advice 🙂

Hi @Anonymous,

 

Thank you for your help. I will rebuild my data model and test it 🙂

Anonymous
Not applicable

Thanks, hope you succeed! Feel free to contact me if you have any further questions. Send me a PM if you want to take a look at the model itself. 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.