Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello everybody
I have a question about data modelling and could not find a best practice how to handle it.
I wonder if it is better to create a broad fact table and more dimensions or vice versa.
By broad fact table with many dimensions I mean something like this:
The alternative would be just a slim fact table, but broader dimension tables:
Of course, our real world uses cases are a lot more complex than this small demo, however I think it shows the point.
My thoughts for model 1:
- Easier for report creators to find the right fields
- Easier to change or enhance
My thoughts for model 2:
- Use of Power BI hierarchies is easier
- Thanks to Display folders, it can still be very well organized
Have you got any literature about this or any best practices you would like to share.
It would be very welcome
This is a really broad question but the short answer is that generally, too many dimensions is bad. You can have many different dimensions but as you get near 20 or 30 total, you almost definitely have better modeling options at the source level. 1 dimension with 20 attributes is good, 20 dimensions with 1 attribute each is not so good. The problems that come with high numbers of dimensions are that it can create ambiguity and complicate slicing/filtering/relationships between tables as your model gets more complex.
I would highly recommend the SQLBI courses (intro courses are free and still contain tons of useful info) and any of their articles. They cover top level data modeling pretty well.. https://www.sqlbi.com/p/introduction-to-data-modeling-for-power-bi-video-course/
All right, thank you. Your answer mostly matches my experience and intuition.
But let's get it a little bit further and say we have two fact tables: One for actual sales and one for planned sales. However, the plan table is not as granular as the actual sales and only plans amount for month, team and customer. I added the green lines for these relationships
Problem with these green relationships: they are all many-to-many. So what would be a good way here. My thoughts for different options:
1. Extract the plan-dimensions from the dimension table and give them each their own table (which would look similar to my first approach in the original post)
2. Create a dummy entry in the dimension tables. This only has the information about the planned dimensions and NULLs for the unused. For example:
TeamID | TeamName | EmployeeId | EmployeeName
1 | Team A | 999 | Planning Dummy
This way, I still can compare actual vs plan
3. Somehow put the planning into the original fact table.
4. Ditch the whole star schema and go back to a snowflake.
Btw. I am looking more for a general discussion and tips, not for a solution for this special case. My actual use case is way bigger, but isn't it always 😉
Thanks and best regarss
Patrick
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
94 | |
82 | |
66 | |
58 |
User | Count |
---|---|
151 | |
121 | |
104 | |
87 | |
67 |