Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
My data model consists of Students, Programs, Applications, Registrations, Graduations, and ECTS. A student apply to multiple programs in several periods (Academic Year + Academic Session). An application can have a registration associated with it, and if all goes well, the student will also graduate. Every year, the student earns a number of credits (ECTS). I may add more tables later, for example a "Grade Point Average" table (which would work the same as the ECTS table).
There's an Excel file with dummy data:
And I have created two PBIX files:
https://www.dropbox.com/s/dai8z6nunjcr5ey/AppRegGrad.pbix?dl=0
https://www.dropbox.com/s/bqwgfo5sy7m1vop/AppRegGrad%20-%20Star.pbix?dl=0
My initial approach was to create a flat (Frankenstein) table. It's very blunt, but it definitely works. I should mention that my real dataset will not be very large (less than a million records). I just left outer join merge everything, and the measures are pretty easy. Probably not a very future proof solution though.
My second attempt was to create a model that I assume is a snowflake. I'm directly linking Applications to Registrations by creating a key (Year+Session+Program+Student). However, the ECTS and Graduations table do not have the Session variable, so I need a different key (Year+Program+Student). On the other side, the Students_v2 does not have a Program variable, so for that I need yet another key (Year+Session+Student). It seems convuluted to me, but visually it looks logical, because there is a flow from admission to registration to graduation. Also, it works. Though I don't know yet how to calculate something such as "Study Length", as it involves variables from two different tables (Registrations.Date, Graduations.Graduation Date). I'm also concerned about using long text strings as keys, as I prefer using integers for keys (I have a php/MySQL background).
You can find these first two designs in https://www.dropbox.com/s/dai8z6nunjcr5ey/AppRegGrad.pbix?dl=0
My final attempt was to create a star schema.
See https://www.dropbox.com/s/bqwgfo5sy7m1vop/AppRegGrad%20-%20Star.pbix?dl=0
These seem to be recommended a lot, and it should be the way to go unless you have some good reasons not to use it. So far my reason is that I can't figure out how to make it work, but I don't think that's a good reason.
The trick here is to create one or more common dimensions. In this case Calendar and Programs.
However... my Students_v2 table does not have a Program variable, and the Date is not unique. Plus my ECTS and Graduations tables do not have the Academic Session variable. I could add those variables, but that means grabbing the data from the Applications table, probably through some merging.
I know I can take the easy way out because my dataset is small. But I'd like to do it the right way.
What would be the best approach here? And just in case the best approach does not involve a star schema, would it still be possible to use a star schema, or is "forcing" a common dimension a bad idea?
My question is related to:
https://community.powerbi.com/t5/Desktop/Relationship-between-Fact-to-Fact-tables/m-p/648556
Solved! Go to Solution.
I would start with a model like this and then make adjustments if needed.
Many thanks for pointing me in the right direction. Your answer along with some Celtic music from Adrian von Ziegler, helped me understand where I went wrong with my star schema.
Downloads:
https://www.dropbox.com/s/eh7uhgo2luo7h40/AppRegGrad_v2.png?dl=0
https://www.dropbox.com/s/2xlvjkwjdmw2som/AppRegGrad_v2.pbix?dl=0
Made a mistake. I wrote that I could add my Students_v2 data (Gender, Nationality) to the Applications data, but that does NOT work. Why not? Because it will not filter through to the other tables such as Registrations. In order to make this work, we once again have to create a common dimension. In the case of Students_v2 we need to build a key out of the Student Number and the date. Remove all other columns except Gender, and Nationality. And then deduplicate. Add the same keys to Applications, Registrations, Graduations, and ECTS (basically every table where you want to apply the Gender or Nationality filter to).
Star schema is best pratice but I cant see See https://www.dropbox.com/s/bqwgfo5sy7m1vop/AppRegGrad%20-%20Star.pbix?dl=0
Please check the link is shared
Dropbox informed me that PBIX files can't be previewed. So the only option is to download the file. Apologies for the inconvenience. I will add some screenshots next time I have a model related question.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
97 | |
86 | |
70 | |
62 |
User | Count |
---|---|
151 | |
120 | |
103 | |
87 | |
68 |