Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Doranov
Regular Visitor

Flatten, Snow, Star ("forcing" a common dimension?)

My data model consists of Students, Programs, Applications, Registrations, Graduations, and ECTS. A student apply to multiple programs in several periods (Academic Year + Academic Session). An application can have a registration associated with it, and if all goes well, the student will also graduate. Every year, the student earns a number of credits (ECTS). I may add more tables later, for example a "Grade Point Average" table (which would work the same as the ECTS table).

 

There's an Excel file with dummy data:

https://www.dropbox.com/scl/fi/nvgg829jqrfb22wpyr1k3/AppRegGrad.xlsx?dl=0&rlkey=5mj7m16zr38erlaljtoo...

And I have created two PBIX files:

https://www.dropbox.com/s/dai8z6nunjcr5ey/AppRegGrad.pbix?dl=0

https://www.dropbox.com/s/bqwgfo5sy7m1vop/AppRegGrad%20-%20Star.pbix?dl=0

 

My initial approach was to create a flat (Frankenstein) table. It's very blunt, but it definitely works. I should mention that my real dataset will not be very large (less than a million records). I just left outer join merge everything, and the measures are pretty easy. Probably not a very future proof solution though.

 

My second attempt was to create a model that I assume is a snowflake. I'm directly linking Applications to Registrations by creating a key (Year+Session+Program+Student). However, the ECTS and Graduations table do not have the Session variable, so I need a different key (Year+Program+Student). On the other side, the Students_v2 does not have a Program variable, so for that I need yet another key (Year+Session+Student). It seems convuluted to me, but visually it looks logical, because there is a flow from admission to registration to graduation. Also, it works. Though I don't know yet how to calculate something such as "Study Length", as it involves variables from two different tables (Registrations.Date, Graduations.Graduation Date). I'm also concerned about using long text strings as keys, as I prefer using integers for keys (I have a php/MySQL background).

 

You can find these first two designs in https://www.dropbox.com/s/dai8z6nunjcr5ey/AppRegGrad.pbix?dl=0

 

My final attempt was to create a star schema.

See https://www.dropbox.com/s/bqwgfo5sy7m1vop/AppRegGrad%20-%20Star.pbix?dl=0

These seem to be recommended a lot, and it should be the way to go unless you have some good reasons not to use it. So far my reason is that I can't figure out how to make it work, but I don't think that's a good reason.

The trick here is to create one or more common dimensions. In this case Calendar and Programs.

However... my Students_v2 table does not have a Program variable, and the Date is not unique. Plus my ECTS and Graduations tables do not have the Academic Session variable. I could add those variables, but that means grabbing the data from the Applications table, probably through some merging.

 

I know I can take the easy way out because my dataset is small. But I'd like to do it the right way.

What would be the best approach here? And just in case the best approach does not involve a star schema, would it still be possible to use a star schema, or is "forcing" a common dimension a bad idea?

 

My question is related to:

https://community.powerbi.com/t5/Desktop/Relationship-between-Fact-to-Fact-tables/m-p/648556

https://community.powerbi.com/t5/Desktop/Advice-required-on-the-Pros-and-Cons-on-normalising-a-very-...

 

1 ACCEPTED SOLUTION
AlexisOlson
Super User
Super User

I would start with a model like this and then make adjustments if needed.

AlexisOlson_0-1638029332247.png

View solution in original post

5 REPLIES 5
AlexisOlson
Super User
Super User

I would start with a model like this and then make adjustments if needed.

AlexisOlson_0-1638029332247.png

Many thanks for pointing me in the right direction. Your answer along with some Celtic music from Adrian von Ziegler, helped me understand where I went wrong with my star schema.

 

  • My Students_v2 table was basically a copy of Applications, except with different variables. So I should either add those to Applications, or use the v1 version where the Student Number variable is unique. That does involve making a choice; do I take the initial or the last known values? To clarify, a student's nationality can change over time. My v2 table supports those changes where the v1 table does not.
  • In theory I could also merge ECTS with Registrations, as there is a 1-to-1 (logical) relationship. A student can only have credits if the student has a registration, and each registration always has credits. So having a separate table for this might be overkill. To make it work as a separate table, I had to add a "Registration Date" variable. This is because of that pesky Academic Session.
  • The Graduations table also needed an extra date variable. One may wonder why, because there's already a Graduate Date. However, I need a date that corresponds to the initial registration. I called it the Priority Date (stole this term from a patent database). This way we can answer the question: "How many students who started in January 2019 in program 1000 graduated?". We can even easily calculate the Study Length. It's the Graduation Date minus the Priority Date.
  • So where does this Priority Date come from? It's calculated on the Registrations data. This time I cheated and added it to the Excel file. But you can also get it with a simple Group By and taking the minimum date.

Downloads:

https://www.dropbox.com/s/eh7uhgo2luo7h40/AppRegGrad_v2.png?dl=0

https://www.dropbox.com/scl/fi/wrtg696a0v8mtll2y0qhg/AppRegGrad_v2.xlsx?dl=0&rlkey=br1lbq0j71s2yds87...

https://www.dropbox.com/s/2xlvjkwjdmw2som/AppRegGrad_v2.pbix?dl=0

 

Made a mistake. I wrote that I could add my Students_v2 data (Gender, Nationality) to the Applications data, but that does NOT work. Why not? Because it will not filter through to the other tables such as Registrations. In order to make this work, we once again have to create a common dimension. In the case of Students_v2 we need to build a key out of the Student Number and the date. Remove all other columns except Gender, and Nationality. And then deduplicate. Add the same keys to Applications, Registrations, Graduations, and ECTS (basically every table where you want to apply the Gender or Nationality filter to).

speedramps
Super User
Super User

Star schema is best pratice but I cant see See https://www.dropbox.com/s/bqwgfo5sy7m1vop/AppRegGrad%20-%20Star.pbix?dl=0 

 

Please check the link is shared

 

Dropbox informed me that PBIX files can't be previewed. So the only option is to download the file. Apologies for the inconvenience. I will add some screenshots next time I have a model related question.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.