Solved: Re: Question on building a proper data model

Hi, I would like to know whether my data modeling is working for Power BI.

The dataset I am using is training course for students and corporate. The original data has 3 tables separated by individual program. The purpose of my visualization is to analyze the 3 programs from all students in a single dashboard.

Here is the original data after being imported to Power BI:

original data.png

Here is the data pre-processing:

- Remove unneeded column

DPT table

Remove column – No, Date, Quarter DTP table Remove column – Count, Email, Date LLD table Remove column – Email, To calculate, Learning Hours

- Rename column name & impute missing value with “Not given”

DPT table

Trainee = Name, Training Provider = Provider, Course name = Course, Focus area = Domain DTP table Participant Name = Name, Event/Training Name = Course, Training providers = Provider

- Create new column and impute them with “No given” and put in same position (for append tables later)

DTP table Level LLD table Company, Provider, Level

- Create a new column called Program and impute value as it program name for each row.

Post cleaned table:

post cleaned table.png

After appending the 3 tables and calling it Master:

post append.png

Then, I duplicate the Master table to create Student, Provider and Program table. In each table remove irrelevant columns, remove duplicates and create unique ID.

Final data model:

Final data model.png

The focus is the Program, Provider and Student tables. The rest of the tables will be deactivated the relationship when creating calculated columns and measures before I make any correction to the data model.

The Provider table can be connected to Student and Program tables using provider column.

Is there any proper approach to build the data model?

From my data model in the last picture, does it mean that the Provider table is a fact table while the Student and Program tables are dimensions?

1 ACCEPTED SOLUTION

Please see my response, keeping in mind that I have not seen your data. There are probably a few variations on your model that could work well.

1. In the query editor, you shouldn't have to worry about circular references. You could merge Provider into your Program table.

2. The Master table would be your fact table and you could have two dimension tables (student and Program).

3. Transforming your 3 tables and appending them is a good thing to do. Just disable load on your original 3 tables.

4. I think having Program table is good, but it looks like provider is just an attribute of Program, so I would merge that in. Your Student table should not have the Provider column or any other Program metadata.

5. You can call the Master table whatever you want, but I expect it is an Assignments table. Each row connects a student to a Program, correct?

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

View solution in original post

4 REPLIES 4

Please see my response, keeping in mind that I have not seen your data. There are probably a few variations on your model that could work well.

1. In the query editor, you shouldn't have to worry about circular references. You could merge Provider into your Program table.

2. The Master table would be your fact table and you could have two dimension tables (student and Program).

3. Transforming your 3 tables and appending them is a good thing to do. Just disable load on your original 3 tables.

5. You can call the Master table whatever you want, but I expect it is an Assignments table. Each row connects a student to a Program, correct?

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

Hi @mahoneypat ,

Thank you for the explanation.

I hope to learn from you and hope you could check on the data and advise me better. I have sent you a private message in here for the modfied dataset.

Thank you and looking forward to your reply.

Without seeing your actual data, here are some comments:

It is a good thing to transform and then append the 3 tables. However, once you do that, you can disable load on the original 3 tables as they are no longer needed (right click on the query in query editor and uncheck 'Enable load'.
Instead of duplicating the Master query, you can choose reference instead (also a right click option). This will improve overall refresh performance as you are not repeating the same transformations multiple times.
It is also good to create dimension tables vs. one big table, but you just need to keep the relationship keys in your fact table (e.g., you don't need Provider, just Provider ID).
Also, I would merge Provider into your Program table, and relate Program to master. In then end, I think you need 2 dim tables (Program and Student) and the one fact table (Master). But there is no one right answer as long as you make a star schema that enables the analysis you need.

Pat

Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.

@mahoneypa HoosierBI on YouTube

@mahoneypat

Thanks for the reply.
I would like to confirm few things.
1. I have Course ID in Program table and Student ID in Student table, I cannot merge the IDs in both tables to prevent circular relationship. Is this correct?

2. In star schema, the middle table is fact. In my final data model, the Provider table is fact but it does not contain any measure/data type in numerical form that allow me to do aggregation. Is this consider as fact?

3. Is there any best practise or method to start off to build data model from multiple identical data source (for example Excel from entity A, B, C and etc) because each excel is recorded by different people/department.

4. In your fourth suggestion, would it be advisorable to seperate the individual table from a wide big table based on their theme? Thus, having Provider and Program tables is advisable?

5. Still on your fourth suggestion, I may also can have Student, Provider, Program tables and have relationship with Master table by individual key on it to make it as a fact table. However, what is the Master table should I called it since Master is meaningless to the table it is just the result of the appending the 3 individual tables.