The dataset I am using is training course for students and corporate. The original data has 3 tables separated by individual program. The purpose of my visualization is to analyze the 3 programs from all students in a single dashboard.
Here is the original data after being imported to Power BI:
Here is the data pre-processing:
- Remove unneeded column
DPT table
Remove column – No, Date, Quarter DTP table Remove column – Count, Email, Date LLD table Remove column – Email, To calculate, Learning Hours
- Rename column name & impute missing value with “Not given”
DPT table
Trainee = Name, Training Provider = Provider, Course name = Course, Focus area = Domain DTP table Participant Name = Name, Event/Training Name = Course, Training providers = Provider
- Create new column and impute them with “No given” and put in same position (for append tables later)
DTP table Level LLD table Company, Provider, Level
- Create a new column called Program and impute value as it program name for each row.
Post cleaned table:
After appending the 3 tables and calling it Master:
Then, I duplicate the Master table to create Student, Provider and Program table. In each table remove irrelevant columns, remove duplicates and create unique ID.
Final data model:
The focus is the Program, Provider and Student tables. The rest of the tables will be deactivated the relationship when creating calculated columns and measures before I make any correction to the data model.
The Provider table can be connected to Student and Program tables using provider column.
Is there any proper approach to build the data model?
From my data model in the last picture, does it mean that the Provider table is a fact table while the Student and Program tables are dimensions?