Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
11-13-2022 13:17 PM - last edited 11-13-2022 13:19 PM
Analyzing the Titanic Kaggle dataset to identify which class was most likely to survive the Titanic disaster. Data source: DataDNA October Challenge
Provided data is a themed dataset and may not be accurate against the actual events of the Titanic disaster. The titanic sample dataset records only 418 passengers’ data.
Identify which class was most likely to survive.
In order to know how well each characteristic correlates with survival, I decided to approach the problem based on the characteristics available in the dataset.
Analyzed the given dataset for errors or possibly inaccurate values within characteristics and tried to correct those values or excluded the samples containing the errors.
Age data in this dataset is incomplete. It contains 86 null values which will not be included while measuring age correlation with survival rate. The cabin and ticket features are dropped during the analysis because of fewer data points.
eyJrIjoiNDBmOTUwMGYtOGY2NC00ZjhlLWFmN2EtMWFhMmYzODBiOWM2IiwidCI6IjdkOThkYTIxLWY2YTktNDAwNC04Y2VhLTY5MmJkNTA2M2M2YSIsImMiOjl9&pageName=ReportSectione703a700ceca445ea4dc