In the dashboard above I analyzed the data from the teams performance during the group stage of the world-cup to predict the winner of each game during the round of 16.
The Data
I collected the data from
https://www.foxsports.com/soccer/fifa-world-cup/stats
After careful consideration of which features to use I decided on the features in the table below to build my machine learning model (Some of my past soccer experience played a role on deciding which data is more important)
The Algorithm
I used Microsoft Azure Machine Learning Studio to create the model. It was a trial and error experiment to decide which regression model will give the right prediction values. I decided to use decision forest regression model from Microsoft Azure Machine Learning Studio algorithms.
Decision trees are non-parametric models that perform a sequence of simple tests for each instance, traversing a binary tree data structure until a leaf node (decision) is reached.
Decision trees have these advantages:
- They are efficient in both computation and memory usage during training and prediction.
- They can represent non-linear decision boundaries.
- They perform integrated feature selection and classification and are resilient in the presence of noisy features.
This regression model consists of an ensemble of decision trees. Each tree in a regression decision forest outputs a Gaussian distribution as a prediction. An aggregation is performed over the ensemble of trees to find a Gaussian distribution closest to the combined distribution for all trees in the model.
eyJrIjoiMmI4NmVhYWEtYzUwMC00MjZkLWFmYjgtODAwNTNmOWE4Njg4IiwidCI6Ijg2M2EzNGYwLWM5NmEtNDA3NC04MDk3LWEwNGM1YWE0M2NjYiIsImMiOjN9