a data science blog

A Tale of Titanic from the eyes of ‘’Data”

For this analysis I used a data set published on http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/DataSets

The data has information about 1309 passengers of the Titanic who were on board. The list excludes the crew members. The data contains various attributes about passengers like age , class , gender , fare etc.

I used R to analyze and visualize this data. The data had missing age values for 263 passengers. I have used a regression tree model to replace these missing values.

Being women , having age less than 18 and having a ticket of class 1 were some of the factors that helped people on Titanic to be the luckiest ones to survive

15th April 1912, a British passenger liner, Titanic, travelling from Southampton, UK to New York City, USA collided with an Iceberg and as the giant ship broke apart lives of 1309 passengers who were aboard suddenly came in huge trouble. The ship was carrying people from different classes of society, different age groups, and many people were on board with their family members including parents , siblings , spouse ,and children. What followed to this was one of the greatest mishaps of the century and of the 1309 people aboard only 500 could manage to survive .What really helped these 500 people to save their lives and be the lucky ones to escape the death? Let’s see what the “Data” reveals.

Of the 1309 people aboard, 466 were female, which is approximately 36 % of the total people embarked and around 64 % that is 843 were male.

The ship was carrying people from all ages with the smallest child being 2 months old and the oldest person being 80 years old.

To find more insights about the distribution of age and survival of people in that age range , I binned the ages in three categories as people below 18 years of age were labeled ‘Children’ , people above 60 years of age were labeled ‘Senior Citizen ‘ and rest were termed ‘Adult’. Below we can see the number of people aboard in each of the 3 categories.

Similarly I Analyzed the number of people who embarked from different cities and different classes.

As we can see , maximum people embarked in Southampton , followed by Cherbourg and Queenstown. Also if we look at number of people who were located in different classes , and their gender distribution as shown in figure below , we can see class 3 had highest number of people of both the genders followed by class 1 and class 2.

So after looking at all these descriptive statistics regarding people who embarked on Titanic , lets look at some of the factor that contributed towards the survival of the 500 people.

So what made these 500 people lucky……..

Gender

As we can see from above graph , out of 466 females on board , 339 females survived , which is approximately 73 % survival rate , where as out of 843 males on the ship , only 161 survived , a ratio of mere 19 %. One reason behind this huge difference between the survival rate could be that , there might be high number of married women on board and since the number of life boats were very few , in fact only 20 , the husbands might have tried to save life of their wives first at the stake of their own life. So being female was one of the most important factor for survival.

Class

As we can see in above graph around 200 people from class 1 , survived out of total 323 people in that class , which is approximately 62 % . This was followed by class 2 which had around 52 % survival rate with 119 people getting survived out of total 227.Class 3 people were not so lucky with survival rate being just 25 % . The reason behind such huge difference between survival rates could be the way these three classes were located in boat.May be it was more easy for class 1 and class 2 people to reach to life boat exit spots as compared to class 3.

Age

Age was also one of the most important factor for getting survived as can be seen from above graph. From above graph we can see , if the age group was child, having age less than 18 , the survival rate was almost 51 % , followed by adult age group for which the survival rate was approximately 38 % . The senior citizen age group had very low survival rate with only 18 people getting survived out of 65. The reason behind high survival of child age group shows, children were given preference over adults and senior citizens while sending people on life boats.

Point of Entry

Point of entry was also one of the factors that helped people in being lucky to survive. As we can see from above plot , for people who embarked from Cherbourg ,around 150 people out of 270 were survived ,which is the highest survival rate out of the three entry points. A reason behind this could be that , lot of people who embarked from Cherbourg were from class 1 and hence were located at right place to reach to life saving boats.

Family size

I have created a new attribute family size by adding number of siblings , spouse , children , parents and the person himself.As we can see the survival rate was better for people having family size between 2 to 4 and was lower for people having family size greater than 4. It is quite reasonable to guess that people with higher family size might have tried to gather and look for all family members before trying for own survival. Interestingly here people with family size 1 means the people who were travelling alone , and the survival rate for this category is very poor as can be seen from above graph. So many of these alone travelers might have purchased a class 3 ticket and hence the lower survival rate.

So from the above analysis we can see that women and children first strategy was applied in Titanic rescue operation, as well as the people of class 1 and class 2 had an advantage over the people from class 3 for reaching to life saving boats and ultimately for survival.People with lower family size were able to gather all family members soon , and hence were having higher survival rate.