A Python Data Visualization Codebook on Suicides Deaths in India in 2018
Hello, world !! Our Earth herself is the mother of data. Data is beneath your feet, above your head: Everywhere. Everything is free, but you need to infer knowledge from this abundance of data. The deduction is something that we need to gain knowledge, to decide our goal and to make our successful roadmap. So, Data Analysis, without a doubt have a crucial mark in our life.
Today we will infer some knowledge on the high rate of Suicidal deaths in India in 2018 AD. In 2018 AD there was 134,516 people die of suicide in India, which is indeed a big concern to address. The analysis performed with pandas, matplotlib, seaborn and plotly. So let's jump into the coding:
Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
If you are working on jupyter notebook, you can add one more piece of code with the import libraries: %matplotlib inline
, which helps to show your graphs and charts in the jupyter notebook itself.
Pro Tips: It’s always good to save your notebook in somewhere in the cloud. Jovian is a platform for sharing and collaborating on Jupyter notebooks and data science projects. You can add your notebooks in Jovian.ml Account.
import jovian
Setting Up Environment
Below piece of codes sets up the environment for your graphs. You can choose your style. See more exciting configuration in the official seaborn documentation.
import matplotlibmatplotlib.rcParams['figure.dpi'] = 200
matplotlib.rcParams['savefig.dpi'] = 200sns.set_style('whitegrid')
sns.set_context('paper')
Reading Dataset File
You can read the tables from a range of files using the pandas library. Here we will read the data from a CSV file.
#reading dataset
suicide_df = pd.read_csv('NCRB-ADSI-2018-Table-2.5.csv')
Let’s see the first ten rows of suicide_df
dataframe.
from IPython.display import displaywith pd.option_context("display.max_rows", None, "display.max_columns", None):
display(suicide_df.head(10))
Data Visualization
Now as we have data loaded in our dataframe, we can start visualizing different aspects of it. The best way to extract out the data is to ask questions about the data. Like:
- Question 1: Visualize the gender-wise deaths in India
- Question 2: Gender-wise suicide cases (deaths) in all states
- Question 3: Mortality rate in different States and Union Territories
Let’s address each question one by one:
Here, we will select a specific row with a few columns.
total_deaths = suicide_df[suicide_df['State/UT/City'] == 'Total (All India)'][['Total - Male', 'Total - Female', 'Total - Transgender']]
Get the values and labels for the bar chart. Give a title.
title='Gender-wise Total deaths in all over India- 2018'
label = ['Male', 'Female', 'Transgender']
x = total_deaths.values[0]
Now time to plot this piece of information in a bar chart.
Pro Tips: Always try to get some function rather have a repeated piece of code.
def get_bar_chart(title, label, values, text_rotation=270):
plt.figure(figsize=(12, 6))
plt.title(title, fontsize=14)
plt.barh(label, values)
plt.box(False)
plt.yticks(rotation=text_rotation, va='center')
for index, value in enumerate(values):
plt.text(value, index, str(value), rotation=text_rotation);get_bar_chart(label=label, values=x, title=title)
Save your work in your jovian.ml account.
jovian.commit(project='data_visualization_suicide_in_India_2018')
Now, we will tackle the next question. Here, in the suicide_df
dataframe, we have all sort of data. So first, we take out ‘State’ related data. Save it in another dataframe state_df
.
state_df = suicide_df[suicide_df.Category == 'State']
Note: state_df
reference to suicide_df
dataframe. So, renaming or deleting columns may produce warnings. If you just want to rename columns, you can use.copy()
method. So the above line can be written as:
state_df = suicide_df[suicide_df.Category == 'State'].copy()
Now, we can rename ‘State/UT/City’ column to ‘State’ using .rename()
method. Below code:
state_df.rename({'State/UT/City':'State'}, axis=1, errors="raise", inplace=True)state_df.head()
Now, select useful columns and rows from the state_df
dataframe.
statewise_total_deaths = state_df[state_df.State != 'Total (States)'][['State', 'Total - Male', 'Total - Female', 'Total - Transgender','Total - Total']]
Here, we have just taken out the ‘Total (State)’ row from the state_df
dataframe. Now, we will sort the statewise_total_deaths
dataframe by ‘Total — Total’ column. Below code:
statewise_total_deaths = statewise_total_deaths.sort_values('Total - Total', ascending=False)
In the next two lines, we will .drop()
the ‘Total-Total’ column as we have already sorted the statewise_total_deaths
dataframe in descending order and second-line we will fuse the dataframe using the .melt()
method to proper form. It is always recommended to look through the official documentation.
statewise_total_deaths = statewise_total_deaths.drop(columns=['Total - Total']) #drop the total deaths column after sortingstatewise_total_deaths.head()
statewise_total_deaths = statewise_total_deaths.melt('State', var_name='Genders', value_name='Deaths')
Now we can visualize this piece of information using seaborn plot methods.
g = sns.catplot(y="State", x="Deaths", hue='Genders', data=statewise_total_deaths, kind='bar')
plt.subplots_adjust(top=0.9)
plt.box(False)
g.fig.suptitle('Statewise Suicide Cases(deaths) in India- 2018');
Now, we have an idea that which states have high no. of suicide deaths, in terms of total and male deaths Maharashtra is far more ahead of other states but in female suicide cases, West Bengal is leading etc.
Don't forget to save your work on your jovian.ml account.
jovian.commit()
In the third question, we need to find out the mortality rate in different States and Union Territories. The mortality rate is defined by the ratio of suicide deaths and the total number of population of that state or union territory.
So, we need another dataset that has states-wise population data for 2018 AD.
population_df = pd.read_csv('indian_states_population.csv')
Now, copy another dataframe fromsuicide_df
dataframe excluding ‘City’ category.
total_df = suicide_df[suicide_df.Category != 'City'][['State/UT/City', 'Total - Total']].copy()
Now, merge total_df
and population_df
dataframe. Below code:
total_df = total_df.merge(population_df, on='State or union territory')
So far, we have picked out a dataframe from suicide_df
. Rename certain columns. Merge with another dataframe. Now we will find out the mortality rate in each state and union territories and save it in a new column. Below code:
total_df['Mortality rate(in 100K)'] = np.round((total_df['Suicide(deaths)'] / total_df['Population']) * 1e5, 4)
Now, we can live with this mortality rate column and tag its value in the map but for better understanding, we can take the logarithmic value for scaling.
total_df['Mortality scale(log10(x))'] = np.log10(total_df['Mortality rate(in 100K)'])
Now let’s glance at the final dataframe we got:
total_df.head()
Now we will plot this bit of information on a map. So we need a .geojson file that contains coordinates of states and UTs. You can download it from GitHub.
import jsonindian_states = json.load(open('states_india.geojson', 'r'))
Now let’s see the available keys in features of geojson file. Below code:
indian_states['features'][0].keys()
The ‘features’ hold three attributes: ‘type’, ‘geometry’ and ‘properties’. The ‘geometry’ attribute holds all coordinates of a State or UT whereas ‘properties’ holds more attributes like state name and state code. Now we will create a dictionary to map between geojson object and total_df
. Here with geojson ‘feature’ object we will add one more attribute or key, state id which is nothing but state code from ‘properties’ attribute.
map_dict ={}
for feature in indian_states['features']:
feature['id'] = feature['properties']['state_code']
map_dict[feature['properties']['st_nm']] = feature['id']
In the last line of the above code, we have basically created the map_dict
dictionary. We take out the state name as a key from ‘properties attribute’ and state code as value.
map_dict
Out[125]:
{'Telangana': 0,
'A & N Islands': 35,
'Andhra Pradesh': 28,
'Arunachal Pradesh': 12,
'Assam': 18,
'Bihar': 10,
'Chhattisgarh': 22,
'Daman & Diu': 25,
'Goa': 30,
'Gujarat': 24,
'Haryana': 6,
'Himachal Pradesh': 2,
'Jammu & Kashmir': 1,
'Jharkhand': 20,
'Karnataka': 29,
'Kerala': 32,
'Lakshadweep': 31,
'Madhya Pradesh': 23,
'Maharashtra': 27,
'Manipur': 14,
'Chandigarh': 4,
'Puducherry': 34,
'Punjab': 3,
'Rajasthan': 8,
'Sikkim': 11,
'Tamil Nadu': 33,
'Tripura': 16,
'Uttar Pradesh': 9,
'Uttarakhand': 5,
'West Bengal': 19,
'Odisha': 21,
'D & N Haveli': 26,
'Meghalaya': 17,
'Mizoram': 15,
'Nagaland': 13,
'Delhi': 7}{'Telangana': 0,
'A & N Islands': 35,
'Andhra Pradesh': 28,
'Arunachal Pradesh': 12,
'Assam': 18,
'Bihar': 10,
'Chhattisgarh': 22,
'Daman & Diu': 25,
'Goa': 30,
'Gujarat': 24,
'Haryana': 6,
'Himachal Pradesh': 2,
'Jammu & Kashmir': 1,
'Jharkhand': 20,
'Karnataka': 29,
'Kerala': 32,
'Lakshadweep': 31,
'Madhya Pradesh': 23,
'Maharashtra': 27,
'Manipur': 14,
'Chandigarh': 4,
'Puducherry': 34,
'Punjab': 3,
'Rajasthan': 8,
'Sikkim': 11,
'Tamil Nadu': 33,
'Tripura': 16,
'Uttar Pradesh': 9,
'Uttarakhand': 5,
'West Bengal': 19,
'Odisha': 21,
'D & N Haveli': 26,
'Meghalaya': 17,
'Mizoram': 15,
'Nagaland': 13,
'Delhi': 7}
Now using .apply()
method we have mapped the map_dict
dictionary with total_df
. Below code:
total_df['id'] = total_df['State or union territory'].apply(lambda x: map_dict[x])
Now we can check it once again:
total_df.head()
Now we will plot this information in a map.
import plotly.express as pxfig = px.choropleth(total_df,
locations='id',
geojson=indian_states,
color='Mortality scale(log10(x))',
hover_name='State or union territory',
hover_data=['Mortality rate(in 100K)','Population', 'Suicide(deaths)', 'Sex ratio'])
fig.update_geos(fitbounds='locations', visible=False)
fig.show()
total_df = total_df.sort_values(by='Mortality rate(in 100K)', ascending=False)
Conclusion:
This analysis provides a comprehensive assessment of the trends of suicide deaths in various cities, states and UTs in 2018 AD. This analysis could be useful to infer knowledge and understand the causes of high trends of committing suicides and thus can be implemented in suicide prevention strategies.
Now don’t forget to save your work in jovian.ml account. peace out!!
GitHub Repo: https://github.com/officialPrasanta/Data-Analysis-Projects/tree/master/Suicides-in-India-2018
Jovian.ml notebook: https://jovian.ml/fprasanta2016/data-visualization-starter-project