Hi All,
I've seen many media reports and movies based upon crime in India. So, when I stumbled upon this data-set in kaggle, I thought of doing an Exploratory Data Analysis on it. And as an Indian, I've many unanswered questions on crime. Please note - that this blog does NOT involve any future prediction.
This case study only limits to "Rapes in India". Why?
Because of the media attention (both international, particularly nytimes.com, bbc.com, and widely in national media) to this particular crime from the spate of other variety of crimes, and gives India a very poor name across the globe. There is even a wiki page on Rape in India (https://en.wikipedia.org/wiki/Rape_in_India).
So,
- Is Indian society really so pathetic and is made of sick people ?
- Can we quantify this claim ?
- Can we zero-in those places where its dangerous for women ?
- How has the decade of 2000 fared in case of crime of rape vs other serious offences in India ?
- How does India stand across the globe on this crime ?
- Has anything changed post - Nirbhaya incident ?
I have to thank Mr. Rajanand Ilangovan who took the huge efforts to collect various data from Government of India website and post them on Kaggle for public use. For this study, I have used the following sheets-
- 20_Victims_of_rape.csv
- Rape_Victims_Table_3A.3_2016.csv
Data sets used for this tasks are from the below link -
So, lets peek into the "Victims of Rape in India from 2001 - 2010" government data. This is gender neutral data-
And here is the most released data of the year 2016 victims -
As one can observe, the above data-set is more verbose and provides more details about the crime & criminal and the follow up.
Important
I've distributed this study in two sections for a slew of reasons - Before Nirbhaya incident (BN) - Data from 2000 decade which is until 2010. Post Nirbhaya incident (PN) - Data of the year 2016.
So, initially the historical data (BN) is worked upon. Thereafter, I study the (PN) data and check if India really changed post the horrific Nirbhaya incident ? If it changed, then in how and how much ?
1. BEFORE NIRBHAYA - DATA SET UNDERSTANDING
This data set has data for each state and UT of India. The data takes into consideration total cases of rapes in the first row. Then, this is sub divided into two different sub-groups. "Incest" and "Other forms of Rape". Incest cases have a specific row, while other all cases come other the next row.
Thus, the total no. of cases = incest case + other types of cases.
Now, let's check the metadata of the data set -
Thus, the total no. of cases = incest case + other types of cases.
Now, let's check the metadata of the data set -
The data set does not have columns such as date on which crime was committed, exact area in latitude and longitude to further pin point the location of crime, or the date/time on which crime was reported, status of investigation, final status of crime conviction, etc.
Had such columns been present, seasonality factor could have be calculated. For example, if date, time and geographical factors were present, then analysis could drill down better. As a background information, its widely known that in rural India where in open defecation is still a practice, sexual violence is inter-linked (https://www.bbc.com/news/world-asia-india-27635363). Similarly, rainy season, late nights or secluded afternoon hours, weekend/ festive holidays, are few of the time slots which see spike in occurrence of crime as per studies over the web. But this data set does not provide such input columns to further research on such factors.
Moving on, after doing the basic cleansing, lets consider only the sub group of "Total Rape Victims" for each state and UT of India for further analysis. This will give the holistic picture.
2. STARTING OF BASIC EDA
Insights -
- From the above, we can see that age group of victims between - 18-30 is most vulnerable.
- Also, shocking is high victim count from 0-10 age indicating high incidence of sexual misconduct on innocent children.
Insights -
- There has been a consistent increase in rape case with the years 2002 and 2003 being a slight exception. So, as a society we are clearly not doing good!
Insights -
- Underage victim (Minor aged victims) indicates high child abuse. Almost 40% victims are below 18 years and minor.
3. STATE WISE CRIME DISTRIBUTION
Now, lets see this area wise, how each state has fared. Do we see some anomalies in any area ?
Insights -
- States of "Madhya Pradesh, Maharashtra, Uttar Pradesh, Assam, Bihar and West Bengal" were most impacted during this period. While Madhya Pradesh didnt improve its law and order situation, West Bengal situation went from bad to worse the most.
- Situation in states of "Odisha, Tamil Nadu, Punjab, Jharkhand, Chhattisgarh and Andhra Pradesh" worsened during this period.
- Baring state of Assam, other North Eastern states don't report Rape related crime in excess.
- The states like Bihar and UP have in-consistent graph in cases of rape. Few years it improved while few years it was really dark times.
Insights -
- As per population, density of crime is more in states of "Mizoram, Tripura, Delhi, Madhya Pradesh, Chhattishgarh and Assam, with Mizoram being the worst of all.
- This is somewhat strange to the earlier heat map where North Eastern states of Mizoram and Tripura were not identified.
- State which is most impacted is ( count & density) - Madhya Pradesh.
Change in a state's population did not impact the crime data over the years.
4. RELATIONSHIP OF VICTIM WITH OFFENDER
Now, let's check out how the victims and offenders are related. This will help to understand if offenders are insiders or outsiders.
Insights -
Barring the year 2006, its clearly visible that pan India, usually the victim is raped by a person known to them - be it parent/ close family member, other relative, neighbor or other known persons (these are the known sub-sections in the input data-set as individual columns).
This also concurs to the point made by the then Delhi Police Chief (Mr. Neeraj Kumar) post infamous Dec 2012 Delhi rape case, in a televised media interview claiming that most of the rapes happen by a known person in India, thus boasting the effectiveness of his police force. And the heinous crime of raping an unknown lady on streets of Delhi is not a norm rather an exception. ( I could not find the link of the video otherwise would have posted it here too.)
P.S. - This can be further explored how other crimes happen on women across India to evaluate the overall safety of women. This is explored in Section 6 Below.
Insights -
- States of West Bengal, Madhya Pradesh, Odhisha, Jharkhand and Mizoram have the most worrying state of events as far as unknown person raping victims, with West Bengal being most unsafe.
- While in most of other areas of India, including New Delhi, termed as "Rape Capital" (http://www.walkthroughindia.com/lifestyle/the-5-most-unsafe-indian-cities-for-women/) are safer than other places in India. Yet, one can not neglect as mentioned in the above article, that the National Capital of India is the most unsafe city for women, as 514 rape cases were reported in the national capital in 2011. India’s capital is ranked first in the top list of 35 main cities in the country, reporting the highest number of rape cases, sexually harassment, molestation and assaults.
5. WHY OF RAPE ? WHAT FACTORS CONTRIBUTE ?
Can we understand the socio-economic factors leading to rape ? Any demography reasons? Let's consider the data for the year 2010 to understand better.
And, the most reliable and verbose pan-India data is 2011 census data. This data was collected from the year 2010 until 2011, making it a year long data collection exercise. So, its most close to 2010 crime data.
Insights -
Literacy rate is negatively co-related to rape rate, particularly illiterate education.
Insights -
Female Workers count negatively co-related to Rape.
Insights -
Urban households, houses with bathing facility and houses with latrine within them are negatively co-related to rape rate.
Rural households having latrine within their premises is also negatively co-related to rape rate.
Let's hold-on here on factors to see the macro picture.
6. FORCIBLE RAPE CASES
Lets see how India performed when rape was forced by kidnapping. That means, its mostly by those offenders who are unknown to the victim.
Insights -
Only a minority of reported cases of rape in the span of 10 years were done after abducting or kidnapping a victim. Rest were done by those who are known to them. This is in accordance to as shown in section 4 above.
7. HOW SAFE IS INDIA FOR WOMEN ?
Digressing from the crime of Rape and including other sexual abuses in India. Lets take a step back and understand how safe is women over all in India ? What does data say ?
Insights -
Women being almost 50% of Indian population but this data shows that social crime upon them is fairly less.
In the later part of decade, the number of cases upon women increased a bit. So, did the overall crime cases in India.
Other serious crimes such as Murder, Robbery, Dacoity, Kidnapping, etc. form bulk of crimes in India.
8. INDIAN JUDICIARY'S IMPACT
Let's check out the effectiveness of investigation and conviction in Rape related crime in India. Here is the judiciary's data-set -
Insights -
While data in bar plot shows how rape cases increased over the year, conviction in courts remained almost constant. Ideally, this should have been consistent with the crime rate. Secondly, conviction counts is almost 30% of the amount of crime showing poor state of investigation and judicial system.
Number of arrested persons is always more than the reported cases of crime. This indicates multiple offenders were arrested for a particular crime.
Of the total number of arrested persons, less number of persons were brought to trail. This indicates that investigation had some gaps leading to acquittal of suspects in wide number.
Most importantly, the Quantum of Punishment is absent in the source data. If this data is added, then it will be more insightful.
9. RAPE RATE AND RAPE VS POPULATION IN INDIA
Let's check out how from 2001 to 2010, count of Rape changed with increase of population of India. Please note that population of 2010 is not available. Thus, from the population of 2011, average year population growth of India is decreased to get to 2010 total Indian population. This gives the approximate Indian population for 2010.
Insights -
Almost a Linear increase in Rape rate over the decade.
Insights -
Again, a linear increase.
Insights -
So, the population percentage increase slightly more than the cases of rape in India. Given the assumption than in the start of decade (2001) both rape & population were equal.
10. WORLD DATA COMPARISON
Let's see the world statistics of Rape.
This is how World fared in the year 2005. No stats were available for year 2001. This is the amount of Rapes per 100 K population. Rape Rate is defined as the amount of rape cases reported per 100 K population of country.
Insights -
As seen above, India is consistently placed among the bottom in the world. Above plots are of years 2005 and 2010.
But if that's true, why India has got a bad name across the planet ?
12. POST NIRBHAYA - WHAT HAS CHANGED IN INDIA? HOW MUCH, IF ANY ?
Now, let's check out the conviction rate for crime against women in POST NIRBHAYA era in the year 2016 -
Insights -
There is a sharp decline in conviction rate in 2016 as compared to entire decade of 2000.
Now, let's see of crime has decreased / increased ?
CONCLUSIONS
Pre - Nirbhaya Era -
- There has been a consistent increase in rape case with the years. The amount of cases rose almost along the lines rise of population.
- Most victims are in the age group (18-30). Also, a significant amount of victims, are minors.
- Most impacted states as far as rape counts is concered - Madhya Pradesh, Maharashtra, Uttar Pradesh, Assam, Bihar and West Bengal. While, states given their population most impacted are - Mizoram, Tripura, Chhattisgarh as well as few of the above states.
- Majority of times, the victim and offender know each other.
- Given the above data in public domain, Indian society is NOT as dangerous for women in sexual crimes, at least as depicted in media. (Please refer EPILOGUE section.)
- Conviction rate is quite abysmal in rape crime.
- Quantum of punishment is absent in source data apart from majot loopholes in source data.
- Factors impacting the rape rate are firstly, households having latrine facility within them (both rural and urban) - negatively. And literacy rate positively impacts.
Post Nirbhaya Era -
- There is significant rise in Rape Cases lately (Would it be because laws have changed ? Or because victims are more aware/ open to raise their voice? )
- Conviction Rate has decreased. (Would it be because, reported crimes do not have credible evidence to prove them ? Or because investigation standards have not improved ?)
EPILOGUE
Given the above study, one can be critical of it and question even the source data on several counts such as -
- VERACITY – Governments can lie just like an individual.
- ACCURACY – Data collection quality can be questioned.
- VERBOSITY – Geographical data is not available. India changes at every 5 kms. So, columns such as zip-code, latitude, longitude, etc. would assist.
- REAL TIME – Most data available in public domain is historical. Rather, it should be as close to real-time as possible.
- ACTUAL – The most important factor is that rape and sexual abuse related crimes are often unreported, due to shame, guilt, fear, etc.
Secondly, 2016 data of the "Post Nirbhaya" era can be an aberration.
Lastly, if Rape is a crime/sin/epdemic/disease, then this is not just an Indian problem rather a global issue. So, as Indian philosophy suggests "Vasudhaiva Kutumbakam" (meaning, the world is one family), lets fight it out globally.
