Sunday, 21 June 2020

The relevance of WHY of a task?


This pandemic and the lock-down that has followed has taught us, consciously or unconsciously, many things.

When lock-down started, I volunteered to do mop, clean and tidy things in my home. That was relief for my family members who suitably handled the kitchen chores. For the initial few days, I used to jump with glee each morning to clean the house. But slowly and surely my wife started finding “quality issues” in my work. A hair stand left here, or a dirt left there. So, to motivate me, delicious homemade dishes were prepared to entice me. That got me going for the next few days. But I started taking these perks for granted. And, yet again the quality dwindled.

So, after a proper dirty cleaning day, I was left to reflect to "why of things".

I drew a parallelism in this task to the motivation that draws me to work in any field, say in office. Whenever I start in a new office, big assignment or project, I start with lot of motivation. Although, as time passes it dwindles or wavers. It is never constant each day. And the perks, bonus or official promotion plays a role of a carrot. Not each day, I work for a carrot. So, why should I work with enthusiasm? And how to keep it up and going strongly?

The answers lie in figuring out the why of a task in hand? And does the answer really motivate you?

So, I work in Feature Library team which creates the host of features which are used in Machine Learning models. These models audit the medical bills of insurance companies. The savings made by the audit process helps in valuable saving for medical companies. Some of this money is used for valuable R&D work such as finding cure... you never know even for Covid!

This is a big motivator for me that my actions akin to an ant trying to move a piece of dirt will go on to move a hill tomorrow. I am part of a big chain. Similarly, my daily efficiency in keeping my home clean will help my family be healthy and happy. And this is irrespective of the perks in office or being served delicious dishes at home :)

So, what is answer to your “why” for a task? 

#mondayblues #motivation #covid #pandemic #purpose #identity #why #what #psychology #self-development #self-perception #signature #Social Psychology #develop #growth #inspiration #mindful #psych #relationships #social maturity #authenticity #reason 

 

P.S. - If one wants to delve deep, here is a link to read and ponder.

Sunday, 5 April 2020

Outlier - Knowledge derivation



For every rule, there exists exceptions. And outliers are also exceptions in data.  So, before I come to the topic of insights which can derived from outliers, let me share an anecdote.


In the early days of my career in Infosys during mid 2000's, I was working in Pune. I had a colleague from South India (Madurai to be precise). Lets name him Jai Muttuswamy. Now, he was very good technically and humble down to earth boy. There were few folks in the team who used to mock his ways and his accent and he used to also pay along mocking theirs in turn. We had a some of the most memorable funny days as a team. 

One fine day during a official party chat, someone called him "Jaiii", which resembled the way Veeru calls the character "Jai" in the end of the epic movie "Sholay". My friend Jai turned blank. It turned out that Jai had never seen or heard about Sholay! 




Now it was a big deal for few of us :). So much so, that few of the team members did an unofficial survey! So, next day an unofficial poll was conducted to understand the demography of team. The results were that we had folks from 23 states of India (India then was composed of around 28 states and 5 Union Territories). And the team size was close to 100. And it turned out that for all of us Sholay was one movie that we had definitely watched. While few team members felt very strongly for the movie, few others felt otherwise. But almost all had watched it once. Certainly, Jai was an exception. An outlier.

Now, let me clarify here, I'm not giving this example to showcase any language or regional divide in the country of India. And Jai also took the fun in his stride and had his own unique ways to give it back to us. Some of the best professional memories and bonds were formed in those days of Infosys, Pune.
So coming to the topic, what can we gain from Outlier? 

One that as most real world data, here too it was right-skewed. That means, the Z score is vastly negative. 
Secondly, for such datum point of outlier may show that Hindi movies is uncommon to folks hailing from the demography of Jai. Can we generalize this hypothesis ? Perhaps yes, if a survey happens in that city or state. So, this asks for more drill down in the future course of an outlier. 

Thirdly, can outliers provide a scope further analysis or not is best left to the analyst. It depends on data analyst to see the why behind the outlier. Should the outlier be considered or ignored? 

Outliers are of different types, namely -

  • Point Outlier
  • Contextual Outlier
  • Collective Outlier

Standard Deviation is taken on a data to understand the spread of data. And it is sensitive to outliers. So, is mean. But median is resistant to outlier. So, given the scenario and domain, outliers can impact the data analysis.

Now, how to detect outliers -
  • if it is located 1.5(IQR) or more below Q1 or
  • if it is located 1.5(IQR) or more above Q3, where IQR stands for Interquartile Range.
Pictorially speaking, IQR can be best described  like this -


So, for outlier, an analyst has few options to handle the scenario - 
  • Can we remove the outlier
  • Can we replace the values with statistical measure - say 5th% and 95th% value or more accuracy sake 1st% or 99th% value
That's all for Outliers.


P.S. - Jai watched Sholay few weekends later and he was fairly amused after watching the movie :)







Sunday, 8 March 2020

EDA in Crime of Rape In India - Using Python

Hi All,

I've seen many media reports and movies based upon crime in India. So, when I stumbled upon this data-set in kaggle, I thought of doing an Exploratory Data Analysis on it. And as an Indian, I've many unanswered questions on crime. Please note - that this blog does NOT involve any future prediction.
This case study only limits to "Rapes in India". Why?
Because of the media attention (both international, particularly nytimes.com, bbc.com, and widely in national media) to this particular crime from the spate of other variety of crimes, and gives India a very poor name across the globe. There is even a wiki page on Rape in India (https://en.wikipedia.org/wiki/Rape_in_India).
So,
  • Is Indian society really so pathetic and is made of sick people ?
  • Can we quantify this claim ?
  • Can we zero-in those places where its dangerous for women ?
  • How has the decade of 2000 fared in case of crime of rape vs other serious offences in India ?
  • How does India stand across the globe on this crime ?
  • Has anything changed post - Nirbhaya incident ?
https://www.kaggle.com/rajanand/crime-in-india
I have to thank Mr. Rajanand Ilangovan who took the huge efforts to collect various data from Government of India website and post them on Kaggle for public use. For this study, I have used the following sheets-

  1. 20_Victims_of_rape.csv
  1. Rape_Victims_Table_3A.3_2016.csv


Data sets used for this tasks are from the below link  -



So, lets peek into the "Victims of Rape in India from 2001 - 2010" government data. This is gender neutral data-  











And here is the most released data of the year 2016 victims -

As one can observe, the above data-set is more verbose and provides more details about the crime & criminal and the follow up.


Important


I've distributed this study in two sections for a slew of reasons - Before Nirbhaya incident (BN) - Data from 2000 decade which is until 2010. Post Nirbhaya incident (PN) - Data of the year 2016.

So, initially the historical data (BN) is worked upon. Thereafter, I study the (PN) data and check if India really changed post the horrific Nirbhaya incident ? If it changed, then in how and how much ?



1. BEFORE NIRBHAYA - DATA SET UNDERSTANDING


This data set has data for each state and UT of India. The data takes into consideration total cases of rapes in the first row. Then, this is sub divided into two different sub-groups. "Incest" and "Other forms of Rape". Incest cases have a specific row, while other all cases come other the next row.
Thus, the total no. of cases = incest case + other types of cases.
Now, let's check the metadata of the data set - 









The data set does not have columns such as date on which crime was committed, exact area in latitude and longitude to further pin point the location of crime, or the date/time on which crime was reported, status of investigation, final status of crime conviction, etc.
Had such columns been present, seasonality factor could have be calculated. For example, if date, time and geographical factors were present, then analysis could drill down better. As a background information, its widely known that in rural India where in open defecation is still a practice, sexual violence is inter-linked (https://www.bbc.com/news/world-asia-india-27635363). Similarly, rainy season, late nights or secluded afternoon hours, weekend/ festive holidays, are few of the time slots which see spike in occurrence of crime as per studies over the web. But this data set does not provide such input columns to further research on such factors.

Moving on, after doing the basic cleansing, lets consider only the sub group of "Total Rape Victims" for each state and UT of India for further analysis. This will give the holistic picture.




2. STARTING OF BASIC EDA





Insights -
  • From the above, we can see that age group of victims between - 18-30 is most vulnerable.
  • Also, shocking is high victim count from 0-10 age indicating high incidence of sexual misconduct on innocent children.












Insights -
  • There has been a consistent increase in rape case with the years 2002 and 2003 being a slight exception. So, as a society we are clearly not doing good!























Insights -
  • Underage victim (Minor aged victims) indicates high child abuse. Almost 40% victims are below 18 years and minor.



3. STATE WISE CRIME DISTRIBUTION


Now, lets see this area wise, how each state has fared. Do we see some anomalies in any area ?



















Insights -
  • States of "Madhya Pradesh, Maharashtra, Uttar Pradesh, Assam, Bihar and West Bengal" were most impacted during this period. While Madhya Pradesh didnt improve its law and order situation, West Bengal situation went from bad to worse the most.
  • Situation in states of "Odisha, Tamil Nadu, Punjab, Jharkhand, Chhattisgarh and Andhra Pradesh" worsened during this period.
  • Baring state of Assam, other North Eastern states don't report Rape related crime in excess.
  • The states like Bihar and UP have in-consistent graph in cases of rape. Few years it improved while few years it was really dark times.





















Insights -

  • As per population, density of crime is more in states of "Mizoram, Tripura, Delhi, Madhya Pradesh, Chhattishgarh and Assam, with Mizoram being the worst of all. 
  • This is somewhat strange to the earlier heat map where North Eastern states of Mizoram and Tripura were not identified.
  • State which is most impacted is ( count & density) - Madhya Pradesh.
    Change in a state's population did not impact the crime data over the years.

4. RELATIONSHIP OF VICTIM WITH OFFENDER


Now, let's check out how the victims and offenders are related. This will help to understand if offenders are insiders or outsiders.




















Insights -


  • Barring the year 2006, its clearly visible that pan India, usually the victim is raped by a person known to them - be it parent/ close family member, other relative, neighbor or other known persons (these are the known sub-sections in the input data-set as individual columns).

  • This also concurs to the point made by the then Delhi Police Chief (Mr. Neeraj Kumar) post infamous Dec 2012 Delhi rape case, in a televised media interview claiming that most of the rapes happen by a known person in India, thus boasting the effectiveness of his police force. And the heinous crime of raping an unknown lady on streets of Delhi is not a norm rather an exception. ( I could not find the link of the video otherwise would have posted it here too.)
P.S. - This can be further explored how other crimes happen on women across India to evaluate the overall safety of women. This is explored in Section 6 Below.


















Insights -

  • States of West Bengal, Madhya Pradesh, Odhisha, Jharkhand and Mizoram have the most worrying state of events as far as unknown person raping victims, with West Bengal being most unsafe.
  • While in most of other areas of India, including New Delhi, termed as "Rape Capital" (http://www.walkthroughindia.com/lifestyle/the-5-most-unsafe-indian-cities-for-women/) are safer than other places in India. Yet, one can not neglect as mentioned in the above article, that the National Capital of India is the most unsafe city for women, as 514 rape cases were reported in the national capital in 2011. India’s capital is ranked first in the top list of 35 main cities in the country, reporting the highest number of rape cases, sexually harassment, molestation and assaults.


5. WHY OF RAPE ? WHAT FACTORS CONTRIBUTE ?

Can we understand the socio-economic factors leading to rape ? Any demography reasons? Let's consider the data for the year 2010 to understand better.

And, the most reliable and verbose pan-India data is 2011 census data. This data was collected from the year 2010 until 2011, making it a year long data collection exercise. So, its most close to 2010 crime data.

























Insights -


  • Literacy rate is negatively co-related to rape rate, particularly illiterate education.

























Insights -

  • Female Workers count negatively co-related to Rape.

























Insights -

  • Urban households, houses with bathing facility and houses with latrine within them are negatively co-related to rape rate.


  • Rural households having latrine within their premises is also negatively co-related to rape rate.

Let's hold-on here on factors to see the macro picture.


6. FORCIBLE RAPE CASES



Lets see how India performed when rape was forced by kidnapping. That means, its mostly by those offenders who are unknown to the victim.















Insights -

  • Only a minority of reported cases of rape in the span of 10 years were done after abducting or kidnapping a victim. Rest were done by those who are known to them. This is in accordance to as shown in section 4 above.





7. HOW SAFE IS INDIA FOR WOMEN ?


Digressing from the crime of Rape and including other sexual abuses in India. Lets take a step back and understand how safe is women over all in India ? What does data say ?















Insights -

  • Women being almost 50% of Indian population but this data shows that social crime upon them is fairly less.




  • In the later part of decade, the number of cases upon women increased a bit. So, did the overall crime cases in India.



  • Other serious crimes such as Murder, Robbery, Dacoity, Kidnapping, etc. form bulk of crimes in India.




8. INDIAN JUDICIARY'S IMPACT



Let's check out the effectiveness of investigation and conviction in Rape related crime in India. Here is the judiciary's data-set -


















Insights -

  • While data in bar plot shows how rape cases increased over the year, conviction in courts remained almost constant. Ideally, this should have been consistent with the crime rate. Secondly, conviction counts is almost 30% of the amount of crime showing poor state of investigation and judicial system.




  • Number of arrested persons is always more than the reported cases of crime. This indicates multiple offenders were arrested for a particular crime.




  • Of the total number of arrested persons, less number of persons were brought to trail. This indicates that investigation had some gaps leading to acquittal of suspects in wide number.




  • Most importantly, the Quantum of Punishment is absent in the source data. If this data is added, then it will be more insightful.





9. RAPE RATE AND RAPE VS POPULATION IN INDIA



Let's check out how from 2001 to 2010, count of Rape changed with increase of population of India. Please note that population of 2010 is not available. Thus, from the population of 2011, average year population growth of India is decreased to get to 2010 total Indian population. This gives the approximate Indian population for 2010.





























Insights -

  • Almost a Linear increase in Rape rate over the decade.























Insights -

  • Again, a linear increase.























Insights -

  • So, the population percentage increase slightly more than the cases of rape in India. Given the assumption than in the start of decade (2001) both rape & population were equal.






10. WORLD DATA COMPARISON



Let's see the world statistics of Rape.






















This is how World fared in the year 2005. No stats were available for year 2001. This is the amount of Rapes per 100 K population. Rape Rate is defined as the amount of rape cases reported per 100 K population of country.





















Insights -

As seen above, India is consistently placed among the bottom in the world. Above plots are of years 2005 and 2010.

But if that's true, why India has got a bad name across the planet ?



12. POST NIRBHAYA - WHAT HAS CHANGED IN INDIA? HOW MUCH, IF ANY ?




















Now, let's check out the conviction rate for crime against women in POST NIRBHAYA era in the year 2016 -





Insights -

  • There is a sharp decline in conviction rate in 2016 as compared to entire decade of 2000.



















Now, let's see of crime has decreased / increased ?






CONCLUSIONS 


Pre - Nirbhaya Era -

  1. There has been a consistent increase in rape case with the years. The amount of cases rose almost along the lines rise of population.
  2. Most victims are in the age group (18-30). Also, a significant amount of victims, are minors.
  3. Most impacted states as far as rape counts is concered - Madhya Pradesh, Maharashtra, Uttar Pradesh, Assam, Bihar and West Bengal. While, states given their population most impacted are - Mizoram, Tripura, Chhattisgarh as well as few of the above states.
  4. Majority of times, the victim and offender know each other. 
  5. Given the above data in public domain, Indian society is NOT as dangerous for women in sexual crimes, at least as depicted in media. (Please refer EPILOGUE section.)
  6. Conviction rate is quite abysmal in rape crime.
  7. Quantum of punishment is absent in source data apart from majot loopholes in source data.
  8. Factors impacting the rape rate are firstly, households having latrine facility within them (both rural and urban) - negatively. And literacy rate positively impacts.


Post Nirbhaya Era -
  1. There is significant rise in Rape Cases lately (Would it be because laws have changed ? Or because victims are more aware/ open to raise their voice? ) 
  2. Conviction Rate has decreased. (Would it be because, reported crimes do not have credible evidence to prove them ? Or because investigation standards have not improved ?)

EPILOGUE




Given the above study, one can be critical of it and question even the source data on several counts such as -

  1. VERACITY – Governments can lie just like an individual.
  2. ACCURACY – Data collection quality can be questioned.
  3. VERBOSITY – Geographical data is not available. India changes at every 5 kms. So, columns such as zip-code, latitude, longitude, etc. would assist.
  4. REAL TIME – Most data available in public domain is historical. Rather, it should be as close to real-time as possible.
  5. ACTUAL – The most important factor is that rape and sexual abuse related crimes are often unreported, due to shame, guilt, fear, etc.


Secondly, 2016 data of the "Post Nirbhaya" era can be an aberration.

Lastly, if Rape is a crime/sin/epdemic/disease, then this is not just an Indian problem rather a global issue. So, as Indian philosophy suggests "Vasudhaiva Kutumbakam" (meaning, the world is one family), lets fight it out globally.