In the early days of my career in Infosys during mid 2000's, I was working in Pune. I had a colleague from South India (Madurai to be precise). Lets name him Jai Muttuswamy. Now, he was very good technically and humble down to earth boy. There were few folks in the team who used to mock his ways and his accent and he used to also pay along mocking theirs in turn. We had a some of the most memorable funny days as a team.
One fine day during a official party chat, someone called him "Jaiii", which resembled the way Veeru calls the character "Jai" in the end of the epic movie "Sholay". My friend Jai turned blank. It turned out that Jai had never seen or heard about Sholay!
Now it was a big deal for few of us :). So much so, that few of the team members did an unofficial survey! So, next day an unofficial poll was conducted to understand the demography of team. The results were that we had folks from 23 states of India (India then was composed of around 28 states and 5 Union Territories). And the team size was close to 100. And it turned out that for all of us Sholay was one movie that we had definitely watched. While few team members felt very strongly for the movie, few others felt otherwise. But almost all had watched it once. Certainly, Jai was an exception. An outlier.
Now, let me clarify here, I'm not giving this example to showcase any language or regional divide in the country of India. And Jai also took the fun in his stride and had his own unique ways to give it back to us. Some of the best professional memories and bonds were formed in those days of Infosys, Pune.
So coming to the topic, what can we gain from Outlier?
One that as most real world data, here too it was right-skewed. That means, the Z score is vastly negative.
Secondly, for such datum point of outlier may show that Hindi movies is uncommon to folks hailing from the demography of Jai. Can we generalize this hypothesis ? Perhaps yes, if a survey happens in that city or state. So, this asks for more drill down in the future course of an outlier.
Thirdly, can outliers provide a scope further analysis or not is best left to the analyst. It depends on data analyst to see the why behind the outlier. Should the outlier be considered or ignored?
Outliers are of different types, namely -
No comments:
Post a Comment