Analysis of Elections in India

By Mohd Hassaan

Introduction

India is the largest democracy in the world, it has mainly two kind of election state and national election. National election is conducted on 545 seats. more than 500 million people voted in the 2014 election. What could be more intresting than analysing the data of indian elections through out the years.

In this project we are going to analyse the data of India’s National elections from 1977 to 2014.

Source: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/26526

Initial Exploration

Structure of the data

## 'data.frame':    73081 obs. of  11 variables:
##  $ st_name   : Factor w/ 43 levels "Andaman & Nicobar Islands",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ year      : int  1977 1977 1980 1980 1980 1980 1980 1980 1980 1980 ...
##  $ pc_no     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ pc_name   : Factor w/ 769 levels "Adilabad","Adilabad ",..: 38 38 38 38 38 38 38 38 38 38 ...
##  $ pc_type   : Factor w/ 5 levels "","GEN","SC",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ cand_name : Factor w/ 56601 levels "'Aids Man' Prakash Taterao Landge",..: 21485 28684 41951 1795 22788 21435 39393 23046 44757 21115 ...
##  $ cand_sex  : Factor w/ 4 levels "F","M","NULL",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ partyname : Factor w/ 1424 levels "A-Chik National Congress(Democratic)",..: 562 576 562 562 562 562 647 562 646 562 ...
##  $ partyabbre: Factor w/ 1071 levels "A S P","Aa S P",..: 427 423 427 427 427 427 499 427 497 427 ...
##  $ totvotpoll: int  25168 35400 109 125 405 470 717 1123 2034 15856 ...
##  $ electors  : int  85308 85308 96084 96084 96084 96084 96084 96084 96084 96084 ...

In the election data set we have state name(st_name), constituency name(pc_name), total electorates(electors), votes polled(totvotpoll), cadidate name(cand_name) etc.

Adding new variables

Before we go into any analysis we are going to add some new variables in the data. We are going to add a variable name proportion which gives us the proportion of vote a candidate get. the another variable is booleean variable which store the information about the candidate’s winning or loosing.

##                     st_name year pc_no                   pc_name pc_type
## 1 Andaman & Nicobar Islands 1977     1 Andaman & Nicobar Islands     GEN
## 2 Andaman & Nicobar Islands 1977     1 Andaman & Nicobar Islands     GEN
## 3 Andaman & Nicobar Islands 1980     1 Andaman & Nicobar Islands     GEN
## 4 Andaman & Nicobar Islands 1980     1 Andaman & Nicobar Islands     GEN
## 5 Andaman & Nicobar Islands 1980     1 Andaman & Nicobar Islands     GEN
## 6 Andaman & Nicobar Islands 1980     1 Andaman & Nicobar Islands     GEN
##           cand_name cand_sex                partyname partyabbre
## 1       K.R. Ganesh        M             Independents        IND
## 2 Manoranjan Bhakta        M Indian National Congress        INC
## 3   Ramesh Mazumdar        M             Independents        IND
## 4     Alagiri Swamy        M             Independents        IND
## 5       Kannu Chemy        M             Independents        IND
## 6         K.N. Raju        M             Independents        IND
##   totvotpoll electors  proportion
## 1      25168    85308 0.295025086
## 2      35400    85308 0.414966943
## 3        109    96084 0.001134424
## 4        125    96084 0.001300945
## 5        405    96084 0.004215062
## 6        470    96084 0.004891553
##            st_name           year          pc_no      
##  Uttar Pradesh :14791   Min.   :1977   Min.   : 1.00  
##  Bihar         : 7727   1st Qu.:1989   1st Qu.: 7.00  
##  Maharashtra   : 6458   Median :1996   Median :18.00  
##  Tamil Nadu    : 5309   Mean   :1997   Mean   :22.31  
##  Andhra Pradesh: 5236   3rd Qu.:2004   3rd Qu.:33.00  
##  Madhya Pradesh: 5196   Max.   :2014   Max.   :85.00  
##  (Other)       :28364                                 
##           pc_name      pc_type                 cand_name     cand_sex    
##  Belgaum      :  567      : 8070   None Of The Above:  543   F   : 3648  
##  Nalgonda     :  563   GEN:54862   Ashok Kumar      :   87   M   :68885  
##  East Delhi   :  434   SC : 7293   Om Prakash       :   78   NULL:  542  
##  Chandni Chowk:  344   SC :   15   Raj Kumar        :   57   O   :    6  
##  Lucknow      :  319   ST : 2841   Ram Singh        :   54               
##  Outer Delhi  :  319               Rajesh Kumar     :   51               
##  (Other)      :70535               (Other)          :72211               
##                     partyname       partyabbre      totvotpoll    
##  Independent             :31458   IND    :41127   Min.   :     0  
##  IND                     : 5619   INC    : 4800   1st Qu.:   872  
##  Independents            : 4050   BJP    : 3350   Median :  2743  
##  Indian National Congress: 3919   BSP    : 2624   Mean   : 49835  
##  Bharatiya Janata Party  : 2329   SP     : 1057   3rd Qu.: 19185  
##  Bahujan Samaj Party     : 1670   JD     :  943   Max.   :863358  
##  (Other)                 :24036   (Other):19180                   
##     electors         proportion           win_vote         won         
##  Min.   :  19471   Min.   :0.0000000   Min.   :     0   Mode :logical  
##  1st Qu.: 912985   1st Qu.:0.0007908   1st Qu.:212244   FALSE:67147    
##  Median :1099503   Median :0.0025292   Median :281953   TRUE :5934     
##  Mean   :1122277   Mean   :0.0483608   Mean   :301058                  
##  3rd Qu.:1329086   3rd Qu.:0.0186813   3rd Qu.:372227                  
##  Max.   :3368399   Max.   :0.6827193   Max.   :863358                  
## 

Cleaning the data

Above we can see that in data we can see that the minimum winning vote is 0. which cant be true.These may not be the wrong values, because manier times candidate was cancelled after suspection of booth capturing, manier times result Whithheal By Courts etc. These observation are very few(5 to be exact) ,but this can effect out analysis so we have to remove these from our data set as a outliers.

## [1] "Nation election data has 73076 Observation after filtering out Outliars."
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    9600  212244  282014  301078  372227  863358

Univariate Plots Section

Now We are going to analyse out data in one dimention.

1. No of contestant contesting every election year

Here we can see, there is a consistent increase in number of contestant participating in the elction and goes up to 14000 in 1996.But after that it comes to less than 5000 in 1998 and it decreased even further in 1999.

The increase in 1996 could be because so many independent cadidate contested as it was the first election after india became an open economy which give indian citzen too many opportunities and contesting election was one of that.

2.Number of contestant based on gender.

We can see how men easily out numbered the women in politics.womens had contested almost less than 5 percent as compared to mens. The other sex(or LGBT) has a long way to go in the indian politics as far as the numbers are concern. It will be more intresting to investigate how many women out of the few who contested did well in the elections.

3.Number of contestant based on State.

The above chart has no surprises as the states with highest population(Uttar pradesh) has highest number of candidates. But the second highest in population is Maharashtra but it stands at third position in number of candidates after Bihar. Also West bengal having fourth populated state stands at 9th position in number of candidates.

The data may look surprising but not too much for an Indian who is aware of political system of the conuntry.

Bihar and Uttar pradesh have been for a long time a hub for political revolution. which gives a mojority of polititians to the country.

West Bengal had been in a communist based government for almost 40 years out of 70 years of india’s history which always believed to have a not so loving attitude toward democracy. this could explain how West Bengal ended up having less candidate despite having large population.

4.Distribution of Proportion get by candidates

First lets see the summary of total vote polled.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##      11     872    2744   49838   19193  863358

Now lets see the summary of vote proportion.

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000103 0.0007910 0.0025294 0.0483641 0.0186860 0.6827193

In the totvopoll we can see that the median is very small which gives us a rough idea that most of the candidate got a very few votes. This is also reflecting in the proportion summary the median proportion is 0.0066 percent which is insanely small as compared to the max value or the mean. More than 75 percent of the candidate didn’t even get a 10 percent of vote share.

Now lets plot this

The data is highly skewed as most of the candidates lost due to few no of votes. It will be good to see the data in log10 transformation.

log10 transformation gives a better picture of data. We almost got a normal distribution.

5. Which state loves women more

Uttar pradesh is at the top where maximum womens have won in the election, followed by westbengal and Bihar. There are Many states in which women politicians rarely won like tripur, sikkim, jharkhand etc.

Univariate Analysis

Structure

The structure of National election data is simple. As seen at the start of the analysis. each row represents a candidate contested for a particular year. we have 11 columns:

  • st_name (Name of state)
  • year (year of election)
  • pc_no (parliamentry constituency no)
  • pc_name (parliamentry constituency name)
  • pc_type (parliamentry constituency type)
  • cand_name (candidate name)
  • cand_sex (candidate sex)
  • partyname (political party name)
  • partyabbre (party’s short form)
  • totvotpoll (total votes candidate get)
  • electors (total electors present in the constituency)

st_name, pc_name, pc_type, cand_name, cand_sex, partyname and partyabbre are of type factor and year, pc_no, totvotpoll and electors are of type integer.

Main feature(s) of interest

party_name, st_name, year, cand_sex are essential feature of intrest which allowed me to answes much fundamental questions like-

  • Which state has most electors?
  • Does women in India has the capacity to run for election?
  • Which year we see the most candidate run for election?

and many more question.

Other features which will help support my investigation into my feature(s) of interest?

totvotpoll, electors are the other two main feature which will help me explore few more relation ship like-

  • How vote share differs among gender?
  • How vote share differ based on electors?
  • Are independent candidates able to attract votes through out the years?
  • How states differ in vote shares?

Did you create any new variables from existing variables in the dataset?

I made three new variable -

  • proportion - how much portion of the total electors a candidate get.
  • win_vote - The vote share of the winning cadidate for that constituency in that year.
  • won - represents a particular candidate won the election or not.

Unusual distributions and its remedy

There are constituencies whose win_vote is 0. which cannot be true. The reason might be that election could be halted for that constituency by the court or election comminsion, which happened manier times in india. So I removed those entries as a outliar.

Observations

  • Among all the election 1996 is the year where maximum candidates contested the election.
  • Other gender and female candidates are way fewer than the male candidates.
  • Most of the candidates are from Uttar Pradesh(most populated state). But west Bengal(4th populated state) after having a huge population stands 7th in the no of candidate contested.
  • Most of Winning women candidate are from Uttar Pradesh followed by west Bengal.

Bivariate Plots and analysis

Now we are going to dig a little deeper in our analysis by adding one more dimension based on what we saw intresting in above plots.

1. Proportion based on gender

## Nat_election_gender$cand_sex: F
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000214 0.0010392 0.0040463 0.0704614 0.1110793 0.5154583 
## -------------------------------------------------------- 
## Nat_election_gender$cand_sex: M
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0000103 0.0007745 0.0024392 0.0475199 0.0175360 0.6827193 
## -------------------------------------------------------- 
## Nat_election_gender$cand_sex: O
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 0.0007498 0.0009032 0.0013373 0.0014541 0.0020291 0.0022823

From the above Boxplot and summary we can say that most women in percentage has little bit better proportion of votes than men. Though women candidates are few, We have to see the female contribution as compared to men a little bit closer.

2.Women candidate By state

We can see that both men and women who won the election share same distribution of proportion.

3. Women candidate By party

Lets see which party fielded more women candidate through out the years and how being in a party effect the winning of a women candidate.

Here we can see that among top 10, the most of the womens contested as Independent and lost, It is very Hard for a women to contest and win an election as independent. But on the other hand if contesting women is in a political party, her chances to win is better.

Indian National Congress is at the top most political party whoose winning women candiate is higher than any other party followed by Bhartiya Janta Party.

3. Male candidate By party

Lets see which party fielded more women candidate through out the years and how being in a party effect the winning of a women candidate.

From the above data we can say that not just female, Infact male candidates are also vulnerable to faliure if they contested election as Independent. And the cances of winning is more if candiates belongs to a palititcal party.

4.Rise And Fall Of Women.

Here we are going to see how many female cadidate make it to the parliament in the election.

As happy to see a slow but yet progressive trend for women taking more seats in parliament through out the years.

5.Rise and fall of women’s charisma

Here we are going to see how women candiate’s vote proportion varies through out the years.

2014 may be the year when maximum no of womens won the election but their vote share decreases than the previous elections.

6. Respond to Independent’s Call

Here we see the the performance of independent candidate through out years.

According to the above graph 1991 and 2014 were the worst years for Independent candidates. It is followed by the 2004 election where only 5 candidates won.

To conclude this we can say that Elections in india is a tough job for an Independent candidate rather than a candidate with a political party in his/her back.

7. States concious of democracy

Here we see the votepolled out of total electorats in each state.

Above Bar Graph shows that Lakshdweep an island which is not a state but a union territory has the people which are more concious about election than any other state.

One intresting things in the above chart is that states like West bengal, Nagaland, Tripura which are suffering from gross insurgent military groups in their region has the high vote percentage than other states.

Kerela which is the most literate state in the country is also in the top 10.

8. Proportion vs Electors

we are going how much a candidate gathere vote based on the electors size in his constiuency.

In above scatter plot we can see that most of the candidates won at a prcentage range of 15 to 45 there are very few winning candidate with less than 10% of vote share, also there are very few candidates with vote share grater than 50%.

One Intresting point we can make here is that there is hardly any candidate having a total electors of more than 200,000 and having a vote share of more than 25% or 30%.

Lets look it in different years.

In the First scatterplot(facet by years) we can see a pattern, more are more candidates in every metioned year got a similar vote share but at a much higher total electors.

The very shift in the total electors we can see in the second graph which is of the first election(1977) and the last(2014). Also the variance in 1977 election is greater than in 2014.

There could be two reasons for this:

1. Due to population growth.

2. Due to a consistent use of politics of populism.

We will investigate furthur on our second point.

9. Cast Divison In Parliament

India is a cast Based society, It thousands of casts which play an important role in election, there are some reserved seats in the parliament on which candidates from a particular cast can contest. Lets see how many cast reserve seats have been in the parliament through out the years of election.

There are two main cast who has reservation

1) Schedule cast(SC) 2) Schedule cast(ST)

Every body else contest in genral category.

##  GEN   SC   ST 
## 4185  793  408

We can see from above that the seats reserves for candidates from particular cast are barely increased. for SC it is from 78 in 1977 to 84 in 2014 and for ST it went from 38 in 1977 to 47 in 2014.

This may reflect that cast dominance is incresed in India through out these years.

Bivariant Analysis

Features relations observation and Surprices With that

  • Though the women’s candidate are lower in numbers but womens secured a good meadia in proportion then men, more women (in %) get good proportion of vote then men.

  • There is a slow increase in women winning candidate through out the year but surprizingly their vote share drops down in the later elections.

  • Independent candidates are may be the highest in contesting the elecctions but they are most likely ones to loose the election. On the other hand only few political parties gets the majority of seats.

  • The top states which are concious of democracy are either union teritories or the states which are dealing with some kind of a militant insurgency in their state.

Strongest Relationship

The most strongest relation is between the electors and the vote percentage. Most of the winning candidate have the vote share between 20% to 40% but it is less likely to have a higher vote share when total electors are very high.

Also it is very less likely to have a vote share of more than 50% even for a small value of total electors.

Another intresting point in the above relation is that as every next election the electors increses but the variance in the proportion gets slightly decreased.

Multivariate Plots Section

1. Electors vs proportion by year

Here the change in color as we go down to up in total electors represents the population growth through out the years, but also we are looking at a decrease in variance as we grow in poulation.

2.Electors vs Vote share By Gender

In the first chart we can see that women did better in getting higher votes in constituencies with large electors then men.

In the seconds graph we can see that 2014 has been one of the most successfull year for women candidates for larger constituencues in getting higher vote share.

3.Top Political Parties

Now we are going to see how the vote share varies in top political parties through out the year.

We can see that out of top ten parties two parties have the largest share which are the BJP and INC.

## top_parties$year: 1977
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##      0    295      7     22      2    152      0      0      0      0 
## -------------------------------------------------------- 
## top_parties$year: 1980
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##      0      0     10     37     16      0    353      0      0      0 
## -------------------------------------------------------- 
## top_parties$year: 1984
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##      2      0      6     22      2    414      0      0      0     30 
## -------------------------------------------------------- 
## top_parties$year: 1989
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##     85      0     12     33      0    197      0    143      0      2 
## -------------------------------------------------------- 
## top_parties$year: 1991
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##    120      0     14     35      0    244      0     59      0     13 
## -------------------------------------------------------- 
## top_parties$year: 1996
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##    160      0     12     32     17    141      0     46     17     16 
## -------------------------------------------------------- 
## top_parties$year: 1998
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##    182      0      9     32      6    141      0      6     20     12 
## -------------------------------------------------------- 
## top_parties$year: 1999
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##    181      0      4     33     12    115      0      0     26     29 
## -------------------------------------------------------- 
## top_parties$year: 2004
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##    138      0     10     43     16    145      0      0     36      5 
## -------------------------------------------------------- 
## top_parties$year: 2009
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##    116      0      4     16     18    206      0      0     23      6 
## -------------------------------------------------------- 
## top_parties$year: 2014
##    BJP    BLD    CPI    CPM    DMK    INC INC(I)     JD     SP    TDP 
##    282      0      1      9      0     44      0      0      5     16

From the above summary we can say that INC(Indian National congress) has been the most dominating party through out the years, But we see a fascinating shift in 2014 where INC went down to 44 seats from 206 seats in 2009. And BJP which starts from 2 seats in 1984 went up to 282 seats in 2014.

But Only few political parties get the majority of seats in the election.

Note:

  • BLD is an outcome of a grand allaince(merger of many parties) to defeat INC. They won the election but the party as well their government didn’t last.

  • INC(I) was a new party but with the old politicians of INC. because the court banned INC to contest election, so they made a new party to contest the election.

4. Independents Vote share analysis

Independent candidates had experienced a lot of ups and downs through out the years. For women it looks a lost battle. And even for men it’s not so good.

According to above plot its hard to find any pattern in it.

5.Top Parties Performance

In this we are going to look at the performance of top three parties which we found above. They are:

  • India National Congress (INC)
  • BHartiya Janta Party (BJP)
  • Communist Party Of India-Marxist (CPM)

INC may be the longest running party to woo the voters but from the above graph we can say that BJP has an edge in higher electors constituncies.

Here we are looking at the performance of top two political parties having a very opposite performance chart, One is BJP who went from nothing to majority in the parliment, and the Other is INC who went from Allmost all the seats to just 44 seats in the parliament.

Party winning candidates through out the years by gender

FOr both the parties women share in each year is always less than 15 percent and in most of the cases its less than 10%. but we see a overall increase in women cadidates winnning in the recent years.

For BJP 2014 is the first time that 30 women(highest ever in BJP) won the election.

Final Plots and Summary

1. Cast Divison In Parliament

India is a cast Based society, It thousands of casts which play an important role in election, there are some reserved seats in the parliament on which candidates from a particular cast can contest. Lets see how many cast reserve seats have been in the parliament through out the years of election.

There are two main cast who has reservation

1) Schedule cast(SC) 2) Schedule cast(ST)

Every body else contest in genral category.

We can see from above that the seats reserves for candidates from particular cast were increased. for SC it is from 78 in 1977 to 84 in 2014 and for ST it went from 38 in 1977 to 47 in 2014. As a result of that seats comes under general category gets decreased.

This reflect how cast dominance is incresed in India through out these years.

2. States concious of democracy

We will see in this graph how much citizen of each state take election seriously or went to poll their votes.

Above Bar Graph shows that Lakshdweep an island which is not a state but a union territory has the population which are more concious about election than any other state.

One intresting things in the above chart is that states like West bengal, Nagaland, Tripura which are suffering from gross insurgent military groups in their region has the high vote percentage than other states.

Kerela which is the most literate state in the country is also in the top 10.

3. Political Shift in Parliament

From the above chart we can say that INC(Indian National congress) has been the most dominating party through out the years, but we see a fascinating shift in 2014 where INC went down to 44 seats from 206 seats in 2009. And BJP which starts from 2 seats in 1984 went up to 282 seats in 2014.

But the most bitter truth of all is that only few political parities took majority of seats.

Note:

  • BLD is an outcome of a grand allaince(merger of many parties) to defeat INC. They won the election but the party as well their government didn’t last.

  • INC(I) which won in 1980 was a new party but with the old politicians of INC. because the court banned INC to contest election, so they made a new party to contest the election.


Reflection

Issues with the data

The data has several issues.

  • There is very few information given about the candidate.

  • The data is not so dirty but I had to remove some observation as it gets zero votes and also won the election.

  • I had to make few variables like won_vote(represents the vote of winning cadidate), won(represent weather a candidate won or not), proportion(how much percentage of vote share a cadidate get)

  • Many political party had party names with different spellings so I had to use party’s abbreviation.

  • I had to change many columns into factors.

Achieved in getting

  • A cleaner picture of parlimentary seats reservtion.

  • A rough idea how winning cadidates proportion related to voters population in their constituencies.

  • How dominance of top political parties shift over the years.

  • See a clear gender bias among the political parties.

  • Performance of women, performance of independent candidates.

  • get a fair picture about which state’s cititzer are intrested/concious about election or voting.

  • How only few political parties ruling India for so long.

Could be done in future

  • performance of winning candidates who run for election more than one time.

  • performance of runnerups who run for next election.

  • performance of political parties based on how many candidate they change in every election.

  • which political party is dominating which state.