Wednesday, July 21, 2010

The state of the Wiki in India (part 1)

I am really happy that Salmaan Haroon accepted my invitation to be a guest on my blog. He took the time to write down his vision for Wikimedia in India. People who want to express a contrasting view are invited when they are as thoughtful.
Thanks,
      GerardM

I wonder what comes to mind when someone mentions India to outsiders; yoga, meditation, the Taj Mahal, strong deterring images of a hot third world country, huddled masses a billion strong- the second largest population in the world.


India currently provides the fourth largest traffic to Wikipedia according to Alexa.com, behind United States, Japan and Germany. The thing to bear in mind with this statistic is the sheer size of the audience here - Germany with its 81 million, Japan with its 127 million both represent a far higher number of internet users than India with over a billion in population. Currently, it only accounts for 7% of the population that is providing the 4th largest traffic to Wikipedia.org, in contrast to 75% in the case of Japan. This is staggeringly low even among developing nations; 36% of Brazil and 31% of China is currently online. The majority of the traffic originating from India is directed towards Wikipedia and the English Wikipedia in particular, the bulk of the traffic is being generated from Delhi and Mumbai according to Google Trends, followed by the states of Tamil Nadu and Karnataka. Also of note in the Google Trend figures is that traffic has been steadily rising since last year, nearly doubling from January 2009.


What are the factors behind a drastically low figure of 7% of internet usage in the Indian population in comparison to say China? It’s actually a myriad of factors in addition to poverty, corruption and highly uneven distribution of wealth that are endemic to developing countries. First it’s the uneven  population distribution, 70% of India still resides in rural areas, not as affected from the rapid urbanization in the recent years. There are currently 47 cities with over one million residents, the city of Mumbai is currently the largest city in India and the 5th largest in the world in terms of population with over 20 million denizens, followed by Delhi, the capital of the country as the 8th largest metropolitan in the world with over 18 million inhabitants. Bangalore, the Silicon Valley of India is also a very important city in terms of internet usage with over 5 million of some of the countries most literate and technically versatile population. Other factors besides an uneven population distribution like the languid pace of IT infrastructure development, with years of bureaucratic protocols hampering the speed of technological progress and comparatively high cost of Internet availability are also to blame.


So what’s the point of interest for Wikimedia in the country? India is poised for exponential growth in terms of internet users in the coming year with the advent of newer technologies and massive investments by Private companies, this combined with the strong economic growth in the recent year rivalling china puts India on the forefront of growth in terms of internet users. India is already the second largest market in the world for mobile internet users. Secondly, one third of the population in India is currently under the age of 15, representing a massive opportunity in terms of social networking and other web 2.0 properties. A young population would not only provide a sustained growth for years to come but easier accessibility and visibility to a wider population.


The biggest Issue in engaging the Indian audience is language. India has one of the worlds most culturally, linguistically and genetically diverse populations. Here are some facts about Indian languages - There are 22 languages that are recognized as official languages by the Indian government, there are over 1600 other languages and dialects in total. India has two major and distinctive linguistic families: the Indo-Aryan used by up to 70% Indians and the Dravidian with over 22% of speakers. The Indo Aryan family easily dominates in terms of usage with Hindi having the largest number of speakers with over 350-400 million speakers, Hindi also happens to be the principal official language of the Republic of India. The second most spoken language in India is Bengali with over 83 million speakers followed by Telugu at over 74 million and Marathi at 71 million speakers and Tamil at 60 million speakers (all figures from 2001 census of India). The major Dravidian Languages are Tamil, Telugu, Kannada and Malayalam. There are also some Austro-Asiatic and Tibeto-Burman languages making up a small minority.

Hindi being designated as the official Language has been a heated issue in the country for over 50 years. The official base for Hindi speakers is centered on North and Central India, including the capital and the seat of the government- Delhi and its neighbouring states. The rest of the country particularly some southern states have been less accepting of Hindi with objections towards a foreign language being unnecessarily forced on them while their own distinct cultural and linguistic identity was being de-emphasized since Hindi is not openly spoken or used in many southern states. Their have been multiple instances of Anti-Hindi agitations historically in the Southern states. Linguistic rights have become an even more delicate issue with some political parties voicing their opposition to Hindi.


The issue of note here is the lingua Franca of the country - English. English has enjoyed a long history of usage in India for centuries under the British rule; it became established as the language of administration and higher education. It is officially recognized as a subsidiary official Language or secondary official language by the Republic of India. The importance of English is agreed upon by the vast majority of the literates, aside from all the Anti-Hindi position; English is considered a necessity to remain globally competitive. English breaches the language barrier in many of the southern states reaching across very culturally and linguistically bifurcated communities. In some Metropolitan areas, it’s not unusual to find families who use and prefer English as their mother tongue and have been doing so for decades. A recent discussion about the Anti-Hindi issue on one news channel raised this issue to a young studio audience, when a suggestion of parting with English and having Hindi in addition to a mother tongue was raised. This suggestion was laughed off by the majority of the audience since the program itself was in English onring states. The rest of the country particularly some southern states have an English News Channel, It is also the only language the entire audience understood comprising of speakers from different states.


I had the privilege of attending Wikimania this year in Gdansk. In one of the discussions related to Strategic Planning, I was asked about the history and usage of traditional encyclopedias in India. Whether the encyclopedias we had during school were in English, I may not have been the right candidate for this question being a bit young; I was born at the right time for the Tech bubble, so I had Encyclop√¶dia Britannica and eventually Wikipedia on my computer before I had to ever look for an encyclopedia in a bookstore. The answer however was yes. The point that I wanted to make was this, the computer that I used and the majority of the country used for years were pre-dominantly Windows based, Microsoft Windows didn’t have support for different Indian languages without the Language interface pack which came out much later and even then it was not used by the majority of the user base due to issues with character input, this might be in contrast to say China where localization was available easily and widely. So the majority of the Indian community that is online have to understand and use English to some degree of comfort. This might also point to some co-relation between literacy rate, economic factors and Internet usage in India.


India represents a very rich cultural and linguistic base for the rest of the world. Going ahead, the Wikimedia foundation needs to decide if it’s interested in pursuing a strong user base in India Full Stop regardless of which language of Wikipedia it uses or if it wants a diverse linguistic base. I personally would suggest establishing a strong user base first and getting a high visibility rate in India before a diverse base could be nurtured. So far none of Wikipedia’s Indian language projects have been able to break the 100,000 article mark- Hindi for example has slightly over 50,000 articles with over 250 active users, followed by Telugu at over 45,000 articles and Marathi at over 30,000 articles, needless to say these statistics don’t take into account the quality and the size of the
articles in question.


Recently, several Translation projects for Wikipedia have been being undertaken by Google and others community supported machine based translation project which might offer a hopeful avenue for growth but the community consensus so far, has been against the use of machine translation especially since Indian language present a very distinct linguistic subset. There have also been some concerns that this might be against the nature of the projects since such an approach is not community oriented and the articles themselves aren’t contributed to by individual editors, perhaps a more organic or a hybrid approach would be needed to tackle this issue.

Salmaan Haroon

All Statistics and Figures mentioned above are taken from the English Wikipedia except where explicitly mentioned otherwise. Traffic statistics from Alexa.com for Wikipedia.org, traffic analysis from Google Trends.
Post a Comment