Sunday, May 01, 2011

#SEO and #Wikipedia


It is often observed that when a Wikipedia article is written, it does well with the search engines. Still search engine optimisation has relevance to the Wikimedia communities. Relevance because of our aim of sharing information with all people of the world and because we want to double our reach in five years.

When more people find our articles, we do a better job in achieving our aims. There are some things we can do to increase our reach and there are things the search engines should do.

The most obvious thing we can do is have the articles to be found. They should be rich in wiki-links and illustrations and even quoting sources makes some difference. We can enhance the ability of people to read our articles by providing web fonts but this should also impact the behaviour of search engines.

When the search engines know that we provide web-fonts, they can make searches like കൗതുകം and കൌതുകം equivalent; for a native Malayalam these are the same and as we present a text in the latest Unicode encoding a search engine could present us as a result for both searches. 

Our articles are often found on the first page of search results. If your aim is more traffic for Wikipedia, write articles that are popular when you write it.
Thanks,
      GerardM

3 comments:

Bawolff said...

In reply to:
>When the search engines know that we
>provide web-fonts, they can make
>searches like കൗതുകം and കൌതുകം
>equivalent; for a native Malayalam these
>are the same and as we present a text in the latest Unicode encoding a search
>engine could present us as a result for
>both searches.

Wait, so the search engine are going to normalize in one direction, and only one direction? That doesn't really make sense. If the search engines are going to normalize the different Malayalam sequences to the same thing, they're going to make both equivalent and searches for either be exactly the same... Also, how does web fonts enter into this at all? The font is only going to change how its displayed. Google et al are computers, they only look at the code point of each character. They don't need any fonts.

GerardM said...

You can only normalise when people have the right fonts to see a text. As we use the latest version of Unicode AND provide webfonts, anyone can read our text.

Knowing that we provide webfonts is why Google et al can normalise for OUR website.
Thanks,
GerardM

Santhosh Thottingal സന്തോഷ് തോട്ടിങ്ങല്‍ said...

This is not that much simple. Search engines can not (and should not) do that normalization unless there is a canonical equivalence defined between two code points. Remember, search is not done by search engines alone. Everywhere there is string comparison. So the ideal solution is getting a canonical equivalence definition for these two code points, optionally deprecate one. Once we implement that equivalence in glibc, cldr, icu etc, our collation, searching, string comparisons becomes error free.

For the കൗതുകം/കൌതുകം patterns, webfonts does not require since all fonts shows it in proper way. As per language both words are same with same meaning, but technically they are completely different words. That is a problem.