Wednesday, February 29, 2012

Twitter works well in English; 140 characters suffice. The objective of a tweet is to say something using only a few words. For some languages 140 characters provides a wealth of space (think Chinese) while other languages use many characters for a compound character (think Tamil) and it is hard to get the message out.

When you reserve space in a database for comments, the same logic applies. It may be an idea to configure the number of characters but the database itself should allow for a language like Tamil.

This is one aspect of internationalisation that is a bit deeper then making sure that characters are shown correctly. It is cool to see people who champion their language pick up on issues like this. Srikanth submitted bug 34586 and it is getting an adequate response.

To my complete surprise, the "other" Wikimedia projects are now supported by the Wikipedia Mobile software. It is a bit of a hack because you may get a security message on some browsers stating "wikipedia" as the target with you having requesting your project in stead.

When your #Wikipedia does not have a mobile main page yet, with some HTML trickery you can have one. One of the more recent projects to present a menu is the Kannada Wikipedia, I was also told that the Odia Wikipedia recently acquired a menu.

For languages like Odia and Kannada there is one additional bit of nuisance. It is the lack of support for the script used for the language in most browsers. It would be cool when the support of WebFonts is part of the mobile roadmap.
Thanks,
GerardM

The argument for closure is compelling: since the start in 2003 only 273 articles have been written and most activity is by bots, vandals and from people cleaning up the mess. A week ago I found someone willing to localise the Babel extension in Zulu but I am afraid that is as much as I can expect.

Monday, February 27, 2012

Only those messages that need to be customised should exist locally. When many messages have only been localised like on the Assamese Wikipedia, the challenge is to copy the messages to translatewiki.net and once the LocalisationUpdate has done "its thing", remove the local messages.

Robin's tool does provide the necessary information about the messages that exist only on the local Wiki. For somebody who knows Assamese, the first step is a matter of copying the existing messages to translatewiki and verifying if they are correct. This improves support for the Assamese language on every Wiki.

The job of deleting the local messages that are the same is something best left to a bot. It is best practice to remove them because it prevents problems down the line. A bot should perform this task on every WMF wiki.

Running Robin's tool is a task that should be run on any MediaWiki wiki. When this task has been performed on many wikis, there will be enough data to indicate the messages that typically have a local override. This could be the next step in optimising your local MediaWiki setup.
Thanks,
GerardM

When this font becomes available at for instance the English Wikisource, the text of the Epic of Gilgamesh or the Cyrus cylinder can be made available in the original format and we can be reasonably sure that any device that supports web fonts will show the characters as defined in the Akkadian font.

The Akkadian font is available in the Debian distribution and was developed by George Douros. There are several other fonts for historic texts created by Mr Douros. They can be made available. This however will happen on request.

English Wikipedia without WebFont support

There is a point to making WebFonts available at this time. However, we will not push this functionality until we are done with a unified interface which integrates both input methods and web fonts. Currently the easiest way of identifying text in another language is using templates eg {{lang-nl|Dit is Nederlands}}.
Thanks,
GerardM

Friday, February 24, 2012

The Cyrus cylinder is arguably one important piece of world heritage. It consists of a text written in the cuneiform script. The text is still relevant after two thousand year so much so, that there is a TED presentation about it.

The text is in translation available at Wikisource. Given that the English Wikisource will support WebFonts next week, having a freely licensed font to do justice to the Cyrus cylinder is the next step before we can reliably show the text as it exists on the cylinder. Having a translation adds value to the source.

Obviously texts in the English language will not be affected. We do not
support fonts for the Latin script yet but we could when it makes sense.
The texts that are to be affected by WebFonts need to be identified as
being in a specific language. For Hebrew it would be something like {{Lang-he}}.

The Localisation team wants to have a better user interface to indicate
texts in another language. This is when we expect to be ready to suggest
enabling WebFonts on something like the English Wikipedia.

Your #Wikipedia may have many messages that can be safely deleted. Robin created just the tool to inform you what messages can be safely deleted because they are identical to the ones provided to you by MediaWiki.

These are identical messages on the Marathi Wikipedia that can be deleted because they are exactly the same. This is a job that any local admin can perform. It will make sure that when a message is changed, the updated message will become available. There is even a list that can be used to have a bot delete these messages; this may save someone from a lot of boring work.

Really interesting are the custom messages; they are the ones where the local message differs from the message provided by MediaWiki. They come in two flavours:

When the messages that are correct on any wiki are included at translatewiki.net, they will become available within a couple of days thanks to the LocalisationUpdate process. At that time these messages can be safely deleted.

Thursday, February 23, 2012

When you are happily using your Android, it is still possible that it does not support the language that you are really using. As a consequence Android will in its standard configuration not enable you to indicate that you are using another language then the pre-configured ones. As a consequence it will not indicate in the meta data the correct language, languages like Neapolitan, Marathi or Frisian.

The "More locale" app allows you to add locales to your system. What I know is that you can add one with an ISO-639 code and you can specify a country code. particularly when your smart phone already supports the script associated with the language the result is wonderful. You can have Neapolitan or a Frisian enabled.

Given that the "Wikipedia mobile" app is localised in many languages that Android does not have as a standard locale, you may find that the "More locale" app will magically turn the localisation on in your language as well.

The screenshot you see to the right is taken on Amir's Android phone. His system has standard support for the Devanagari script and consequently it shows really well in our eyes. It is really relevant to check if a future phone supports you. Later versions of Android tend to do a better job.

Languages like Amharic, Gujarati and Malayalam do not have the same standard font support. For the scripts associated with these languages web fonts would be great but sadly we have not managed to make them work. Another option could be including fonts with our smartphone app. However tempting, it is much to be preferred when fonts can be included on the system level. Having the fonts is great when all you want to do is display texts, I am sure that people want to enter text in their language as well.

All in all, you can trick Android to use your language. The real challenge will be to enable the use of scripts that have no default support. Default support will make the "More locale" app redundant but until that time, it is very much recommended.
Thanks,
GerardM

Wednesday, February 22, 2012

Some of the localisations in use for MediaWiki need improvement. It can be for all kinds of reasons including bad grammar, typos, spelling errors but also because a message changed. When you are an administrator on a wiki, you can change such MediaWiki messages on your local wiki however, you should not.

When you improve a message locally, the result will be local and it will override the messages that come with the MediaWiki software. This includes the daily messages that are new or have been updated and come to you through the LocalisationUpdate process.

When you improve a message at translatewiki.net, you will find the experience much more productive compared to working on your local wiki. Tools are available to you that suggest translations, help you proofread what is already there. If anything, translatewiki.net exist to make you as a localiser as efficient as possible. You will find once you get going, you are at least ten times more productive.

For some people localising on-line is what prevents them from being productive. When the local connectivity to the servers of translatewiki.net leave a lot to be desired, it is sometimes better to work off-line.

Getting the message out why you should localise at translatewiki.net has its answer in this way. However, there is a reason why some messages need to be changed locally. It is because the message needs to be adapted to the local wiki. They may be about policies, contact information or whatever else is not standard. Only a few people need to bother with this and they are the local admins of a wiki, they are the ones who can change messages locally.
Thanks,
GerardM

One of the languages in the Incubator is Mapudungun. It is spoken in Chile and Argentina and many of the Mapuche people are offended by the English name used for their language. The result is that they refuse to have anything to do with the effort to have a Wikipedia for their language.

The Wikipedia article on the Mapuche language mentions the name that should not be used and it states: "The latter was the name given to the Mapuche by the Spaniards but nowadays both the Mapuche and others avoid this usage". The question is; why does it say Araucanian, where does it come from and how can it be fixed.

Tuesday, February 21, 2012

Git is a version control system. The Wikimedia Foundation is going to use it and frankly, I have no clue why. What I do understand is that the waving around of hands and the exclamation of buzz words looks good. I am not aware of the existence of a cost benefit analysis, something that would make things understandable.

My impression is that because of the introduction of git, the quality of the localisations and the internationalisations of MediaWiki and its extensions will be negatively impacted.

As it is, changes committed to Subversion, the current version control system, result in new messages in translatewiki.net within a day. These new messages become available to the twn community and as a result there is a steady stream of messages that need textual improvement and/or an explanation. As development progresses, the messages improve and the localisation is under way. At the time new functionality is ready for deployment it is localised in many languages.

With Git, there is no such thing as new messages that make their way to translatewiki.net. As I understand it this will only happen when the developer is done, the software is reviewed and is moved into "production" status. This means no internationalisation review, no textual improvements, no help texts and no localisations once software goes into production.

When a translator costs one hundredth part of a developer and we support three hundred languages then reducing the cost of development is not necessarily where we make our big wins. Our language support benefits from the current model and I fail to see why we should git going.

Really, what is the global benefit from using git over Subversion ?
Thanks,
GerardM

Monday, February 20, 2012

#Wikipedia is without a doubt a good example of a multilingual project. On a good day we support 280+ languages. On a sad day there are more then 6000+ languages we do not yet support.

This year a lot has happened at the Wikimedia Foundation to support languages. As we now have a Localisation team, we support quite a few languages with web fonts and input methods. We are improving our translation capability. The internationalisation for our PHP and JavaScript code supports grammatical gender and plural.

Best of all is what we aim to achieve: we make editing MediaWiki as easy as it is to edit in English.

This year we will host an "office hour" at18:00 UTCon IRC. The subject is very much how we can help you and how you can help us support your language. We welcome any suggestions that improves the use of MediaWiki in your language:

Having a #font for the Ge'ez script allows you to see what a language like the Amharic or the Tigrinya language look like. Chances are that many people who have the font on their system can also read a language that uses the script.

For those people who do not have a font for this script on their computer, the Amharic Wikipedia now supports the AbyssinicaSIL font. For the people who do not know this, Amharic is spoken in Ethiopia.

When you use the current Firefox, you will get an experience like the one below. We already know that a next version of Chrome will properly support it. I do not know about the other browsers. What I do know is that Firefox is doing a sterling job for me.

Friday, February 17, 2012

The American entertainment industry is struggling to increase its margins. This is done on the basis of "ownership of the industry". Read this Politico article for an update on how they want to sway your opinion.

There are several things in their approach that upset me. They call Wikipedia a company and by doing this they deny the existence of the community that created the wealth of knowledge embedded in our projects. The Wikimedia Foundation is like them, not an organisation that creates content but an organisation that represent those that do... with one stark difference; it is not business the WMF represents but people.

In the Politico article, it is said that putting the "stars" in the limelight of their campaign may hurt their value. It hurt Metallica in the past and it may hurt others in the future. This is in stark contrast with what happens when content is not an "industrial product". Our community does have people able and willing to explain the value we represent to our world. Its true value can be measured for instance in our traffic statistics for Wikipedia or in the CD's and books provided to schools all over the world. We do not complain when people reuse the content of our Wikis, we know it increases the number of people our content reaches. When it does, it eventually has more people come to us directly and contribute. The same mechanism works for the industry; the people who are branded as "pirates" spend most on industry produced content.

The RIAA and the MPAA are proud to be American; it is in their name. This begs the question if they represent the industry as a whole or only American companies. When what they propose is implemented, it will affect the Internet negatively and it will have its effects beyond the US border.

Compare this to the WMF; it is proud to support over 280 language communities. It provides infrastructure to all of them and it will welcome even more people and even more languages. The people it enables are industrious and they provide the world with the quality content they create. They do not need "industry representation" to make it useful or valuable. Sadly it is the distorted message of how industry is represented that makes it necessary to waste time, effort and emotions on what should be a non-issue.
Thanks,
GerardM

Thursday, February 16, 2012

#McAfee scans links from Facebook. This would be a good idea when done right. As it is, links to Wikipedia are flagged on Facebook as "potentially unsafe by our trusted partner".

This is rather silly because the Wikimedia Foundation has been a partner of Facebook longer then this new "trusted partner". Facebook, I trust that you can agree that you have got it wrong. It is "wonderful" to have trusted partners who talk on your behalf and get it so magnificently wrong.

The National library warns that it may still be possible that the font does not look properly. This is why we really need to test this font. We want to know if it works properly in what configurations. It is likely that some legacy systems will just not work and we need to know if providing this font works for you.

In #Ubuntu, the fonts for #Malayalam and several other scripts are a bit broken. As you can see in the Google+ screenshot, headers are not displayed. This is probably because the font provided by Ubuntu does not support bold or something.

Obviously, the best solution is when the fonts used by Ubuntu provide the full support applications expect. When these fonts fail as they do, it is possible to provide web fonts. Google is very much into web fonts but it seems that they have a preoccupation with the one script that is already so well supported.

It would be so cool when the people who develop fonts would concentrate their efforts where they provide the most benefit. Enabling the full functionality of a script and its associated languages on the Internet is arguably more relevant then prettifying single webpages.
Thanks,
GerardM

Wednesday, February 15, 2012

#Wikipedia supports more then 280 different projects and each and every one is in one language. In the name of the project you find an indication of what language it is, in the meta-data you find what language it is.

As a consequence it should be obvious that the tk.wikipedia.org, the one in the Turkmen language is not written in Turkish. Your software does not need to guess what language it is because we make it plain. It says somewhere in the code:

In my understanding it means that the content that follows is in the "tk" language. Yes, I will continue to report to you that you reported an incorrect language. The only thing, you do not need to guess what language it is, we do indicate it properly.
Thanks,
GerardM

Tuesday, February 14, 2012

In #India, people realise that having a Wikisource is really relevant. Recently we celebrated the start of the Marathi Wikisource and now the spotlight is on the creation of a Gujarati Wikisource.

The Marathi Wikisource is special because the Maharashtra government promised source material to be published on their new project. We can hope that this will set an example for other Indian state governments.

The information needed at the moment of creation is being collected. They already created a logo for the Gujarati Wikisource. They finished the localisation of the compulsory messages in record time and now it is time for the language committee to assess if all the requirements are met.

When the language committee is satisfied, they will ask the Wikimedia Foundation board for permission to create the new project. This formality is typically granted and it is then for the WMF operations people to create a new project. It is at this stage that several configuration options have to be available and it is great that this information is already being collected.
Thanks,
GerardM

For all those people who have no clue what this scientific theory is about, it is wonderful that they have at least the opportunity to learn what Mr Darwin said in his own time. When they object to the evolution theory, they have to appreciate that science did not abandon this theory and has amassed a wealth of additional supporting evidence not known at the time of Mr Darwin.

The significance of this new wonderful resource is that is shows the humble beginnings of a theory that is still as vibrant as it used to be. It is still as controversial as it used to be but now it is possible to learn about how this theory evolved during the lifetime of Mr Darwin. To learn what science has learned in additional knowledge, you have to study the subject itself.

Many people have opposed the evolution theory. It would be a worthy Wikisource project to accumulate the original sources of the ideas proposed against the evolution theory.
Thanks,
GerardM

Sunday, February 12, 2012

In order to improve the scripting in MediaWiki, it will become possible to add LUA code. Many of the existing templates are not really examples of readability or performance and, in addition to this many templates have been copied to many other wikis.

Changing these templates, making them useful in other languages is quite a job.and it requires knowledge of the scripting language(s) involved and obviously the language the script(s) that need supporting.

At translatewiki.net a request has been made to use messages for templates that are used on many projects. Templates like "done", "not done", "support", "oppose" and "neutral"are the ones suggested. It is one of those suggestions that gets a lot of sympathy from the translatewiki community.

Localising the messages and getting them to every project is a problem we have a solution for. Making sure that templates exist and remain on the latest version will need some thought. With the introduction of LUA imminent, this is an opportune moment to give the distribution of templates some thought.

Doing this by hand is already hard if not impossible. With Wikipedia going strongly towards 300 projects, it would be great when the support in the many languages of our communities can already start in the Incubator.
Thanks,
GerardM

Thursday, February 09, 2012

Friends of mine have a running shop. On Monday and Thursday morning I go there because Karin does not feel secure when she is alone.

Today, when I walked to the shop I saw her standing outside, there was a lot of police. They had fire bombed the shop next door for a third time. This time with so much explosive material that a wall between the two shops collapsed.

These !@#$% do not consider what collateral damage may result; there are apartments above the shop. A technical survey had to be done to establish the condition of the roof. They do not consider the trauma they do to people like Kees and Patty (the shop keepers) and Karin, the sales lady. It does affect me; it is why I write this blog post.

Such Mafia practices are what you do not expect in your neighbourhood. It is what you do not expect to touch your life, your mood. Today I am good for nothing so that is what I will do for the rest of the day.
Thanks and sorry for imposing on you all,
Gerard

Wednesday, February 08, 2012

#MediaWiki aims
to support over 300 languages. Not all languages are equal in the way
they express things like a plural. The rule for English is easy: one,
many. One apple many apples never mind how many, it will be apples.

Other languages express plurals different. How they are expressed is defined in the CLDR standard. They use a formula and consequently languages that express plural in the same way share the same formula.

These
formulas, when known for a language, will be used in MediaWiki when
generating messages including numbers. As we already know the language,
we just have to look up how a number fits what rule. Using these
formulas helps because it prevents us from having to write separate code
for each and every language and, when the rules become known for more
languages, they will just fit in.

The
challenge is to ensure that the CLDR will support every one of the
existing 6000+ languages of which we at present only support a twentieth.

Monday, February 06, 2012

On the Incubator, a lot of thought goes into providing the best functionality to the people who want a project in a new language. Often these people do not speak English. Providing information to these fledgling projects has so far been done in English; we did not really have an easy way of providing translations.

Today at Incubator, the Translate extension has been enabled. People who do know English are now able to translate essential information in their language. This will make it easier for people to find their way and work on the initial documents for a language.

With this functionality in place, it is also a great moment to update these documents. Spending time now saves time later. SPQRobin made the request, Niklas did the configuration and once Robin is ready it will be for the Incubator community to prepare the way for those people who do not know English.
Thanks,
GerardM

When I use the #Wikipedia label, an article is deemed to be of interest to the "Planet Wikimedia". I write often about Wikipedia but it is hardly the only thing that is relevant to what is called the planet Wikimedia. Wikipedia is not the only Wikimedia project and it is not the only project that has relevance.

I wrote about Wikisource the other day and it was read by people from the Italian Wikisource community. They did comment on it not being available on the planet.

The issue of access to my blog being limited to Wikipedia has been raised in the past. The suggestion was made that I should use the label "Wikipedia" more liberally. That is something I refused because it destroys the relevance of the label.

Some may consider this not the best way to get things changed. Other ways have been tried. This will however register (again) the reluctance to consider other projects. Projects that are essential to honour the aim of our foundation because not all knowledge is encyclopaedic.
Thanks,
GerardM

Sunday, February 05, 2012

In the #Sakha Republic, the cost of using the Internet depends on the location of the servers serving the data. The rationale is that it is expensive to get content out of the rest of the world. As a consequence the local websites and forums are more popular and those websites in the local language are doing fine.

We do have a Wikipedia in the Sakha language and we do care about the availability of Wikipedia at zero costs. So the question is very much how can we reduce the costs for the Sakha people for Wikipedia (in any language).

If the ISP are willing to bring the information into their country for everyone, they may consider the use of Squid. What the Wikimedia Foundation then can consider is supporting this squid by invalidating its content in the same way it does its own squids.
Thanks,
GerardM

At #Fosdem you meet people from all over Europe. I met with one Wikisourcerer and we had a talk about this project. Even though it is an official Wikimedia Foundation project, it is very much a Cinderella; an unloved daughter that is blossoming into beauty.

The one thing Wikisource is lacking is a bit of TLC. Compare it with the attention showered over GLAM, the contrast could not be starker. We all love GLAM (I do), it has had a global WMF fellow and now there is a person doing the works for the USA and it has the attention of many chapters. Wikisource is what some people do.

Actually what these people do, particularly when they are organised is very relevant and it supports one of the pillars of what we do; the public domain. The public domain is strengthened because it fulfils the role industry does not. It ensures the availability of works that are considered to be of no commercial interest. It brings attention to works that have a timeless value and for many languages it actually ensures that schools have books for kids to read.

There are several projects where people are digitising books, transliterating them. The one thing lacking is learning from best practices. The best source projects organised the acquisitions of books, the scanning, the transliteration, the distribution. The Malayalam Wikisourcerers for instance published a CD that was send to all schools in Kerala.

Wikisource is a project where our chapters are particularly well positioned to engage in Wikisource. It is practical and when done well it will help us in preserve and strengthen the public domain.
Thanks,
GerardM

A conference like #FOSDEM is big, some of the talks are influential and consequently it is a place where people are to learn about what is happening and how people think about technology and open source technology at that.

At breakfast we were sitting next to a gentleman working for Intel. He was very much into mobile communications so we reached out to him and discussed with him about our need to support input methods for languages like Tamil, Malayalam ... We told him that we provide web fonts.

You never know at the start of a conversation what will come out of it. What we do know is that support that is low in the technology stack has the most effect.
Thanks,
GerardM

When I publish on my blog, it is published to an international public. Once I have blogged, a tweet on #Twitter is produced to make more people interested in reading the article. When I tweet, it is intended for a worldwide public.

The moves by Google and Twitter to publish my blog, my tweets in country specific domains upsets me. By publishing it on for instance "http://ultimategerardm.blogspot.com.au/" upsets me. It upsets me because it allows for easy censorship of Blogger and Twitter. It will be Google or Twitter who decide what gets published where. Their excuse that the governments of countries ask them to do this, does not impress me much.

I am not amused with the situation. I like Blogger, its functionality is great. However, I categorically deny Google the right to box me in, kettle is now the word that comes to mind. It is not okay to restrict my access to the Internet for what I blog. It is not okay to restrict the access of the readers of blogs what they can read.

We have had our SOPA moment and like with SOPA, this is how the Internet is damaged. As this is not acceptable, I will have to reconsider what to do next. We will have to consider what to do next.

Saturday, February 04, 2012

We Wikimedians took the bus to FOSDEM. This proved to be a problem for some, the bus was full. As the buses are driving again, we are sad to have left some people out in the cold.

We are here for many reasons, some are on their way to Pune, some are here because Fosdem is awesome and a place to be. Several of us is here to meet open source developers looking for challenges and one of us is looking for the best of them.

I am here at Fosdem to talk about languages and language support. I would love to connect to other projects who are oriented to the whole of the web and all its languages.

So when you are into language and language support, find me and talk to me.
Thanks,GerardM

Friday, February 03, 2012

In time for #FOSDEM; #Wikimedia people were looking for a place to work, have great connectivity and talk shop. At BetaGroup Coworking we found a warm welcome, and experienced how its customers are provided with what they need. For us it was a white board and guess what, someone brought chocolate cake..

We found a quiet spot where we discussed the usability and the user interface of the Translate functionality. Having Jon Harald with us proved to be a boon. He did the project management for the Fundraiser localisations and brought us a large amount of observations and suggestions.

Having a white board allowed us to easily visualise our ideas and as easily replace them with something else, something better. Once we achieved a common understanding, Jon Harald's input was translated in a "user story" by Siebrand and this will eventually end up as more work for Niklas.

PS According to Jon Harald it is not cold ... not cold enough to wear a coat.
Thanks,
GerardM

Thursday, February 02, 2012

When you have a user of every #Wikipedia, it is fairly obvious that on most Wikipedias you do not know anything of that language. When I provided this information on the Wikipedia in Haitian, the message came out in French.
This was wrong on many levels particularly because the localisation of the Babel extension in Haitian is hardly recent. It pre-dates the current release of MediaWiki.

The good news is that this bug has been fixed. We know that it affected the Babel extension but it would not surprise us when it affects other extensions as well.

When you find all of a sudden many more localisations available for your language, you can assume that it is because bug 33768 has been fixed by Roan.
Thanks,
GerardM

Wednesday, February 01, 2012

There is a #Wikipedia in 280+ languages. However, not all languages are equal. Some of the languages are not recognised as languages. From a support point of view this is not good. We prefer to make use of information that is available in standards. When the standard bodies have accepted information about a language, we can be relatively certain that what we do will be exactly what is done by other applications.

For the "Normandian" Wikipedia, the language is not recognised. It does not mean that there is no support possible or available. The language can be localised at translatewiki.net. The language falls back to English but looking at the text, it may be better to have French as a fall back language. It is unlikely that we know about special needs like plural or if we need to address women using the grammatical female form.

We are reaching out to all our languages and we we aim to provide proper support to all of them. With "Nourmande" it is extremely obvious that we do not know what to do. It is unlikely that any of us has the literature that answers these questions and really, as each of our Wikipedias must have people knowing enough of their language to support us.