Archive for the ‘I18N’ Category

Official: WordPress default theme translation is forbidden

Thursday, September 6th, 2007

Whether WordPress default theme can be translated, this issue keeps popping up once in a while. While many translators want to make it usable for most people in the world, Matt Mullenweg resisted this idea with plain blank refusal. Well, at least there is a final answer now.

First the (old) news: the default theme won’t be i18n-ed. The reason in short: for the sake of simplicity.

I beg you, let’s not make a flame war on that, I am almost sure this policy won’t change, so we’d better focus on making our life easier, instead of losing precious time for pointless discussions. Thank you!

Probably due to Matt’s ever-increasing power. I have already given up long time ago, seeing that it just wastes my time. Am I wise? Probably other translators have also realised this dead-end — wp-polyglot list is usually as quiet as a mouse. Anyway, case closed.

Serendipity 的中文支援不能用

Saturday, May 5th, 2007

因為對於 WordPress 有點不滿意(Automattic 某部份人將翻譯人員看成是二等公民),所以有個念頭,嘗試一下別的 blog 能不能夠代替。剛才試過 Serendipity,簡單一句說,裏面中文支援是垃圾,不能用的,UTF-8 字符編碼的支援也像是沒有測試過一樣。要能夠使用繁體中文的話,最好資料庫只使用 Big5,並且在你的瀏覽器的語言中加入一個自創的 “tw” 語言代碼才行。一般上,瀏覽器會送出 ‘zh-hk’ 代表香港,或 ‘zh-tw’ 代表台灣,而 ‘tw’ 在 ISO 639 正式的名單中代表一種稱為 Twi 的語言。

可是,還是用英文比較好,因為看到那些中文「翻譯」就火大了。根本未試過能不能用就已經放上去,不止一部份的意義全錯,連句子中直接顯示些甚麼 ‘%s’ 也有。連試也嫌浪費時間。投降了,暫時還是 WordPress 吧,雖然資料庫的字符編碼方面也是一樣爛。目前我沒這種精神和動力去救一件有先天性心臟病的東西。

WordPress discriminates international users

Sunday, April 1st, 2007

This is NOT April Fool’s joke!

It is surprising to see that WordPress, with translations in lots of languages, actually doesn’t welcome efforts to have it properly internationalised.

From this discussion, it is said that an RTL (right-to-left) language support to WordPress theme is said to be ‘intrusive’. Being intrigued at what intrusive means? Here is the relevant change concerning the RTL support, which involves 2 lines of code change, an extra CSS file not used by default, and a few images which is supposed to be localized image files. And this RTL setting is not even enabled by default. Now, THAT particular comment is even more intrusive than the change itself, because it means people not in Europe or America should be denied the right to improve their language support, and anything better is at the mercy of English-speaking and European-language-speaking people.

Despite the fact some WordPress developers see the need for proper I18N in order to be adopted worldwide, some others reject such belief, such as:

  1. claiming any non-English and non-European language as a ‘minority’
  2. any localised theme would ’scare away’ theme designers
  3. etc, etc

Since history begins, localisation of applications has always been an uphill battle, mainly due to the difficulty to adopt every kind of difference between various languages, like punctuations, number formats, monetary formats, date/time formats, string ambiguity, to name a few. In recent years, some progress has been made, and a few large categories of obstacles (like RTL support, CJK support) is no more an unsolvable problem. Any software without proper I18N support is mainly due to lack of I18N knowledge or simple ignorance without receiving improvement requests; but this kind of blatant disregard is rare in today’s world, when globalization is one of the vital element to worldwide adoption.

KDE4 使用標準的 gettext 了

Monday, January 15th, 2007

在 KDE 早期,連 gettext 也未有複數格式的支援時,KDE 使用自創的 _n: 格式來表示複數:

msgid ""
"_n: Open %n file\n"
"Open %n files"
msgstr "開啟 %n 個檔案"

這個方案其實有很大問題。用甚麼來決定複數格式?是在 kdelibs 中某一個翻譯條目決定的:

msgid ""
"_: Dear translator, please do not translate this string in any form, but pick "
"the _right_ value out of NoPlural/TwoForms/French... If not sure what to do "
"mail thd@kde.org and coolo@kde.org, they will tell you. Better leave that out "
"if unsure, the programs will crash!!\n"
"Definition of PluralForm - to be set by the translator of kdelibs.po"
msgstr "NoPlural"

總括來說,一句定生死。可是麻煩的地方在於,中文是否永遠都沒有複數?不是!有時中文也會有複數形式的。說中文沒有複數,那只是指數字加量詞的情況,例如 1 horse, 2 horses 等等都變成「n 隻馬」而已。「你」的複數是「你們」,「她」的複數是「她們」,那些情形還是要分辨複數的,但就因為在 kdelibs 裏面的一句,就判定了中文絕對不能有複數。

但現在情況不同了。KDE4 使用 gettext 的標準方式表示複數,即是說在檔案的 header 加一句複數表示形式,不再由 kdelibs 判刑了。對於翻譯者來說,這真是喜訊一樁:

  1. 情況可以由每個檔案自行決定;
  2. 翻譯者不需要根據每種軟件分辨複數方式,減少了混亂

至於新的 gettext 的另一個格式 — msgctxt,則暫時仍沒有甚麼軟件用它。現在 gtk+ 和 qt 各用自己的格式為條目加上 context,但期望 gettext 0.15 較多人採用時,情況會改善。

What is Code Page 951 (CP951)?

Tuesday, September 12th, 2006

Most Chinese Windows users should have heard about CP950, which is the implementation of Big5 character mapping inside traditional Chinese Windows. However, what the heck is CP951? Is it somehow related to CP950? Yes! This code page exists, but is rarely mentioned in internet, and I didn’t manage to find any page that clearly documents it so far; not even inside M$’s web site.

Now most of the content is moved to another static page, since it deserves some research value. Visit that page for more detail.

CJK font testing

Monday, August 14th, 2006

These days I’ve been using lots of time testing CJK (Chinese, Japanese, Korean) fonts in browser and Linux desktop, especially the mystery of how fontconfig chooses which font to use for each glyph. CJK fonts have been notorious for fighting against each other when searching for proper font to display CJK unified glyphs. While Keith Packard (author of fontconfig) didn’t provide any in-depth explanation of how it works, everybody resorted into doing the guess work themselves. (Not even when asking him directly through email — his usual reply is that it automagically works, no useful information at all.)

I still haven’t achieved my goals yet. My goals are:

  1. Pick font according to language — For Chinese, Japanese and Korean web pages, if the lang attribute is specified, then corresponding font for each language is used. NOT always override all other fonts by Chinese ones.
  2. Pick uniformly — For all non-CJK web pages, don’t pick random font for each glyph individually. Always prefer single Chinese font (because I’m in Chinese environment), unless some glyphs are Japanese and Korean Han characters. In that case……
  3. Mix and match — Attempt to match various font face of each language together properly based on their stroke style. For example, uming (”AR PL ShanHeiSun Uni”) should mix together with some specific Japanese font (”Sazanami Mincho”, “Kochi Mincho”) and Korean font (”UnBatang”). It’s because they have very similar (Song/Ming style, or in Chinese, 宋體/明體) stroke styles.
  4. Use english for english — When displaying latin character glyphs, always use latin fonts like DejaVu and Bitstream Vera. Latin glyphs in Chinese fonts are unconditionally crappy. They are blurred for smaller sizes (though Chinese won’t setup display in smaller font size as it would become unreadable for CJK glyphs), unlike normal latin fonts that have solid outline. I’m not sure how freetype 2.2.x goes, but even if it has improvement regarding anti-aliasing when comparing with 2.0.x, it would be a long time before everybody switch to 2.2.

However, there are lots of problems in the setup:

  1. Mix and mismatch — While Song/Ming style can be matched with Serif, there is no match against Sans (or unmodulated as better term internationally). The best match against unmodulated font face is Hei style (黑體, means black style). Japanese font (”Sazanami Gothic”) and Korean one (”UnDotum”) is publicly available that closely resemble Hei style, but not Chinese. Kai style (楷體) simply is another style, not a substitution of Hei style.
  2. Hard to use English fonts for latin glyphs — Latin glyphs inside both uming and ukai are actually in monospaced Serif face. However, the punctuations and other symbols are not of uniform width, they are proportional. What should I call it? Anyway, so far the attempt to use Dejavu to override it is unsuccessful.
  3. Browser issue — Each browser may or may not have the capability to support CSS properly and pick glyphs according to language. Look at the 3 screenshots below:
    Font on Firefox Firefox
    Font on Konqueror Konqueror
    Font on Opera Opera

    Among Firefox, Opera and Konqueror, only Firefox managed to obey <span lang="xxx"></span> attribute and pick correct font among CJK ones. Konqueror only picks font from current locale when using Sans, Serif or Monospace aliases. Opera doesn’t even obey font-family in CSS (I guess it is using uming with no anti-alias). That means I have to stick to Firefox for font testing on browser.

  4. How about other fonts? — I’m pretty sure, things can go wild if other untested fonts are used, which is common among Chinese where borrowing Windows ttf (mingliu and simsun) and other commercial fonts is a common practise. Making fontconfig behave properly with these fonts is tedious work.

測試 HKSCS-2004

Monday, August 7th, 2006

剛剛寫好兩頁網頁用來測試 HKSCS-2004,結果出乎我意料之外。

當初 Arne 宣佈 uming/ukai 有了 HKSCS-2004 的支援時,的確是幾開心的,但後來用多了覺得不妥,例如字體不正(感歎號過於偏左,有些字的部件太大或者太小),缺字等等經常出現,況且這三兩天在研究各種語言的字體,所以突然心血來潮,試一試 uming/ukai 的香港字是否正常。結果有點失望,2006 年 5 月的版本缺了 5 個字 (Unicode 碼位):

  • U+0251
  • U+0261
  • U+4491
  • U+FFED
  • U+27F2E

雖然以後會否用到這些字也成問題,但說是完全覆蓋 HKSCS-2004 又差一點點才成。不過要補上這些字應該不難。


16:28 更新:剛剛聯絡了 Arne,也提交了 bug report,他說字體要做轉換(老實說我不太明白),但肯定是有反應給我,大致上也放心了。


2006-09-12 更新:最新版的 uming/ukai (2006-09-03) 已經補上這幾個字了。