2007-12-10
As promised in previous post (in Chinese, sorry), here is the full advisory of WordPress SQL injection vulnerability I have mentioned. Excerpt below:
It is found that the search function provided within WordPress fails to sanitize input based on different character sets. So if WordPress tries to query MySQL database using certain specific character sets, WordPress search function is exploitable using charset-based SQL injection.
Currently known character sets exploitable include: Big5, GBK, GB18030. All of them may use backslash (’\') as part of multibyte character. WordPress with MySQL database created any other character sets fulfilling such property may also be exploitable.
Executing this attack alone results in exposure of all database content on web interface without need of authentication. However, if combined with other exploits (such as cookie authentication vulnerability disclosed earlier), any remote user can obtain WordPress admin privilege, resulting in server compromise.
Actually, I have long been suspecting this is exploitable, though the real effort to verify such claim doesn’t occur before a few days ago. Given the security track record of WordPress, such thing is entirely within expectation.
Chinese sites which are stubborn enough to continue using Big5 or GBK encoding in database are in jeopardy; but otherwise most sites should be rather safe from this exploit (as most should be using UTF-8). Neither is latin1 character set vulnerable (as used in most earlier default WordPress installation). But in contrary to common belief, it looks like mysql_real_escape_string() doesn’t fix the problem at all. Anybody can confirm or deny this?
2007-12-10 20:55 update: GB18030 is not vulnerable. MySQL 5.0.x doesn’t support this character set at all, don’t know about 5.1 series.
2007-05-05
因為對於 WordPress 有點不滿意(Automattic 某部份人將翻譯人員看成是二等公民),所以有個念頭,嘗試一下別的 blog 能不能夠代替。剛才試過 Serendipity,簡單一句說,裏面中文支援是垃圾,不能用的,UTF-8 字符編碼的支援也像是沒有測試過一樣。要能夠使用繁體中文的話,最好資料庫只使用 Big5,並且在你的瀏覽器的語言中加入一個自創的 “tw” 語言代碼才行。一般上,瀏覽器會送出 ‘zh-hk’ 代表香港,或 ‘zh-tw’ 代表台灣,而 ‘tw’ 在 ISO 639 正式的名單中代表一種稱為 Twi 的語言。
可是,還是用英文比較好,因為看到那些中文「翻譯」就火大了。根本未試過能不能用就已經放上去,不止一部份的意義全錯,連句子中直接顯示些甚麼 ‘%s’ 也有。連試也嫌浪費時間。投降了,暫時還是 WordPress 吧,雖然資料庫的字符編碼方面也是一樣爛。目前我沒這種精神和動力去救一件有先天性心臟病的東西。
2006-09-12
Most Chinese Windows users should have heard about CP950, which is the implementation of Big5 character mapping inside traditional Chinese Windows. However, what the heck is CP951? Is it somehow related to CP950? Yes! This code page exists, but is rarely mentioned in internet, and I didn’t manage to find any page that clearly documents it so far; not even inside M$’s web site.
Now most of the content is moved to another static page, since it deserves some research value. Visit that page for more detail.
2006-08-24
當初 Roger So 說 Big5HKSCS 應該快點死掉才好,當時不明所以,現在充分感受到了。
我在中文維基中寫過 glibc 的 Big5HKSCS 版本實在是太舊了,甚至和日常用的 Big5 不兼容。大致上也可以估到,glibc 中 Big5HKSCS 的 Big5 部份是取自 80 年代已經超級過時的 Big5 碼表,即使是 Big5 本身也修訂過,但 glibc 的 Big5HKSCS 卻一直未更新過。舉個例子,兩個 Big5 中都有的符號,由 Big5 轉換至 Unicode,和由 Big5HKSCS 轉換至 Unicode 的結果不一樣:
iconv 轉換的結果
| Byte Sequence |
0xA1FE |
0xA241 |
| Big5 → Unicode |
/ (U+FF0F) |
∕ (U+2215) |
| Big5HKSCS → Unicode |
轉換錯誤 |
/ (U+FF0F) |
U+2215 和 U+FF0F 在 Unicode 的解釋分別為 DIVISION SLASH 和 FULL WIDTH SOLIDUS,前者用於算式,後者是 ASCII 的斜號的全形版本。在 Unicode 中後者是全形但前者不是。
目前我正在寫個小小的 script,將使用舊版 HKSCS (Big5HKSCS 或 UTF-8) 的文字檔轉換至使用 HKSCS-2004 的 UTF-8 編碼,主要關鍵是舊版ISO 10646 將不少香港字分配至造字區 (PUA, Private Use Area),但 ISO10646:2003 已經將絕大部份的字搬到造字區以外的正式區域,所以盡量將這些字的碼位更新。但沒想到困難重重,至少 glibc 的舊版 HKSCS 就使得我不能信任 iconv (和 Perl 的 iconv module),要另找方法檢查檔案編碼是否合法的 Big5HKSCS 了。幸好,Perl 的 Encode module 有較正確的 Big5HKSCS 碼表:
Perl Encode module 轉換的結果
| Byte Sequence |
0xA1FE |
0xA241 |
| Big5 → Unicode |
/ (U+FF0F) |
∕ (U+2215) |
| Big5HKSCS → Unicode |
晚上 10 時更新:這個 script 已經放上網了。
2006-08-07
剛剛寫好兩頁網頁用來測試 HKSCS-2004,結果出乎我意料之外。
當初 Arne 宣佈 uming/ukai 有了 HKSCS-2004 的支援時,的確是幾開心的,但後來用多了覺得不妥,例如字體不正(感歎號過於偏左,有些字的部件太大或者太小),缺字等等經常出現,況且這三兩天在研究各種語言的字體,所以突然心血來潮,試一試 uming/ukai 的香港字是否正常。結果有點失望,2006 年 5 月的版本缺了 5 個字 (Unicode 碼位):
- U+0251
- U+0261
- U+4491
- U+FFED
- U+27F2E
雖然以後會否用到這些字也成問題,但說是完全覆蓋 HKSCS-2004 又差一點點才成。不過要補上這些字應該不難。
16:28 更新:剛剛聯絡了 Arne,也提交了 bug report,他說字體要做轉換(老實說我不太明白),但肯定是有反應給我,大致上也放心了。
2006-09-12 更新:最新版的 uming/ukai (2006-09-03) 已經補上這幾個字了。