Posts tagged ‘character set’

WordPress Charset SQL Injection Vulnerability

2007-12-10

As promised in previous post (in Chinese, sorry), here is the full advisory of WordPress SQL injection vulnerability I have mentioned. Excerpt below:

It is found that the search function provided within WordPress fails to sanitize input based on different character sets. So if WordPress tries to query MySQL database using certain specific character sets, WordPress search function is exploitable using charset-based SQL injection.

Currently known character sets exploitable include: Big5, GBK, GB18030. All of them may use backslash (’\') as part of multibyte character. WordPress with MySQL database created any other character sets fulfilling such property may also be exploitable.

Executing this attack alone results in exposure of all database content on web interface without need of authentication. However, if combined with other exploits (such as cookie authentication vulnerability disclosed earlier), any remote user can obtain WordPress admin privilege, resulting in server compromise.

Actually, I have long been suspecting this is exploitable, though the real effort to verify such claim doesn’t occur before a few days ago. Given the security track record of WordPress, such thing is entirely within expectation.

Chinese sites which are stubborn enough to continue using Big5 or GBK encoding in database are in jeopardy; but otherwise most sites should be rather safe from this exploit (as most should be using UTF-8). Neither is latin1 character set vulnerable (as used in most earlier default WordPress installation). But in contrary to common belief, it looks like mysql_real_escape_string() doesn’t fix the problem at all. Anybody can confirm or deny this?

2007-12-10 20:55 update: GB18030 is not vulnerable. MySQL 5.0.x doesn’t support this character set at all, don’t know about 5.1 series.

WordPress 去死吧

2007-12-08

我大概會在短時間內將這個貼上 full-disclosurebugtraq

WordPress SQL injection screenshot

想知道圖中那個 e10adc3949ba59abbe56e057f20f883e 作表甚麼嗎?拿這個數字去 www.xmd5.net 查一查,就知道我架設這個測試用的 WordPress 時使用甚麼密碼了。

單從這個漏洞本身來看,最多只能將整個資料庫的內容顯示出來;但如果配合別的漏洞一起,就天下無敵了。例如最近發表的一個 WordPress cookie 漏洞 (適用於 1.5 - 2.3.1),能夠隨意成為 WordPress 的 admin,但先決條件是能夠讀取 admin 的名稱和密碼,從而合成 admin login 所需的 cookie。我找出來那個漏洞剛好可以不用直接存取資料庫而取出 admin 的名稱密碼,正是那個 cookie 漏洞必須和充分的條件。

不過大家應該不用太擔心,我找出來的漏洞的先決條件很苛刻,大部份的人應該都不會中招;但如果有哪位是使用 Big5, GBK, GB2312 等作為資料庫的 charset,那麼是時候考慮 migrate 至 UTF-8 了。

順帶一提,如果哪個打算建議我先知會 Automattic 的人,那麼可以省下這口氣了。有不少的安全漏洞的 advisory 他們都不于理會,直至有公開的 exploit 方會處理,我對此已到達厭惡的程度。

Serendipity 的中文支援不能用

2007-05-05

因為對於 WordPress 有點不滿意(Automattic 某部份人將翻譯人員看成是二等公民),所以有個念頭,嘗試一下別的 blog 能不能夠代替。剛才試過 Serendipity,簡單一句說,裏面中文支援是垃圾,不能用的,UTF-8 字符編碼的支援也像是沒有測試過一樣。要能夠使用繁體中文的話,最好資料庫只使用 Big5,並且在你的瀏覽器的語言中加入一個自創的 “tw” 語言代碼才行。一般上,瀏覽器會送出 ‘zh-hk’ 代表香港,或 ‘zh-tw’ 代表台灣,而 ‘tw’ 在 ISO 639 正式的名單中代表一種稱為 Twi 的語言。

可是,還是用英文比較好,因為看到那些中文「翻譯」就火大了。根本未試過能不能用就已經放上去,不止一部份的意義全錯,連句子中直接顯示些甚麼 ‘%s’ 也有。連試也嫌浪費時間。投降了,暫時還是 WordPress 吧,雖然資料庫的字符編碼方面也是一樣爛。目前我沒這種精神和動力去救一件有先天性心臟病的東西。

Multibyte character? No, not considered.

2007-01-13

Lately I’ve been struggling with one of the Recent Comments WordPress plugin. As expressed in title, it is one of the vast category of software not supporting multi-byte characters. Not especially bad, just that this plugin is too visible on my blog. All software not supporting multi-byte strings are equally evil or ignorant.

This plugin has slight advantage over other similar but simpler plugins: it allows breaking long ‘words’ (like URL), so the blog layout wouldn’t be damaged due to extremely long URL. However it makes use of multi-byte unsafe PHP functions like substr(), strlen(), and especially — wordwrap(), which has no multibyte-safe equivalent (like mb_substr() or mb_strlen() ). The net result is, some comments have line break inserted in the middle of a multibyte character!

The most obvious thing to do, is to replace wordwrap() with other saner functions. After attempting something stupid (like trying to write my own function… the only possible resolution is give up), time to turn to my savior (read: Google) for help. Finally, this htmlwrap() script written by Brian Huisman gets my attention. Quoting from its author:

Built for use in the Orca Forum and Blog, the htmlwrap() function safely wraps HTML formatted text by breaking strings of characters over a certain length. It’s great for use anywhere where generated HTML output is built from user input.

A BIG plus: it is UTF-8 safe! So I simply replace all instances of wordwrap() with htmlwrap(), and have half of the problem solved. The remaining half is actually two problems:

  1. While the plugin claims it can chop off the comment after certain number of characters, it actually means this many bytes minus the length of name of comment author. That’s certainly not blog admin would expect, though I doubt if many people would really count the characters.
  2. Word wrapping is only defined as a bunch of bytes separated by spaces. However the ‘word wrap’ rule is vastly different for Asian languages, especially CJK: no space is inserted at all between characters; A whole sentence, or a whole paragraph, can contain zero white spaces. Line breaks can occur before any character except (most) punctuation marks.

It is more time consuming to check for punctuations to avoid line breaking; but the others tends to be easy. Thus the remaining part comprises of replacement of string functions with their multibyte-safe equivalents, and some preg_match() to make sure only incomplete english word at the end of text is trimmed. Here is the comparison before and after modification:

Before After
WordPress comment with bad line breaking WordPress comment with good line breaking

However, I have made assumption that people are using UTF-8 encoding. Most people should already be using it, but still, forewarning is better than regretting later. Not sure if submitting this change upstream is a good idea, since it is not generic enough to cope with any multi-byte encoding and/or any language — only CJK comment in UTF-8 encoding so far.

Anyway, if you want to try, save the content of this file and rename the file to get-recent-comments.php. Place this file into WordPress plugin directory and pray have fun!

What is Code Page 951 (CP951)?

2006-09-12

Most Chinese Windows users should have heard about CP950, which is the implementation of Big5 character mapping inside traditional Chinese Windows. However, what the heck is CP951? Is it somehow related to CP950? Yes! This code page exists, but is rarely mentioned in internet, and I didn’t manage to find any page that clearly documents it so far; not even inside M$’s web site.

Now most of the content is moved to another static page, since it deserves some research value. Visit that page for more detail.

該「死」的 Big5HKSCS

2006-08-24

當初 Roger So 說 Big5HKSCS 應該快點死掉才好,當時不明所以,現在充分感受到了。

我在中文維基中寫過 glibc 的 Big5HKSCS 版本實在是太舊了,甚至和日常用的 Big5 不兼容。大致上也可以估到,glibc 中 Big5HKSCS 的 Big5 部份是取自 80 年代已經超級過時的 Big5 碼表,即使是 Big5 本身也修訂過,但 glibc 的 Big5HKSCS 卻一直未更新過。舉個例子,兩個 Big5 中都有的符號,由 Big5 轉換至 Unicode,和由 Big5HKSCS 轉換至 Unicode 的結果不一樣:

iconv 轉換的結果
Byte Sequence 0xA1FE 0xA241
Big5 → Unicode
(U+FF0F)

(U+2215)
Big5HKSCS → Unicode 轉換錯誤
(U+FF0F)

U+2215 和 U+FF0F 在 Unicode 的解釋分別為 DIVISION SLASH 和 FULL WIDTH SOLIDUS,前者用於算式,後者是 ASCII 的斜號的全形版本。在 Unicode 中後者是全形但前者不是。

目前我正在寫個小小的 script,將使用舊版 HKSCS (Big5HKSCS 或 UTF-8) 的文字檔轉換至使用 HKSCS-2004 的 UTF-8 編碼,主要關鍵是舊版ISO 10646 將不少香港字分配至造字區 (PUA, Private Use Area),但 ISO10646:2003 已經將絕大部份的字搬到造字區以外的正式區域,所以盡量將這些字的碼位更新。但沒想到困難重重,至少 glibc 的舊版 HKSCS 就使得我不能信任 iconv (和 Perl 的 iconv module),要另找方法檢查檔案編碼是否合法的 Big5HKSCS 了。幸好,Perl 的 Encode module 有較正確的 Big5HKSCS 碼表:

Perl Encode module 轉換的結果
Byte Sequence 0xA1FE 0xA241
Big5 → Unicode
(U+FF0F)

(U+2215)
Big5HKSCS → Unicode

晚上 10 時更新:這個 script 已經放上網了

測試 HKSCS-2004

2006-08-07

剛剛寫好兩頁網頁用來測試 HKSCS-2004,結果出乎我意料之外。

當初 Arne 宣佈 uming/ukai 有了 HKSCS-2004 的支援時,的確是幾開心的,但後來用多了覺得不妥,例如字體不正(感歎號過於偏左,有些字的部件太大或者太小),缺字等等經常出現,況且這三兩天在研究各種語言的字體,所以突然心血來潮,試一試 uming/ukai 的香港字是否正常。結果有點失望,2006 年 5 月的版本缺了 5 個字 (Unicode 碼位):

  • U+0251
  • U+0261
  • U+4491
  • U+FFED
  • U+27F2E

雖然以後會否用到這些字也成問題,但說是完全覆蓋 HKSCS-2004 又差一點點才成。不過要補上這些字應該不難。


16:28 更新:剛剛聯絡了 Arne,也提交了 bug report,他說字體要做轉換(老實說我不太明白),但肯定是有反應給我,大致上也放心了。

2006-09-12 更新:最新版的 uming/ukai (2006-09-03) 已經補上這幾個字了。