Posts tagged: utf8

Ruby 1.8.6 and Unicode

Using Ruby 1.8.6 (p111)

In irb

$KCODE='u'
'aébvHögtåwHÅFuG'.scan(/./u)

nicely yields

 ["a", "é", "b", "v", "H", "ö", "g", "t", "å", "w", "H", "Å", "F", "u", "G"]

and

 'aébvHögtåwHÅFuG'.scan(/[\x00-\x7F]/u).join('')

neatly removes all non-ASCII characters.

Ruby and UTF8 Encoding

For plain Ruby 1.8.6

$KCODE=’u’ doesn’t seem to do anything

Third-party solutions:

regex patterns take /u parameter to enable Unicode  (Parsing UTF-8 encoded strings in Ruby)

Chilkat though I can’t see where to download/buy the Ruby libraries. Looks like it may be Windows only.

Nikolai’s UTF8 Library

Onigurama is unicode-aware regex engine available for Ruby 1.8.6. Unclear how involved that is, or whether it’s compatible with UTF8/Unicode support in other packages – e.g.. character-encodings gem.

Rails 2.x

Rails 2.x handles UT8 encoding out of the box – IFF you use the ‘.chars’ accessors for strings (see the RubyOnRails Wiki) – uses the ActiveSupport unicode and multibyte extensions

WordPress Themes