Using Ruby 1.8.6 (p111)
In irb
$KCODE='u'
'aébvHögtåwHÅFuG'.scan(/./u)
nicely yields
["a", "é", "b", "v", "H", "ö", "g", "t", "å", "w", "H", "Å", "F", "u", "G"]
and
'aébvHögtåwHÅFuG'.scan(/[\x00-\x7F]/u).join('')
neatly removes all non-ASCII characters.
For plain Ruby 1.8.6
$KCODE=’u’ doesn’t seem to do anything
Third-party solutions:
regex patterns take /u parameter to enable Unicode (Parsing UTF-8 encoded strings in Ruby)
Chilkat though I can’t see where to download/buy the Ruby libraries. Looks like it may be Windows only.
Nikolai’s UTF8 Library
Onigurama is unicode-aware regex engine available for Ruby 1.8.6. Unclear how involved that is, or whether it’s compatible with UTF8/Unicode support in other packages – e.g.. character-encodings gem.
Rails 2.x
Rails 2.x handles UT8 encoding out of the box – IFF you use the ‘.chars’ accessors for strings (see the RubyOnRails Wiki) – uses the ActiveSupport unicode and multibyte extensions