» Ruby – how to detect the encoding of a string
Posted by Kasper Tidemann on Monday 22nd of March 2010 10:51:59 PM
With file uploads in Ruby on Rails, e.g. an upload of a 2 KB CSV file, you’ll often run into trouble trying to decipher the encoding of the Tempfile string data stored in params[:my_upload_form][:uploaded_file] or whatever you’ve named your input field.
If you want to keep everything to one encoding, you could make use of Iconv.conv(‘UTF-8′, <whatever encoding>, string) to convert the data from the input field to UTF-8. But to make the iconv() wrapper work properly, it needs to know what to convert from… So how do you acquire this knowledge?
Try to use the Ruby gem rchardet by Jeff Hodges. Here is an example of how to use it:
require ‘rchardet’
[...]
cd = CharDet.detect(params[:my_upload_form][:uploaded_file])
encoding = cd['encoding']converted_string = Iconv.conv(‘UTF-8′, encoding, params[:my_upload_form][:uploaded_file])
The above is not bullet proof, but it’ll get you going. If you have alternative ideas in this regard, please comment to let us all know.





