Applescript and Perl and Unicode ... Oh my!
Once upon a time, I wanted to announce on my web site what I was currently
listening to. This was during a time when I was working from home on my Mac
and not trudging out to the office to sit on a corporate-handed Windows box.
My script is simple and pretty typical. I have an Applescript that would
query iTunes and get the information about the currently running track.
It would hand that over to my Perl script which would deal with the issue
of formatting and transferring it up to my web site. Simple, eh? It actually
is.
But, I have a lot of world music, and playing anything from another
country would result in garbage on the web page, and I thought I would mention
what hell I had to go through to get Perl to deal with Unicode.
Sure, Perl 8 does support Unicode … but not really. You need a module to
help, and the problem is the plethora of Unicode-related modules on CPAN.
If you don't have things correctly set up, any multiple-byte characters look
like garbage BTW: I highly recommend grabbing Unicode Checker, a program that displays lots of great information about each Unicode character .
iTunes returns text to Applescripts as UTF-8 (that's a variant of Unicode …
yes, the lovely thing about standards is there are so many to choose
from--including Unicode).
But Perl's internal Unicode format is not UTF-8 … it is something
else. In order to convert things so that Perl can deal with them, we need
to use the following module:
use Unicode::Transform;
my $uni = utf8_to_unicode($original_text);
But what I wanted was to convert the text over to HTML escape sequences.
Let's grab yet-another-perl-module:
use HTML::Entities;
my $esc = encode_entities($uni);
Now things look and work fabulously when I listen to some of my Nordic Punk.
Tell others about this article: