Howardism Musings from my Awakening Dementia
My collected thoughts flamed by hubris
Home PageSend Comment

Applescript and Perl and Unicode ... Oh my!

Once upon a time, I wanted to announce on my web site what I was currently listening to. This was during a time when I was working from home on my Mac and not trudging out to the office to sit on a corporate-handed Windows box.

My script is simple and pretty typical. I have an Applescript that would query iTunes and get the information about the currently running track. It would hand that over to my Perl script which would deal with the issue of formatting and transferring it up to my web site. Simple, eh? It actually is.

But, I have a lot of world music, and playing anything from another country would result in garbage on the web page, and I thought I would mention what hell I had to go through to get Perl to deal with Unicode.

Sure, Perl 8 does support Unicode … but not really. You need a module to help, and the problem is the plethora of Unicode-related modules on CPAN. If you don't have things correctly set up, any multiple-byte characters look like garbage BTW: I highly recommend grabbing Unicode Checker, a program that displays lots of great information about each Unicode character .

iTunes returns text to Applescripts as UTF-8 (that's a variant of Unicode … yes, the lovely thing about standards is there are so many to choose from--including Unicode).

But Perl's internal Unicode format is not UTF-8 … it is something else. In order to convert things so that Perl can deal with them, we need to use the following module:

use Unicode::Transform;
my $uni = utf8_to_unicode($original_text);

But what I wanted was to convert the text over to HTML escape sequences. Let's grab yet-another-perl-module:

use HTML::Entities;
my $esc = encode_entities($uni);

Now things look and work fabulously when I listen to some of my Nordic Punk.

Tell others about this article:
Click here to submit this page to Stumble It