Tag Archives: Unicode

irssi encoding problem.

If you have a problem with encoding in irssi there is an easy way to fix it.
In my setup I am running irssi inside a screen session and connecting from windows7 computer via Putty.

First of all you have to make sure your Putty has correct encoding set, you can check it in configuration dialog under Window -> Translation page.

Second thing you need to do is to start screen with unicode support, you can do this with -U switch.

Third and the last thing you need to do is to set correct encoding in irssi itself, just do /set term_charset utf-8 in chat window.

From now on your irssi should have full unicode support.

UTF-8 filenames under Windows / PHP

If you read this, probably you have problems with files with names containing unicode characters.

My scenario was simple:

  1. User uploads own file with unicode characters (ex. my_unicode_ąść.jpg)
  2. PHP handles the upload and moves uploaded file to a directory on your server and creates a database record with original filename(ex. my_unicode_ąść.jpg)
  3. Because lack of unicode support in PHP filename is saved with wrong encoding (multibyte string is split into one-byte characters, which in case of unicode multibyte encoding renders wrong, ex. my_unicode_ść.jpg)
  4. Any future call using database stored name fails because two names are different.

To get rid of this problem I have used PHP UTF8 library.

It is as simple as that:

include("../utf8toascii/utf8_to_ascii.php");
utf8_to_ascii($_FILES['Filedata']['name'])

From now on, before working with filename, first we change any unicode character to it’s closest ASCII version.

We have to do this and patiently wait for PHP6 release, which promises native UFT-8 support.