UTF-8 filenames under Windows / PHP

If you read this, probably you have problems with files with names containing unicode characters.

My scenario was simple:

  1. User uploads own file with unicode characters (ex. my_unicode_ąść.jpg)
  2. PHP handles the upload and moves uploaded file to a directory on your server and creates a database record with original filename(ex. my_unicode_ąść.jpg)
  3. Because lack of unicode support in PHP filename is saved with wrong encoding (multibyte string is split into one-byte characters, which in case of unicode multibyte encoding renders wrong, ex. my_unicode_ść.jpg)
  4. Any future call using database stored name fails because two names are different.

To get rid of this problem I have used PHP UTF8 library.

It is as simple as that:

include("../utf8toascii/utf8_to_ascii.php");
utf8_to_ascii($_FILES['Filedata']['name'])

From now on, before working with filename, first we change any unicode character to it’s closest ASCII version.

We have to do this and patiently wait for PHP6 release, which promises native UFT-8 support.

Leave a Reply

Your email address will not be published. Required fields are marked *