I have been tracking the progress of unicode support in PHP. It seems the devs finally decided to implement it. Andrei Zmievski described the implementation during the PHP Conference 2005, you can find the slides here (pdf 394 kB). The actual readmap was outlined a bit by Andi Gutmans recently in the mailing lists here and there.

Well, that's really great news.
But our point of view the real problem going towards to unicode is how to convert data in the database.
And I think this should be done in such matter that it doesn't require site administrators to do any extra work on it, meaning there should be some way to do this automatically.
But our point of view the real problem going towards to unicode is how to convert data in the database.
And I think this should be done in such matter that it doesn't require site administrators to do any extra work on it, meaning there should be some way to do this automatically.
Interesting -- thanks for the links!
The description of all the aspects of unicode is good. I hope they refine ("design") what they actually expose to programmers -- and that the slides are just 'early thinking about alternatives'. I don't know anything about the libraries they are proposing to use.
We can do a lot with unicode before this stuff makes it -- it's just that regexes and strlen() don't work very well outside of the ASCII range, but we can probably cope with that. I've done plenty of unicode with PHP with early PHP4.x -- it certainly works for the mostly simple string manipulation we do.
Advanced string manipulation with the whole unicode range is really hard (uppercasing combined code points is tricky, or defining what \w in a regexp means), and that's what they are aiming for. It's great that it's happening at last... but it shouldn't hold us back.
The description of all the aspects of unicode is good. I hope they refine ("design") what they actually expose to programmers -- and that the slides are just 'early thinking about alternatives'. I don't know anything about the libraries they are proposing to use.
We can do a lot with unicode before this stuff makes it -- it's just that regexes and strlen() don't work very well outside of the ASCII range, but we can probably cope with that. I've done plenty of unicode with PHP with early PHP4.x -- it certainly works for the mostly simple string manipulation we do.
Advanced string manipulation with the whole unicode range is really hard (uppercasing combined code points is tricky, or defining what \w in a regexp means), and that's what they are aiming for. It's great that it's happening at last... but it shouldn't hold us back.