content top

Encoding

This blog has moved to a new server, a move not without problems. The biggest problem turned out to be the bizarre way in which WordPress has stored characters in the database. Perhaps this has changed in newer versions of WP, but I am stuck with an old database full of posts. It took ages to get the data over to the new server in a way that would display special characters correctly. And by “special” I don’t necessarily mean accented letters or non-Latin characters – even typographically correct quotation marks caused problems.

You can find a vivid description of the pain involved in Derek Sivers’ 2006 post Turning MySQL data in latin1 to utf8.

It seems to be working now: Latin-1 content pushed to the utf-8 database with everything above 0128 encoded in utf-8 or as numeric entity. This pretty much took away all enthusiasm for upgrading WordPress.

Share

Leave a Reply

Your email address will not be published. Required fields are marked *