Pat Gunn (dachte) wrote,
Pat Gunn
dachte

PostgreSQL migration gotcha

If you're migrating from PostgreSQL 7 to 8, and used latin8 characters (e.g. umlauts, accents, and similar), importing your old databases to Postgres8 will failwith stuff like the following:

pg_restore: ERROR: Unicode characters greater than or equal to 0x10000 are not supported

unless you create the Postgres8 database as follows:


createdb mydatabasename --encoding=LATIN1

I don't know if there's a way to enable automagic translation in pg_dump/createdb from latin1 to unicode, but thinking about it, that would probably be undesirable anyhow because it would possibly impose charset changes on whatever clients were used to access the database. While Unicode is in general a good idea, the migration is a huge pain in the butt, and UTF8 is, in my opinion, a stupid misdesign. It's not cool that the link between the data length of a string and the character length of a string is not consistent -- the long term correct decision would be to use UTF32. It's irritating how often short-sighted decisions (like the lib64/lib snafu on amd64) end up dirtying a clean, efficient design for what generally proves to be only a limited gain.

Tags: tech, warning
Subscribe

  • Still alive

    Been feeling a bit nostalgic. Not about to return to LiveJournal - their new ownership is unfortunate, but I wanted to briefly note what's been up…

  • Unplugging LJ

    It's about time I pulled the plug on the LJ version of my blog: 1) I'm much more active on G+ than I am with general blogging. I post many times a…

  • Mutual Trust

    I don't know which should be considered more remarkable: That a cat should trust a member of a far larger and stronger species that it can't…

  • Post a new comment

    Error

    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

  • 0 comments