As you can see times are very similar – there are differences between data types, but there is no clear “winner", and I'm much more inclined to assume that these differences come from random fluctuations of load on my machine, than from anything else.

Why wasn't the record selected? Reason is simple: char(n) values are right padded with spaces.

This also means that this will work:

# SELECT*FROM x WHERE y ='depesz ';
y
-----------------
depesz
(1ROW)

Given this – remember that char(n) will actually use more disk space for strings – if your strings are shorter than “n" – because it will right pad them to required length.

So, what about varchar, varchar(n) and text.

Varchar and text are the same. So we can treat them as the same, but to avoid confusion with varchar(n), and because text is simply shorter (in terms of characters in name) – I prefer text.

So, we're with 2 data types left: varchar(n) and text.

The obvious benefit of varchar(n) is that is has built-in limit of size.

Which is all cool, until you will have to change this limit. Which happens a lot.

Of course, one can say, you can:

ALTERTABLE xxx ALTERCOLUMNTYPEVARCHAR(NEWLIMIT)

Well, first – let me say that I am discussing now only making the limit larger. If the new limit has to be smaller than previously (never seen of such case, but it's technically possible) – table has to be scanned to be sure that all values fit within new limit.

So, what happens with when you make the limit larger?

PostgreSQL has to rewrite the table. Which has 2 very important drawbacks:

Requires exclusive lock on the table for the time of operation

In case of non-trivial tables, will take considerable amount of time

This 2 points together make it (in my opinion) a no-go.

I'd rather use text without any limitation, than consider making database that will require (for example) shutdown of site, to change limit of email field from 75 to 100 characters (been there, done than, cursed a lot).

So, what is the best way to be able to limit field size – in a way that will not lock the table when increasing the limit? Or at least – will do it's job without table rewrite, as this takes too long time.

Unfortunately, as you can see, this transaction obtained ‘AccessExclusiveLock' for table x. This means that for 2.5 seconds nobody can use it. It doesn't sound bad, does it? Unfortunately – it does. My table is relatively small. What's more – trying to get lock, automatically blocks all next transactions trying to reach the table.

As you can see – there is only ShareLock on the table while changing domain – this means that writes will not work, but selects will. Which is huge gain in comparison with “ALTER TABLE" and its AccessExclusiveLock on table – which blocked everything.

Is that all that we can do? Basically – yes. We could theoretically make check that gets limit from custom GUC, but it looks slow, and is definitely error prone – as the values can be modified per-session.

So, while there is no clear winner, I believe that the TEXT+DOMAIN is really good enough for most of the cases, and if you want really transparent limit changes – it looks like trigger is the only choice.

To sum it all up:

char(n) – takes too much space when dealing with values shorter than n, and can lead to subtle errors because of adding trailing spaces, plus it is problematic to change the limit

varchar(n) – it's problematic to change the limit in live environment

varchar – just like text

text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name

UPDATE

Based on Caleb comment, I did test of speed of data load for various ways of getting text datatype with limited length.

Comments navigation

After 2 years of using Postgresql in our project. I change from Varchar(n) to Text completely.
My experience is Varchar not only give a bitter change length but also not helpful. User never please at document title that limit 50 characters! What’s drawback if they want to the title up to 80 chars! I don’t see value to limit such need. One may argue how can we show in report if it’s over limit 50. No problem I design report to wrap such text and it does matter.
I am very impressive PostgreSQL.It’s very powerful RDBMS and help developing my project, GWT project.
Our demo project,iLegal, UAT :- url=http://ilegal-uat.cloudora.net , user = user12_48, password=p@ssword

Hi Depesz,
after reading your article I’ve done several tests on a real-world application which I’m working on from several years. It’s a sort of datawarehouse that works with about 1.5 bilion rows.

I have two systems with different hardware and OSs. I’ve done several tests varying only the tables definition. The original tables had character(n) and character varying(n) columns; in the new tables the same columns were changed to text. So basically on each system I had two schemas that differed only about text/non-text columns. Over those tables I created partitions, indexes, clustered and vacuumed them. I didn’t use triggers or domains, so my scenario is simpler than yours and focuses only on pure text vs non-text string definition.

PostgreSQL 9.0.13 (OS X Server)
——————————————-
I saw that the loading process of data (COPY, INDEX, CLUSTER and VACUUM) is ~4% faster using text, but my transactions (which involves partitions and many indexes over string columns) were ~12% slower compared to the non-text tables.

PostgreSQL 9.4.1 (Ubuntu)
——————————————-
Results are promising and aligned with your findings. Loading process time differences is the same as PostgreSLQ 9.0 but transactions now are ~1% faster than non-text tables.

What about size? Indexes are smaller for both systems, but overall size gain is trascurable (few MB against 15GB of tables). This may only increase of a little percentage the probability of fitting indexes inside RAM.