As you can see times are very similar – there are differences between data types, but there is no clear “winner", and I'm much more inclined to assume that these differences come from random fluctuations of load on my machine, than from anything else.

Why wasn't the record selected? Reason is simple: char(n) values are right padded with spaces.

This also means that this will work:

# SELECT*FROM x WHERE y ='depesz ';
y
-----------------
depesz
(1ROW)

Given this – remember that char(n) will actually use more disk space for strings – if your strings are shorter than “n" – because it will right pad them to required length.

So, what about varchar, varchar(n) and text.

Varchar and text are the same. So we can treat them as the same, but to avoid confusion with varchar(n), and because text is simply shorter (in terms of characters in name) – I prefer text.

So, we're with 2 data types left: varchar(n) and text.

The obvious benefit of varchar(n) is that is has built-in limit of size.

Which is all cool, until you will have to change this limit. Which happens a lot.

Of course, one can say, you can:

ALTERTABLE xxx ALTERCOLUMNTYPEVARCHAR(NEWLIMIT)

Well, first – let me say that I am discussing now only making the limit larger. If the new limit has to be smaller than previously (never seen of such case, but it's technically possible) – table has to be scanned to be sure that all values fit within new limit.

So, what happens with when you make the limit larger?

PostgreSQL has to rewrite the table. Which has 2 very important drawbacks:

Requires exclusive lock on the table for the time of operation

In case of non-trivial tables, will take considerable amount of time

This 2 points together make it (in my opinion) a no-go.

I'd rather use text without any limitation, than consider making database that will require (for example) shutdown of site, to change limit of email field from 75 to 100 characters (been there, done than, cursed a lot).

So, what is the best way to be able to limit field size – in a way that will not lock the table when increasing the limit? Or at least – will do it's job without table rewrite, as this takes too long time.

Unfortunately, as you can see, this transaction obtained ‘AccessExclusiveLock' for table x. This means that for 2.5 seconds nobody can use it. It doesn't sound bad, does it? Unfortunately – it does. My table is relatively small. What's more – trying to get lock, automatically blocks all next transactions trying to reach the table.

As you can see – there is only ShareLock on the table while changing domain – this means that writes will not work, but selects will. Which is huge gain in comparison with “ALTER TABLE" and its AccessExclusiveLock on table – which blocked everything.

Is that all that we can do? Basically – yes. We could theoretically make check that gets limit from custom GUC, but it looks slow, and is definitely error prone – as the values can be modified per-session.

So, while there is no clear winner, I believe that the TEXT+DOMAIN is really good enough for most of the cases, and if you want really transparent limit changes – it looks like trigger is the only choice.

To sum it all up:

char(n) – takes too much space when dealing with values shorter than n, and can lead to subtle errors because of adding trailing spaces, plus it is problematic to change the limit

varchar(n) – it's problematic to change the limit in live environment

varchar – just like text

text – for me a winner – over (n) data types because it lacks their problems, and over varchar – because it has distinct name

UPDATE

Based on Caleb comment, I did test of speed of data load for various ways of getting text datatype with limited length.

Comments navigation

Nice. I was always a fan of DOMAINs, now I have one more reason to use them. This decoupling of constraints from data is a great feature. And it’s standards-compliant, so should be portable to other databases.

In sql server they have deprecated the “text” datatype and advises use of “varchar(max)” instead.http://msdn.microsoft.com/en-us/library/ms187993.aspx
I just wish Oracle would do the same. The lob datatypes have way to many restrictions compared with a varchar2.
I think types like “char” and “text” exposes implementation details.

I’ve been bit too many times by needing to change the size of a varchar(n) as well. Been going with text for awhile now based on mailing list info and some more basic measurements. Glad to see it looks like the right call.

I guess I’m the only one that disapproves of the over use of text.
First of all the fact that your table becomes unusable by changing the length I consider a bit of problem and one of my pet peeves with PostgreSQL. I don’t have this much of an issue with other databases so I consider it something that must be improved on with PostgreSQL. Cause I get asked that question all the time — why is it so damn slow to increase a field in PostgreSQL?

Secondly hmm I use tools to view data that really consider text and varchar(x) as very different animals. They rely on that to determine width to show a field and also (gasp!),whether a field is searchable or not. I know its sad but we must live in a world that has to support other databases in addition to PostgreSQL. In most other databases – text storage is very different from varchar(n) and so generic tools such as ODBC drivers and so forth treat them differently. Not to mention I just like the fact that I can generate a create table statement and have the meta data quickly there to know what the maximum size is I can expect in a table. I like the genericness of that approach to seeing max length.

@Regina:
You’re most likely not the only one that doesn’t like “text”.

Compatibility (whether it will work with other databases) was one of the angles that I wanted to write about, but then I realized, that in the most important (in my, not so enlightened, opinion) database – oracle – it’s also not varchar.

As for using tools that have problems with text, and don’t have it with varchar – well, whatever I write expresses my opinions, and, thanks $DEITY, I don’t have to deal with such tools/interfaces/whatever.

As with every “rule” in every field – there are exceptions. What I write is just general guideline (according to me) that can be ignored, bent, or ridiculed 🙂

But I’m surprised postgres isn’t smart enough to take a SharedLock instead of an ExclusiveLock when altering the table in this manner. Is there a good reason for that, or could postgres use a SharedLock in the future ? Failing that, it would be good to update the “character types” documentation. I guess 99% of people use varchar(n) instead of domain/trigger, because they dont know any better.

@Hubert,
Yap I know. What would be really cool though, is to have my cake and eat it too. You can have your text with your domains, but I’m lazy — I just want to limit the amount of text people put in my fields (very easily) and have my meta data view show it.

If the performance is exactly the same, then why can’t PostgreSQL internally
do the domain thing but publish the meta data I wanna see and other tools rely on. Then it wouldn’t need to rewrite tables right?

Really I don’t care about implementation details — I just like my meta data the way I like my meta data and don’t want to go sniffing in domain logic to find it. My ER diagrammer lists the sizes the way I like it. Its so important for ETL. Okay now you can ridicule me too :).

@Regina – well, I think that Pg should even simpler way to update max length – after all – since binary format of ‘xx’::varchar(10) and ‘xx’::varchar(1000) is (afaik) the same, it should be (theoretically) enough to update catalog and set new max length (of course only in case when new length it greater than previous). but I’m not sure if it will ever happen.

Can actually ALTER TABLE for the constraint check use a temporary index on length(value)? Never tried it actually…

And what about changing a varchar(n) limit in the system catalog directly for the needed table column? Increasing the limit is always safe actually. Do you see any problems with that, other then it is not good to play with the catalog? What ALTER TABLE actually does, is checking that a new limit, that it is going to write into the catalog, is not violated by the current data. And probably is dropping current execution plans.

The real issue around constraints which you kinda touched on is that “must be length n or shorter” isn’t a real constraint, or at least hasn’t been one any time I’ve seen so far. Real constraints are more complicated. Perhaps pointing this out in your DOMAIN example would help.

Maybe I missed it… or maybe like the difference between the data types.. it’s negligible. but what, if any, difference is there for inserts and updates on the different kind of constraints. I’m kinda a fan of thinking that varchar(x) will be the fastest, and that inserts into my table are probably being done more often than I’m increasing that limit. if there is I may want to use varchar or even char if I know the width is unlikely to change for a very long time (like a zip code which is not a number). Basically you’ve shown it’s a pain if you ever need to change… yeah, ok, but what if it adds a lot of overhead to a write heavy server?

also worth noting that although this code works in postgres many less mature databases don’t support this part of the standard (and by many I’m looking at MySQL). CHECK’s and DOMAIN’s are simply not an option there at this time, so if you’re building something to support both…

@Valentine Gogichashvili
It’s actually not always safe. You have to do a bit more if your change pushes the table from not needing a toast table to needing one. Apparently talking PG into creating the toast table for you at the same time you change the catalog is not so simple in later versions (although not impossible). And if you don’t realize you need to do it, things get all mucked up.

@TEOŚ:
have you read the blogpost? I mean – it’s the one of the first things, that I stress that the binary formats are exactly the same, and I test it to show that there is no visible performance difference.

I don’t see justification for such a general statement (yet). Recently I adapted single-thread code to parallel computation in a nearly real-time app in such a way that the original single-threaded version is still usable. Rules work just fine in integration of data streams, despite their limited capabilities. The syntax of RULE is less complex than that of a trigger (define function + define trigger) and the execution time seems OK at first glance (I didn’t test in-depth).

I go with just text in 99% of the cases, too.
Where limitation is needed varchar(n) still seems a poor choice (change of limit still requires table rewrite and would break views, etc).
I was only considering text + constraint vs. text-domain with constraint, and whether the changes in 9.1 affect the advantage of the domain – less aggressive lock while changing the constraint. Maybe I just have to run some tests.

Thanks for the research. I was looking for performance differences between varchar and text and you showed me that there is no real difference.

Your post has a technical point of view. For the decision which datatype to use, I would however consider first functional requirements. The real problem for me with constraints (domains or varchar(n) alike) is that they come in too late. Functional data corruption and SQL-injection are the first concerns. The second requires input validation before any execution of SQL. The first might be the reason why ALTER TABLE puts an exclusive lock on the table.

Hence I prefer the use of ALTER TABLE instead of domains, because alteration of a domain does not affect rows already in the database. That means using domains that the same column can be used for different types of functional data. That makes it very hard to understand reports based on those columns.

In your discussion of using DOMAIN+CONSTRAINT to flexibly limit TEXT length I note your adverse comment about the time taken to check a large table if the constraint limit is changed.

From the PostgreSQL 9.2 documentation I note:
ALTER DOMAIN
ADD domain_constraint [ NOT VALID ]
These checks can be suppressed …
Newly inserted or updated rows are
always checked…

Seems to be the best of both worlds… assuming the existing data is still valid it does not need checking; and, all newly entered/altered data, i.e., the reason for the design change, will be checked against the updated constraint.

Untested: Could this be an instantaneous ‘no locks’ solution?

On the general question of text vs. varchar(n) I infer from the following that at least Joe Celko favours setting lower limits in suitable (most?) contexts:
These large sizes tend to invite bad data.
You give someone … an NVARCHAR(255) column
and eventually it will get a Buddhist sutra
in Chinese Unicode.
[http://joecelkothesqlapprentice.blogspot.com.au/2007/06/db-table-design-question.html]

@Gavan:
while using invalid checks is interesting hack, I wouldn’t really use it in production.
Of course – the fact that you can have invalid checks, and later on validate them makes changing constraints easier.

As for Celko. Well. I appreciate a lot of stuff that he did and does, but I definitely don’t agree with him on many things (starting from favorite structure for trees).

The way I read the specification the checks are not ‘invalid’ (despite the semantics saying this), rather the checks are not applied to existing data, only new data.

If the original domain text had a constraint of 24, and design changes to 36, then it is provably certain that the new constraint (36) will not invalidate any existing data.

Importantly, as you mention, it would be good to avoid the wasted effort of a big production table getting rechecked for existing valid data, i.e., data that had already passed the more restrictive constraint.

Obviously this is not the case for a 36->24 design change. But moving the limit down is a much bigger problem that just checking constraints, and it’s almost certainly never going to happen, i.e., not a likely problem ‘in production’.

I would see the sequence as follows.

1. Implementation: text field with unqualified length constraint of 24, with the belief/hope this will never change.

Great job. Thank for your cool work. I am moving MySQL to PostgreSQL. According to the test, no need expecting Char would outperform Varchar. But I still prefer Varchar than Text since it make meta data more understadable.