Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

How are bit strings handled in other languages such as PHP, Java, C#, C++, etc., through drivers like Npgsql, ODBC, etc.

For question #1, using smallint or bigint will be much more storage efficient, and would perhaps offer a performance gain since integers are supported everywhere. Most programming languages handle bit operations on integers with ease. If that is the case, what is the point of introducing the bit-string data type? Is it Only for cases that need a large amount of bit masks? Bit field indexing maybe? I'm more curious about how bit field indexing is done in PostgreSQL.

For #2, I'm confused, more than curious. For example, what if I store week day bit masks in a bit(7) field, one bit for a day, with lowest bit representing Monday. Then I query for the value in PHP and C++. What will I get? The documentation says I'll have a bit string, however a bit string is not something I can use directly - as with integers. Then in this case, should I give up on bit field?

Erwin's answer on SO is great (and if you don't mind copying it over @Erwin, it would be useful to have here), but I'd like to add my own caution: in most cases you wouldn't contemplate storing information in bit strings on an RDBMS - using separate boolean columns in the normal solution regardless of storage 'efficiency'.
–
Jack Douglas♦Sep 28 '12 at 13:39

@Erwin I don't see why not - there is some overlap between the sites and they are both supposed to stand alone (so for instance we wouldn't - and anyway couldn't - close a question here as a duplicate if there was an identical question on SO). Our focus is more on 'expert' issues, but IMO your answer fits that category as it stands :)
–
Jack Douglas♦Sep 28 '12 at 14:25

@JackDouglas: Well, makes sense. And how could I possibly disagree after the praise you slipped in, anyway? ;)
–
Erwin BrandstetterSep 28 '12 at 14:32

2 Answers
2

Conditions for queries and partial indexing are easy to write and read and meaningful.

A boolean column occupies 1 byte. For only a few variables this occupies the least space.

Unlike the other options boolean columns allow NULL values for individual bits if you should need that. You can always define columns NOT NULL if you don't.

Optimizing storage

If you have more than a hand full variables but less than 33, an integer column may serve you best. (Or a bigint for up to 64 variables.)

Occupies 4 bytes on disk.

Very fast indexing for exact matches (= operator).

Handling individual values may be slower / less convenient than with bit string or boolean.

With even more variables, or if you want to manipulate the values a lot, or if you don't have huge tables and disk space / RAM is no issue, or if you are not sure what to pick, I would consider bit(n) or bit varying(n).

Occupies at least 5 bytes (or 8 for very long strings) plus 1 byte for each group of 8 bits (rounded up).

Yes, I agree with you. Currently, I'm using samllint to store weekdays' bit mask. It suited the case, storage efficiency / performance wide. However, if I would have some more indexing/filtering on bit masks, it'll fail, due to low performance.
–
Jackey CheungSep 29 '12 at 4:19

All PostgreSQL types are useful for some things and less useful for others. In general, you get more out of worrying about functionality first and performance later. PostgreSQL has a large number of functions for manipulating various kinds of data types and these are no exception.

I would expect on the application layer, unless your db driver handles it through some sort of type conversion, you'd get a string representation and have to handle this. So it may or may not be useful in that capacity.

Where it is likely useful is when you want to select records based on bitwise operations, such as a bitwise or or a bitwise and, or otherwise manipulate the data in SQL queries. Unless you are doing this, many of the more esoteric features of PostgreSQL are less helpful.

Note also for longer strings of binary information there is a large object interface which allows you to do streaming etc. and a bytea interface which allows a more compact string representation.

tl; dr: If you need it you will know it. Otherwise file it away in the "reserved for future use" section of your mind.