"Heikki Linnakangas" <heikki(at)enterprisedb(dot)com> writes:
> I've started working on revamping Free Space Map, using the approach
> where we store a map of heap pages on every nth heap page. What we need
> now is discussion on the details of how exactly it should work.
You're cavalierly waving away a whole boatload of problems that will
arise as soon as you start trying to make the index AMs play along
with this :-(. Hash for instance has very narrow-minded ideas about
page allocation within its indexes.
Also, I don't think that "use the special space" will scale to handle
other kinds of maps such as the proposed dead space map. (This is
exactly why I said the other day that we need a design roadmap for all
these ideas.)
The idea that's becoming attractive to me while contemplating the
multiple-maps problem is that we should adopt something similar to
the old Mac OS idea of multiple "forks" in a relation. In addition
to the main data fork which contains the same info as now, there could
be one or more map forks which are separate files in the filesystem.
They are named by relfilenode plus an extension, for instance a relation
with relfilenode NNN would have a data fork in file NNN (plus perhaps
NNN.1, NNN.2, etc) and a map fork named something like NNN.map (plus
NNN.map.1 etc as needed). We'd have to add one more field to buffer
lookup keys (BufferTag) to disambiguate which fork the referenced page
is in. Having bitten that bullet, though, the idea trivially scales to
any number of map forks with potentially different space requirements
and different locking and WAL-logging requirements.
Another possible advantage is that a new map fork could be added to an
existing table without much trouble. Which is certainly something we'd
need if we ever hope to get update-in-place working.
The main disadvantage I can see is that for very small tables, the
percentage overhead from multiple map forks of one page apiece is
annoyingly high. However, most of the point of a map disappears if
the table is small, so we might finesse that by not creating any maps
until the table has reached some minimum size.
regards, tom lane