On Wed, Dec 29, 2010 at 6:46 PM, Noah Misch <noah(at)leadboat(dot)com> wrote:
>> I think this scenario will be more common than you might think. Tables don't contain random data; they contain data that the DBA thinks is valid. The situation where the data is mostly as you expect but with a few kooky rows is, in my experience, extremely common.
>
> Perhaps. A few kooky rows is indeed common, but we're talking about a specific
> breed of kookiness: 99.9% of the rows have identical bits after an ALTER TYPE
> transformation expression, and 0.1% have different bits. Is that common?
I think it's common enough to be worth worrying about.
> Expanding on my introduction, none of the following can yield a negative
> verification scan; the result is always positive or an error:
>
> CREATE DOMAIN loosedom AS text;
> CREATE DOMAIN tightdom AS text CHECK (value LIKE '<%/>');
> CREATE TABLE t (c varchar(6));
> INSERT INTO t VALUES ('<abc/>'),('<de/>');
> ALTER TABLE t ALTER c TYPE varchar(8);
> ALTER TABLE t ALTER c TYPE text;
> ALTER TABLE t ALTER c TYPE loosedom;
> ALTER TABLE t ALTER c TYPE xml USING c::xml;
> ALTER TABLE t ALTER c TYPE varchar(64);
> ALTER TABLE t ALTER c TYPE tightdom;
+1 for trying to optimize these cases (but maybe after we optimize the
varchar -> text and varchar(less) -> varchar(more) cases to skip the
scan altogether).
> Adding a bpchar into the mix makes a negative verification scan possible, as
> does a USING clause having a truncating effect. Continuing the example, these
> can and would get a negative verification scan:
> ALTER TABLE t ALTER c TYPE character(6);
> ALTER TABLE> In case it was not obvious, I'll note that any error thrown by a transformation
t ALTER c TYPE varchar(5) USING c::varchar(5);
> Plenty of academic USING clause examples exist:
> ALTER TABLE t ALTER c TYPE varchar(8) USING CASE c WHEN '<de/>' THEN 'foo' ELSE c END;
I am not really convinced that there's much value in optimizing these
cases. They're not likely to arise very often in practice, and DBAs
like predictability. There's tangible value in being able to say
"this is going to scan your table at most once - it might rewrite it,
or it might just verify what's there, or it might decide no scan is
necessary, but the absolute worst case is one scan with rewrite".
That's simple to understand and simple to document and probably
simpler to code too, and I think it covers very nearly all of the
cases people are likely to care about in practice.
> Yes. Indeed, that's the intuitive basis for my hypothesis that the verification
> scan will usually either fail early. I don't advocate this approach to pick up
> edge cases, but to pick up reasonable cases _without explicit annotations_
> showing them to be achievable. Take the text->xml example, certainly of genuine
> value if not top-frequency. I see three ways to ensure we do a verification
> scan for it:
>
> 1. Have the user tell us: ALTER TABLE t ALTER c TYPE xml USING c::xml IMPLICIT
> 2. Mark the text->xml cast as "possibly no-rewrite" and look for that
> 3. Do a verification scan every time
I think for any pair of types (T1, T2) we should first determine
whether we can skip the scan altogether. If yes, we're done. If no,
then we should have a way of determining whether a verify-only scan is
guaranteed to be sufficient (in your terminology, the verification
scan is guaranteed to return either positive or error, not negative).
If yes, then we do a verification scan. If no, we do a rewrite.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company