SQL sets

Suppose we have a table of widgets, with various descriptive attributes:

widget

id

size

shape

color

alignment

1

Big

Round

Purple

Chaotic

2

Tiny

Star

Purple

Lawful

3

Tiny

Linear

Red

Neutral

And another table whose rows correspond to sets of widgets:

widget_set

id

description

1

Widgets of sizes other than Tiny

2

Chaotic-aligned or Star-shaped widgets

3

Big and Red widgets

Now we'd like to set things up so we can look up the widgets that belong to a
given set (or look up the sets that contain a given widget). The most
natural way would be to define a view:

create view widget_set_member as
select 1 as widget_set_id
, w.id as widget_id
from widget w
where w.size!='Tiny'
union
select 2 as widget_set_id
, w.id as widget_id
from widget w
where w.alignment='Chaotic'
or w.shape='Star'
union
select 3 as widget_set_id
, w.id as widget_id
from widget w
where w.size='Big'
and w.color='Red'

One limitation of this method is that new sets of widgets can't be defined
with only data-manipulation
(insert/update/delete) statements; it
requires redefining the view. We could avoid this problem by using a table
instead of a view:

widget_set_member

widget_set_id

widget_id

1

1

2

1

2

2

This method has its own limitation, though: changes to the
widget table won't be automatically reflected in
widget_set_member. Luckily, there is a way to have the best of
both worlds. We'll start with an attribute_type table whose
rows represent the columns of descriptive attributes of widgets:

attribute_type

id

name

1

size

2

shape

3

color

4

alignment

Then we'll use that table to add an attribute_member view, which
turns the structure of the widget table into
data:

create view attribute_member as
select w.id as widget_id
, at.id as attribute_type_id
, at.name as attribute_type_name
, case at.name
when 'size' then w.size
when 'shape' then w.shape
when 'color' then w.color
when 'alignment' then w.alignment
end as attribute_value
from widget w
, attribute_type at

The criteria that define our sets will be boolean predicates, based on the
descriptive attributes of widgets. The primitive conditions will test that a
particular attribute has a particular value—say, that the
size is Big. Then several of those primitive
predicates can be combined into one compound predicate with and,
or, and not. We'll represent those compound
predicates in
disjunctive
normal form. attribute_member gives us the primitive
predicates, so the next layer up will be a widget_subset table
corresponding to conjunctions, a widget_subset_pred table
showing which (possibly negated) primitive criteria make up a conjunction,
and a widget_subset_member view showing which widgets satisfy
the criteria for which subsets.

widget_subset

id

1

2

3

4

widget_subset_pred

widget_subset_id

attribute_type_id

attribute_value

negated

1

1 (size)

Tiny

1 (negated)

2

4 (alignment)

Chaotic

0 (not negated)

3

2 (shape)

Star

0 (not negated)

4

1 (size)

Big

0 (not negated)

4

3 (color)

Red

0 (not negated)

create view widget_subset_member as
select wss.id as widget_subset_id
, w.id as widget_id
from widget_subset wss
, widget w
where (select count(*)
from widget_subset_pred wssp
, attribute_member am
where wss.id=wssp.widget_subset_id
and wssp.negated=1
and wssp.attribute_type_id=am.attribute_type_id
and wssp.attribute_value=am.attribute_value
and am.widget_id=w.id)=0
and (select count(*)
from widget_subset_pred wssp
, attribute_member am
where wss.id=wssp.widget_subset_id
and wssp.negated=0
and wssp.attribute_type_id=am.attribute_type_id
and wssp.attribute_value=am.attribute_value
and am.widget_id=w.id)=
(select count(*)
from widget_subset_pred wssp
where wss.id=wssp.widget_subset_id
and wssp.negated=0)

Next, we'll add a widget_set_pred table showing which subsets
(conjunctions) make up a set (disjunction) and a
widget_set_member view showing which widgets satisfy the
criteria for which sets.

That's it. With these tables and views, defining a new set can now be done
by inserting rows for its criteria into the
widget_subset, widget_subset_pred,
widget_set, and widget_set_pred tables. Members of
the set will then be immediately visible in the
widget_set_member view. We don't need to create or modify any
views, and we don't need to keep any separate tables in sync when the
widget table is changed. However, as you might guess from
looking at the widget_subset_member view, the query performance
may be rather slow, depending on the amount of data there is to work
through.

We can improve performance by creating another table to enumerate all
possible attribute values, and another to associate widgets with the
attributes that apply to them (replacing the attribute_member
view):

attribute

id

attribute_type_id

value

1

1

Big

2

1

Tiny

3

2

Round

4

2

Star

5

2

Linear

6

3

Purple

7

3

Red

8

4

Chaotic

9

4

Lawful

10

4

Neutral

We can set a uniqueness constraint on the
(attribute_type_id, value)
combination.

attribute_member

attribute_type_id

attribute_id

widget_id

1

1

1

2

3

1

3

6

1

4

4

1

1

2

2

2

4

2

3

6

2

4

9

2

1

2

3

2

5

3

3

7

3

4

10

3

attribute_type_id is not semantically
necessary, but we can create a uniqueness constraint on
(attribute_type_id, widget_id) to ensure we
don't accidentally give one widget two different colors, etc.

Then we would modify the widget_subset_pred table to have an
attribute_id column instead of attribute_type_id
and attribute_value, and redefine the
widget_subset_member view:

create view widget_subset_member as
select wss.id as widget_subset_id
, w.id as widget_id
from widget_subset wss
, widget w
where (select count(*)
from widget_subset_pred wssp
, attribute a
, attribute_member am
where wss.id=wssp.widget_subset_id
and wssp.negated=1
and wssp.attribute_id=am.attribute_id
and am.widget_id=w.id)=0
and (select count(*)
from widget_subset_pred wssp
, attribute_member am
where wss.id=wssp.widget_subset_id
and wssp.negated=0
and wssp.attribute_id=am.attribute_id
and am.widget_id=w.id)=
(select count(*)
from widget_subset_pred wssp
where wss.id=wssp.widget_subset_id
and wssp.negated=0)

The disadvantage with these extra tables is that now we have to maintain the
data in the attribute and attribute_member tables,
which is more complicated than maintaining widget alone. On the
other hand, the descriptive columns could be dropped from
the widget table, so the data would still only need to be
maintained in one place. The structure of the original widget
table could be reproduced as a view.

Note that this table structure can't ensure that every widget has one
attribute of each type. Some widgets might be missing a size, or an
alignment, etc. This would be somewhat like having a null value
in the corresponding column in the original widget table, but
the semantics in the set membership views would be different.