Each table shown is in Sixth Normal Form (6NF) and contain no indexes other than the primary key. The schema represents a typical user/group mapping where there is a many-to-many relationship between users and groups (i.e., a user can belong to multiple groups, and a group contains multiple users).

I've randomly generated and loaded data into the tables. There is roughly twice as many entries in the mapping table as the domain tables:

Common queries will be noticeably slower with this amount of data. For example, getting a list of groups for a specific user can take over one second. Using the EXPLAIN command we can find out why, e.g.,

The culprit is the full table scan on foobar_users_groups, which is perhaps the worst thing we can do since it's the largest table with almost a quarter million rows. In the EXPLAIN output, if you look in the column identified as rows, you'll see that 240999 rows were scanned.

Selecting which users belong to a specific group performs the same full table scan of foobar_users_groups. If we were to load millions of users and groups the performance would be unacceptable by any reasonable expectation.

Indexes

I want to add indexes to these tables to prevent full table scans. In fact, one of the main problems is that we're starting with a username and groupname and neither of those are indexed, only the user_id and group_id have indexes (from the primary key). Since these tables are fully normalized, there is no reason that username and groupname were not given the UNIQUE identifier in the create table statements, e.g.,

The query is no longer scanning entire tables, in fact, the largest estimated row count is from the foobar_users_groups mapping table-- EXPLAIN estimates it needs to scan 12 rows, which corresponds correctly with the number of groups that will be returned.

However, querying users in a specific group remains slow, consider the following:

The query is no longer doing a full table scan of foobar_users_groups, but it is doing a full table scan of the foobar_users table. The problem is, the primary key on foobar_users_groups is a composite key consisting of (user_id, group_id). The query (as written) is scanning all of the user_id's that might be part of that composite (user_id, group_id).

Adding a separate group_id index on the foobar_users_groups table will solve this, i.e.,