From: Rick James
Date: September 20 2012 4:29pm
Subject: RE: Making Myself Crazy
List-Archive: http://lists.mysql.com/mysql/228208
Message-Id: <2E7DD7ADE53B044C8C8BCD9C5829E1EB148BB5E0C1@SP2-EX07VS01.ds.corp.yahoo.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
below..
> -----Original Message-----
> From: Jan Steinman [mailto:Jan@stripped]
> Sent: Wednesday, September 19, 2012 8:59 PM
> To: mysql@stripped
> Subject: Re: Making Myself Crazy
>=20
> Thanks for your help, Rick!
>=20
> Interspersed are some questions and rationales for you to shoot down...
> :-)
[Rick James] :)
>=20
> > From: Rick James
> >
> > s_product_sales_log has no PRIMARY KEY. All InnoDB tables 'should'
> have an explicit PK.
>=20
> This table really has no identifying information. There could be two
> identical, valid rows, if the same person sold the same amount of the
> same product to the same other person on the same day.
[Rick James] The PK provides a _unique_ identification for each row. Each =
other key depends on the PK to find the row.
A PK is a UNIQUE key is an INDEX.
If you don't provide a PK, InnoDB will provide an inaccessible, 6-byte, num=
ber for the purpose.
id INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY
is a common technique. It is 4 bytes. (Or less if you use MEDIUMINT, etc)
(Hence, I said "should", not must.)
> All the foreign keys were indexed. Is there something I don't
> understand about something a PK field does? If an individual record
> cannot be uniquely identified by its information, is there really any
> need for a primary key?
>=20
> None the less, I added field "ID" as an unsigned autoincrement INT and
> made it PK.
>=20
> > INT(5) is not what you think. INT is always a 32-bit, 4-byte
> quantity, regardless of the number.
>=20
> >
> > Use TINYINT UNSIGNED, SMALLINT UNSIGNED, etc. wherever reasonable.
>=20
> Understood. I make all my keys UINT even when they could be smaller,
> because I've gotten into some gnarly consistency problems. with foreign
> keys.
[Rick James] Yes, types must be consistent for FKs. (I don't know the deta=
ils of what aspects of the 'type' can be relaxed.) Usually that happens 'a=
utomatically' when you build the schema and decide that foo_id is TINYINT U=
NSIGNED, and apply that to all tables with foo_id.
Changing the type after the fact can be trickier, especially because of PKs=
.
>=20
> >> KEY `is_value_added` (`is_value_added`),
> > A single-column INDEX on a flag, ENUM, and other low-cardinality
> field, is almost always useless.
>=20
> Why is that? Surely, even a flag separates the record space into two?
[Rick James] Yes, it separates. But let's look at the effort. There are t=
wo 'files', one with all the data, one with all of this index.
Plan A: Use the index. Walk through half of the index 'file' (actually a =
BTree). For each 'row' encountered, reach over into the data -- this may b=
e a random lookup. Back and forth, back and forth.
Plan B: Ignore the index; simply scan all the data 'file'. As you scan, c=
heck the row for the flag.
Plan A is likely to involve more I/O, hence be slower. So, the optimizer c=
hooses Plan B. There is no exact cutoff between the two approaches; it is =
somewhere around 20%.
Further note, that to reach from the Index into the Data requires a BTree l=
ookup to find the data row. This uses the PRIMARY KEY, which is implicitly=
included in the secondary key's BTree. (Hence one reason that InnoDB must=
have a PK.)
>=20
> > Performance issues...
> >> WHERE YEAR(sales.`Date`) =3D '{{{1}}}'
> > won't use
> >> KEY `Date` (`Date`),
> > because the column (Date) is hidden in a function. A workaround:
> > WHERE `Date` >=3D '{{{1}}}-01-01'
> > AND `Date` < '{{{1}}}-01-01' + INTERVAL 1 YEAR
>=20
> Thanks! Good catch.
>=20
> > JOINing two subqueries -- There is no way to index either of them, so
> the JOIN will have to do a table scan of one temp table for every row
> of the other temp table.
> > (The alternative is to CREATE TEMPORARY TABLE... with an index, for
> one of the subqueries.)
>=20
> But I made sure the subqueries were the smallest possible sets --
> essentially, the domain of s_profit_centre, which only has 12 records.
[Rick James] Good. With 12, the performance won't be too bad.
>=20
> I had the entire thing coded up into one massive JOIN of everything,
> and it took 30 minutes to run! By LEFT JOINing down to a dozen or fewer
> records, it seems to run in reasonable time, even though it's two
> subqueries that are not indexed.
>=20
> > It would probably be better to move the mt.tot!=3D0 test inside:
> >
> > GROUP BY `Profit Centre`
> > ) mt ON mt.pcid =3D tt.pcid
> > WHERE mt.tot !=3D 0
> > ) xx
> > -->
> > GROUP BY `Profit Centre`
> > HAVING tot !=3D 0 -- added
> > ) mt ON mt.pcid =3D tt.pcid
> > -- removed: WHERE mt.tot !=3D 0
> > ) xx
> >
> > That would make mt have fewer rows, hence that unindexed JOIN could
> run faster.
>=20
> My first attempt to do that produced an error. And again, both the
> subqueries will have 12 or fewer records, so I'm wondering if this
> really helps anything.
[Rick James] From where I sit, it is hard to say, I can't tell if the HAVIN=
G would decrease the "12".
>=20
> Thanks for your help!
[Rick James] You are welcome. (I can also be found on forums.mysql.com.)
>=20
> [clip]
>=20
> ----------------
> :::: People see what they have been conditioned to see; they refuse to
> see what they don't expect to see. -- Merle P. Martin
> :::: Jan Steinman, EcoReality Co-op ::::
>=20
>=20
>=20
>=20
>=20
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe: http://lists.mysql.com/mysql