(Notes of mine from a project using SQLite, 2009-08-10)
I leaned towards conclusion that the
DISGUSTINGTIME-LENGTHENING [EXPURGATED] SCALING seems in line with
join implementation by ‘nested loops’ discussed by
http://www.sqlite.org/cvstrac/wiki?p=QueryPlans .
Logically it seemed I could do MUCH better
even by doing subqueries manually in separate queries
and feeding results thereof into later query/ies
with a long in (?, ?, ?, ?, ….) clause.

Native types of db

changes as sqlite evolved —> change optimization strategies.

once upon a time, every thing was a string

now more native types

some native type comparisons in db would seem faster than others

might want to structure some queries and/or tables accordingly

Some weird query behavior I’ve seen:

I selected MIN, MAX from a table indexed on k;
was slower than selecting each in a separate query.
Actual code of my test (comments added afterwards, duh):

Database Locking

sqlite normal default behavior is to lock entire db file on write:

no reads during write

Reducing db locking collisions

To make overall program performance more orderly and predictable,
in a multithreaded / multiprocess piece of software
you might want to do some things, e.g. db writing,
through a 1-at-a-time
coordinator, at least where that’s convenient and totally reasonable.
[I pointed out someone in the audience who I knew had done this. :-) ]

some possible methods

FIFO job queue

does things in what’s probably the expected order

priority queue

job stack (FILO) :-)

maybe could be helpful for averting some perverse
virtual memory swapping situations

AUDIENCEDISCUSSIONMAYBE?

Sample code?

Other wrapping code to help decide how to deal with collisions?

Conclusions

optimizing your pysqlite dbs and queries can make performance differences of
order(s) of magnitude

even in unobvious or surprising ways

I told you about real problems and shortcomings

instead of trying to blow happy smoke

and how to work around them!

Some general considerations

the usual SQL stuff that everyone knows or learns

e.g. create indices

caching is enemy of performance testing!

workarounds

order of criteria in a query

narrow down quickly

weird optimizations by sqlite e.g. min, max

db’s native types

transactions

bundling —> speedup

bug in Python 2.5 pysqlite —> workaround

database locking collisions can be reduced to reduce performance weirdness

Use

This presentation and its outline/notes on the web
http://chrisniswander.com will link to this and other presentation notes.