10 Answers
10

It is similar to the C# solution. Your list of value stays in memory (no temporary table) and it will not be persisted to disk and you use a parameterized query so the query executer doesn't have to reparse every query. I have no tried it with java but I think that it will be fast.

Agree with 1. Do you have any info or links with performance stats for 2 - seems a bit non-intuitive? Any guidance as to the tilting point as to IN becomes inefficient vs insert to temp and then JOIN to temp? Thnx
–
StuartLCSep 4 '10 at 11:31

That is what I answer during the interview. The interviewer then comment this answer must have something to do afterward. I guess he is talking about concurrent access to table and housekeeping.
–
woakSep 5 '10 at 8:28

Collection type MDSYS.SDO_ELEM_INFO_ARRAY is used because if we use this already predefined Oracle type we don't have to define our own Oracle type. You can fill collection MDSYS.SDO_ELEM_INFO_ARRAY with max 1048576 numbers.

If you already have an index on the value field and the values are not available in any tables to join or sub-select from, then I don't think there are any possibilities to optimize. In the special case where your values are really "val1", "val", ... you could use a like query which would uses an index to search on the prefix. But I asume that was just an example.

If query is executed frequently, values val1, val2 etc are fixed and table is large (having say 20,000 or more rows) then store all the values in a another table (say temp table) and join both tables on value field.

If table in the query below is large, there should be index on value field to improve the performance.

SELECT field
FROM table
WHERE value IN ('val1', 'val2', 'val3', ... 'valn')

Both the table should be analyzed.

The reason for better performance is that the optimizer will choose best join method depending on the characteristic of the table. If table in above query is very large, the join will be nested loop join and above table should have an index on the column val.

If table in above query is very small (say fewer then 200-300 rows) the new table (temp table) should have index on val column.

I got acceptable performance (execution time close to fetching rows unconditionally) when doing a similar query with the following approach.

static final int MAX_QUERY_SET = 1000;

I iterate the values and make separate query every MAX_QUERY_SET value. So for 10K values I have 10 queries. I process the queries sequentially.

After implementing this algorithm I was able to play with the constant. For either value of 30 or 3000 I got 3x longer execution time. So I sticked to 1000.

This may not work if you can't process multiple queries. My experience was gathered on a different database (Pervasive, with a limit of 65K characters per statement), but I think this question is quite general and its conclusions should be common.