I have a table which is a link table from objects in my SQL Server 2012 database, (annonsid, annonsid2). This table is used to create chains of triangle or even rectangles to see who can swap with who.

This is the query I use on the table Matching_IDs which has 1,5 million rows in it, producing 14 million possible chains using this query:

I must improve performance to take maybe 1 second or less, Is there a faster way to do this? The query takes about 1 minute on my computer. I normally use a WHERE m.annonsid=x, but it takes just the same amount of time, cause it has to go through all possible combinations anyway.

What indexes do you have defined on the table? An index (annonsid2,annonsid) may be a good option.
–
Oded♦Dec 29 '12 at 22:38

I have a clustered index of both columns, and one non-clustered for every column (2)
–
infinity1975Dec 29 '12 at 22:39

Where does the query plan say time is taken?
–
Oded♦Dec 29 '12 at 22:40

Well, it takes about 2 seconds from when i Press run until it starts to show results, but it doesn't mean the query is that fast right?, cause when I just do a Count(m.annonsid) instead of getting all the records, its taking around 1 minute.
–
infinity1975Dec 29 '12 at 22:41

Could you append query plan to the question?
–
VladimirDec 29 '12 at 22:41

#1 and #2 is similair (22 seconds) same as my normal query, i managed to use a distinct table which went down from 1 minute to 22 seconds. The 3rd query is very fast, 2 seconds. I have 8gb ram.
–
infinity1975Dec 30 '12 at 11:35

Actually what I really want is to remove the 1 to 1 id table, but I have no clue on how I can know which ad matches which in an easy way. I thought about having alot of bigint's and using bitwise and between the ads. This is narrowing the search down to 15.000 ads (rows) instead of 1,5 million. BUT, the end result will anyway be 15 million so it might not be worth it.
–
infinity1975Dec 30 '12 at 11:44

What do you mean by 1 to 1 id table? Regarding performance: I just did a test on my laptop. Counting 6 mio rows takes a while (20 secs) and so I doubt you can get much faster.
–
alzaimarDec 30 '12 at 23:35

It seems like you already indexed this quite well. You can try converting the hash to a merge join by adding the right multi-column index, but it won't give you the desired speedup of 60x.

I think this index would be on annonsid, annonsid2 although I might have made a mistake here.

It would be nice to materialize all of this but indexed views do not support self-joins. You can try to materialize this query (unaggregated) into a new table. Whenever you execute DML against the base table, also update the second table (using either application logic or triggers). That would allow you to query blazingly fast.

You should make this query a bit more separated. I think first you should create a table, where you can store the primary key + annonsid, annonsid2 -if annosid is not the primary key itself.

DECLARE @AnnonsIds TABLE
(
primaryKey int,
-- if you need later more info from the original rows like captions
-- AND it is not (just) the annonsid
annonsid int,
annonsid2 int
)

If you declare a table, and you have index on this column, it is quite fast to get the specified rows by the WHERE annonsid = @annonsid OR annonsid2 = @annosid

After the first step you have a much smaller (I guess), and "thin" table to work with. Then you just have to use the joins here or make a temp table and a CTE on it.

I think it should be faster, depending on the selectivity of the condition in the WHERE, of yourse if 1.1 million rows fits in it, then it does not make sense, but if just a couple hundred or tousend, then you should give it a try!

You could denormalize the data by adding a table RelatedIds with AnnonsId, RelatedAnnonId and Distance. For every value of AnnonsId the table would contains rows for each RelatedAnnonId and the number of relations that need to be traversed to reach it, aka Distance. Triggers on the existing MatchingIds table would maintain the new table with some configured maximum value for Distance, e.g. 3 to handle rectangular shares. Index the table on (AnnonsId, Distance).

Edit: An index on (Distance, AnnonsId) will allow you to quickly find rows that have enough related entries to form a particular shape. Adding a column for MaxDistance may be useful if you want to be able to exclude rows based on, for example, having a triangular but not rectangular relationship.

The new query would inner join RelatedIds as RI on RI.AnnonsId = m.AnnonsId and RI.Distance <= @MaxDistance with the desired "shape" dictating the value of @MaxDistance.

It should provide much better performance on the select. Downsides are another table with a large number of rows and overhead of the triggers when altering the MatchingIds table.

this was way to complex for me to understand :D, What I already have is a table relatedids, this is the table "Matching_IDs" who have (annonsid, annonsid2), What I dont understand is the "Distance" and how to get it?, and what the triggers will do.
–
infinity1975Dec 31 '12 at 9:38

The problem persist I think?, the problem is the performance, not to create the outcome. This trigger is just the same as using my query by providing an id making it run on only that id, and then save the result in a second table. Am I wrong?
–
infinity1975Jan 2 '13 at 10:41

@infinity1975 - The difference is that the trigger divides the heavy lifting into a relatively small amount of work every time the base table is changed. A downside is that the work is always performed, even if you never query for "shapes". The upside is that the "shapes" query should be much faster.
–
HABOJan 2 '13 at 14:31

Yes, I have a solution for this, except the distance parameter. I have one table for directswaps and one for triangleswaps.
–
infinity1975Jan 2 '13 at 17:21