HELP WITH OPTIMIZING SP

I'm quite new to optimizing Stored procedures, I'm currently developing a application who will be added to a website which have 6000+ daily users, and the current SP will be runned one time for each unique user.
This basicly means it will be runned a lot in the beginning when we release the application, and as time goes by the procedure will be only runned for each new user. 1-300 each day.

When i'm monitoring it with Profiler it takes around 1000-10000 ms in duration, around 300-1000 cpu and has aprox 10000 reads and 16 writes.

I guess the temp tables are partly to blame for bad preformance, but i just cant find a way around them.

I know i'm using old style sql and not joins, and i know i should change it, but i dont think that is the problem with this procedure.

Code follows below, here are some comments of the code:
Each user has one or many sessions (in this case the ALWAYS has one)
Each user has responed to a set of questions. (responsevalue)
The responsevalue is stored with a reference to the session id and the item id (question id)
Each question is connected to many faktors with a given calc_value for each connection
Select nr 1 is calculating scores for the faktores by inserting values in a temp table
It looks kind of like this:
user id:faktor id:question id:responsevalue:calcvalue:score:
111122
112313
121144

etc..
the score in each line is calculated by multiplying the responsevalue with the current calcvalue.

Next we need to calculated user-scores on a faktor level. (tscores)

This basicly means you run this algorithm where sum(score) is the sum of all the scores for the user on this faktor:
((sum(score)-average)/std_deviance)*10+50
The variables average, std_deviance is found in a table called konstanter. Each faktor will have many rows in the konstanter table, but only one for each language (spraak).

What i have done so far is the run the first select into a temp table, then inserting and calculation tscores from the temp table directly into the table where i want the scores.

The table Item_response is very large, it contains 300+ lines for each user.
All other tables except user and session is relativly small (<1000).

do you have indexes on those ids? can you paste the execution plan for these queries? whats the total size for each table?
your tempdb can contribute to the performance since you're using temporary tables. However, it can only be one of many. but let us start with the execution plan and indexes.

Without looking further, you must drop select .... into #tempTable code and use create table #tempTable and normal insert. The reason is that each time you use select into, tempdb is locked untill data are populated.

in any case you ALWAYS want to explicitly create temp tables, don't rely on select ... into. It is simply lazy and makes the table creation less explicit. You also always want to create all temporary tables at the start of the procedure. This will reduce procedure recompiles

Temp table creation is concurency and recompilation issue and should be resolved. However, from single execution performance point of view you will gain the most if you optimize index design as cmdr and Twan already suggested.

First of, i have now modified the procedure to look like twans example.
The indexes suggested was already in place, except the one on the qstnr table, which is very small anyway (max 10 rows).

Statistics from profiler shows the following result:
Difference in reads - writes - cpu = No major difference. Results still hovering around 5-9000 reads, 50-250 writes, cpu 50-200.
Duration: New duration is around 100-200 ms when the procedure is run by it selv, when the procedure is run multiple times in small spaces of time the duration goes up to 1000-6000 ms.
Difference from old design: It seems to be faster when run alone (old 400-600), but when run multiple times its no big difference.

So there seem to be something to gain by using your design, and I am very grateful for that.

I have not yet tried ranjitjain's design, but i will give it a testdrive to see. I will come back to post results for that.

Other questions:
Q: can you paste the execution plan for these queries?
A: yes, but how to i convert the execution plan to a readable format for you guys? I guess i could do a screenshot, but then vital information when hovering over the steps are missing. Any good suggestions?

Q:Btw, do you need the results to be accurate? The NOLOCK might be responsible for incorrect resultsets.
A: Data for qstnr_item_faktor, qstnr, konstanter will not change more than once a month. When this happens the applikation goes offline. session and item_responses only allows for inserts and all the inserts for the given user has been completed when this procedure runs. So even though the data has to be accurat I have choosen to go with the nolock option to allow inserts from other users to happen while this calculation is running.

One thing that worries me is that other simple and fast running queries seems to block up and stack up when this procedure is run at the same time. A query with an average duration of 0-10 ms suddenly takes 1000 ms when run right after this procedure.
Even with nolock on all the tables involved.

Thanks again for all your help! I will come back with an update after trying ranjitjain's design.

For a textual execution plan, change "Results in Grid" to "Results in Text" on the Query menu of Query Analyzer.

Would it make a difference if you create a temp table where you store the base information for the calculations, then insert from the temp table into the target table? It might also make a difference to do the calculations on the temp table before copying the data into the target table, instead of doing the calculation during the copy action.

I have now tried ranjitjain's design and here are the results.
Maybe i have forgotten to mention, but when i run the test i launching 100 users at a random time within 2 mins. These users do a full application run, they post all the responses and has a calculated score in the end.
This means the results may not be correct with a sample as small as this. If lucky those users may calculate the score at completly different times and by that giving me the impression that the procedure is running faster.

However, the results so far shows that ranjitjains design has a avg duration of 270 -600 ms while twans is at 150-400 ms.
Reads, writes, cpu no big difference. And when run multiple times within small spaces of time the procedure takes long on both. 1-3000 ms.

Thanks to both of you guys, but i guess i will try to work with twans example!

Hi,
Thanks for posting your test results.
So the table variables are faster then derived tables.
lets consider twan example.
can you check the code below which i have commented and i think that column is irrelevant here.

If you remove the item_sid column from the table variable, you will have a primary key of only faktor_sid and then there will be a problem when inserting rows with the same faktor_sid. (There will be 40 rows in table variable for each faktor_sid).
A solution may be to not putting a primary key on the table variable. Is that advisable?
If so, do you think i will gain any preformance by doing it?

After running a new test i'm getting results of an average duration of 195 ms on the procedure.
Thats a remarkeble difference from the first draft!
But still a couple of the executions has duration up to max 1600 ms, and dragging some other usally fast running inserts on the item_response table down on the way.

otherwise SQL has a tendency to do the joins first and then the where conditions, which can be very bad for performance

also the nolock option should in my opinion only be used where it is absolutely necessary. I must admit that in over 10 years of working with SQL Server I can probably count on one hand the number of times that I've had to use that in production code

I havn't tried your latest suggestion twan, but i will try it once the weekend is over and i'm back at work. I will come back with test results then. Same goes for the indexes. I think i got them all, but i will check it again.

One thing i may have forgot to mention is that this procedure is more than just this code and the test results i'm posting is when the whole procedure is runned. In short terms the procedure has 2 main operations. Calculate scores of faktor (the code you guys have been looking at) and Calculate scores of fasett. The fasett calculation is pretty much exactly the same as the faktor calculation, only with connections to fasett tables instead of faktor tables.

All the changes has been done on both parts of the procedure, so the test results should be accurate in regards to the percentage gained compared to the old one, but in reality the reads, writes, duration, etc is half of what i have posted for just this bit of code.

Ok, here is the results from trying some new designs.
First of, I've upped the test scenario to 300 users to do a full application run. The users is started randomly within 2 minutes and each user has completed all task within 20 secs. (When launched we expekt at max 6000 users doing the same within 2 hours. Each user will take 10-30 mins completing the run).

I tried mmarovic's example with some modifications from twan, and implemented them on the code for the whole procedure.
What happened was this:
Reads increased from 7000 to 10000
Writes decreased slightly (average of 20 -> average of 14)
Cpu increased from average 120 to 180
Duration increased from avg of 180 ms to 400 ms.

So from what i can see the first suggestion with table variables is still the fastest.

Next up the indexes, here is what twan suggested:
konstanter( language_sid, faktor_sid ) ideally clustered (can have columns other way around too)
Konstanter (faktor_sid, language_sid) non clustered
Konstanter (fasett_sid, language_sid) non clustered
(There can be only one clustered and konstanter contains data of both faktor and fasett. One of whom will always be NULL.

item_response( session_sid, item_sid ) ideally nonclustered
item_response(session_sid, item_sid) non clustered.
(Session_sid and item_sid are the primary keys of this table which also makes the a clustered index right?)

tscores_faktor( mp_user_sid ) ideally clustered
tscores_faktor(faktor_sid, tscore) non clustered.
(mp_user_sid, faktor_sid are the primary keys and makes them a clustered index right?)
I need the faktor_sid, tscore index for another procedure so i cant remove that. But should i create the table using non clustered primary keys and create a clustered mp_user_sid index? Does the index really matter when inserting rows?

Ok, so for those with special interest here is the whole procedure including the fasett bit. (Just to get the whole picture).

--For all the scores in the table variable where there is found konstants we insert in tscores_fasett using this line
insert into tscores_fasett(mp_user_sid, fasett_sid, tscore)
select @user_sid,
t.fasett_sid,
((t.meansnudd-k.average)/k.std_deviance)*10+50
from @tempfasett t
inner join
konstanter k with (nolock)
on k.fasett_sid = t.fasett_sid
where
k.language_sid = @spraak
group by
t.fasett_sid,
average,
std_deviance,
t.meansnudd

--For all the scores in the table variable where there is no konstant we have to search the lookup table for the right tscore
insert into tscores_fasett(mp_user_sid, fasett_sid, tscore)
select @user_sid,
t.fasett_sid,
l.tscore
from @tempfasett t
inner join
lookup l with (nolock)
on t.fasett_sid = l.fasett_sid
where
l.language_sid = @spraak
and t.meansnudd between l.min_value and l.max_value

end

So, what do you think?
Is there a way I can gain even more preformance?
Maybe its all down to my fairly amaturish way with indexes?
I'm still using the with (nolock) option. Even though i cant see any difference in preformance with or without it.
One other thing to point out is that all the tests are run on a SQL-server with limited resources (bad hardware), and the production server will be much faster.

I would still like to try my original implementation. While I agree with Twan that putting language_sid = @spaark in the join clause might potentialy improve execution plan, I am afraid that putting the condition s.mp_user_sid = @user_sid there could made it worst. My guess is that this condition filters out the most rows from resultset so instead as a join condition it should be used in where clause. However I can only guess without seeing the exectuin plan.

They look fine. If session_sid is primary clustered key on the session table, then you don't need session_sid as the second column of index on mp_user_sid. However, I don't expect dramatic impact of removing that column from the index. Also that second column may be usefull for different query (if there are where conditions on both columns).

<blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote"><i>Originally posted by mmarovic</i><br /><br /><blockquote id="quote"><font size="1" face="Verdana, Arial, Helvetica" id="quote">quote:<hr height="1" noshade id="quote">I still like the results from table variable procedure better...<hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote">That's fine, I just wanted to see what happens when execution plan I had in mind was tested.<br /><br />I see udf call in another part of the procedure. This is the place where improvements are possible.<br /><br /><hr height="1" noshade id="quote"></font id="quote"></blockquote id="quote"><br /><br /><br />Yes i guess thats whats really dragging it down.<br />The problem with it is that each item is connected to the fasett table through the qstnr_item_fasett (qif). So the qif table contains information wheter or not the scorevalue for this spesific item connected to this spesific fasett has to be turned or not. If the score has to be turned, it has to look up the midrange of the item (in the item table) and flip the score accordingly.<br />And since this check and flip has to be done to every item its bound to process slow... <img src='/community/emoticons/emotion-6.gif' alt='' /><br />And i cant even store the flipped score in the item_response since it changes from fasett to fasett...<br /><br />If you have some suggestions to how i can denormalize or in any other way work around this please let me know.<br /><br />Thanks for all help so far! I've come a long way since the beginning!<br />

I modified the UDF slightly and halleluja! Reads decreased to 3000, cpu acordingly!<br />Then I tried mirkos code once again, and the reads decreased to 1500!<br />Avg duration mirko-code at 160 ms.<br /><br />Why didn't i think of that before? Thanks for reminding me mirko! I feel so stupid <img src='/community/emoticons/emotion-1.gif' alt='' /><br /><br />Maybe there is still more to gain? I will try as adriaan suggested to table variable the values.<br />I'll be back to post results from that.<br />

ok, i've tried table variables and it seems faster although with more reads than just the modified udf (1500 reads). This has about 3000 reads but runs at speed sub 100ms !
Not constantly, avg is around 150 ms, with some sub 100 and some 1000+

Below is the code to populate the flipscore table variable.
Do you think I will gain something if i modify the column qif.dir from varchar to int describing its direction?

Are you able to run your code from QA with statistics IO and time both turned on, and post the resulting messages?

as adriaan hints, the index on qnstr currently has the columns in the reverse order for what may be more optimal for this query. Try reversing the column order in the nonclustered index to language_id, qnstr_id

Q:Are you able to run your code from QA with statistics IO and time both turned on, and post the resulting messages?
A: Of course, but do you mean the code with or without the udf? Here is without:
One strange thing is that first time I run it it goes slower than second time. Look at the statistics.

the difference is mostly due to the delete on the fasett table. In the first call there are physical reads required, the second time there aren't. There is definitely a clustered index on this table with user_id as the first column?

for the fasset inserts, try using a union (or better yet a union all) rather than the two separate inserts?

for the item_response index, add responsevalue to the end of this index. This will cover the query, unless there are other columns out of the item_response table used by the queries (not all columns have the 'alias.' at the front of them which makes it harder to work out)

what are the tables fasett and web_user? I've not seen them in the code? are they part of a view?

some points in general,

- a primary key can be defined as either clustered or non-clustered. You want to make sure that you don't have a clustered primary key as well as a non-clustered index unless you have a specific reason to do so. e.g. you're using the non-clustered index to cover a query and need the clustered index for range queries
- don't put an order by on an insert into...select clause, unless you have an identity column that needs to have a predetermined order
- don't put additional columns in the group by clause. Only put in those columns that make the rows unique. In you example you have things like fasett id followed by average, etc. only put the additional columns if it is possible for a single fasett id to have multiple averages, otherwise all you're achieving is to slow down the group by without any benefit

Based on all your feedback, this is the best code i've put together so far:

In this one there is no udf, and the statistics/time io from previous post is from when this code is run:
Looks like the most reads /scans accour in the lookup table. Is there a index i'm missing?
so far lookup has this index: lookup(fasett_sid, language_sid)

Ok, i just read your post twan. Give me some minutes to sort this out and i'll post back with new results.
One question though, and this may be a amatourish one. How to make a clustered primary key non clustered without dropping the whole table?

I have to group by average and std deviance because it is possible to have multiple averages and deviances for each fasett. Though only one pr. language.

Q: you say "most reads/scans" - you mean index scans?
A: Table 'LOOKUP'. Scan count 85, logical reads 1105, physical reads 0, read-ahead reads 0.
None of the other tables has a scan count and logical read count as high as this.
I will try to experiment with some index changes on this table, maybe a clustered is better than non clustered?

I also tried to to a union all with the two separate inserts into tscores_Fasett.
I did not change any other code, and when i tested it it seems to run slower than the double inserts.
Maybe union puts more pressure on the hardware? My inferior sql-testserver is not particulary well equiped in that department...

With reversed index on lookup(language_sid, fasett_sid) clustered, it effectivly decreased the reads in the lookuptable from 1000 to 170.

Twan:
The statistics io output shows reads in WEB_USER, FAKTOR AND FASETT.
The procedure contains no views and i dont really know why those tables have reads...
The web_user table is the parent of session, and contains user information.
Faktor and fasett tables are parrent of the tables containing fasett_sid/faktor_sid and contains basic information about the faktors and fasetts. None of which is included in this procedure so again I dont know why it reads through those tables.

Yes, the item_response table has index with session_sid, item_sid, responsevalue.

Since i've now included the item table as a part of the procedure i need to put a decent index on that one as well. From my point of view it seems logical to create a non-clustered index on item(item_sid, mid_range).
I've also added a non-clustered index on web_user(mp_user_sid)
Does that seem logical or have a missed something important?

noclustered on item_sid, mid_range sounds sensible provided that there isn't a clustered index on item_sid, or if there is then if the execution plan without the nonclustered index shows a clustered index scan rather than a seek.

same with web_user if there was already a clustered index on mp_user_sid then the nonclustered may not make any difference, again check the execution plan

if you do create a nonclustered index which is a composite index and includes the primary key, then make sure that you make it a unique nonclustered index since it will effect SQL's optimiser