Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

Columns HASHKEY has 19 digit BIGINT values, which can DUPLICATE. But values will be always of 19 digits.

I have application mainly depends on data in this table. Application retrieves data from this table with query like:

select HASHKEY, NOTE_ID, TIME_OFFSET
from TABLE_NAME
where HASHKEY in (<list of around 30000 hashkeys>)

But this query takes around 2 minutes to retrieve data. This is my problem. The application is real time processing application, and need to retrieve data in about 5 seconds. How can I tweak things in server, so time to retrieve data can be decreased?

I have indexed the table by HASHKEY column when I created table, but still retrieving data is taking much time.

Is there any setting which I can do in database to so time can be decreased? I will welcome any type of solution. But I need to solve this. I am not very expert in this.

Also HASHKEY is just random values of 19 digits, no relation with other values.

Could you please edit the question to include the entire scripted table definition, including all indexes? We need to know exactly what you mean about indexing the table by HASHKEY. Also, when the application runs one of those queries, are the HASHKEY values sequential, or random? Posting an execution plan would also help. I get the feeling that even if there is an index on HASHKEY, too many rows are being asked for and the plan is table scanning all the time. Which edition of SQL Server (Standard, Enterprise, etc.) are you using?
–
Jon SeigelSep 13 '13 at 17:08

2

Yeah. I'd need to know about your table definition and indexes, too. Is your clustered index on the non-unique HASHKEY? Is HASHKEY sequential? Also, if your query really does give a list of 30,000 hashkeys with a "hashkey in," well, that's not going to provide you with optimal performance.
–
Katherine VillyardSep 13 '13 at 17:15

what can be problem? Please help me? What are the ways to boost query performance when situations is like: so many records, around 500 million and want to retrieve only 30000 records. Records can have duplicate values, say out of all records, only 60% are distinct.
–
UDPLoverSep 13 '13 at 17:45

I am not DB expert, I am Java developer and created DB myself, now problem arose, I have not did advanced things in DB, just created table, and indexed non-unique column HASHKEY. HASHKEY values are random.
–
UDPLoverSep 13 '13 at 17:49

@Meraman How are you passing the haskeys on the IN? (I mean, are they retrieved from another SELECT?, are you writing them?). Also, if HASHKEY is always a 19 digit number, you might wanna think on using a CHAR(19) instead
–
LamakSep 13 '13 at 18:25

1 Answer
1

Your table is a "heap," which means that it doesn't have a clustered index. The short version of what that means is that your data isn't laid out on the disk in a logical way. A heap really isn't an optimal structure for an actively updated table. Here's an excellent video on heaps.

So, my suggestions--please test them before deploying them in production--are:

This will take some time to complete AND will be a blocking query, so do it during downtime.

This will give you a clustered index and lay out your table on a completely random sequential number, unfortunately, but since you don't have any non-unique values that might be the best course of action.

I would then include the columns note_id and timeoffset in your nonclustered index on HASHKEY.

(Or you could do it through the GUI. If you right-click on the index and choose properties, there's an included columns pane.)

This will hopefully have the effect of replacing your heap scans with nonclustered index seeks, but since your query has a "where in [30,000 items], it might do a index scan instead. Either way, it won't be rooting around in the heap, which I expect will be a good thing.

I will be retrieving data based on HASHKEY values, not based on ID values. Like select * from TABLE_NAME where HASHKEY in (....), not like select * from TABLE_NAME where ID in (....)
–
UDPLoverSep 13 '13 at 20:10

Yes, I understand that. On the other hand, since HASHKEY is not unique, it's a poor choice for a clustered index. Also, select * is... not optimal.
–
Katherine VillyardSep 13 '13 at 21:27