If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

T2_PK index needed an extra 518 (4588 - 4070) blocks for the leaf and an additional 2 blocks for the branch because of the varchar2 data type.

Certainly number data type requires less space than the varchar2 data type. Any serach on index requires minimum of 3 Logical I/Os. That is why I got the same elapsed time from both indexes. If I start searching 1000s rows using index, i will get different response time.

But if you are going to search a single row or less than 100 rows using index, then either data type will work efficiently. Please remember that defining a data type for a column is entirely dependant on application.

just to return to my original point lest it be forgotten, basing the choice of what data type should be used for a PK on any alleged performance benefits of one type over another is not a good way to spend your valuable work hours. You are chasing performance benefits of fractions of a percent for your application, and you risk degrading the flexibility/supportabilty/expandability of your data model.

i) use a data type that is appropriate for the data being stored.
ii) if your pk is synthetic, and thus probably numeric, then NUMBER is evidently the correct choice.

First of all, I don't want to object any of tamilselvan's observations because they are all justified and logical. However I do find the following to deserve some additional explanation:

Originally posted by tamilselvan Certainly number data type requires less space than the varchar2 data type. Any serach on index requires minimum of 3 Logical I/Os. That is why I got the same elapsed time from both indexes. If I start searching 1000s rows using index, i will get different response time.

The above is true only under the pressumption that in VARCHAR2 column you are actualy storing the numeric values (converted to strings) - in that case of course the space consumed for the same numeric values will be greater in VARCHAR2 column than if they are stored in NUMBER column. Of course noone in his right mind would want to store let's say ******* values in a VARCHAR column. However you can also think of it another way:

If we want to assign unique ******* values to those 1.8 milions, those ******* numbers will on avarage consume somwhere between 4 and 5 bytes each - so those numbers will require about 8 milion bytes of storage (that's just my estimate, don't take it literaly).

Now let's say that we assign unique VARCHAR2 values to those 1.8 million of records, where values can be composed of normal ALPHANUMERICAL characters (A-Z, a-z, 0-9, if we set asside those other more "exotic" characters) - that means that each of those 1,8 unique values will consume no more than 3 bytes, so in total about 5 milions bytes (with single byte chatacterset).

So in effect, for uniquely identifying 1.8 million of rows with ******* numeric values will require about 30% more space compared to if we use "normal" alphanumeric values! That's quite normal, because in effect in the first case we use base-10 numeric system, while in the second case we are using base-64 (or something similar) numeric system, which is much more efficient storage-wise. Of course it is totaly different isue which of the two optons have more practical use .....

So my point is: Saying that using numeric datatype for PK is more efficient compare dto VARCHAR2 datatype because of the sawing in space consumed by the underlying index is only true *under certain circumstances* - in general it is just the other way arround!

I think slimdave has realy given the most accurate bottomline of this whole isue:

Originally posted by slimdave i) use a data type that is appropriate for the data being stored.
ii) if your pk is synthetic, and thus probably numeric, then NUMBER is evidently the correct choice.

Everything else is just speculations....

Jurij ModicASCII a stupid question, get a stupid ANSI
24 hours in a day .... 24 beer in a case .... coincidence?

Originally posted by jmodic You are talking nonsence here.
Wrong conclusions due to previously explained misunderstandings. The time difference even on a big table would be negligible, I can assure you.

The time difference wud be or order of milliseconds...PS below demo in my test environment.

Pin column of reporting_PRODUCT_Hier is Varchar type.
Pin column of temp_reporting_PRODUCT_Hier is number type.

There is another thing to consider. Using surrogate keys could cause you application to need more joins Thus negating you 15 millisecond performance gain. I think that the real answer depends on what type of data you are trying to store. Whether or not to use varchar2's or numbers depends on a number of factors. There are situations where Varchar2's would be preferable and there are situations where they would not be preferable. One other consideration is that if you are using a true surrogate key then the value of the key is probably derived from a sequence. Fetching the sequence might take a few milliseconds thus throwing off your numbers.

I know I haven't been around for a while, and I completely confess to not having the patience to read this entire thread. However, I simply canot believe that this argument is still going on. Okay, yes, if someone actually goes through the trouble of making some non-sensical series of letters and characters to make a unique key, then okay, maybe storing a varchar is more efficient. HOWEVER, this is very rarely the case and you all know it. People usually use string PKs because they have some existing string value that they want to use, like Department Name. Now we should all be able to agree that such a value will definitely not be more efficient than a surrogate, numeric PK. And efficiency of space is far more important than slimdave appears to realize. This is because the size of the data has a direct effect on the efficiency of the buffer cache. The buffer cache is one of the most important pieces of performance, what with disk accesses being orders of magnitude slower than memory accesses. Further, there is a geometric curve to cache utilization. If your data is 3 times larger than it should be, this means that only 1/3 of what could fit in the cache actualy does. This means that you are 3 times more likely to need to read from the disk. Since the data is 3 times larger, it will take you 3 times as long to read it. So your database now does 9 times more disk accesses than it should have.

Further, keys are even more important than other fields. Let's look at Department Name. In an accounting application, for example, just how many tables will contain the FK of the Department Name? Lots. Further, if that were a dynamic table, all of these FKs might have indexes on them (to avoid the idiotic locking issue, if nothing else). So now, we have this very large field duplicated in many tables, and further duplicated in many indexes. This single, bad PK choice is now costing a whole lot of cache and thus a whole lot of performance.

So, yes, the difference specifically between a string and a numeric value may be small, given similar sizes, and a string might actually be able to be tighter, but first of all, who actually does that? What most novices will take away from this argument is that it doesn't matter if I use large strings as PKs - jmodic and slimdave said that was fine. Large PKs need to be avoided at all costs!

Now, even when people *do* attempt to use 'tight' string PKs, they often run into another problem. That of meaningful keys, which I have also vented against on many occasions here. For Department names, people might try to make some cute, 3-letter abbreviations for each department. Of course, when this company merges with another one that is twice as big, that little abbreviation scheme goes out the window. And on and on. There are many benefits to having meaningless keys, including performance benefits, and there are many maintenance, concurrency, and other problems with using meaningful keys. I can elaborate if need be.

There are also many problems with multiple-field keys, which is another side-effect of using string keys. People use string keys because they have them and they mean something and they're easy. They then get propogated to other tables. When a cross-reference table is created, it gets the keys from the 2 original tables because they have meaning... When a child table is created, it gets the key from the parent and another field, because the person is thinking that way. When what they *should* be doing is to create a surrogate, single field, numeric, sequence-populated PK for each table. While this may not be the best solution in 100% of the cases, it will be a better solution than what is actually done 95% of the time!

This solution, in general, leads to tighter, faster keys, tighter indexes, better cache utilization, a smaller, faster database with less concurrency and maintenance headaches down the line (which are caused by meaningful and multi-field keys).

So I do agree with slimdave on one point - this is not something that you should waste cycles worrying about - follow this simple rule to a better database. There are always special-case exceptions, but the practice, IMHO, for each case is to assume that a surrogate key is better until proven otherwise.

I got into this because of some ridiculous pseudo-measurements that attempted to prove the benefits of numerics over varchars using bogus methodologies, but don't take my defence of varchars as being a dismissal of synthetic numeric keys.

if someone actually goes through the trouble of making some non-sensical series of letters and characters to make a unique key, then okay, maybe storing a varchar is more efficient. HOWEVER, this is very rarely the case and you all know it.

Why would anyone do this? No-one has suggested doing it - it would be a dumb idea.

efficiency of space is far more important than slimdave appears to realize.

Haven't even mentioned field size, but you seem to be assuming that varchar2's must be lengthy fields -- could be a varchar2(2) though couldn't it? Long varchars are surely a bad idea for key fields, so that's a good point.

What most novices will take away from this argument is that it doesn't matter if I use large strings as PKs - jmodic and slimdave said that was fine.

I doubt it -- jmodic and me never said any such thing, so i don't see how they could. Again, you're assuming that varchar2's are lengthy fields.

There are many benefits to having meaningless keys, including performance benefits, and there are many maintenance, concurrency, and other problems with using meaningful keys. I can elaborate if need be.

No-one's arguing about the benefits of synthetic keys, they are generaly the best choice, but there are cases where a synthetic key is neither required nor beneficial - how about US State abbreviations, or international currency abbreviations (USD,GBP etc)? Nothing wrong with using those as varchar2's.

People use string keys because they have them and they mean something and they're easy. They then get propogated to other tables. When a cross-reference table is created, it gets the keys from the 2 original tables because they have meaning... When a child table is created, it gets the key from the parent and another field, because the person is thinking that way. When what they *should* be doing is to create a surrogate, single field, numeric, sequence-populated PK for each table. While this may not be the best solution in 100% of the cases, it will be a better solution than what is actually done 95% of the time!

Sure, use a synthetic numeric key for such cases. It is the appropriate choice.

So I do agree with slimdave on one point - this is not something that you should waste cycles worrying about - follow this simple rule to a better database. There are always special-case exceptions, but the practice, IMHO, for each case is to assume that a surrogate key is better until proven otherwise.

Yes, synthetic keys are a reasonable choice in 99% of cases. Let's not just dismiss the other 1% of special cases.

Anyway, this lengthy discussion has tuned out pretty well, i tink, as we've covered a lot of diverse considerations.