Actual table size decrease

09-08-2010, 04:41 AM

Hey,

I'm looking at a table here that is kinda big.. but it doesnt need to be.. I'm wondering if I should suggest something to change how its stored, inserts would have an added layer of complexity, but the queries should be faster

Does anyone have a nicer way to optimize a table with different data types for data values? I know what datatype the values are, it just seems a huge waste just dumping them all in the same table and good luck trying to do a query on date ranges as it is!

Comment

It sounds a bit odd that you have anything from a string with one character '1' to 64000 characters of XML in the same column. Could it be that all these parameters are mostly present so that you should instead have 4 different columns in this table? one int, one date, etc that contains the data. That way you can also have different indexes on them like for example the large VARCHAR() you might want a fulltext while on the other columns you want normal indexes.

Doing your 4 table join strategy is something I would definately advice against (based on your description it sounds just like you are going to have a lot of work and no direct benefit).

Quote:

Data fills this table from xml, the parameter value could be as small as an int '1' or as big as a large string containing log data

Technically it wouldn't be a "int" since it would be the string '1' so as long as you still query with "WHERE yourVarCharColumn = '1' " there is no implicit conversion.

But your main concern seems to be that you thinks this table will be excessively large, and it sounds like you think a VARCHAR(64000) will be very large even though it only contains a small string.
That is not so, the size of a VARCHAR(64000) will be stringLength + 2 bytes.
So the string '1' will only occupy _3_ bytes of storage.

As for indexes (if you have used a normal index) you can use the yourVarCharColumn(x) syntax in your index creation where you say that only x bytes from the beginning of the string should be part of the index.
This is since you want to reduce the size of the index since usually the first 20-30 bytes can be enough to locate the matchings records, then the rest of the data is just payload which is better fetched from the table. This way the index itself can be kept small and fast.
But this is not true for fulltext indexes which is a totally other beast.

Comment

It sounds a bit odd that you have anything from a string with one character '1' to 64000 characters of XML in the same column. Could it be that all these parameters are mostly present so that you should instead have 4 different columns in this table? one int, one date, etc that contains the data. That way you can also have different indexes on them like for example the large VARCHAR() you might want a fulltext while on the other columns you want normal indexes.

This is a device/parameter_value table and a device could have 0 - 5000 different parameters, so 1 device might have 2000 parameters where I have to store their values, another device might have 3000 parameters.

If a completly new parameter is discovered, I've to deal with it and store it and the parameter name and value.

A device might tell my it has a parameter with value "mylogin"
then an instant later tell me it's got a parameter with value "this is a really long log...."

I've no control over the parameters on the devices, I think 64000 was an arbitrary value when this table was created or just the largest it could handle at the time without moving to blobs/text.

sterin wrote on Wed, 08 September 2010 14:59

Doing your 4 table join strategy is something I would definately advice against (based on your description it sounds just like you are going to have a lot of work and no direct benefit).

True.. join would create problems in the app side of things, but the union might be ok? It'd return 1 value and the data type would be known..it should be extremely quick since its all on a small indexed field, but its looking like I shouldnt need to split it anyhow

sterin wrote on Wed, 08 September 2010 14:59

Quote:

Data fills this table from xml, the parameter value could be as small as an int '1' or as big as a large string containing log data

Technically it wouldn't be a "int" since it would be the string '1' so as long as you still query with "WHERE yourVarCharColumn = '1' " there is no implicit conversion.

Ah true again but dates would be an issue to search on or querying values below a certain value. Basically all searches are restricted to strings or having to do a conversion/function call on the data?

But your main concern seems to be that you thinks this table will be excessively large, and it sounds like you think a VARCHAR(64000) will be very large even though it only contains a small string.
That is not so, the size of a VARCHAR(64000) will be stringLength + 2 bytes.
So the string '1' will only occupy _3_ bytes of storage.

This is only InnoDB? Currently all the tables in this lovely mess are MyIASM, doesnt this work a bit differently?

As it is, the index was on the first 333 characters of the parameter value .. but could easily drop to 20

The sizing issue with huge varchars is all in my head I think now I'd just been looking at hints and tips and I might have been assuming too much, like dont use large values for PRIMARY indexes.. I can manage the size of the indexes as you said above.

It's also in my head that there has to be a nicer way to organise the data, its too generic as it is.. but having varchar(64000) shouldn't be an issue of any sort (apart from not being able to do datatype specific searches on the data) .. but the size was my concern for now, I was thinking we were just wasting space and it'd have an effect on searches, insert speeds etc aswell as disk space