One thing that occurred to me the other day, are specific types still necessary or a legacy that is holding us back. What I mean is: do we really need short, int, long, bigint etc etc.

I understand the reasoning, variables/objects are kept in memory, memory needs to be allocated and therefore we need to know how big a variable can be. But really, shouldn't a modern programming language be able to handle "adaptive types", ie, if something is only ever allocated in the shortint range it uses fewer bytes, and if something is suddenly allocated a very big number the memory is allocated accordinly for that particular instance.

Float, real and double's are a bit trickier since the type depends on what precision you need. Strings should however be able to take upp less memory in many instances (in .Net) where mostly ascii is used buth strings always take up double the memory because of unicode encoding.

One argument for specific types might be that it's part of the specification, ie for example a variable should not be able to be bigger than a certain value so we set it to shortint. But why not have type constraints instead? It would be much more flexible and powerful to be able to set permissible ranges and values on variables (and properties).

I realize the immense problem in revamping the type architecture since it's so tightly integrated with underlying hardware and things like serialization might become tricky indeed. But from a programming perspective it should be great no?

PHP, Ruby, Perl and others do not require you to state the types of variables. The environment figures it out for you.
–
FrustratedWithFormsDesignerFeb 2 '11 at 20:15

6

Unicode strings don't have to take up additional memory when they happen to be used for ASCII only (UTF-8).
–
delnanFeb 2 '11 at 20:20

1

But there's a difference between variant and adaptive types IMO. Variants are not typed at all but gets typed when assigned, while adaptive types would be typed, but more loosely. (and I like the concept of type constraints)
–
konradFeb 2 '11 at 20:25

10 Answers
10

I totally believe this to be the case. Semantic constraints are worth more than constraints of implementation. Worrying about the size of something feels like worrying about the speed of something when object-oriented programming was coming about.

+1 When it comes to data storage/transmission (ex. networking) the constraints are fundamental to maximizing the efficiency of the protocol/implementation. Also, there's a lot of ground to be gained if typed collections are available. Other than that, it's safe to assume that efficiency can take a backseat (especially if it decreases the possibility of semantic errors).
–
Evan PlaiceJan 17 '12 at 5:42

Adaptative types means logic to do the adaptation, means work at runtime to run that logic (templating and compile-time would require a specific type, type inference being a special case where you get the best of the two worlds). That extra work could be ok in environments where performances is not critical, and system keeps a reasonable size. In other environments it is not (embedded systems is one, where you sometime have to use 32/64bits integer types for cpu performance, and 8/16bits integer types for static memory backup optimization).

@delnan : replaced auto-typing with late binding wich is what I meant :)
–
MatthieuFeb 2 '11 at 20:30

There's lots of general-purpose languages that resolve types at runtime, Common Lisp to name just one. (For performance purposes, you can declare types in Common Lisp, so you can do so in performance-critical sections only.)
–
David ThornleyFeb 2 '11 at 20:56

@David Thornley : "enforcing" strong typing may have been too strong, "promoting" would be more appropriate, updated my answer accordingly. A language that let you choose between the two kind of binding depending on the situation is certainly better that being forced in one way or the other. Especially when not doing low level programming, and focusing on logic.
–
MatthieuFeb 2 '11 at 21:04

Simplicity, Memory, and Speed
When I declare a variable, the memory for that variable is allocated in one block. In order to support a dynamically growing variable, I would have to add the concept of non-contiguous memory to that variable (either that or reserve the largest block that the variable can represent). Non-contiguous memory would reduce performance on assignment/retrieval. Allocating the largest possible would be wasteful in the scenario where I only need a byte but the system reserves a long.

Think of the tradeoffs between an array and a vector (or linked list). With an array, seeking a specific position is a simple matter of getting the start position and shifting the memory pointer x spaces to locate that new position in memory. Think of an int as a bit[32] reading an int involves walking through that array to get all the bit values.

To make a dynamic number type, you have to change that from an array of bits to a vector of bits. Reading your dynamic number involves going to the head, getting that bit, asking where the next bit is in memory, moving to that location, getting that bit, etc. For each bit in the dynamic number, you're doing three operations read (current), read (address of next), move (next). Imagine reading the values of a million numbers. That's a million extra operations. It might seem insignificant. But think about the systems (like financials) where every millisecond matters.

The decision was made that putting the onus on the developer to check the size and validate is a small trade off compared to affecting performance of the system.

The other alternative is to implement numbers similar to arraylists where the array is re-allocated when the number outgrows the current size. Also you have to account for the case where the user WANTS the overflow to loop.
–
Mike BrownFeb 2 '11 at 20:43

That's true, but somewhat of a simplification. You could come up with a more efficient array structure, while not as fast as statically typed could be "fast enough" for most cases. for instance you could save information on blocks of different types, if the array wasn't completely jagged that wouldn't take up that much more memory or performance. Or the array could sacrifice some memory to have an index of some sorts. The array could even self-optimize itself based on it's content. You still could have the option of typing the memorysize through a type constraint if you needed performance.
–
konradFeb 2 '11 at 20:46

To be fair, it's not as brutal as you make out. C.f. my upcoming answer.
–
Paul NathanFeb 2 '11 at 21:20

Specific types are required for hardware-centric languages and projects. One example is on-the-wire network protocols.

But let's create - for fun - a varint type in a language like C++. Build it from a new'd array of ints.

It's not hard to implement addition: just xor the bytes together and check the high bits: if there's a carry opereration, new in a new upper byte and carry the bit over. Subtraction follows trivially in 2's complement representation. (This is also known as a ripple carry adder).

Deterministic time. You have a syscall(new) that may trigger at points that are not necessarily controllable.

Deterministic space.

Semi-software math is slow.

If you need to be using a hardware-layer language and also need to be operating at a high (slow) level and don't want to embed a scripting engine, a varint makes a lot of sense. It's probably written somewhere.

[*] C.f. hardware math algorithms for faster ways of doing it - usually the trick is parallel operations though.

This is a good question. It explains why a language such as Python does not need "short, int, long, bigint etc.": integers are, well, integers (there is a single integer type in Python 3), and have no limit size (beyond that of the memory of the computer, of course).

As for Unicode, the UTF-8 encoding (which is part of Unicode) only uses a single character for ASCII characters, so it's not that bad.

More generally, dynamic languages seem to go in the direction that you mention. However, for efficiency reasons, more constrained types are useful in some instances (like programs that must run fast). I don't see much change in the foreseeable future, as processors organize data in bytes (or 2, 4, 8, etc. bytes).

On a language-theory basis you're right. Types should be based on a set of legal states, the transformations available to those states, and the operations performable on those states.

This is roughly what OOP programming in its typical form gives you, however. In fact, in Java, you are effectively talking about the BigInteger and BigDecimal classes, which allocate space based on how much is required to store the object. (As FrustratedWithFormsDesigner noted, many scripting-type languages are even further along this path and don't even require a declaration of type and will store whatever you give them.)

Performance is still relevant, however, and since it's costly to switch types at runtime and since compilers can't guarantee the maximum size of a variable at compile-time, we still have statically-sized variables for simple types in many languages.

I realize that some sort of dynamic/adaptive typing seems costly and less performant than what we have now, and using current compilers they certainly would be. But are we 100% sure that if you build a language and compiler from the ground up you couldn't make them, if not as fast as statically typed, atleast feasibly fast to be worth it.
–
konradFeb 2 '11 at 20:37

Yes, you can make it feasibly fast (but probably never as fast as a static system for numbers). But the "is it worth it" part is trickier. Most people work with data whose range fits comfortably in an int or a double, and if it doesn't, they're aware of it, so dynamic value sizing is a feature that they don't need to pay for.
–
jpreteFeb 2 '11 at 20:42

As all programmers of course I dream of some day making my own language ;)
–
konradFeb 2 '11 at 20:47

@jprete: I disagree; most people are unaware of possible large intermediate results. Such a language can and has been made fast enough for most purposes.
–
David ThornleyFeb 2 '11 at 21:00

It depends on the language. For higher level languages like Python, Ruby, Erlang, and such you only have the concept of integral and decimal numbers.

However, for a certain class of languages having these types are very important. When you are writing code to read and write binary formats like PNG, JPeg, etc. you need to know precisely how much information is being read in at a time. Same with writing operating system kernels and device drivers. Not everyone does this, and in the higher level languages they use C libraries to do the detailed heavy lifting.

In short, there is still a place for for the more specific types, but many development problems don't require that precision.

I understand the reasoning, variables/objects are kept in memory, memory needs to be allocated and therefore we need to know how big a variable can be. But really, shouldn't a modern programming language be able to handle "adaptive types", ie, if something is only ever allocated in the shortint range it uses fewer bytes, and if something is suddenly allocated a very big number the memory is allocated accordinly for that particular instance.

Float, real and double's are a bit trickier since the type depends on what precision you need. Strings should however be able to take upp less memory in many instances (in .Net) where mostly ascii is used buth strings always take up double the memory because of unicode encoding.

Fortran has had something similar (I don't know whether this is what you mean exactly, since I'm seeing two questions up really). For example, in F90 upwards you don't need to explicitly define a type size, so to say. Which is good, not only since it gives you a central place to define your data types, but also a portable way of defining them. REAL*4 is not the same in all implementations on all processors (and by processor I mean CPU+compiler), not by a longshot.

selected_real_kind(p,r) return the kind value of a real data type with decimal precision greater of at least p digits and exponent range greater at least r.

Programming languages have been moving in that direction. Take strings for example. In old languages you have to declare the size of the string, like PIC X(42) in COBOL, DIM A$(42) in some versions of BASIC, or [VAR]CHAR(42) in SQL. In modern languages you just have one dynamically-allocated string type and don't need to think about the size.

Integers are different, however:

What I mean is: do we really need
short, int, long, bigint etc etc.

Take a look at Python. It used to distinguish between the machine-sized (int) and arbitrary-sized (long) integers. In 3.x the former is gone (the old long is the new int) and nobody misses it.

But there's still a specialized type for sequences of 8-bit integers in the form of bytes and bytearray. Why not use a tuple or list of integers, respectively? True, bytes does have extra string-like methods that tuple doesn't, but surely efficiency had a lot to do with it.

Float, real and double's are a bit
trickier since the type depends on
what precision you need.

Not really. The "everything is double-precision" approach is very common.

Maybe base types should declare basic intent of the type, ie int for "ordinary" numbers, double for all normal "decimals" (shouldn't ints be able to have decimals though for simplicity?) "money" for working with amounts and byte for working with binary data. A type constraint declared through an attribute could allow for declaring allowed range, decimal precision, nullability and even allowed values. It would be cool if you could create custom and reusable types that way
–
konradFeb 3 '11 at 8:20

@konrad: IMHO, the reason "unsigned" integers cause such headaches in C is that they're sometimes used to represent numbers and sometimes used to represent members of a wrapping abstract algebraic ring. Having separate "ring" and "unsigned number" types could ensure that code like unum64 += ring32a-ring32b will always yield the correct behavior, regardless of whether the default integer type is 16 bits or 64 [note that the use of += is essential; an expression like unum64a = unum64b + (ring32a-ring32b); should be rejected as ambiguous.]
–
supercatFeb 14 '14 at 22:31