Hello.
It would be very good to be able to save classes to disk in a safe
manner, so that (maybe only public?) fields can be saved and then read
in, even if a class has been sublassed or expanded (not too hard, with
current memory model), or even if the underlying machine is different
(hard). But even saving would probably become much harder if powerful
data reordering for arrays of classes is implemented.
For this i thing a special problem are Unions. A smart union type has to
be introduced(switch?), which would keep information on active field,
and thus provide debugging capabilities. BTW, a parsing library and many
other usages would draw profit of such a "switch", being shorter to
write and easier to maintain than a union.
Another useful thing is ML-style pattern matching which i have already
wished. I was thinking about possible implementation, but then i got
busy with other things. Yesterday i stumbled over a document describing
*exactly this* - a C++ extention for this feature. I have only looked
briefly at the document. Maybe their syntax is overbent, but it might be
worth a look anyway.
http://citeseer.nj.nec.com/leung96cbased.html
-i.

Hello.
It would be very good to be able to save classes to disk in a safe
manner, so that (maybe only public?) fields can be saved and then read
in, even if a class has been sublassed or expanded (not too hard, with
current memory model), or even if the underlying machine is different
(hard). But even saving would probably become much harder if powerful
data reordering for arrays of classes is implemented.
For this i thing a special problem are Unions. A smart union type has to
be introduced(switch?), which would keep information on active field,
and thus provide debugging capabilities. BTW, a parsing library and many
other usages would draw profit of such a "switch", being shorter to
write and easier to maintain than a union.

Some of the code gerators we use at work automatically create binary
load and save functions. In the early 90's we used them at QuickLogic,
but we ran into difficulties maintaining binary backwards compatibility
with our simple binary dumps. We also found that a simple memory image
of binary data structures typically takes up more space than a carefully
designed ASIC format (which takes up more than a carefully designed
binary format).
As a result, no one has used the binary load/save feature in a decade.
It sounds cool. I even wrote code in one of the generators to do it. It
just hasn't been as usefull as I thought it would be.
Instead of building functions like binary load/save into the language,
I'd recommend providing the hooks for users to do it with code
generators. Even if there's no direct generation capability in the
language, there are a few things that could make D work better than C++
does with code generators. In particular:
- Having a way to split up class definitions into multiple parts.
For example, an 'extend' keyword in front of a class could mean we're
adding to an existing class. This isn't inheritance. We'd be modifying
a class directly rather than creating a new one.
- Do the same thing for modules, functions, variables, and class methods.
It's kind of nice for code generator to be able to put a few fields
here, add a few statements there, and add a couple functions to an
existing module. For example, the auto-generated recursive destructors
we use were hell to write for C. Every kind of class relationship
supported had to be considered in big switch statements to generate all
the different parts of the function. Really ugly. When targetting a
language that supports these after-the-fact extensions, the complexity
of the code gerator was reduced tremendously. The same code that adds
fields to the parent and child classes also adds a few statements to the
recursive destructor. It's much nicer.
Extensions like these allow code generators like ClassWizard to simply
add files to your project, and not need to modify your hand written
files. No more parsing the whole language to do a simple generator. No
mor ugly /* !!! Do not edit this !!! */ machine generated crud in my files.
If you were to go the whole 9 yards, you might also allow a similar
feature: not just extensions... replacement! You could use something
like a replace keyword in front of your module or class or function or
method or variable.
With this syntax, you can add little edit files to your projects that
fix problems in a library you've been handed.
For example, if you run into a performance problem with a third party
library (like that never happens ;-)) and track it down to the use of a
singly linked list instead of doubly linked, you type a few lines of
code in a patch file, and problem solved! For the next ten years that
it takes your library vendor to get around to fixing the problem, you
have a work around that usually works with their new releases.
What do you think?
Bill Cox

Hello. Sorry it took me that long to become aware of this post. :)
Comments embedded.
-i.
Bill Cox wrote:

Hi, Ilya.
Ilya Minkov wrote:

Hello.
It would be very good to be able to save classes to disk in a safe
manner, so that (maybe only public?) fields can be saved and then read
in, even if a class has been sublassed or expanded (not too hard, with
current memory model), or even if the underlying machine is different
(hard). But even saving would probably become much harder if powerful
data reordering for arrays of classes is implemented.
For this i thing a special problem are Unions. A smart union type has
to be introduced(switch?), which would keep information on active
field, and thus provide debugging capabilities. BTW, a parsing library
and many other usages would draw profit of such a "switch", being
shorter to write and easier to maintain than a union.

Some of the code gerators we use at work automatically create binary
load and save functions. In the early 90's we used them at QuickLogic,
but we ran into difficulties maintaining binary backwards compatibility
with our simple binary dumps. We also found that a simple memory image
of binary data structures typically takes up more space than a carefully
designed ASIC format (which takes up more than a carefully designed
binary format).

Hm. You have mentioned dynamic properties a while ago. With them, you
probably wouldn't have such difficulties.
There also has to be some framework, which would allow extending the
format, even if the serialisation code is written manually. A basic
support for it would include that a basic class has a (stub) method for
converting it into the stream of data (.Serialize ?, analogous to
current ToHash and ToString). You would then implement this method in
the simplest case with statements like "serstream ~
thisproperty.Serialize". This would also imply that .Serialize is
implemented in the basic types. Analogous about reading.
Languages with dynamic only object methods seem to have this one problem
less. However, implicit serialisation sequence would also allow to
interpret some data, which cannot be represented in the object directly
due to changes.
As to the framework, XML is one example of it. I consideer it though
appropriate for such things, i would also prefer to have an equivalent
binary format (with conversion utilities back and forth), since it would
work faster and take up less space.
BTW, i could make such an XML-like framework... make a function like
ToXMLData, which would be overloaded for basic types. A user can
overload it for his own types. And for classes, it should take the
corresponding method of a class. It should be doable with interfaces.
Then a way to compose one XMLData of many and to save it all in binary,
or convert it into real XML.
And i have to consider the Pizza contest. Don't expect much though since
i'm not the major brain here and i'm only 20, i just started to study
CS. And since i *never* eat at Pizza Hut, but rather in Restaurant
Italy, Asado Steak, and some others. I still have over 100 restaurants
to explore. :)

As a result, no one has used the binary load/save feature in a decade.
It sounds cool. I even wrote code in one of the generators to do it. It
just hasn't been as usefull as I thought it would be.

For static languages binary dumps are much less useful that to dynamic ones.

Instead of building functions like binary load/save into the language,
I'd recommend providing the hooks for users to do it with code
generators. Even if there's no direct generation capability in the
language, there are a few things that could make D work better than C++
does with code generators. In particular:
- Having a way to split up class definitions into multiple parts.
For example, an 'extend' keyword in front of a class could mean we're
adding to an existing class. This isn't inheritance. We'd be modifying
a class directly rather than creating a new one.
- Do the same thing for modules, functions, variables, and class methods.
It's kind of nice for code generator to be able to put a few fields
here, add a few statements there, and add a couple functions to an
existing module. For example, the auto-generated recursive destructors
we use were hell to write for C. Every kind of class relationship
supported had to be considered in big switch statements to generate all
the different parts of the function. Really ugly. When targetting a
language that supports these after-the-fact extensions, the complexity
of the code gerator was reduced tremendously. The same code that adds
fields to the parent and child classes also adds a few statements to the
recursive destructor. It's much nicer.

These are all good ideas. Also consider, that one could possibly have
very few classes in the application, but very many methods to add to
them. Then it would make sense to split up the class across multiple
files for easy navigation and editing. This means however, that all
these units have to be compiled simultaneously. Dependencies can be
awful to track.

Extensions like these allow code generators like ClassWizard to simply
add files to your project, and not need to modify your hand written
files. No more parsing the whole language to do a simple generator. No
mor ugly /* !!! Do not edit this !!! */ machine generated crud in my files.
If you were to go the whole 9 yards, you might also allow a similar
feature: not just extensions... replacement! You could use something
like a replace keyword in front of your module or class or function or
method or variable.

Ouch.

With this syntax, you can add little edit files to your projects that
fix problems in a library you've been handed.
For example, if you run into a performance problem with a third party
library (like that never happens ;-)) and track it down to the use of a
singly linked list instead of doubly linked, you type a few lines of
code in a patch file, and problem solved! For the next ten years that
it takes your library vendor to get around to fixing the problem, you
have a work around that usually works with their new releases.

And i have to consider the Pizza contest. Don't expect much though since
i'm not the major brain here and i'm only 20, i just started to study
CS. And since i *never* eat at Pizza Hut, but rather in Restaurant
Italy, Asado Steak, and some others. I still have over 100 restaurants
to explore. :)

You've got a lot of knowledge about computer languages for being only
20. Pretty impressive. I'm 39, just old enough to have actually had a
job programming in Fortran on a PDP-11/45.
-- Bill

of binary data structures typically takes up more space than a carefully
designed ASIC format (which takes up more than a carefully designed
binary format).

Hm. You have mentioned dynamic properties a while ago. With them,

probably wouldn't have such difficulties.

I thought a simple example might illustrate the trouble I had with binary
save formats. Suppose we're saving a directed graph to disk. It's classes
look like:
class Node {
LinkedList<Edge> inEdges, outEdges;
bool visited, marked;
char *name;
}
class Edge {
Node fromNode, toNode;
}
Now, let's assume I have a graph that in a text file would be represented
as:
A B C
B C E
C A D
D A B C
E B D E
The first colum is node names, and the remaining symbols are
destinations of edges. This takes 34 bytes.
If we stream binary to the disk, I assume all Edges and Nodes wind up
there. Assume the LinkedList class has a head pointer a name, and two
Booleans that I could pack into 1 byte. Each Node would take 7 bytes.
Each Edge has two Node pointers and two next pointers. They would take
16 bytes.
On disk, the simple binary dump takes 5*7 + 12*16 = 227 bytes. That's a
whole lot worse than 34 bytes.
As for compatibility, suppose we later on convert our LinkedList
relationships to DoublyLinkedList. First, the binary size gets worse, while
the text file doesn't. Second, we now have to write converters to be able to
load the old binary files. We could gain some backward compatibility by
using an even larger binary format that tags all the fields, but what's the
point? Are we trying to be efficient, or just trying to avoid writing a parser?
File size isn't important for most apps. Look at how large MS Word files
are. No one cares. I work with design files representing .13u chips. A
small file for us migh be 100 meg. Not only does the text version reduce
the size, but our users demand text so they can hack our data structurs
with Perl scripts.
Bill

File size isn't important for most apps. Look at how large MS Word files
are. No one cares. I work with design files representing .13u chips. A
small file for us migh be 100 meg. Not only does the text version reduce
the size, but our users demand text so they can hack our data structurs
with Perl scripts.

You hit on a big advantage with text files - they can be checked visually
for correctness, and can be editted with ordinary text editors. Binary files
require a custom dumper/editor to be written.
One reason I don't use .doc files is because I need a specific version of
the word processor installed to read them. 20 years from now, who will have
that? (Yes, I have 20 year old files I still use.) With ascii text format,
I'm covered.

It would be very good to be able to save classes to disk in a safe
manner, so that (maybe only public?) fields can be saved and then read
in, even if a class has been sublassed or expanded (not too hard, with
current memory model), or even if the underlying machine is different
(hard). But even saving would probably become much harder if powerful
data reordering for arrays of classes is implemented.

This is in DLI under the pickle.d module. It transfers a class field
image, so new and reordered fields don't matter, and handles single
transferrence of pointers, references, and arrays. The only
non-portable part is a dependency on IEEE.

For this i thing a special problem are Unions. A smart union type has to
be introduced(switch?), which would keep information on active field,
and thus provide debugging capabilities. BTW, a parsing library and many
other usages would draw profit of such a "switch", being shorter to
write and easier to maintain than a union.

Unions don't get serialisation. If you want to save a union, save the
active state.

It would be very good to be able to save classes to disk in a safe
manner, so that (maybe only public?) fields can be saved and then read
in, even if a class has been sublassed or expanded (not too hard, with
current memory model), or even if the underlying machine is different
(hard). But even saving would probably become much harder if powerful
data reordering for arrays of classes is implemented.

This is in DLI under the pickle.d module. It transfers a class field
image, so new and reordered fields don't matter, and handles single
transferrence of pointers, references, and arrays. The only
non-portable part is a dependency on IEEE.

Cool. Thanks.
So it handles endianness.

Unions don't get serialisation. If you want to save a union, save the
active state.

OK...
But do you doubt usefulness of a switching union?
Thanks a lot.
-i.