Nine reasons not to use serialization

Although .NET provides a number of quick and easy ways to serialize and deserialize data, do not use them. This article explains why.

Introduction

If you want to know how to get your application to save information to disk or the registry, then a quick skim through MSDN magazine or a quick search on newsgroups will give you the answer: serialization.

Mark your classes with the [Serializable] attribute and there you go. It’s a simple matter of creating a Formatter and a Stream and a couple of lines later it’s done. Alternatively, you could mark up your class with the necessary attributes and use XML Serialization.

All very simple, but unfortunately all very wrong. There are a number of reasons why you should not opt for the simple approach. Here are nine important ones.

1. It forces you to design your classes a certain way

XML serialization only works on public methods and fields, and on classes with public constructors. That means your classes need to be accessible to the outside world. You cannot have private or internal classes, or serialize private data. In addition, it forces restrictions on how you implement collections.

2. It is not future-proof for small changes

If you mark your classes as [Serializable], then all the private data not marked as [NonSerialized] will get dumped. You have no control over the format of this data. If you change the name of a private variable, then your code will break.

You can get around this by implementing the ISerializable interface. This gives you much better control of how data is serialized and deserialized. Unfortunately …

3. It is not future-proof for large changes

Type information is stored as part of the serialization information. If you change your class names or strong-name your assemblies, you’re going to hit all sorts of problems. Even if you manage to code the necessary contortions to get round this, you’re going to find that …

4. It is not future-proof for massive changes

.NET isn’t going to be around in five years or so. If you start implementing the ISerializable interface in your code now, then its tendrils are going to be everywhere in five years’ time. Your code is going to be full of little hacks to cope with version changes, class re-naming, refactoring, etc. Some time in the future, .NET will be superseded by something even more wonderful. Nobody knows what this something wonderful will be, but you can bet that writing code-read data serialized by version 1.1 of the .NET framework is going to be a pig. I wrote some VB6 code 5 years ago and used the Class_ReadProperties and Class_WriteProperties events to access PropertyBag objects. A neat, easy way of storing information to disk, I thought. And it was, until .NET came along and then I was stuck.

5. It is not secure

Using XML serialization is inherently insecure. Your classes need to be public, and they need to have public properties or fields. In addition, XML serialization works by creating temporary files. If you think you’re creating temporary representations of your data (for example, to create a string that you’re going to post to a web service), then files on disk will pose a potential security risk. If, instead, you implement the ISerializable interface and are persisting sensitive internal data, then, even if you’re not exposing private data through your classes, anyone can serialize your data to any file and read it that way, since GetObjectData is a public method.

6. It is inefficient

XML is verbose. And, if you are using the ISerializable interface, type information gets stored along with data. This makes serialization very expensive in terms of disk space.

7. It is a black box

The odds are you don’t really know how serialization works. I certainly don’t. This means that there are going to be all sorts of quirks and gotchas that you can’t even conceive of when you start using it. Did you know that XML serialization actually uses the CodeDom? When you think you’re creating a bunch of XML, .NET is actually doing some sort of compilation. What are the implications of that? The only thing I know is that I will not know about them until it’s too late.

8. It is slow

When I did some research for a previous article (http://www.devx.com/dotnet/Article/16099/0), I noticed a few interesting things. I wrote a class that contained two double values. I created 100,000 instances of this class, stored them to disk, and then read them back again. I did this two ways. First of all, I did it the “proper” way, by implementing ISerializable, creating a BinaryFormatter, and using the Serialize and Deserialize methods. Secondly, I did it the “dirty” way, by blasting the data straight out into a Stream. Which way was faster? Perhaps not surprisingly, the dirty way. About 50 times faster. Surprised? I was.

9. It is weird

ISerializable does a lot of cunning work. This means that it doesn’t necessarily behave the way you might expect. When you deserialize a collection of objects, for example, the constructors won’t get called in the order that you might think. Take the following code sample:

This code essentially serializes and de-serializes a parent object that contains a collection of child objects. You cannot, however, access the child objects from within the deserialization constructor of the parent object. The m_Collection object has been created, a value has been assigned to it, and info.GetValue(“Collection”, typeof(ArrayList)) has been called, but the m_Collection object does not contain any child objects. This is necessary given the way that serialization works, but it is not obvious behaviour. This, and other things, means that using serialization can be non-intuitive, and very hard to debug.

Have no regrets

Although .NET provides a number of quick and easy ways to serialize and deserialize data, do not use them. A week, a month, a year, or five years down the line you will regret it.

License

This article has no explicit license attached to it but may contain usage terms in the article text or the download files themselves. If in doubt please contact the author via the discussion board below.

Comments and Discussions

It is now 5 years on since this article was written, and where exactly is .NET?
It's here and in your face in many new applications, it's integreated into your operating system and it's here to stay.
Get used to it. And get used to serializing.

Obviously you've got nothing useful (on topic of either the article or my post) to contribute.
Your post is not helpful or accurate either. I've been a member for less than 2 years. I'll only post if I feel that my knowledge on the article is of value, not some pointless crap about some other member's status.

You mean, like poking a guy because his prediction didn't turn out right? Pointless like that?

What exactly were you "contributing" to the conversation with your comment?

And for the record, you took my comment wrong. I was criticizing your motive for commenting in the first place, not your experience or lack thereof on CodeProject. I'm trying to say to you the same thing you seem to be accusing me of: "Let's try to be constructive." Gloating about something silly like that is not constructive. Waiting around until his post hit the five year mark, just so you can say "See! You're wrong!" is sad. And I stand by that.

This is an opinion piece and offers no real substance. The author clearly does not know what he's talking about and doesn't bother to offer an alternative to serialization. Perhaps this article was written as a joke? Either way, it should be removed.

totally agree with that. I've been working with .NET XML serialization for over a year with a lot of classes that needed to be serialized. The good part is that XML serialization is way easy and fast. The bad part is I've never seen a single class that passed the XML serialization process without having to make a lot of changes that make any OO programmer sick. For instance:
1) I don’t want my class to have a parameterless constructor. It does not make sense to have one, so don’t force me to.
2) I must avoid the ICollection interface, otherwise the serializer will ignore your properties.
3) All properties have to be get/set, when most of the time, it is not necessary, or doesn’t make sense at all.
4) Timespan cannot be serialized (at least up to 2.0), so you again modify your class in order to serialize this information properly. (Ok not very important)

You should try SerializableAttribute and use BinaryFormatter or Newtonsoft.Json with IgnoreSerializableAttribute disabled.
That way, no constructor will be called. All fields are serialized, public and private. No properties are serialized (properties are often just wrappers for private fields). Lots of framework classes (including nearly all collections, dictionaries)are actually properly marked with SerializableAttribute.
Overall this way of serialization keep the design impact minimum and I've been using it for years.

First of all, as said by many others, serialization is not only used for persistence. If you need to send data across a wire (i.e. network), you need to make your object a machine-independent memory-independent series of bytes to transfer. What other mechanism do you suggest for doing this?
The fact that you cannot guarantee (with only one design iteration) that your object will work in the future is a problem that goes back to the ancient “C++ fragile base class” problem. If the layout of your object changes , code has to be written to deal with this. I don’t see how you can avoid this problem no matter what method you use.

After using .Net binary serialization for almost two years, I agree it's very bad choice for all the reasons mentioned. I have a horrible performance bottleneck in a system serving few hundreds of TCP/IP connections. After intensive investigation it's very apparant that BinaryFormatter is the reason behind it.

You clearly have very little clue what you're talking about. Since BinaryFormatter, as the name suggests, serializes and deserializes in binary format, the overhead associated with it is absolutely minimal. I somewhat doubt the quality of your profiling, but even if your code spends much of it time in serializing or deserializing it does not follow that any other way of serializing would perform better. In most if not all cases the performance of serializing and deserializing objects is limited by the streams read/write performance, not the actual (de)serialization, since this is already in binary format and no conversion at all takes place.

In terms of native object serialization, you are asking for trouble and it should be avoided for anything other than a cheesy network send hack. However, XML as a standard is not going anywhere, and things will be stored / serialized to and from it for years to come, reagardless of whether .Net exists.

The structure of the Xml should be designed with extensibility in mind, and any future proofing should be up to the actual developer. As for having to 'design' it into your application. Of course you do; any saving mechanism requires design to implement, whether binary, xml, database or JSON. However the mechanical components of your application should never be serialized directly, this is extremly bad practice.

If you create an XmlDocument that writes Nodes to it (Which incidently is what the XmlSerializer does), you are still locking yourself into the same format as using the XmlSerializer, but you can't change it as easily.

At the end of the day, serialization is just a storage medium, for either short or long term. If you want to secure it, encrypt the stream as you write it to a file.

The problem with XML is that it is very slow to format and parse. Very slow. If you sent a CSV-delimited text file you'd be much better off. IF you remove the commas and replaced them with character-count prefixes to each string then you'd be even more better off.

You can quite happily write to a binary stream instead and transmit that (remember to convert to network byte order, like every tcp/ip stream should be) and you'll be platform independant with only a little bit of work... but your stream will be a fraction of the size of your XML document, and will be significantly faster to serialise. If you think that today's computing resources are so fast that the overhead of using XML is acceptable, then you havn't been sendng it over a wireless or mobile connection.

XML is 'cool', but its a pretty poor solution to almost every task its used for.

My point is that serialization is essential as a tool for persistance of hierachical data, regardless of the medium used.

The type of serialization is implementation dependant, but serializing to XML is perfectly acceptable for long term storage on a file system. Similarly, it would not be appropriate for short term persistance over a low bandwidth connection.

gbjbaanb wrote:

The problem with XML is that it is very slow to format and parse. Very slow. If you sent a CSV-delimited text file you'd be much better off. IF you remove the commas and replaced them with character-count prefixes to each string then you'd be even more better off.

I completely agree if your emphasis is on Speed and Minimizing size. But premature optimisation wastes development time.

Sure, XML is a good document format. Its good at persisting documents.

Its not good for a lot of other things that it is used for, probably becuase everyone and his dog thinks MXL = magic bullet that solves all problems.

The internet was a form of transmitting documents and HTML worked well for that, but once you start sending data, especially dynamic data, it starts to fall apart. You don;t see nayone converting their data streams to html and sending fully-parsed documents to the browser do you? No, the internet would become even slower!

For a good example, look at the differences between XML and JSON or YAML. One requires you to read the entire document into memory, is bloated with tags and is inefficient to read or write. JSON is a lot more stream-friendly and a lot smaller.

XML was a good first step in document formats, and had the handy ability to be useable for object serialisation. People have seen its disadvantages and created better formats since then.

Any managed Framework assembly can be disassembled with a tool like Reflector (http://www.aisto.com/roeder/dotnet/[^]) into your favorite language. Personally I think the .NET Framework is an open book. If you want to know whats going on under the hood just disassemble it and look for yourself!

See "custom serialization" in msdn library. (I know this didn't exist when the article was written).

The best practice and easiest way (introduced in version 2.0 of the .Net Framework) is to apply the following attributes to methods that are used to correct data during and after serialization:

OnDeserializedAttribute

OnDeserializingAttribute

OnSerializedAttribute

OnSerializingAttribute

Using attributes should help to decouple your oo design from the lock-ins of the platform, ie. you mention in 5 years time you don't want your code to be littered with references to dotnet serialization interfaces... attributes is a level higher.
ie. you could tag your save method:

[OnSerializedAttribute]
public bool Save() { }

That is relevant to every platform, obviously just follow the pattern of the new platform to invoke that save method.

The fact that you make the comment 'dotnet won't be around in 5 years' really takes away any credability that this article has. The article was written almost 3 years ago (?), so you've almost been proved wrong, if you haven't already conceeded that... .Net 3.0 has recently been released and the platform/framework is very rich in features.
You can imagine as dotnet goes forward more emphasis will be put on performance, any tuning to the framework means we get a performance boost - for nothing!

I'm a little scared to see you're in the business of promoting MS products if this is your attitude.

Of course you need to write classes a certain way to enable them to be serialisable. Anyone who has experience in object oriented programming (NO VB6 and below does not count and in some cases neither does .NET) will know what serialisability is for. Learn OO then you will see serialising is a good thing

2. It is not future-proof for small changes and 3 and 4

Serialising allows you to transfer data objects between applications making the data portable from one program to another. IE, use a class library for your object model. When the need arises to use the data from a previous applications, you utilise this library of classes to read the data.

When a class becomes redundent and you deprecate its use or a new class is needed, DONT write a whole new class from scratch, use OO programming techniques and Extend the class you already have. T

his means that new fields, properties and methods can be implemented and older data can be read into the new class.

5. It is not secure

There are various techniques to obfuscate the data. Dont be so naieve.

6. It is inefficient

Yes serialisation can be inefficient BUT this TOO can be streamlined.

7. It is a black box

Serialisation is an open book. Its take the properties and store them to disk or a stream, a database etc., extremely black and white.

8. It is slow

Once again this can be streamlined.

9. It is weird

Yes contructors dont get called because your NOT creating an INSTANCE of the object. The object is being loaded into another instance. Simple.