Pages

Saturday, November 2, 2013

Serialization in Java

Imagine you want to save the state of one or more objects. What will you do is use one of the I/O classes to write out the state of the instance variables of all the objects you want to save. The worst part is trying to reconstruct the new object that identical to the object that you have saved.In this case, yon need some protocol for the way you wrote and restored the state of each objects or you may end up setting wrong values to their respective variables.For instance, while saving object that has instance variables for height and weight, you write the height and weight as 2 ints in a file, but the order in which you wrote is significant.It would be all too easy to re-create the object but mix up the height and weight values—using the saved height as the value for the new object's weight and vice versa.Serialization lets you simply say "save this object and all of its instance variables." unless you marked a variable as transient, which means don't save the variable value.

The magic of basic serialization happens with just two methods: one to serialize objects and write them to a stream, and a second to read the stream and deserialize objectsObjectOutputStream.writeObject() // serialize and writeObjectInputStream.readObject() // read and deserializeThe writeObject method is responsible for writing the state of the object for its particular class so that the corresponding readObject method can restore it. The method does not need to concern itself with the state belonging to the object's superclasses or subclasses. State is saved by writing the individual fields to the ObjectOutputStream using the writeObject method or by using the methods for primitive data types supported by DataOutput.The readObject method is responsible for reading and restoring the state of the object for its particular class using data written to the stream by the corresponding writeObject method. The method does not need to concern itself with the state belonging to its superclasses or subclasses. State is restored by reading data from the ObjectInputStream for the individual fields and making assignments to the appropriate fields of the object. Reading primitive data types is supported by DataInput.Let's have a basic example before going further

After running this program you can see the file "javalatte.ser" in your current directly where you are running this program.This way we can serialize the object.

This is how we can de-serialize the object and you can see the value of variable 'a' by running this program.Let go through the above program:

We declare that the Car class implements the Serializable interface.Serializable is a marker interface; it has no methods to implement.

We make a new Car object, which as we know is serializable.

We make a new Car object, which as we know is serializable.

Next we had to create a FileOutputStream to write the object to. Then we wrapped the FileOutputStream in an ObjectOutputStream, which is the class that has the magic serialization method that we need.

We de-serialize the Car object by invoking the readObject() method. The readObject() method returns an Object, so we have to cast the deserialized object back to a Car.

Point to remember :Serialization does not write out the fields of any object that does not implement the java.io.Serializable interface. Subclasses of Objects that are not serializable can be serializable. In this case the non-serializable class must have a no-arg constructor to allow its fields to be initialized. In this case it is the responsibility of the subclass to save and restore the state of the non-serializable class. It is frequently the case that the fields of that class are accessible (public, package, or protected) or that there are get and set methods that can be used to restore the state.

What is Saved?

The class of the object.

The class signature of the object.

All instance variables not declared transient.

Objects referred to by non-transient instance variables.

If a duplicate object occurs when traversing the graph of references, only one copy is saved, but references are coded so that the duplicate links can be restored.

Uses of Serialization

Means to make objects persistent.

Means to communicate objects over a network.

Means to make a copy of an object

Main Serialization Methods

Saving
Objects

public
ObjectOutputStream(OutputStream
out) throws IOException

SecurityException

public
final void writeObject(Object
obj) throws IOException

InvalidClassException,NotSerializableException

public void flush() throws IOException

Writes any buffered output bytes and flushes through
to the
underlying stream.

An ObjectOutputStream writes primitive data types and graphs of Java objects to an OutputStream. The objects can be read (reconstituted) using an ObjectInputStream. Persistent storage of objects can be accomplished by using a file for the stream. If the stream is a network socket stream, the objects can be reconstituted on another host or in another process

Only objects that support the java.io.Serializable interface can be written to streams.

Only objects that support the java.io.Serializable interface can be written to streams.

The default serialization mechanism for an object writes the class of the object, the class signature, and the values of all non-transient and non-static fields. References to other objects (except in transient or static fields) cause those objects to be written also.

In the simplest and most common case, serializing an object involves doing two things: creating an ObjectOuptutStream and calling writeObject( ) with a single "top-level" instance.

The methods implemented by ObjectOutputStream can be grouped into three categories: methods that write information to the stream, methods used to control the stream's behavior, and methods used to customize the serialization algorithm.

Methods that write information to the stream

public void write(byte[] b);

public void write(byte[] b, int off, int len);

public void write(int data);

public void writeBoolean(boolean data);

public void writeByte(int data);

public void writeBytes(String data);

public void writeChar(int data);

public void writeChars(String data);

public void writeDouble(double data);

public void writeFields( );

public void writeFloat(float data);

public void writeInt(int data);

public void writeLong(long data);

public void writeObject(Object obj);

public void writeShort(int data);

public void writeUTF(String s);

public void defaultWriteObject( );

The stream manipulation methods

public void reset( );

public void close( );

public void flush( );

public void useProtocolVersion(int version);

Methods that customize the serialization mechanism

public ObjectOutputStream.PutField putFields( );

protected void annotateClass(Class cl);

protected void annotateProxyClass(Class cl);

protected boolean enableReplaceObject(boolean enable);

protected Object replaceObject(Object obj);

protected void drain( );

protected void writeObjectOverride(Object obj);

protected void writeClassDescriptor(ObjectStreamClass classdesc);

protected void writeStreamHeader( );

These methods are more important to people who tailor the serialization algorithm to a particular use or develop their own implementation of serialization.

ObjectInputStreamAn ObjectInputStream deserializes primitive data and objects previously written using an ObjectOutputStream. ObjectInputStream is used to recover those objects previously serialized.ObjectInputStream ensures that the types of all objects in the graph created from the stream match the classes present in the Java Virtual Machine. Classes are loaded as required using the standard mechanisms.

Only objects that support the java.io.Serializable or java.io.Externalizable interface can be read from streams.

The method readObject is used to read an object from the stream. Java's safe casting should be used to get the desired type. In Java, strings and arrays are objects and are treated as objects during serialization. When read they need to be cast to the expected type.

Primitive data types can be read from the stream using the appropriate method on DataInput.

The default deserialization mechanism for objects restores the contents of each field to the value and type it had when it was written. Fields declared as transient or static are ignored by the deserialization process.

Reading an object is analogous to running the constructors of a new object. Memory is allocated for the object and initialized to zero (NULL). No-arg constructors are invoked for the non-serializable classes and then the fields of the serializable classes are restored from the stream starting with the serializable class closest to java.lang.object and finishing with the object's most specific class.

How Serialization works in case of Object Graphs

While saving the object, if the instance variable are all primitive types, it's easy staightforward. But what if the instance variables are themselves references to objects? What gets saved?For instance, we have a Dog class and that a reference to collar classclass Dog {private Collar theCollar;

.....

}

Now make a dog... First, you make a Collar for the Dog:Collar c = new Collar(3);

Then make a new Dog, passing it the CollarDog d = new Dog(c, 8);

Now what happens if you save the Dog?If our purpose is to save and the restore Dog, and the restored Dog is an exact duplicate of Dog that was saved, then Dog needs a Collar that is an exact duplicate of the Dog's Collar at the time the Dog was saved. That means both the Dog and the Collar should be saved.Again, if the Collar itself had references to other objects—like perhaps a Color object? This gets quite complicated very quickly

Fortunately, the Java serialization mechanism takes care of all of this. When you serialize an object, Java serialization takes care of saving that object's entire "object graph." That means a deep copy of everything the saved object needs to be restored.For example, if you serialize a Dog object, the Collar will be serialized automatically. And if the Collar class contained a reference to another object, THAT object would also be serialized, and so on.

Let's see an example of this

Sample output

Before serializing Dog...

java.io.NotSerializableException: com.Serialization.Dog

at java.io.ObjectOutputStream.writeObject0(Unknown Source)

at java.io.ObjectOutputStream.writeObject(Unknown Source)

at com.Serialization.SerializedDog.main(SerializedDog.java:15)

Why we got this exception before we forgot to implement the serializable interface.Change the above code:

publicclass Dog implements Serializable{

....

}

Now we run it again and sample output

Before serializing Dog...

java.io.NotSerializableException: com.Serialization.Collar

at java.io.ObjectOutputStream.writeObject0(Unknown Source)

at java.io.ObjectOutputStream.defaultWriteFields(Unknown Source)

at java.io.ObjectOutputStream.writeSerialData(Unknown Source)

at java.io.ObjectOutputStream.writeOrdinaryObject(Unknown Source)

at java.io.ObjectOutputStream.writeObject0(Unknown Source)

at java.io.ObjectOutputStream.writeObject(Unknown Source)

at com.Serialization.SerializedDog.main(SerializedDog.java:14)

What did we forget? The Collar class must ALSO be Serializable. If we modify the Collar class and make it serializable, then there's no problem:

publicclass Collar implements Serializable{..}

Now we got a file in the current directory with name "Dog.ser".

Sample Output

after Deserializing dog state Collar size: 3

In other words, what if making the Collar class serializable was not an option? Are we stuck with a non-serializable Dog? So…THEN what do you do if you want to save a Dog?

That's where the transient modifier comes in.If you mark the Dog's Collar instance variable with transient, then serialization will simply skip the Collar during serialization.

Let change the above code and see the output.

You will get the following output when you de-serialized the Dog object.

java.lang.NullPointerException

at com.Serialization.DeSerializedDog.main(DeSerializedDog.java:14)

So NOW what can we do?writeObject and readObject come into picture

The Dog has a Collar, and the Collar has state that should also be saved as part of the Dog's state. But…the Collar is not Serializable, so we must mark it transient. That means when the Dog is deserialized, it comes back with a null Collar. What can we do to somehow make sure that when the Dog is deserialized, it gets a new Collar that matches the one the Dog had when the Dog was saved?

Java serialization has a special mechanism just for this—a set of private methods you can implement in your class that, if present, will be invoked automatically during serialization and deserialization. It's almost as if the methods were defined in the Serializable interface, except they aren't.

These methods let you step into the middle of serialization and deserialization. So they're perfect for letting you solve the Dog/Collar problem: when a Dog is being saved, you can step into the middle of serialization and say, "By the way, I'd like to add the state of the Collar's variable (an int) to the stream when the Dog is serialized." You've manually added the state of the Collar to the Dog's serialized representation, even though the Collar itself is not saved.

Of course, you'll need to restore the Collar during deserialization by stepping into the middle and saying, "I'll read that extra int I saved to the Dog stream, and use it to create a new Collar, and then assign that new Collar to the Dog that's being deserialized."

Now we will change our Dog class and implements the writeObject and readObject function. Current Dog class code is

After implementing the new function, final code will look like :

Sample Output when you de-serialized the object after serializing after Deserializing dog state Collar size: 3

Points to be noted for the above code:When you invoke defaultWriteObject() from within writeObject() you're telling the JVM to do the normal serialization process for this object. When implementing writeObject(), you will typically request the normal serialization process, and do some custom writing and reading too.Remember, the most common reason to implement writeObject() and readObject() is when you have to save some part of an object's state manually

How Inheritance Affects Serialization

If a superclass is Serializable, then according to normal Java interface rules, all subclasses of that class automatically implement Serializable implicitly.what happens if a superclass is not marked Serializable, but the subclass is? Can the subclass still be serialized even if its superclass does not implement Serializable?To fully understand these implications.

Look at the difference between an object that comes from deserialization vs. an object created using new

All instance variables are assigned default values.

The constructor is invoked, which immediately invokes the superclass constructor or another overloaded constructor, until one of the overloaded constructors invokes the superclass constructor.

All superclass constructors complete.

Instance variables that are initialized as part of their declaration are assigned their initial value (as opposed to the default values they're given prior to the superclass constructors completing).

The constructor completes.

But these things do NOT happen when an object is deserialized.When an instance of a serializable class is deserialized, the constructor does not run, and instance variables are NOT given their initially assigned values.For example, imagine you have a class that declares an instance variable and assigns it the int value 3, and includes a method that changes the instance variable value to 10:

Obviously if you serialize a Foo instance after the changeNum() method runs, the value of the num variable should be 10. When the Foo instance is deserialized, you want the num variable to still be 10! You obviously don't want the initialization to happen.

The point is, when an object is deserialized we do NOT want any of the normal initialization to happen. We don't want the constructor to run, and we don't want the explicitly declared values to be assigned. We want only the values saved as part of the serialized state of the object to be reassigned.Of course if you have variables marked transient, they will not be restored to their original state (unless you implement readObject()), but will instead be given the default value for that data type.

To understand this, for instance, we have Animal class which is not serialized and Dog class which extends Animal and is serializable.

Because Animal is NOT serializable, any state maintained in the Animal class, even though the state variable is inherited by the Dog, isn't going to be restored with the Dog when it's deserialized! The reason is, the (unserialized) Animal part of the Dog is going to be reinitialized just as it would be if you were making a new Dog . That means all the things that happen to an object during construction, will happen—but only to the Animal parts of a Dog

This example will explain which variable will and will not be restored with the appropriate values when an object is deserialized.

The key here is that because Animal is not serializable, when the Dog was deserialized, the Animal constructor ran and reset the Dog's inherited weight variable

Serialization Is Not for Statics

You should think of static variables purely as CLASS variables. They have nothing to do with individual instances. But serialization applies only to OBJECTS. And what happens if you deserialize three different Dog instances, all of which were serialized at different times, and all of which were saved when the value of a static variable in class Dog was different.Which instance would "win"?Which instance's static value would be used to replace the one currently in the one and only Dog class that's currently loaded?See the problem?Static variables are NEVER saved as part of the object's state…because they do not belong to the object!

It will make redundant copy of same variable in multiple objects which makes it in-efficient.

The static variable can be modified by any object and a serialized copy would be stale or not in sync with current value.

As simple as serialization code is to write, versioning problems can occur in the real world. If you save a Dog object using one version of the class, but attempt to deserialize it using a newer, different version of the class, deserialization might fail

Version ID

To avoid incompatible changes, each class has a version ID that is included, along with its fully qualified name, with each serialized object of the class.This number is known as the stream unique identifier (SUID)

The SUID is a hash value of type long whose computation depends on the signature of the class members that are neither static nor transient.

The value can be set explicitly by giving a value to the static final field serialVersionUID.

The JDK comes with a utility command that provides the version

Three ways to generate SerialVersionUID

serialver classname - utility command

If you are using Eclipse, move your mouse over the serialization class.

Just specify your own serialVersionUID , give a number and append an “L” behind.private static final long serialVersionUID = 6L;

InvalidClassException

If an attempt is made to deserialize an object whose stored value of serialVersionUID disagrees with the value belonging to the current version of the class, an InvalidClassException is thrown.

Suppose we want to alter a class definition in a way that will have no substantial effect on the deserialization process but the version ID change prevents deserialization.

at com.Serialization.CattyDeSerialVersion.main(CattyDeSerialVersion.java:11)

Exception in thread "main" java.lang.NullPointerException

at com.Serialization.CattyDeSerialVersion.main(CattyDeSerialVersion.java:17)

How Serialization Detects When a Class Has Changed

In order for serialization to gracefully detect when a versioning problem has occurred, it needs to be able to detect when a class has changed. As with all the other aspects of serialization, there is a default way that serialization does this. And there is a way for you to override the default.

The default involves a hashcode. Serialization creates a single hashcode, of type long, from the following information:

The class name and modifiers

The names of any interfaces the class implements

Descriptions of all methods and constructors except privatemethods and constructors

Descriptions of all fields except private, static, and private transient

This single long, called the class's stream unique identifier (often abbreviated suid), is used to detect when a class changes. It is an extraordinarily sensitive index.

How to Serialized and de-Serialized Singleton object

There is method readResolve() in Serialization class. By implementing the readResolve method, a class can directly control the types and instances of its own instances being deserialized.

readResolve() will ensure the singleton contract while serialization.

The readResolve method is called when ObjectInputStream has read an object from the stream and is preparing to return it to the caller.

ObjectInputStream checks whether the class of the object defines the readResolve method. If the method is defined, the readResolve method is called to allow the object in the stream to designate the object to be returned. The object returned should be of a type that is compatible with all uses.

If it is not compatible, a ClassCastException will be thrown when the type mismatch is discovered.

In the following example, a Singleton class could be created for which only a single instance of each symbol binding existed within a virtual machine.The readResolve method would be implemented to determine if that Singleton was already defined and substitute the preexisting equivalent Singleton object to maintain the identity constraint.In this way the uniqueness of Singleton objects can be maintained across serialization.

readResolve() is used for replacing the object read from the stream.Above example explained the use of enforcing singletons; when an object is read, replace it with the singleton instance. This ensures that nobody can create another instance by serializing and deserializing the singleton.

I hope till know you got the idea of Serialization process in java. Now let's have a summary of the above in term of question and answers.

Question and Answers

If class A does not implement Serializable but a subclass B implements Serializable, will the fields of class A be serialized when B is serialized?Ans : Only the fields of Serializable objects are written out and restored. The object may be restored only if it has a no-arg constructor that will initialize the fields of non-serializable supertypes. If the subclass has access to the state of the superclass it can implement writeObject and readObject to save and restore that state.

Does object serialization support encryption?Ans : Object serialization does not contain any encryption/decryption in itself. It writes to and reads from Java Streams, so it can be coupled with any available encryption technology.

What do you mean by Serialization in Java?Ans : Serialization is a mechanism by which you can save or transfer the state of an object by converting it to a byte stream. This can be done in java by implementing Serialiazable interface. Serializable is defined as a marker interface which needs to be implemented for transferring an object over a network or persistence of its state to a file.

Why is Serialization required?Ans : You already read about in the above section.

What is the Difference between Externalizable and Serializable Interfaces?Ans : Serializable is a marker interface therefore you are not forced to implement any methods, however Externalizable contains two methods readExternal() and writeExternal() which must be implemented. Serializable interface provides a inbuilt serialization mechanism to you which can be in-efficient at times. However Externilizable interface is designed to give you greater control over the serialization mechanism. The two methods provide you immense opportunity to enhance the performance of specific object serialization based on application needs.

When will you use Serializable or Externalizable interface? and why?Ans : Most of the times when you want to do a selective attribute serialization you can use Serializable interface with transient modifier for variables not to be serialized. However, use of Externalizable interface can be really effective in cases when you have to serialize only some dynamically selected attributes of a large object. Lets take an example, Some times when you have a big Java object with hundreds of attributes and you want to serialize only a dozen dynamically selected attributes to keep the state of the object you should use Externalizable interface writeExternal method to selectively serialize the chosen attributes.

How to improve Serialization performance?Ans : The serialization process performance heavily depends on the number and size of attributes you are going to serialize for an object. Below are some tips you can use for speeding up the marshaling and un-marshaling of objects during Java serialization process:1. Mark the unwanted or non Serializable attributes as transient.2. Save only the state of the object, not the derived attributes. Some times we keep the derived attributes as part of the object however serializing them can be costly.3. Serialize attributes only with NON-default values. For examples, serializing a int variable with value zero is just going to take extra space however, choosing not to serialize it would save you a lot of performance.4. Use Externalizable interface and implement the readExternal and writeExternal methods to dynamically identify the attributes to be serialized. Some times there can be a custom logic used for serialization of various attributes.

Does setting the serialVersionUID class field improve Java serialization performance?Ans : Declaring an explicit serialVersionUID field in your classes saves some CPU time only the first time the JVM process serializes a given Class. However the gain is not significant, In case when you have not declared the serialVersionUID its value is computed by JVM once and subsequently kept in a soft cache for future use.

What are the alternatives to Serialization?Ans :1. Saving object state to database, this is most common technique used by most applications.2. Xml based data transfer is another popular mechanism, and a lot of XML based web services use this mechanism to transfer data over network.3. JSON Data Transfer - is recently popular data transfer format

What are transient variables? What role do they play in Serialization process?Ans : The transient keyword in Java is used to indicate that a field should not be serialized. Once the process of de-serialization is carried out, the transient variables do not undergo a change and retain their default value. Marking unwanted fields as transient can help you boost the serialization performance.

Why does serialization NOT save the value of static class attributes? Why static variables are not serialized?Ans : The Java variables declared as static are not considered part of the state of an object since they are shared by all instances of that class. Saving static variables with each serialized object would have following problems1. It will make redundant copy of same variable in multiple objects which makes it in-efficient.2. The static variable can be modified by any object and a serialized copy would be stale or not in sync with current value.

Is it possible to customize the serialization process? How can we customize the Serialization process?Ans : Yes, the serialization process can be customized. When an object is serialized, objectOutputStream.writeObject (to save this object) is invoked and when an object is read, ObjectInputStream.readObject () is invoked. What most people do not know is that Java Virtual Machine provides you with an option to define these methods as per your needs. Once this is done, these two methods will be invoked by the JVM instead of the application of the default serialization process. Classes that require special handling during the serialization and deserialization process must implement special methods with these exact signatures. Example are already explained above.

How can a sub-class of Serializable super class avoid serialization?Ans : For this, writeObject () and readObject() methods should be implemented in your class so that a Not Serializable Exception can be thrown by these methods. And, this can be done by customizing the Java Serialization process.Example :

What is the difference between Serializable and Externalizable interface in Java?Ans :1. Serializable is the parent interface of Externalizable and Externalizable is the child interface of Serializable.2. Serializable interface is implemented to use the default serialization mechanism but Externalizable interface is implemented to customize the serialization mechanism.3. Serializable interface has no methods but Externalizable interface declares two methods to implement: readExternal() and writeExternal().4. If the serialization process turns out to be slow then we cannot tune the performance using Serializable but we can write better code for serialization using Externalizable.

What is serialVersionUID? What would happen if a class does not define this?Ans : serialVersionUID is the version of the serialized object. If the class does not explicitly declare it then a default serialVersionUID is computed using ObjectStreamClass.computeSerialVersionUID(). During de-serialization the serialVersionUID in the byte stream and in the loaded class are compared and if they do not match then InvalidClassException will be thrown. If they match then the byte stream data will be mapped to the attributes of the class to reconstruct the object again.

If you know anyone who has started learning java, why not help them out! Just share this post with them.