Update

Please see the comments from Jean-philippe Bempel in the comment section. He mentioned a real example of how a deadlock can happen from JVM optimization. One of the reasons I like to blog as much as possible is that I can learn from the community if I misunderstood something. Thank you!

What is a volatile variable?

volatile is a keyword in Java. You cannot use this as a variable or method name. Period.

Seriously, jokes aside, what is volatile variable? When should we use it?

Ha ha, sorry, couldn’t help.

We typically use volatile keyword when we share variables with more than one thread in a multi-threaded environment, and we want to avoid any memory inconsistency errors due to the caching of these variables in the CPU cache.

Consider the following example of producer/consumer, where we are producing/consuming items one at a time –

In the above class, the produce method generates a new value by storing its argument into value, and changing the hasValue flag to true. The while loop checks if the value flag (hasValue) is true, which signifies the presence of a new value not yet consumed, and if it’s true then it requests the current thread to sleep. This sleeping loop only stops if the hasValue flag has been changed to false, which is only possible if the new value has been consumed by the consume method. The consume method requests the current thread to sleep if no new value is available. When a new value is produced by the produce method it terminates its sleeping loop, consumes it, and clears the value flag.

Now imagine that two threads are using an object of this class – one is trying to produce values (the writer thread), and another one is consuming them (the reader thread). The following test illustrates this approach –

This example will produce expected output in most of the times, but it also has a strong chance to run into a deadlock!

How?

Let’s talk about computer architecture a bit.

We know that a computer consists of CPUs and Memory Units (and many other parts). Even though the main memory is where all of our program instructions and variables/data reside, during program execution CPUs can store copies of variables in their internal memory (which is known as CPU cache) for performance gain. Since modern computers now have more than one CPUs, there are more than one CPU caches as well.

In a multi-threaded environment, it’s possible for more than one threads to execute at the same time, each one in a different CPU, (although this is totally dependent on the underlying OS), and each one of them may copy variables from main memory into their corresponding CPU cache. When a thread accesses these variables, they will then then access these cached copies, not the actual ones in the main memory.

Now let’s assume that the two threads in our test are running on two different CPUs, and the hasValue flag has been cached on either one of them (or both). Now consider the following execution sequence –

writerThread produces a value, and changes the hasValue to true. However, this update is only reflected in the cache, not in the main memory.

readerThread is trying to consume a value, but it’s cached copy of the hasValue flag is set to false. So even though a value has been produced by the writerThread, it cannot consume it as the thread cannot break out of the sleeping loop (hasValue is false).

Since the readerThread is not consuming the newly generated value, writerThread cannot proceed either as the flag is not being cleared, and hence it will be stuck in its sleeping loop.

And we have a deadlock in our hands!

This situation will only change if the hasValue flag is synchronized across all caches, which totally depends on the underlying OS.

What’s the solution then? And how does volatile fit into this example?

If we just mark the hasValue flag as volatile, we can be sure that this type of deadlock will not occur –

private volatile boolean hasValue = false;

Marking a variable as volatile will force each thread to read the value of that variable directly from the main memory. Also each write to a volatile variable will be flushed into the main memory immediately. If the threads decide to cache the variable, it will be synced with the main memory on each read/write.

After this change, consider the previous execution steps which led to deadlock –

Writer threadproduces a value, and changes the hasValue to true. This time the update will be directly reflected into the main memory (even if it’s cached).

Reader thread is trying to consume a value, and checking the value of hasValue. This time every read will force the value to be fetched directly from the main memory, so it will pick up the change made by the writer thread.

Reader thread consumes the generated value, and clears the value of the flag. This new value will go to the main memory (if it’s cached, then the cached copy will also be updated).

Writer thread will pick up this change as every read is now accessing the main memory. It will continue to produce new values.

And voila! We are all happy ^_^ !

I see. Is this all volatile do, forcing threads to read/write variables directly from memory?

Actually it has some further implications. Accessing a volatile variable establishes a happens-before relationship between program statements.

What is a happens-before relationship?

A happens-before relationship between two program statements is sort a guarantee which ensures that any memory writes by one statement are visible to another statement.

How does it relate with volatile?

When we write to a volatile variable, it creates a happens-before relationship with each subsequent read of that same variable. So any memory writes that have been done until that volatile variable write, will subsequently be visible to any statements that follow the read of that volatile variable.

Err….Ok….I sort of got it, but may be an example will be good.

Ok, sorry about the vague definition. Consider the following example –

Let’s assume that the above two snippets being executed by two different threads – thread 1 and 2. When the first thread changes hasValue, it will not only flush this change to main memory, but it will also cause the previous three writes (and any other previous writes) to be flushed into the main memory as well! As a result, when the second thread accesses these three variables it will see all the writes made by thread 1, even if they were all cached before (and these cached copies will be updated as well)!

This is the exactly why we did not have to mark the value variable in our first example with volatile as well. Since we wrote to that variable before accessing hasValue, and read from it after reading hasValue, it was automatically synced with the main memory.

This has another interesting consequence. JVM is famous for its program optimization. Sometimes it reorders the program statements to boost performance without changing the output of the program. As an example, it can change the following sequence of statements –

first = 5;
second = 6;
third = 7;

into this –

second = 6;
third = 7;
first = 5;

However, when the statements involve accessing a volatile variable, then it will never move a statement occurring before a volatile write after it. Which means, it will never transform this –

first = 5;
second = 6;
hasValue = true;
third = 7; // Order changed to appear after volatile write! This will never happen!

even though from the perspective of program correctness both of them seem to be equivalent. Note that the JVM is still allowed to reorder the first three writes among them as long as they all appear before the volatile write.

Similarly, the JVM will also not change the order of a statement which appears after a volatile variable read to appear before the access. Which means the following –

However, the JVM can certainly reorder the last three reads among them, as long as they keep appearing after the volatile read.

I sense a performance penalty has to be paid for volatile variables.

You got that right, since volatile variables force main memory access, and accessing main memory is always way slower than accessing CPU caches. It also prevents certain program optimizations by JVM as well, further reducing the performance.

Can we always use volatile variables to maintain data consistency across threads?

Unfortunately not. When more than one threads read and write to the same variable, then marking it as volatile is not enough to maintain consistency. Consider the following UnsafeCounter class –

The code is pretty self-explanatory. We are incrementing the counter in one thread, and decrementing it in another by same number of times. After running this test we expect the counter to hold 0, but this is not guaranteed. Most of the times it will be 0, and some of the times it will be -1, -2, 1, 2 i.e., any integer value between the range [-5, 5].

Why does this happen? It happens because both the increment and the decrement operation of the counter are not atomic – they do not happen all at once. Both of them consists of multiple steps, and the sequence of steps overlap with each other. So you can think of an increment operation as follows –

Read the value of the counter.

Add one to it.

Write back the new value of the counter.

and an decrement operation as follows –

Read the value of the counter.

Subtract one from it.

Write back the new value of the counter.

Now, let’s consider the following execution steps –

First thread has read the value of the counter from memory. Initially it’s set to zero. It then adds one to it.

Second thread has also read the value of the counter from memory, and saw that it’s set to zero. It then subtracts one from it.

First thread now writes back the new value of counter to memory, changing it to 1.

Second thread now writes back the new value of counter to memory, which is -1.

My personal choice is the one using AtomicInteger as the synchronized one hampers performance greatly by allowing only one thread to access any of the inc/dec/get methods.

I notice that the synchronized version does not mark the counter as volatile. Does this mean……..?

Yup. Using the synchronized keyword also establishes a happens-before relationship between statements. Entering a synchronized method/block establishes a happens-before relationship between the statements that appear before it and the ones inside the method/block. For a full list of what establishes a happens-before relationship, please go here.

That’s all I have to say about volatile for the time being. All the examples have been uploaded in my github repo.

A compile-time error will be issued if you do not explicitly call any of the available parent constructors, because in this case the compiler tries to automatically call the no-arg constructor of the parent, and since it does not have any, an error will occur.

But I thought the compiler will always define the no-arg constructor for me?

Nope.

The moment you defined a constructor for the parent by yourself, the compiler stopped interfering. Which means, now it will not automatically define the default constructor for you. If you want a no-arg constructor now, you will have to define one by yourself.

So if I now explicitly define a no-arg constructor in the parent, the error will be resolved?

That is one of the two ways to solve it. The other one is given below –

Using super you can explicitly call a parent constructor, providing the required arguments and thus choosing an appropriate overloaded version. This is exactly how the compiler called the parent constructors in the first and the second examples, except the super call was invisible to us. The compiler automatically put it when it compiled our code.

You need to be aware of one thing though – the super call should be the first statement of the child constructor, otherwise the compiler will throw an error. As a consequence, you cannot use super() and this() in the same constructor at the same time.

What is this()?

You use this() to call the constructor of the same class. Usually you use it to call an overloaded version of the constructor which contains common initialization logic for the class, like below –

Share:

Like this:

In my last article I showed two different ways to read/write persistent entity state – field and property. When field access mode is used, JPA directly reads the state values from the fields of an entity using reflection. It directly translates the field names into database column names if we do not specify the column names explicitly. In case of property access mode, the getter/setter methods are used to read/write the state values. In this case we annotate the getter methods of the entity states instead of the fields using the same annotations. If we do not explicitly specify the database column names then they are determined following the JavaBean convention, that is by removing the “get” portion from the getter method name and converting the first letter of the rest of the method name to lowercase character.

We can specify which access mode to use for an entity by using the @Access annotation in the entity class declaration. This annotation takes an argument of type AccessType (defined in the javax.persistence package) enum, which has two different values corresponding to two different access modes – FIELD and PROPERTY. As an example, we can specify property access mode for the Address entity in the following way –

As discussed before, we are now annotating the getter method of the entity id with the @Id, @GeneratedValue and @Column annotations.

Since now column names will be determined by parsing the getter methods, we do not need to mark the transientColumn field with the @Transient annotation anymore. However if Address entity had any other method whose name started with “get”, then we needed to apply @Transient on it.

If an entity has no explicit access mode information, just like our Address entity that we created in the first part of this series, then JPA assumes a default access mode. This assumption is not made at random. Instead, JPA first tries to figure out the location of the @Id annotation. If the @Id annotation is used on a field, then field access mode is assumed. If the @Id annotation is used on a getter method, then property access mode is assumed. So even if we remove the @Access annotation from the Address entity in the above example the mapping will still be valid and JPA will assume property access mode –

You should never declare a field as public if you use field access mode. All fields of the entity should have either private (best!), protected or default access type. The reason behind this is that declaring the fields as public will allow any unprotected class to directly access the entity states which could defeat the provider implementation easily. For example, suppose that you have an entity whose fields are all public. Now if this entity is a managed entity (which means it has been saved into the database) and any other class changes the value of its id, and then you try to save the changes back to the database, you may face unpredictable behaviors (I will try to elaborate on this topic in a future article). Even the entity class itself should only manipulate the fields directly during initialization (i.e., inside the constructors).

In case of property access mode, if we apply the annotations on the setter methods rather than on the getter methods, then they will simply be ignored.

It’s also possible to mix both of these access types. Suppose that you want to use field access mode for all but one state of an entity, and for that one remaining state you would like to use property access mode because you want to perform some conversion before writing/after reading the state value to and from the database. You can do this easily by following the steps below –

Mark the entity with the @Access annotation and specify AccessType.FIELD as the access mode for all the fields.

Mark the field for which you do not like to use the field access mode with the @Transient annotation.

Mark the getter method of the property with the @Access annotation and specify AccessType.PROPERTY as the access mode.

The following example demonstrates this approach as the postcode has been changed to use property access mode –

The important thing to note here is that if we do not annotate the class with the @Access annotation to explicitly specify the field access mode as the default one, and we annotate both the fields and the getter methods, then the resultant behavior of the mapping will be undefined. Which means the outcome will totally depend on the persistence provider i.e., one provider might choose to use the field access mode as default, one might use property access mode, or one might decide to throw an exception!

That’s it for today. If you find any problems/have any questions, please do not hesitate to comment!

Share:

Like this:

In my last post I showed a simple way of persisting an entity. I explained the default approach that JPA uses to determine the default table for an entity. Let’s assume that we want to override this default name. We may like to do so because the data model has been designed and fixed before and the table names do not match with our class names (I have seen people to create tables with “tbl_” prefix, for example). So how should we override the default table names to match the existing data model?

Turns out, it’s pretty simple. If we need to override the default table names assumed by JPA, then there are a couple of ways to do it –

We can use the name attribute of the @Entity annotation to provide an explicit entity name to match with the database table name. For our example we could have used @Entity(name = “tbl_address”) in our Address class if our table name was tbl_address.

We can use a @Table (defined in the javax.persistence package) annotation just below the @Entity annotation and use its name attribute to specify the table name explicitly –

From these two approaches the @Table annotation provides more options to customize the mapping. For example, some databases like PostgreSQL have a concept of schemas, using which you can further categorize/group your tables. Because of this feature you can create two tables with the same name in a single database (although they will belong to two different schemas). To access these tables you then add the schema name as the table prefix in your query. So if a PostgreSQL database has two different schemas named public (which is sort of like default schema for a PostgreSQL database) and document, and both of these schemas contain tables named document_collection, then both of these two queries are perfectly valid –

-- fetch from the table under public schema
SELECT *
FROM public.document_collection;
-- fetch from the table under document schema
SELECT *
FROM document.document_collection;

In order to map an entity to the document_collection table in the document schema, you will then use the @Table annotation with its schema attribute set to document –

Inlining the schema name with the table name this way is not guaranteed to work across all JPA implementations because support for this is not specified in the JPA specification (non-standard). So it’s better if you do not make a habit of doing this even if your persistence provider supports it.

Let’s turn our attention to the columns next. In order to determine the default columns, JPA does something similar to the following –

At first it checks to see if any explicit column mapping information is given. If no column mapping information is found, it tries to guess the default values for columns.

To determine the default values, JPA needs to know the access type of the entity states i.e., the way to read/write the states of the entity. In JPA two different access types are possible – field and property. For our example we have used the field access (actually JPA assumed this from the location/placement of the @Id annotation, but more on this later). If you use this access type then states will be written/read directly from the entity fields using the Reflection API.

After the access type is known, JPA then tries to determine the column names. For field access type JPA directly treats the field name as the column names, which means if an entity has a field named status then it will be mapped to a column named status.

At this point it should be clear to us how the states of the Address entities got saved into the corresponding columns. Each of the fields of the Address entity has an equivalent column in the database table tbl_address, so JPA directly saved them into their corresponding columns. The id field was saved into the id column, city field into the city column and so on.

OK then, let’s move on to overriding column names. As far as I know there is only one way (if you happen to know of any other way please comment in!) to override the default column names for entity states, which is by using the @Column (defined in the javax.persistence package) annotation. So if the id column of the tbl_address table is renamed to be address_id then we could either change our field name to address_id, or we could use the @Column annotation with its name attribute set to address_id –

You can see that for all the above cases the default approaches that JPA uses are quite sensible, and most of the cases you will be happy with it. However, changing the default values are also very easy and can be done very quickly.

What if we have a field in the Address entity that we do not wish to save in the database? Suppose that the Address entity has a column named transientColumn which does not have any corresponding default column in the database table –

If you compile your code with the above change then you will get an exception which looks something like below –

Exception in thread “main” java.lang.ExceptionInInitializerError
at com.keertimaan.javasamples.jpaexample.Main.main(Main.java:33)
Caused by: javax.persistence.PersistenceException: Unable to build entity manager factory
at org.hibernate.jpa.HibernatePersistenceProvider.createEntityManagerFactory(HibernatePersistenceProvider.java:83)
at org.hibernate.ejb.HibernatePersistence.createEntityManagerFactory(HibernatePersistence.java:54)
at javax.persistence.Persistence.createEntityManagerFactory(Persistence.java:55)
at javax.persistence.Persistence.createEntityManagerFactory(Persistence.java:39)
at com.keertimaan.javasamples.jpaexample.persistenceutil.PersistenceManager.<init>(PersistenceManager.java:31)
at com.keertimaan.javasamples.jpaexample.persistenceutil.PersistenceManager.<clinit>(PersistenceManager.java:26)
… 1 more
Caused by: org.hibernate.HibernateException: Missing column: transientColumn in jpa_example.tbl_address
at org.hibernate.mapping.Table.validateColumns(Table.java:365)
at org.hibernate.cfg.Configuration.validateSchema(Configuration.java:1336)
at org.hibernate.tool.hbm2ddl.SchemaValidator.validate(SchemaValidator.java:155)
at org.hibernate.internal.SessionFactoryImpl.<init>(SessionFactoryImpl.java:525)
at org.hibernate.cfg.Configuration.buildSessionFactory(Configuration.java:1857)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl$4.perform(EntityManagerFactoryBuilderImpl.java:850)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl$4.perform(EntityManagerFactoryBuilderImpl.java:843)
at org.hibernate.boot.registry.classloading.internal.ClassLoaderServiceImpl.withTccl(ClassLoaderServiceImpl.java:398)
at org.hibernate.jpa.boot.internal.EntityManagerFactoryBuilderImpl.build(EntityManagerFactoryBuilderImpl.java:842)
at org.hibernate.jpa.HibernatePersistenceProvider.createEntityManagerFactory(HibernatePersistenceProvider.java:75)
… 6 more

The exception is saying that the persistence provider could not find any column in the database whose name is transientColumn, and we did not do anything to make it clear to the persistence provider that we do not wish to save this field in the database. The persistence provider took it as any other fields in the entity which are mapped to database columns.

In order to fix this problem, we can do any of the following –

We can annotate the transientColumn field with the @Transient (defined in javax.persistence package) annotation to let the persistence provider know that we do not wish to save this field, and it does not have any corresponding column in the table.

We can use the transient keyword that Java has by default.

The difference between these two approaches that comes to my mind is that, if we use the transient keyword instead of the annotation, then if one of the Address entities gets serialized from one JVM to another then the transientColumn field will get reinitialized again (just like any other transient fields in Java). For the annotation, this will not happen and the transientColumn field will retain its value across the serialization. As a rule of thumb, I always use the annotation if I do not need to worry about serialization (and in most of the cases I don’t).

In my two previous articles I explained how to set up JPA in a Java SE environment. I do not intend to write the setup procedure for a web application because most of the tutorials on the web do exactly that. So let’s skip over directly to object relational mapping, or entity mapping.

Object-relational mapping (ORM, O/RM, and O/R mapping) in computer science is a programming technique for converting data between incompatible type systems in object-oriented programming languages. This creates, in effect, a “virtual object database” that can be used from within the programming language. There are both free and commercial packages available that perform object-relational mapping, although some programmers opt to create their own ORM tools.

Typically, mapping is the process through which you provide necessary information about your database to your ORM tool. The tool then uses this information to read/write objects into the database. Usually you tell your ORM tool the table name to which an object of a certain type will be saved. You also provide column names to which an object’s properties will be mapped to. Relation between different object types also need to be specified. All of these seem to be a lot of tasks, but fortunately JPA follows what is known as “Convention over Configuration” approach, which means if you adopt to use the default values provided by JPA, you will have to configure very little parts of your applications.

In order to properly map a type in JPA, you will at a minimum need to do the following –

Mark your class with the @Entity annotation. These classes are called entities.

Mark one of the properties/getter methods of the class with the @Id annotation.

And that’s it. Your entities are ready to be saved into the database because JPA configures all other aspects of the mapping automatically. This also shows the productivity gain that you can enjoy by using JPA. You do not need to manually populate your objects each time you query the database, saving you from writing lots of boilerplate code.

Let’s see an example. Consider the following Address entity which I have mapped according to the above two rules –

Let’s take a step back at this point and think what we needed to do if we had used plain JDBC for persistence. We had to manually write the insert queries and map each of the attributes to the corresponding columns for both cases, which would have required a lot of code.

An important point to note about the example is the way I am setting the id of the entities. This approach will only work for short examples like this, but for real applications this is not good. You’d typically want to use, say, auto-incremented id columns or database sequences to generate the id values for your entities. For my example, I am using a MySQL database, and all of my id columns are set to auto increment. To reflect this in my entity model, I can use an additional annotation called @GeneratedValue in the id property. This tells JPA that the id value for this entity will be automatically generated by the database during the insert, and it should fetch that id after the insert using a select command.

With the above modifications, my entity class becomes something like this –

How did JPA figure out which table to use to save Address entities? Turns out, it’s pretty straight-forward –

When no explicit table information is provided with the mapping then JPA tries to find a table whose name matches with the entity name.

The name of an entity can be explicitly specified by using the “name” attribute of the @Entity annotation. If no name attribute is found, then JPA assumes a default name for an entity.

The default name of an entity is the simple name (not fully qualified name) of the entity class, which in our case is Address. So our entity name is then determined to be “Address”.

Since our entity name is “Address”, JPA tries to find if there is a table in the database whose name is “Address” (remember, most of the cases database table names are case-insensitive). From our schema, we can see that this is indeed the case.

So how did JPA figure our which columns to use to save property values for address entities?

At this point I think you will be able to easily guess that. If you cannot, stay tuned for my next post!

Those who are only starting out on JPA, please let me explain some of the components of this configuration.

The first section is used for configuring JDBC connection that will be used by the persistence provider. Usually we specify the JDBC url, database username, password and fully qualified name of the Driver class in this section. The second section configures some property values for hibernate, and is explained below –

The “hibernate.show_sql” property specifies whether or not hibernate will print the queries in the log file (provided that you have configured log4j properly). This is specially helpful if you want to view which queries are being executed for reading/writing/deleting some entities. In the production environment you can set to false if you want so that the queries will not be logged.

The “hibernate.format_sql” property specifies whether or not the queries will be formatted to a more readable form before logging.

The “hibernate.dialet” property specifies which type of dialects we intend to use. If you do not know what they are then please read this excellent answer on StackOverflow.

The “hibernate.hbm2ddl.auto” property is very interesting one. By changing its value you can enable hibernate to create/drop your database tables for you, or validate an existing schema against your mapping. This has also been explained very well on StackOverflow.

The last section configures the connection pool that hibernate will use for the database. Hibernate usually provides a built-in connection pooling mechanism which is good enough for development and testing, but is not suitable for production environment. So to get the optimal connection pooling behavior in production you need to use something more mature. C3P0 is a popular production-grade connection pooling library which is very easy to use with Hibernate. All you need to do is just specify the property values like minimum/maximum number of connections in the pool, timeout values etc., and the rest will be taken care of by Hibernate.

There is also another important point that I have skipped in the last article. In order for the entities to be found by the persistence provider in a Java SE environment, they will have to be listed in the persistence.xml file as follows –

Doing this will ensure that your entities will be found by the persistence provider and will be ready for persistence. However, hibernate auto-scans the packages for classes marked with the “Entity” annotation and make them persistent, so we did not have to worry about that. Keep in mind that this is also the case when you use JPA in a Java EE environment (i.e., your application runs in a full-blown application server). In this case the application server scans the application during its deployment and finds the classes marked as entities. If you run your application in a Java SE environment using a provider which does not have this type of auto scanning ability, then you will be required to list the entities in the persistence.xml file like above.

That’s it for setting up JPA in a Java SE environment. Hopefully once you have this setup you will be easily able to persist your entities without much trouble.