My Views on ColdFusion, Java and related technologies

Archive for category ORM

One of the most convenient thing I love about ColdFusion ORM is that it lets you build your database automatically from the object model. It generates the table from the CFC mapping that you have provided, automatically defines all the necessary constraints from the relationships or property definitions and allows you to populate the database with some initial data using a script file. It also allows you to keep your database always in sync with your application, which means that if you add a new CFC in your application or if you add new properties in your components, you don’t have to go to the database and add them. CF-ORM (or Hibernate) will do that automatically for you.

Though this is really nice, ColdFusion does not do it by default. You need to enable it specifically if you want it to build the database for you. You can do this by setting ‘ormsettings.dbCreate‘ to ‘dropcreate‘ or ‘update‘ in application.

ormsettings.dbcreate can have following values

dropcreate : With this setting, CF-ORM drops the table if it exists and then creates it. This starts the application with a clean slate and one should be careful while using this. Careful because all your data will be deleted and tables will be created afresh whenever the application starts or when ever ORM is initialized. This setting is awesome at the development time. With this setting, you can also specify a sqlscript file to initially populate the tables once they are created. You can specify that using ormsettings.sqlscript in application.

update : With this setting, CF-ORM will create the table if it does not exist or update it if it exists. This setting is very convenient as you don’t need to make any changes in the database table yourself whenever changes are made in the application. Hibernate will do that for you. However that is not absolutely true all the time. (I can see lot of people complaining/logging bugs about it ). Here is what Hibernate does when you have this setting on :

Create table if a new CFC mapping is found in the application or if the table name of a CFC is changed. If the table name is changed for a CFC, it will not rename the table in the DB. It will simply create a new table leaving the old table as it is.

For existing table, add column if a new property is added or if the column name for property is changed. If the column name is changed for a property, it will not rename the column but will simply create another column with the new name.

For existing column, add the constraints, if a new foreign key constraint is required for relationship. However none of these are modified in the table - datatype, length, not-null, unique, precision, scale, index, uniqueKey.

Change the id generation strategy if generator is changed for id column.

none : This is the default setting where tables are not created or modified by CF-ORM. It uses the existing tables in the database. One should switch to this setting once the application goes in production.

One of the frequent question that comes up for ORM is – Can I use database Views instead of the table? And the answer is “of course”! From ORM perspective, there is no difference between database view and a table. Any query that ORM generates will work on the views in the same way as it does on a table. So while defining the persistence metadata for your CFC, just use the view name instead of the table name and you should be all set.

Of course views are used just for the query and not for the insert/update/delete. Hence method like EntitySave/EntityDelete which will try to do insert/update/delete on View will not change the view and would throw an error at the time of flushing the ORM session.

In my last two posts, I mentioned that immediate fetching or lazy fetching can cause ‘N+1 select problem’. If you are wondering what exactly is this, read on.

Consider the example of Department and Employees. When you call EntityLoad(“Department”), following sqls will be executed.

SELECT*FROM department;
SELECT*FROM employees WHERE deptId = ?

The first query will be executed once (1) and the second query will be executed as many times as the department (N). Thus the above entityLoad call results into ‘N+1′ sql execution and thus can be a cause of performance bottleneck. Because of N+1 sqls, this is known as ‘N+1 select problem’. This will almost always happen when the fetching is “Immediate” (using fetch=”select”) or can happen with lazy loading.

With immediate fetching it is obvious why this would happen. When lazy=’true”, this can happen when the association is accessed immediately on each of the owning object (department in this case).

If you think this could be happening in your application, use either of these two options.

set lazy=”false” and use fetch=”join” so that the departments and its employees get loaded together. (Eager fetch)

Keep lazy=”true” but load the department using hql with join. So instead of using EntityLoad(“Department”), use

In the previous post, we talked about different fetching strategies and when to use them. In this post, we will go little deep in lazy loading which is the most popular and commonly used fetching strategy.

As we said in the earlier post – with this strategy, when you load an entity, ColdFusion ORM will load the entity’s data but relations and any mapped collections and are not loaded. They are loaded only when you want to load them i.e by calling the getter method for it and accessing it. Thus the relations and collection mappings are lazily loaded. To give an example, when Department is loaded, all its employees are not loaded and they are loaded only when getEmployees() is called.

There are three types of lazy loading that is provided by ColdFusion ORM for relationship.

lazy : This is the default lazy loading that applies to collection mapping, one-to-many and many-to-many relationship. In this case, when you call the accessor for the collection/relation, the collection is fully loaded. Thus when you call EntityLoad() for a particular department, its employees are not loaded at that time. When you call dept.getEmployees(), all the employees object belonging to the department will get loaded. This is achieved by setting lazy=”true” on the relationship property definition in the CFC.Example : In Department.cfc

Extra lazy : This applies to one-to-many and many-to-many relationship. This is similar to lazy loading but goes one step ahead of it and does not load the associated objects for for calls like size(), contains(Object). This means that calls like ArrayLen(dept.getEmployees()) or ArrayContains(dept.getEmployees(), anEmployee) or ArrayFind(dept.getEmployees(), anEmployee) will not result into loading any employee object. It will just execute the sql for finding size or finding if the employee belongs to the department. The employee objects will be loaded only when a employee is accessed from this collection. This is very useful if the collection is huge. This is achieved by setting lazy=”extra” on the relationship property definition in the CFCExample : In Department.cfc

proxy : This applies to one-to-one and many-to-one relationship. When an object is loaded, the associated object is not loaded from the database. ColdFusion will only create a proxy object for the related object and when any method is invoked on the related object, the data for the proxy object is loaded from the database and populated in the proxy object. To give an example, if the Employee-Department relation is lazy, when Employees is loaded, the department is not loaded and when you call employee.getDepartment(), you would only get a proxy object. When you call any method on the proxy object, query will be executed on the DB to load department’s data. This is achieved by setting lazy=”true” on the relationship property definition in the CFCExample : In Employee.cfc

An important thing to note here is – An entity is loaded only once in the request (in Hibernate session to be more specific) and there will always be only one copy of it in the request. So for Employee-Department relationship, which is lazy, if the department is already loaded, calling employee.getDepartment() will not create a proxy object and will return the loaded department object.

Lazy loading can be disabled by setting lazy=”false” on the relationship property definition in the CFC.

Choosing an appropriate lazy loading option is very important for the performance of your application. Extra lazy means more number of trips to the database (each trip to the DB is expensive) but less data in memory whereas no lazy loading means a huge object graph in the memory. So you need to make a balance depending on the application need.

While lazy loading is very useful and helpful in reducing the amount of data loaded from the database and thus reducing the number of objects in memory, overdoing it can have an inverse effect. Lets say in your application, when you load an object, you always access its associated data, lazy loading will again cause ‘N+1 select problem’. This means that a huge number of sqls will be executed which can be avoided by using eager fetch or using HQL with join (See query example of “Eager Fetch” in this post).

There are some other important things to remember/note while using lazy loading

The lazy collection (including one-to-many and many-to-many) is not immediately loaded when you call the getter for the relationship. The sql is executed only when you access anything on the result of the getter (either get its size, or iterate over it etc). lazy=”extra” is little extra lazy (see “Extra Lazy” above).

has*** methods on the entity for relationship are optimized in such a way that it will not result into loading the associated object.

You can quite easily hit the famous “LazyInitializationException“. Mark Mandel explains this nicely in his post on “Explaining Hibernate Sessions“. Ray Camden also talks about his experience with it here. So you need to be careful when using detached object.

If you are retrieving ORM entities in flex, even if you set lazy=”false”, ColdFusion will not send the whole object graph. If you need the relation data to be serialized to flex, you need to set “remotingfetch=’true’” on the relationship property. More on this later.

In any application that needs database interaction, DB operations are the key to the application performance. Most of the application performance problems come because the sqls being executed are not optimized or there are huge numbers of queries being executed or there is too much data getting loaded by the query or the columns are not properly indexed or there is no caching being done and the application always hits the DB. In this series, I will try to cover different strategies that you need to use for a good performing ORM based application.

As we all know, the fundamental strategy to tune an application performance is to optimize the sql queries. As a general practice, object retrieval using many round trips to the database is avoided and you would fetch all the required data for a particular operation using a single SQL query using Joins to retrieve related entities. Also, you would fetch only the data that is required i.e data will not be fetched unnecessarily if it is not needed so as to reduce the load on the DB. However this becomes an issue when you use ORM because you no longer write the SQL queries yourself and queries are generated and executed by the underlying ORM engine.

Thankfully ORM engine like Hibernate provides various hooks to optimize the sql as well no of trips that will be made to the database. The most important of these hooks is “fetching strategy” which defines what data will be fetched, when and how.

There are four fetching strategies for loading an object and its associations. (We will use Department-Employee relationship for all the explanation)

Immediate fetching : In this strategy, the associated object is fetched immediately after the owning entity is fetched, either from the database using a separate SQL query or from the seconadary cache. This is usually not an efficient strategy unless the associated object is cached in the secondary cache or when separate queries are more efficient than a Join query. You can define this strategy by setting lazy=”false” and fetch=”select” for the relationship property definition in the CFC.example :

With this strategy, on loading the department object, its employees object will be loaded immediately using a separate SQL query. As a result, this strategy is extremely vulnerable to ‘N+1 Select problem’.
pros : The association is loaded immediately and hence the associated object can be accessed even after the ORM session is closed.cons : A large number of sqls get executed causing a higher traffic between application and the database. The association is loaded even if it might not be needed.

When to use : When the association is almost always read after loading the object and executing separate sql is more efficient than executing a join query.

Lazy fetching : In this strategy, the associated object or collection is fetched lazily i.e only when required. For example, when you load a Department object, all the associated employees will not be loaded at all. It will be loaded only when you access it. This results in a new request to the database but it controls how much of data is loaded and when is it loaded. This helps in reducing the database load because you fetch only the data that is required and is a good default strategy. We will talk about this in much more detail in the next post. For the time being lets just say this is the most commonly used and the default strategy for obvious reasons. You can define this strategy by setting lazy=”true” or lazy=”extra”.example :

pros : Only the minimum required data is loaded. This avoids loading of entire object graph in memory and hence the performance is generally good.cons : If the association is always accessed after loading, this would result in extra sql execution. If the loaded object is accessed in another ORM session (i.e has become detached), extra care must be taken to avoid errors like ‘LazyInitializationException’ or ‘NonUniqueObjectException’.

When to use : When the association is not immediately read after loading the object. This is the most commonly used and default strategy.

Eager fetching : In this strategy, the associated object or collection is fetched together with the owning entity using a single SQL Join query. Thus, this strategy reduces the number of trips to the database and is a good optimization when you always access the associated object immediately after loading the owning entity. You can define this strategy by setting fetch=”join” for the relationship property definition in the CFC.example :

With this strategy, on loading the department object, both department and employees data will be fetched from the database using a single join query.

Even if the eager fetching is not defined in the CFC metadata, it can be done at runtime using ORMExecuteQuery. This can be very powerful in scenarios where in most of the cases, you choose the assocition to be lazily loaded but in some cases, you want to immediately load it. In those case, use Join in the HQL and execute that using ORMExecuteQuery.

pros : The association is loaded immediately and hence the associated object can be accessed even after the ORM session is closed. The association is loaded using a single join query which usually is more efficient than executing multiple queries.cons : The association is loaded even if it might not be needed. Since the query used is a join query, the resultset returned by the DB will typically contain lot of repititive data. If used for more than one collection of an entity, this will create a cartesian product of the collection’s data and thus causing creation of a huge resultset.

When to use : When the association is almost always read after loading the object. More suitable for many-to-one and one-to-one association or single collection where the associated objects can be loaded using join query without much overhead.

Batch fetching : This strategy tells Hibernate to optimize the second SQL select in Immediate fetching or lazy fetching to load batch of objects or collections in a single query. This allows you to load a batch of proxied objects or unitialized collections that are referenced in the current request. This is a blind-guess optimization technique but very useful in nested tree loading.The concept of batch-fetching is slightly confusing (at least I got confused when I first read about it). So you need to pay careful attention to this.This can be specified using “batch-size” attribute for CFC or relationship property. There are two ways you can tune batch fetching: on the CFC and on the collection.

Batch fetching at CFC level : This allows batch fetching of the proxied objects and hence is applied to one-to-one and many-to-one relationship. To give an example, cosider Employee-Department example where there are 25 employee instance loaded in the request(ORM session). Each employee has a reference to the department and the relationship is lazy. Therefore employee objects will contain the proxied object for Department.If you now iterate through all the employees and call getDepartment() on each, by default 25 SELECT statements will be executed to retrieve the proxied owners, one for each Department proxy object. This can be batched by specifying the ‘batch-size’ attribute on the Department CFC like

<cfcomponent table=”Department” batch-size=”10″ …>

When you call getDepartment() on the first employee object, it will see that department should be batch fetched, and hence it will fetch 10 department objects that are proxied in the current request.
So for 25 employee objects, this will make Hibernate to execute at max three queries – in batch of 10, 10 and 5.
You must note that batch-size at component level does not mean that whenever you load a Department object, 10 department objects will get loaded in the session. It just means that if there are proxied instances of Department object in the session, 10 of those proxied objects will get loaded together.

Batch fetching at collections : This allows batch fetching of value collections, one-to-many or many-to-many relationships that are unitialized. To give an example, consider Department-Employee one-to-many relationship where there are 25 departments loaded and each department has a lazy collection of employees. If you now iterate through the departments and call getEmployees() on each, by default 25 SELECT statements will be executed, one for each Department to load its employee objects. This can be optimized by enabling batch fetching which is done by specifying “batch-size” on the relationship property like

One important thing to understand here is that batch-size here does not mean that 10 employees will be loaded at one time for a department. it actually means that 10 employee collections (i.e employees for 10 department objects) will be loaded together.
When you call getEmployees() on the first department, employees for 9 other departments will also be fetched along with the one that was asked for.

The value for batch-size attribute should be chosen based on the expected number of proxied objects or unitialized collections in the session.

Few days back Manju logged a bug in CF-ORM saying ‘lazy’ does not work for many-to-one relation and that too on non-Windows machine. At first, I simply rejected the bug because a) ORM can not have anything to do with OS and therefore, if it works on Windows, it works on all the plaform and b) I know it works . But he did not agree and I had to take a look at that machine. And apparently he was right – lazy was not working ! The related entity was in-fact getting loaded immediately. (Question for you – how will you know that lazy is working or not?)

Even after seeing this, I did not believe it and asked him to replicate this on another system and he successfully showed that to me on one another system. And he agreed that it works fine on most of the configurations. The problem exists only on a few of the systems.

This got me thinking – Why would a relation get loaded immediately even after it is marked lazy? The only answer would be – if some one is accessing that lazy field and calling some method on it. I checked his code which was loading the entities to see if there could be any case, where the field would get loaded and unfortunately there was none.

And then suddenly it hit me – what if “memory tracking” is swithched on? That would access each of the field of each object recursively to compute the size of an object and that can definitely cause this. I immediately checked the server monitor and the “memory tracking” was right there staring at me in “red”! It was indeed enabled. I asked Manju to check the other system as well (where lazy was not working) and the memory tracking was enabled there as well.

So the lesson – If the ‘memory tracking’ is enabled on the server, the relationship will no longer remain lazy. And btw, you should enable “Memory tracking” on the server only if you need to track the memory for some troubleshooting. Memory tracking is really really expensive in terms of performance.

Another reason why it might not work for you could be – if you are sending the object to flex. Currently, during serialization, the related objects also get sent irrespective of the lazy attribute set on the relationship. We are still working on it and hopefully by the time we release, this will be fixed.

When you use ORM for developing an application, SQLs are generated and executed by the underlying ORM engine (i.e Hibernate for ColdFusion ORM). However, for both troubleshooting and performance optimization, it is crucial to monitor what queries are getting generated. It can help you find out if there is any error in mapping that you have provided as well as it can help you decide various tuning strategies.

ColdFusion can log the SQLs generated by ORM either onto the console or a file. At the same time it leaves enough hook for you to log it anywhere you want.

ColdFusion ORM provides two ways to log the SQLs.

Using application setting to log to console : This is a quick and simple way to log the sql to console. This is enabled by setting “logsql” in ormsettings.

<cfset this.ormsettings.logsql="true">

This setting is self sufficient and it will log all the sqls executed by hibernate to the console (or a file where the server output goes). However this is not a very flexible option. The sqls are always written to the console and it will be combined with any other output that goes to console. Also this option will not show the DDL queries used for creating or updating tables. It only logs the SQL for the entity operations.

Using log4J.properties: Hibernate uses log4j for its logging and you can completely control its logging (including SQL) by modifying the log4j.properties. log4j.properties is present under <cf_home>/lib directory. Please note that you don’t need to do any application specific setting for this.

I will go in details about using log4j.properties for SQL logging. Here is a snippet from log4j.properties file that is shipped with ColdFusion.

log4j.logger.org.hibernate.SQL : Defines whether the SQL executed for entity operations will be logged and where it will be logged. The second value for this i.e ‘HIBERNATECONSOLE’ is a appender that controls where the SQLs will be logged. In the above example HIBERNATECONSOLE is a ‘console’ appender which means it will log the sql to console.

log4j.logger.org.hibernate.type : Defines whether parameter values used for parametrized query will be logged.

Here is the complete log4j.properties for logging SQL for console. Ofcourse after changing this you need to restart the server. If you need to log the parameter values used for queries, you need to uncomment ‘#log4j.logger.org.hibernate.type=DEBUG’ as well.

What if you want to log the SQL to a file and not to console? That is pretty easy. You just need to change the ‘Appender’ used here (HIBERNATECONSOLE) to point to a ‘FileAppender’ instead of a ConsoleAppender. Here is how the configuration for HIBERNATECONSOLE should look like after you point it to a File Appender.

For standalone ColdFusion installation, the file ‘hibernatesql.log’ will be created in the /logs directory.You can also specify a full path of the file for property ‘log4j.appender.HIBERNATECONSOLE.File’ and the log will be written to that.

That was easy. Isn’t it? What if you want a rolling log file where you dont want the log file size to grow infinitely. That is fairly easy too. All you need to do is to use an appropriate appender. The appender definition for that will look like

Now that you have seen how easy it is to change one ‘Appender’ to another, you can pretty much log it anywhich way you want. Here are some of the interesting ‘Appender’s that come with log4j which you can easily use.

How many times did you write the accessor methods (better known as getters and setters) for the fields of your CFC? And how many times did you feel that it was really a mundane job and wished there was a better way to do this? With ColdFusion 9, we have done just that! You no longer need to write those accessors. ColdFusion will automatically generate the accessor methods for you in the object. All you need to do is to define the properties using cfproperty and set the attribute ‘accessors’ to true on the component.

The implicit setter/getter uses the ‘variable’ scope, which is a private scope for a object, for storing the data. i.e the setter method puts the data in the variable scope and the getter method gets the data from the variable scope.

What if you want to override the setter/getter method? Ofcourse you can do that. You just need to define those methods in your CFC and ColdFusion will call your method instead of calling the implicit one.

Hmm.. What if you dont want ColdFusion to generate getter or setter or both methods for a particular property? You can disable that by adding attribute getter=”false” and setter=”false” on the property.

So if you define a property

property name="city" getter="false" setter="false";

ColdFusion will not generate getCity() and setCity() methods for this CFC.

The implicit methods can also do some nice validation if you have given the ‘type’ for properties. Consider the same Person.cfc with type specified where we have made age as a numeric type.

It will throw a nice error saying “The value cannot be converted to a numeric”.

Whats more interesting is that you can do a whole lot of validation using cfproperty. We have added two more attributes on cfproperty named ‘validate’ and ‘validateparams’. These attributes allow you do even more advanced validations of the property data when the setter method is called for a property. (This is similar to cfparam or cfform validation).

The possible values for validate attributes are

string

boolean

integer

numeric

date

time

creditcard: A 13-16 digit number conforming to the mod10 algorithm.

email: A valid e-mail address.

eurodate: A date-time value. Any date part must be in the format dd/mm/yy. The format can use /, -, or . characters as delimiters.

regex: Matches input against pattern specified invalidateparams

.

ssn: A U.S. social security number.

telephone: A standard U.S. telephone number.

UUID: A Home Universally Unique Identifier, formatted ‘XXXXXXXX-XXXX-XXXX-XXXXXXXXXXXXXXX’, where ‘X’ is a hexadecimal number.

guid: A Universally Unique Identifier of the form “XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX” where ‘X’ is a hexadecimal number

zipcode: U.S., 5- or 9-digit format ZIP codes

The ‘validateparams’ attribute available with<cfproperty>

takes the parameters required by the validator specified in the ‘validate’ attribute. This needs to be specified in the implicit struct notation.

min: Minimum value if ‘validate’ is integer/numeric/

max: Maximum value if the ‘validate’ is integer/numeric/

minLength: Minimum length of the string if the ‘validate’ is string

maxLength: Maximum length of the string if the ‘validate’ is string

pattern: regex expression if the validator specified in ‘validate’ attribute is regex

Consider the Person again where we will add few more properties with some validation

Now when you create an object of Person and call setState(state), before setting the data for state in variable scope, ColdFusion will validate that state value provided is of type ‘string’ and its length is 2. Similarly setZip() will validate that the input data is a valid zip code and setPhone() will validate that the input data is a valid telephone number.

For String type properties, you can also do regular expression validation allows you to have all sort of validation on a property of string type.

And to top it all, all of these nice goodies are available to you at a much lesser cost than regular UDFs. Yes, you read it right. Implicit methods perform much better than regular UDFs that you write in CFC. To confirm that, lets use this CFC where we will use implicit methods for ‘firstName’ and write our own accessors for ‘lastName’.

Thats almost eight times faster than regular UDF. Isn’t that sweet? You get those auto-generated methods, whole lot of auto-magical validations and to top that, incredible performance. I am sure writing accessors for your CFC fields will become a thing of past now !

Update : Post public beta, we have disabled implicit getters/setters for non-persistent CFC. To enable it, you need to set “accessors=’true’” on the component. For persistent CFC, the implicit getters/setters will always be enabled. I have changed the examples above to reflect this.