sábado, febrero 26, 2011

You never develop code without version control, why do you develop your database without it? Liquibase is database-independent library for tracking, managing and applying database changes.

Liquibase works with XML files, these files contains which changes should apply to database (create, drop, insert, delete, update, and managing constraints...). Also they contain the revision code (id), so when migration from old database to new database is performed, Liquibase knows if changes have already applied or not. If not, database is updated.

Internally, Liquibase creates a DatabaseChangeHistory table where it saves which update files have been imported. If a change in changelog file is not in the DatabaseChangeHistory table, Liquibase executes it and records the change for being skipped next runs.

Now you know what Liquibase do, what I am doing to explain is how I use in a real use case.

If your company develops server side software, Liquibase is important but you can survive without it because you update only one computer and in theory updates are successive, you would not jump in production server from version 1.0.0 to 1.0.2. Of course this sentence only applies if you don't distribute your server code (your clients connects directly to your server), if not (like Sonar for example), you have the same problem like standalone applications have with database updates.

I develop software that is standalone, every client has its installation (they do not connect to central server), and we have hundreds of clients, where every client have different versions of our product. Although we are constantly developing upgrades, not all clients update at same time, so a typical scenario is that some clients have version 1.0.1, others 1.0.3, and last version is an update to 1.1.0. When a client wants to install a new version, our product is installed correctly, the problem is that usually our database changes too, and our clients want back compatibility with previous versions, and of course they don't want to loose database registries.

Moreover our changes can imply adding new tables, or adding/removing new "static" information into database. Before we were using Liquibase, we created SQL scripts for jumping between versions, and then depending on client version we sent the required SQL scripts so they could update software and database. As you can imagine this was chaotic, we should create SQL scripts for every version, and those scripts should be sent to client with the human error of sending bad ones or not all required (think about jumping from 1.0.0 to 2.0.0, we should execute one script for each intermediate version) so if a failure was occured, application stopped to work and the database remains inconsistent .

As you can imagine Liquibase solves this problem, first of all it uses XML, which is readable than plain SQL, second, it allows us to define a single file for migrating from one version to another one, knowing exactly which changes have been applied in each version, and last it is automatic, once changelog files are created, Liquibase automatically knows which current database version is installed, and which patches should apply.

Let me show with simple example:

We have one Liquibase master file which imports all changelog files that should be applied for updating database to last version (master file where we add a new include line for each version we release):

You can see that two files are imported, first one (META-INF/database-change/db.changelog-1.0.xml) that contains changes for version 1.1, and the second ones that contains changes for migrating from 1.1 to 1.2. When Liquibase reads this file, it starts to execute sequentialy the files. If file have been already imported, Liquibase skip it, and tries next one.

Take a look that Liquibase creates two tables for managing database versions, and in our case two inserts to Liquibase tables are done, one for each changelog file.

In red are statements that are executed by Liquibase the first time is executed.

In orange insertions about updating from 1.0 to 1.1. See that table departments is created, and also is inserted that db.changelog-1.1 has been executed.

In bluedatabase structure that was already inserted in version 1.0.

In green are statements that are executing for updating from 1.1 to 1.2. Last line is an insert into a table created in version 1.1. Obviously this insert will always work because Liquibase ensure that 1.1 script is executed before 1.2.

Custom Refactoring where you specify which SQL sentence you wanted to execute.

Liquibase can be executed as standalone application, but also as Ant script, Maven script, Servlet Listener, Grails or Spring. In my case, I have chosen Spring rules, because I am already using Spring, and because I wanted that each time the application is started, it checks changelog files to see if there was any new upgrade to process.

Liquibase not only updates database, but can also rollback changes, makes a diff file between versions, generates documentation about changes in javadoc fashion, or simply just not run update directly but create an SQL file so DBA can check updates.

But wait there are even more. Do you know Spring Profiles released in Spring 3.1? If not take a look, and see how perfect matches Liquibasecontext parameter with Spring Profiles. Liquibasecontext parameter indicates that some changelog should only be applied in case Liquibase is running in defined context. For example you can have a changelog that inserts data for your unit testing queries, and another one that creates some tables. Inserts should be applied only when test context is active, but creation tables should be applied always.

In <changeset></changeset> tag you specify in which context should be applied. If no context is placed, change is always executed. Keep in mind is a good practice that testchangesets and productionchangesets reminds in the same file. Only is a matter of configuring contexts correctly.

<changeSet id="2" author="bob" context="test"/>

Previous changeSet will be executed only when test context is specified. In Spring Liquibase exists a property called contexts where you indicate which contexts are currently active.

Both changeSet and Spring property supports comma-separated value for configuring more than one context.

Then thanks of Spring 3.1, you defines two SpringLiquibase instances each one in different Spring Profile, one with test contexts, for example test, test-integration, ..., and the other one without contexts.

Now when we release a new version, simply we packed all in a war, and sent to client. When the war is deployed in its server, the new version is installed, with software updates, database updates, and more important, client has not lost its data. No postinstallation steps are required nor SQL scripts.

miércoles, febrero 23, 2011

JUnit 4 has many features, which can be considered "hidden". I am sure that developers that always read JUnitchangelogs will know some of these features, but for those who don't read changelogs, I am going to discover them:

@Ignore: this annotation indicates to JUnit runner that this test should not be executed. This annotation makes sense in TDD/BDD environments, where you write unit test classes first and then source code is written. It is important that if you have not implemented a functionality, in your CI system does not appear as a failed, but as ignored, so with quick view (in Jenkins?) you can get an idea of which specifications pass, which failed, and which are pending to implement.

@RunWith(Parameterized): this annotation is used for data-driven test. Test class defined as parameterized, takes a large set of test data, and the expected result, and verifies expected value with calculated value. Think as a Spreadsheet where last column is the expected value, and previous columns are input data. Then your test simply iterate over each row, using N-1 columns as input parameters, and N column as expected value. This annotation is useful when you need flexibility between data and expected results, which can vary depending on input.

@Rule TemporaryFolder: If your test should write data to a file or read data from a file, this file could no be created in a static directory because computer that executes test, could not have write permissions or parent path simply do not exists, and more important if you forget deleting those temporary files, your testing environment starts to grow with trash data. To avoid that, TemporaryFolder rule is available. This temporary folder is a folder, which provides methods for creating files. All files created using TemporaryFolder are deleted after each Test execution.

@Rule TestWatchman: Usually when a test passes or fails, the result is notified to user, in case of running into IDE, with green/red bar, or in case of Maven results are stored into a surefire report. But what's happening if you wanted to notify/report the result of a test differently? TestWatchman interface helps you. This interface defines two methods (failed and succeeded) called depending on result of test. This can be useful for QA department. Imagine your QA requires a Spreadsheet where you have a column with specification id, the next column is the test name that check the specification, and the third one, if PASS or FAIL so anyone could view status of the project in a quick view. Because filling this spreadsheet manually is not human, you need to automatize spreadsheet modification. So simply implements TestWatchman interface for modifying Spreadsheet depending on result.

@Rule Timeout: This rule defines the maximum time that a test should take. If after defined time test has no result, execution of test is canceled and is considered as failed. I usually use this feature in performance tests, for monitoring Hibernate queries. In my work, response time is so important, queries that takes more than 60 ms should be changed, re-factored or at least justify why takes more than 60 ms. Thanks of timeout rule, I have a flag (the test fails) that warns me in case some query does not meet this limitation. Also can be used in systems where any kind of time response between heterogeneous systems is required.

@Rule
public MethodRule globalTimeout = new Tomeout(60);

@RunWith(Categories):Categories is a new feature in JUnit that is used for grouping tests. Tests are grouped (using @Category annotation) as different types using interfaces. Test classes or test methods can be annotated, so a test class can contain test methods of different groups. Using Categories implies:

- Define categories. In fact implies creating an interface, named with group name.

Categories are really helpful for grouping your test cases, and not execute them each time. For example in my projects I have a Performance Category. This category groups the tests that validates that performance are acceptable for my requirements. Performance Test Suite is not run every Night (as Unit Tests do), but only on Fridays night.

@RunWith(Theories): A theory is something that is expected to be true for all data. Theories should be used for finding bugs in edge cases. Theories are like Parameterized on the fact that the two uses a large set of test data. The difference is that in theories there is no expected value for each row, there is only one valid result because theories should hold true for all data with same expected value.

@RunWith(Theories.class)

public class CarTheories {

@DataPoints

public static Car[] data() {

return getAllCars();

}

//Theory method have one parameter where theory is asserted.

@Theory

public void theoryAllCarsShouldHaveFourWheels(Car car) {

assertThat(car.getNumberWheels(), is(4));

}

}

I have explained some new features on JUnit 4.8.x. In fact there are more @Rule classes, but I have decided to show only those that are useful in my diary work. I wish you find it interesting and I am sure your unit testing live will change after reading the post.

sábado, febrero 19, 2011

From Wikipedia: Continuous Integration aims to improve the quality of software, and to reduce the time taken to deliver it, by replacing the traditional practice of applying quality control after completing all development. Continuous Integration involves integrating early and often, so as to avoid the pitfalls of "integration hell". The practice aims to reduce rework and thus reduce cost and time.

Continuing reading Wikipedia talks about recommended practices:

Maintain a code repository

Automate the build

Make a build self-testing

Everyone commits to baseline every day, and every day is built.

Find integration problems fast.

Test in a clone production environment.

Easy to get latest deliverables.

Everyone can see the results of last built and correct quickly.

Continuing, you can read the advantages of CI and the disadvantages (only a few). If you look closely to disadvantages talks about required hardware and about initial time setup.

What if you had a single file (debian file) that you double click and whalaa!!! Your Continuous Integration server ready to work. Then you didn't need spending time for configuring or searching over internet, tools or plugins for making your system completely workable.

The best guy for talking about CI is John Ferguson Smart from Wakaleo. His main purpose is helping people building software better, and in his presentations (I recommend you go to their workshops and public presentations), you can learn which tools are better for you.

So we have the knowledge (thanks of John), we have the software (thanks to open source community), only an installable is required that mount into a server all CI architecture (applications, plugins), configure them (system environment), and connects between them. And that's where I come to play. I am developing a debian file that will install all required applications, configure them and interconnect.

Apache Tomcat 7.0.6Tomcat 7 installed in /usr/share/apache-tomcat-7.0.6 but with catalina.sh file modified for adding Sonar options to CATALINA_OPTS, and JENKINS_HOME and PLEXUS_NEXUS_WORK pointing to their working directories.

Nexus Open Source 1.8.0.1 server installed to :8080/nexus with predefined repositories. Internal repository is set to /opt/nexus.

Jenkins 1.396 continuous integration server installed to :8080/jenkins. Jenkins home is /opt/jenkins. It is already configured for working withinstalled Maven and Sonar. Also comes with next plugins:

First Known Issue: don't know why but my postinst file does not read my JAVA_HOME so does not configure Jenkins JAVA_HOME correctly. I am sure the problem is in my linux installation so it is possible that in your computer works perfectly, if not, goes to /opt/jenkins/config.xml and inside tag just under JDK 6 tag write your JDK path (same of JAVA_HOME). Also can be set using Jenkins configuration menu. I suspect that with root user variable JAVA_HOME defined into /etc/profiles is not available.

Sonar 2.5 is installed into Tomcat in:8080/sonar, its configuration directory is /opt/sonar and is is already configured for working with Nexus.

Template Repository 0.1 this is a module I have created, and is installed in :8080/template-repository. What you find here is PMD and Checkstyle files used by Sonar for scanning projects,so developers can download them and use the same Sonar configuration into the IDE plugin. Also a settings.xml configured with Nexus repository, so developers should only download and copy it to maven configuration directory.

Maven Parent POM with distributuionManagement addressing to Nexus server, repositories also addressed to Nexus. There are also three "must on" dependencies for TDD (Hamcrest, Mockito and JUnit), a profile with name metrics (can be activated with -Pmetrics) which contains metrics for PMD, Cobertura, Findbugs and Checkstyle. Another profile called source-javadoc which creates a jar with attached source files, and another one with javadocs. Also three plugins are installed: maven-compile, maven-deploy (configured for uploading to Nexus) and versions-maven. Although is defined as parent POM, this POM is used as inheritance and aggregation.

Template POM that is configured for being used with previous parent POM. I have written down some SCM configurations, only uncomment the line that configures your SCM. And finally defines maven-release plugin.

Second Known Issue: Sometimes don't know why but Tomcat does not start, it shows a message that Nexus are being deployed and stays there forever. I have noticed that only happens the first time you start Tomcat. If this happens simply stop tomcat (Ctrl+C) and restart and the second time works perfect.

The project have no name, and for now I don't have any idea, so for now the package is called cis (Continuum Integration System), I wish in future versions I can find a better name, if anyone have any suggestion it would be welcomed.

Also I am open to receive suggestions of which plugins should be added, or components that a CI system should have.

Final note, for now, Tomcat is not started as a daemon, but I have an init.d file prepared. In final release this file will be also copied into /etc/init.d, but for now I prefer a manual start using catalina.sh run command.

This is a Milestone 1, it is possible that for Milestone 2, you should remove previous installation. Install it for see how it works and whetheritcanbe helpfulin yourcompany

Eclipse Update Site with useful plugins. Because all developers should use the same plugins, System Manager can upload there plugins and make them public into project. All developers should only defined this updatesite into their Eclipse.

This week Springsource people have uploaded the first milestone of Spring Framework 3.1.

For my diary work, Environment Abstraction is the best feature developed in current milestone.

Environment Abstraction allows us grouping bean definitions for activation in specific environments. For example you can define two data sources, where one will be used in testing (HSQL configuration) and the other one in production (PostgreSQL). I am sure that in your life using Spring, you have had uploaded by mistake the test database configuration into production code. Environment Abstraction tries to avoid this situation.

Now I have showed you a typical example, but you can also create a custom resolution of place-holders, so for example in test mode you use a properties file with log level property configured to TRACE, and in production the properties file configured to INFO.

Another example that I find it interesting is in acceptance tests. One of my tasks is design and develop planners for robots. In Unit Testing we are mocking the hardware interface with the robot. But in acceptance tests, instead of mocking the hardware interface, we inject an emulator interface, so you can see how a fictional software robot is moving in your screen and validate that all movements are correct visually.

In previous Spring versions, what we have is three SpringXML files, one for production, another one for acceptance tests, and one main file that imports one of both files, so depending on environment we import one file or another. As you can imagine this implies a manual process between production environment and testing environment; although is not tedious, can be omitted so having emulator hardware interface in production instead of driver interface to robot.

With Spring 3.1, only one file is required with both configurations, and depending on environment Spring loads the required beans.

First thing you will notice is that beans tag is still the root but also is self-nested. The first two nested beans have an attribute called profile. This attribute inform when the beans it contains are active. First beans are active when configured profile is test or integration, while second ones when production profile is active. The last one definition is always activated.

Continuously we are writing something like "depending on environment", and I suppose you are wondering how you specify that environment.

You have an environment variable called spring.profiles.active where you specify which is the current profile. For setting this variable you can use an export in case of *nix systems, or as parameter execution-Dspring.profiles.active, or in context.xml in case of web applications, ...

For example a valid environment definition should be:

export spring.profiles.active=test

In case you are using Java-based Container Configuration, @Profile("profileName") can be used in beans definition, for specifying profile name.

ApplicationContext has getEnvironment() method for specifying which profiles are activated.

I have developed a simple example that have two classes, where one prints passed message in upper case, and the other one prints the message between sharp character. Depending on profile configuration it changes message format. Moreover the same example is provided but using Java-based Container Configuration.

I hope you find this new feature as useful as I find, and more important, once you have configured your machines with required environment variable, you won't worry any more about changing configuration files.

jueves, febrero 17, 2011

Maven 3 presents a new way to configure POM files. In Maven 2POM files have to be configured using a XML file. In Maven 3 can be used the old way, but also as Groovy file. Groovy is a nice script language that offers functions for implementing a DSL (Domain Specific Language).

As you have noted Groovy file is as understandable as XML file, but much more readable, only two lines instead of 6.

After watch this new form, I got curious about how Maven people have created the parser with Groovy. In fact all is reduced to Closures, and Dynamic Invocation. Both capabilities are implemented by Groovy. So my next step was implementing some XML configuration as Groovy configuration and develop the "parser" in Groovy.

First of all I have chosen Liquibase configuration file, I have trunked a lot (for making the example simple).

Then I have implemented how I wanted to look the Groovy configuration file:

In Groovy a Closure is composed by a definition (name), and arguments and statements to execute. In formal way:

name { [closureArguments->] statements }

if you look close to Groovy configuration file, this can be seen in first statement:

databaseChangeLog {
....
}

you have a closure name, a brace, no arguments, and a list of statements.

In Groovy a function can be called like in Java functionName(param1, param2, ...); but also without parentheses.

An example can be found in line 3:

id {type "int"}

the function call is named type and a string parameter is passed with value int.

We have almost done, we know that databaseChangeLog and createTable are closures name, as is id and name.

And I suppose you are wondering, "you are right I can see closures, I can see functions, but I need to read configured values". Yes you are right and this is a Groovy class that resides in another Script file.

With this information we have almost done, let me show the implementation and comment the last trick.

First of all we need to define a ExpandoMetaClass (that allows you to dynamically add methods, constructors, properties and static methods using a neat closure syntax) for matching the parent method, in our case databaseChangeLog.

In this line we are telling to Groovy that statements (method calls) defined into databaseChangeLog closure should be delegated to specific delegation class and after this delegation (Closure.DELEGATE_FIRST) the closure (method content) should be executed (cl()).

DatabaseChangeLogDelegate delagate defines that when createTable method is found should create a new delegate, print that A Table is Created and finally calls the closure.

So recapitulate what Groovy executes after all initialization completes, or more formaly when script.run() (this runs our configuration Groovy script) is executed.

The script main method is executed (databaseChangeLog), this method is found into ExpandoMetaClass and what Groovy do is create the new delegation and calls databaseChangeLog. This method have into its body another method called (createTable). And tries to execute, how? finding them into the delegation. Look DatabaseChangeLogDelegate class, it contains a createTable method. This method creates a new delegation, cl.delegate = new AttributeNameDelegate();, for next methods and it is executed (cl()) which implies finding two methods, id and name.

I know it is a little bit confusing, think about delegation class like an @Around aspect in Spring AOP, that can execute some logic before and after, and instead of having proceed call have a call to closure (cl());

And now the last trick, you can think, yep man, you are redefining a methods that you know their name like databaseChangeLog, createTable or type. But what happens with methods that are defined by user and can be any valid function name like id or name, each table will create their own attribute names, so how to deal with methods that we don't know what are their names in developing time?

Groovy defines a method in delegates called methodMissing. This method is called when in current delegate class, the method definition is not found.

First look seems very complicated code, but you will find that is easy to develop, and repeatable, after you have defined one Delegate are all exactly equals but changing required method names and configuration.

Some of the advantages of DSL approach for configuration files are:

Less information for expressing the same. Our new configuration file for Liquibase is essentially an XML file but without the noise generated by the XML tags.

More concise and more readable.

Enhance extensibility. Because it is based on script language, can be extended with script statements. For example, in case of Maven, if you wanted to execute some Groovy statements during an specific goal, you should import a plugin (GMaven) and configure it. See GMaven website for examples for watching why this feature can be interesting http://docs.codehaus.org/display/GMAVEN/Executing+Groovy+Code. But if your configuration file is already a Groovy file you can put Groovy statements natively, without adding any new plugin or configure it.

With Groovy you can create your own Doman-Specific Language. What I suggest is that you start thinking about creating a DSL for configuration file when you are writing a submodule that will be reused in many other projects, and in each one different configuration will be required.

Of course developing a DSL can be used in different situations rather than defined here. Some of you could think that XML files is the best way for creating configuration files, but I think is always nice to know different valid possibilities, and what offer.