Operator New

2017/07/09

Introduction

I have lately written a short tutorial on Building REST services with Spring Boot, JPA and MySQL, with Part 1 and Part 2.

I decided to add an essential part to any serious project with an SQL store: management of Database Migrations.

In a real world where requirements will change, or when schemas cannot be fully designed up front, you will be facing a real problem sooner than later: how to manage changes in Database schema, once the application or web service is running?

I wrote an article some time ago, which still seems to be quite valid. You can read there the details of a recommended development workflow to cope with Database migrations in all phases of development.

In this article, I will apply the same ideas from my previous article to a more up-to-date app: a Spring Boot Rest Web service with MySQL.

Adding the Liquibase plug-in for Maven

Let's add the Liquibase plug-in.
What we want to achieve at this first step is to be able to evolve our model and keep working as before adding Liquibase: the Hibernate Maven plugin will take care of recreating the schema up-to-date every time we run tests or lunch the application.

We add the dependency:

Our initial Liquibase changeset will be empty.

We need to add the Database Changelog File for liquibase: the file where all changes managed by Liquibase are registered. It will initially be empty.
We add the file src/main/resources/db/db.changelog.xml:

We add the liquibase.properties file:

We finally add the plug-in with relevant configuration:

With this configuration, we are telling Liquibase not to try any database migrations. This is our normal workflow of updating model with JPA annotations and running tests. Hibernate plugin takes care of dropping and re-creating the database.

This is what we see if we run mvn clean test

Generating DB diff automatically with Liquibase: First migration

We have finished evolving our model and adding the additional logic and tests.
We are now happy and ready to commit a change.
This point could even be the very first version of your DB schema!

Let's generate the DB diff with Liquibase.
The generated diff file will be incorporated to the registered Liquibase DB schema updates. Additionally, when running our app, Liquibase will take care of migrating the DB schema to the latest version registered in our codebase.

To make all this magic happen, let's add a profile in our pom file so we can generate the DB diff anytime.

Liquibase has generated for us the file src/main/resources/db/db-20170709_144112.changelog.xml with this contents:

Great! We can now add this filename to our global DB changelog file:

Subsequent migrations with Liquibase

To check our migration mechanism works well, let's update our model with a version field and generate again a DB diff via mvn process-test-resources -Pdb-diff.

Liquibase generates this file:

This seems like magic!

Automatic DB migration embedded in the app

Having added liquibase in our dependencies has also included a Liquibase Spring Bean to our app. This bean runs at application startup, checks the registered changesets against the app DB and brings the DB schema up-to-date automatically by applying any needed migrations.

It would be good to see this in action at development time, so we can test it.

Let's add another profile to our maven project for this.

This profiles skips any schema generation by the Hibernate plugin and drops the database. This way, when the app starts, the Liquibase Spring bean will enter into action and be forced to run all registered migrations.

2017/05/29

Introduction

If you have developed a nice web app with a lot of content, you will sooner or later face undesired web scraping.

The undesired web scraping will sometimes come from an unmasked bot, with user agents such as Go-http-client, curl, Java and others. But sometimes you will have to deal with bots pretending to be almighty Googlebot or some other legitimate bot.

In this article I will propose you a defense to mitigate undesired web scraping, and to detect fake bots disguised under a legitimate bot name (user agent), without compromising the response time.

This defense can be integrated in any rack-based web app, such as Ruby on Rails or Sinatra.

Request Throttling

If your website has a lot of content, any reasonable human visitor will not access many pages. Let's say that your visitor is a very avid reader and enjoys a lot your content. How many pages do you think it can visit:

per minute?

per hour?

per day?

Our defense strategy will be based on accumulating the number of requests coming from a single IP address for different slots of time.
When one IP address exceeds a pre-configured reasonably high number of requests for the given interval, our app will respond with an HTTP 429 "Too many requests" code.

To the rescue comes rack-attack: a rack middleware for blocking and throttling abusive requests.

Rack-attack stores request information in a configurable cache, with Redis and Memcached as some of the possible cache stores. If you are using Resque, you will probably want to use Redis for rack-attack too.

Here's a possible implementation of rack-attack:

Let's go through the code.

Any request whose path starts with one of these entries will be a candidate for throttling:

We set up a reasonable maximum number of requests for each of the intervals of time we will consider for request throttling:
This is arbitrary and you can choose different intervals of time.

We would like to limit the number of requests within 60 seconds coming from the same IP:

When this throttle block returns a non-falsey value, a counter will be incremented in the Rack::Attack.cache. If the throttle's limit is exceeded, the request will be blocked.

We will modify slightly the default rack_attack algorithm to allow legitimate web indexers in a timely manner.
Here's the new implementation of the algorithm:

Our new algorithm is basically the same as the original rack_attack one, except for the addition of these lines which check if the request comes from one of our allowed Search crawlers:

What this block does is:

Check if the request comes from a Search Engine, identified by its user agent

If that's the case, assume it's true and verify offline the authenticity of the bot, so we do not delay the response. If it turns out to be fake, it will be blocked in subsequent requests

The performance of this algorithm will tipically be of just a few milliseconds.

Here's the Rails ActiveJob that will verify the authenticity of the bot. This can be implemented by a Resque queue.

Verify Bot

Let's see a possible implementation of VerifyBot.
Methods that VerifyBot will have:

verify: given a user agent and IP, verify the authenticity of the bot

allowed_user_agent: true for the user agents from bots we will allow

fake_bot: true for bots already verified as fake

allowed_bot: true for bots already verified as authentic

VerifyBot will use Redis to cache already verified bots and marked either as safe or fake. These two lists will be stored as Redis sets.

With these, only the implementation of the BotValidator is missing to complete the puzzle.

Bot Validator

Popular search engines authenticity can be verified by a reverse-forward DNS lookup. For instance, this is what Google recommends to verify Googlebot:

Run a reverse DNS lookup on the accessing IP address

Verify that the domain name is in either googlebot.com or google.com

Run a forward DNS lookup on the domain name retrieved in step 1. Verify that it is the same as the original accessing IP address

Our BotValidator will have two main methods:

allowed_user_agent: true for users agents from bots we will allow

do_validation: true if the user agent can be authenticated. Will raise exception in case of a fake bot

Subclasses for each bot we want to validate will implement the methods:

validates? : true if responsible of validation for the given user agent

is_valid? : true when the bot is validated for the given user agent and IP address

2017/05/28

Introduction

In the first part of this tutorial we saw how to build a skeleton java app from scratch based on the Spring framework and implemented the persistence to MySql database.

In this second part, we will implement a REST web service with the Spring framework.

I'll be using maven 3, version 3.0.5 and Java 8 SDK. Google around for installation of these in your environment.

Step 2: Implement a REST endpoint with Spring

In order to use the Spring framework as the basis for our REST endpoint, we need to add the necessary dependencies to our existing pom.xml:

We already have a model persisted to MySql and now we will add a controller with a method index that retrieves all persisted instances of our model.

We will annotate this method so that it is published as a REST endpoint when running our app within a Servlet container.

The Spring annotations added to our code are:

@RestController This declares our class as a controller returning domain objects instead of views. Spring will take care of the JSON serialization automatically via Jackson serializer

@RequestMapping(value="/games", method=RequestMethod.GET) This maps GET requests for the path /games to our controller method.

We can now add a test for our new REST endpoint.

In our test, instead of running our controller within an external application Server, we use the Spring class MockMvc, which will direct requests to our controller, making our test faster.

If we now run mvn clean test:

Running our REST endpoint

We are now ready to package our app and run it.

If we run mvn clean package:

We now have a jar and we can just run it. Yes!!! That's right: we can just run it directly!
Spring has generated an uber jar: a jar with all needed dependencies to run our app, including an embedded servlet container: by default Tomcat, but you can easily change it for Jetty or any other of your preference.

If we launch the command

java -jar target/spring-boot-mysql-0.0.1-SNAPSHOT.jar

We can see on the console a Tomcat has started and is listening on 8080 for requests!

If we now run our test with mvn clean test, we have a build failure: we have no tables in our MySql schema and dbunit cannot insert the test data.

At this point, we need to generate a DDL script for our schema.

There are a number of options. You could opt for a Spring solution.

We will apply a more generic solution from a third party which works on Spring and non-Spring frameworks: Hibernate Maven Plugin from juplo.de. This is a completely new implementation of the Hibernate Maven plugin updated to Hibernate 5.

We need to add these lines to our pom.xml:

And the file src/test/resources/hibernate.properties needed by the hibernate-maven-plugin:

Notice in the updated pom.xml:

The hibernate-maven-plugin must appear before the dbunig-maven-plugin: the database tables will be created before the dbunit sample data is inserted.

Additionally, the file src/test/resources/hibernate.properties needs to be filtered by the standard maven resources plugin.

If we run mvn clean test, our test is finally passing after creating the database tables and populating them with unit test data:

We leave for a future part publishing a REST web service for our model and handling automatic Database migration with Liquibase.

As we will be updating the User services, we generate the full source code from the newly created signup directory:

mvn appfuse:full-source

User email verification

How can we verify a new user's email?

1. A new user signs up and fills in an email address which we want to verify as valid.

2. After the user submits their data, we'll generate a unique and difficult to guess token for each that signs up. The new user won't be able to log in until they complete their email verification process.

3. We'll send an email with a URL from our app which will include this generated token. As AppFuse supports multi-language, we'll generate the email in the active locale.

4. When the new user receives the confirmation email, they will be able to visit the included URL with unique token to say "Hey, it's me. I've received your difficult-to-guess token at the email address I gave you". We will mark then this user as confirmed.

User Signup Confirmation: Service and Model Layers

Ok. Let's add a new Java interface for our new Signup confirmation service. The service will be responsible for starting a user's data confirmation process and confirming the user's data. We will apply it to email verification, but it could be applied to mobile phone number verification as well.

Hold on. What is this WebAppContext type? We'll need to include our web app URL in the generated email. As we'll implement the confirmation in the service level, we'll avoid to add an unnecessary dependency to servlet classes. After all, we're working at the service layer.

Implementing our new Service

We can now write an implementation for our new SignupConfirmService to verify a new user's email address.
Our implementation will use AppFuse MailEngine service to send email, a ResourceBundle for mail subject i18n, and the Java SecureRandom class to generate a unique and difficult to guess token for each new user.

Our implementation is failing in the new UserManagerImpl signup() method, when saveUser() is called the second time to save fields updated by the SignupConfirmationService implementation. The saveUser() method eventually calls UserDao's getUserPasword() to check if the password needs to be re-encrypted. But getUserPassword() is annotated not to support propagation of transactions, whichs makes it fail because our new user is created within a transaction that has not finished yet:

The second saveUser method call in our new signup method does not need to re-encrypt user password clearly. We can fix it with a private method that just saves the user instance without checking for password encoding, and a bit of refactoring to avoid duplicated code:

We can now write a new test for our SignupConfirmService class. We'll use Wiser to mock smtp server, and verify that startConfirm() sets a value on user's emailConfirmToken, that an email is sent and that the email's body contains the generated token:

It will initially fail as we need to add resources for email subject i18n and the Velocity templates. In MailResources.properties:

email.signup.subject=Confirm your email Address

In signupConfirm.vm Velocity template:

Hello ${userFirstName},
To enable your account, please click in the following link or copy it onto the address bar of your favourite browser.
${signupConfirmURL}

After that, the test passes.

User Signup Confirmation: Web Layer

After implementing the service layer, we can now implement the web layer.

When a new user clicks on the sing up link, a page with a fill-in form is showed and the user enters their info.

After completion of the form, the user will press the signup button and, if configured to confirm email before the account is enabled, a page will be displayed informing the user that an email has been sent to their email address and they need to confirm their account by following email instructions.

In AppFuse with Struts2, the SignupAction class implements the Struts2 action for Signup. The save() method currently saves the new signed up user, sends them an email of signup welcome and logs the new user in.

We will change the implementation by calling the new UserManager.signup() method, eliminate the welcome email and only will log in the new user if the app is NOT configured to confirm email before the account is enabled.

2012/11/30

Introduction

In an scenario of agile development, new versions are frequently released and deployed, and continuous changes in your database schema are frequent.

To deal with these database changes, a mechanism should be in place. In Ruby on Rails, you have it out-of-the-box and it works great. But in Java web apps, you have to find a solution and plug it in your own projects.

We will implement an automatic database update mechanism for Java web apps trying to meet the following goals:

Automatic: can be integrated for automatic updates as spring bean or as servlet listener

Our App Before Liquibase

If we use hibernate and the hibernate3-maven-plugin, during development our database schema is automatically kept up-to-date: hibernte3-maven-plugin extracts schema info from JPA annotations and hibernate configuration.

We will use a project based on the AppFuse framework, with flavours for Spring MVC, Struts2, Tapestry, JSF, but this is applicable to any java web app based on any framework.

From the AppFuse quickstart page, I copy the maven command to generate an initial Spring MVC app from AppFuse archetypes:

By inspecting the project's pom.xml, we can see the database schema is kept up-to-date during development by generating drop and create DDL commands, extracted from the JPA annotations in our model classes.

This is fine during development, because we do not need to worry about database migrations as yet. By running maven process-test-resources or maven jetty:run, we can see the tables are dropped and recreated each time we run our web app with jetty:

Managing db updates in our project

Liquibase at run-time: will automatically update the server schema as needed on deployment, including generation of the first database version for an empty schema

Liquibase at build-time

We will be evolving our app and when we have something to commit and push to our project's global repo, we can then generate the database migrations, if any.

We will add to our maven project the liquibase plugin and needed executions in order to:

generate database diff changelogs at any time when we want to consolidate our model updates

generate database production data dumps as changelogs at any time to consolidate app preloaded db data

exercise liquibase db migrations at any time for rapid testing (jetty:run)

Liquibase at runtime

The first time we deploy the app, it will contain the initial db changelog for an empty schema. Liquibase will generate all the db tables and populate with initial database data (default users, user roles, any lookup tables...). On subsequent deployments, our app will contain an additional db changelog to bring the server database schema up-to-date.

Integrating the db update at app startup

Liquibase can perform automatic db update at runtime by looking at the registered change sets in a changelog file and checking if they are applied against a table in our schema called DATABASECHANGELOG. It will create it automatically if it does not exist.

We can implement the automatic db update either with a Spring bean or with a Servlet listener.

Working on our app during development: evolving our model

During development, we will possible be making many changes to our app. We do not want to spend time for now on db migrations. Just evolve our model, annotate it with JPA and when doing rapid testing with jetty:run, generate the db from scratch.

Liquibase maintains a table with applied change sets to our database. Based on the contents of that table, on startup our app will run liquibase to apply any missing migration to our db.

Generating preloaded db data if any

Many web apps will have a set of data preloaded in the db: initial set of internal user accounts, available user roles, list of applicable taxes, ... whatever.

These can also be defined as a changesets so that Liquibase can update the database for us when running our app.

To generate data changesets, the maven liquibase plugin won't be of much help, as it does not include a goal for this. Instead, as it is also included in the plugin, we'll call directly the liquibase main java class as if we were using it from the command line. We'll do it with the exec-maven-plugin in a db-data maven profile so that we can generate the preloaded db data at any time:

In AppFuse, the profile prod feeds the database with production data instead of test data. We use this profile to regenerate the db and populate it with production data.
After this, we can use the db-data profile to generate our changelog for the initial db data.

Exercising the automatic db migration during rapid testing

Our app is now ready to perform all registered db migrations when deployed in a server. However, it would be nice too to exercise this during development when we launch a jetty:run for rapid testing of our unpackaged app.

You better erase these or you will run into trouble if you set a different schema name in your jdbc configuration file.

For a full list of the maven liquibase plugin goals and params, you can run this command:

mvn liquibase:help

It is up-to-date, as opposed to the liquibase site documentation.

Liquibase validates db change sets by comparing some attributes of the change sets present in the changelog file against those registered in the DATABASECHANGELOG table as applied change sets. It compares:

the full path filename that contains the change set

the MD5 checksum of the change sets

If any of these is different, it will try to re-apply the changeset. If you want to avoid this, you can clear the corresponding fields in the db table and liquibase will refill them with the actual values.

I also like to consolidate a set of related changesets from a changelog file in a single changeset. Instead of having many changesets, each one creating a table, creating an index, creating a referential integrity, etc, I tend to group many of these updates as a single changeset.

Liquibase autogenerates a numeric id to identify each changeset. I prefer to assign it a timestamp, as it gives more info and they still appear ordered.

2012/11/07

I have been working as developer for more than 20 years. I enjoy software development.

In these years I have worked with a variety of technologies: C, C++, ObjectStore and Poet at the beginning of my professional career. Later on, Java and the incipient servlets, Oracle, MySql. Then Struts, Hibernate, Lucene ... And lately, I am working with Java, Spring, Spring Security, Hibernate Search, Struts 2, Apache CXF, Bootstrap and jQuery, to name a few.

Same goes for Engineering practices: from cascade lifecycle to spiral to agile.