Why Audit Logging?

Audit logging in the database may be one of those things that you have to experience to appreciate. Some of our clients have audit logging set up and others do not. When you’ve never experienced it it might not seem like a big deal. After you have it in place though and start to use it you wonder how you could ever live without it. By audit logging I mean having a snapshot in the database made every time a record is added, modified, or removed, along with when the change happened and (ideally) who made the change and why. The changes are typically stored in additional database tables, one for each regular table in the database. Each logging table records a snapshot of each record in the table it monitors, every time it changes.

Troubleshooting/debugging: I can’t tell you how many times I’ve used audit logging to troubleshoot a bug report. Why is this product’s price suddenly $5 less than it used to be? Well user X lowered on April 22 last year at 9am. When did we change this discount from 20% off to 25% off? User Y did that, exacly 9 months and 21 days ago. So many answers to so many questions are right there in the logging tables.

Usage metrics: It can be incredibly valuable to understand how an application is being used. Regular application logs can tell you this, but only to a point. Audit logs tell you not only what was done but what the results were. For example, SoftSlate administrators have the ability to manipulate orders after they are placed by customers, including adding items (say, if a customer service rep is upselling an item on the phone to the customer). You might need to know how often these tools are used and what the impact of the changes is. You can look through the Request Log to find out how many times a request was made to say, add an item. By querying the audit logs though, you can also know which items were most commonly added, what the financial impact was, and which CSRs were responsible for the upsells. Again, there is a fantastic depth to the information at hand in the audit logs. In this case for example, you could calculate commissions for your reps quite easily to encourage them to do more upselling.

Compliance: Anyone’s who’s had the pleasure (not?) to go through SOX auditing will appreciate how invaluable audit logs can be. SOX is concerned with having a firm handle on financial information. For an ecommerce application, it might require some of the following:

A report on changes to an order (e.g. before and after values, who changed it, when it was changed, and why it was changed, etc.). Items added and removed, shipping methods changed, and discounts added or removed.

A report showing all pricing or discount changes for a specific period, including product price changes, and discounts created or modified.

A report showing when administrative users were activated or deactivated, and when their permissions or roles changed.

The above is just a small sampling of the kinds of reports you might need to come up with. With audit logging in place, each of these reports becomes a matter a constructing an SQL query. Really, in most cases, one query and you’re done. Without audit logging in place, well, I think you’d have to make code changes in the application to record exactly what you need to record – for each requirement. It could keep someone busy for a long time.

Why Envers?

I have only one other technical approach to compare Envers to in my experience: database triggers that write to logging tables on every insert, update, and delete. We implemented this for clients in MySQL and it worked quite well for us for quite some time. If you want to see an example, here is the rough idea (although we implemented it differently).

People frown on database triggers, but I think you can make a decent argument for them in this particular case. The triggers aren’t implementing any business logic. They are very low-level and dumb: they just record the fact that you’ve changed this, that, or the other thing. And one huge advantage is that you can be sure the trigger will catch everything, including bulk statements. As far as I can tell Envers cannot record changes made to the database when using a bulk insert, update, or delete statement, where more than one entity is affected. Depending on the circumstance, this issue could be very important to you. Another advantage of database triggers over Envers is the fact that of course Envers doesn’t track changes made outside the context of Hibernate. Any changes made directly in the database, or by some other application, won’t be tracked. If this situation applies to you, you might want to consider triggers.

You have to hand it to Envers that all the random issues aside, it is easy to implement. Just add the .jar file to your project, create the logging tables, and it will start tracking. For most purposes the annotations to tell it which properties to audit and which not to, and other configurations, are useful. The ease of implementation, the fact that it’s part of your application itself, and the configurability, are its big pluses I feel.

All that said, here are some of the issues we encountered. (By the way, Envers is available starting with version 3.3.5 of SoftSlate Commerce.)

Envers Is Turned On By Default, But We Wanted It Turned Off By Default

If you add the Envers jar file to your application, it will automatically be enabled. What this means is that it will register event listeners with Hibernate. Event listeners respond to every insert, update, and delete made against entities (but not bulk inserts, updates, or delete). If you inspect the SessionFactory in a debugger and drill down into the registered listeners, you’ll find em.

One of the first questions I had as I added the hibernate-envers.jar was, what is the upgrade path for our existing users? And, more to the point, how can we turn auditing on and off, because surely not everyone is going to want to have all these extra tables filled up all over the place. If you are upgrading SoftSlate as opposed to installing from scratch, the feature should be off by default, until you decide to turn it on. However, as soon as Envers sees those @Audited annotations, it is going to start trying to audit. And if the logging tables don’t exist, it’s going to throw a nasty exception:

If the tables do exist, well, it’s going to start filling them up, which you might not want either. So, how to turn this thing on and off? I finally found a reference to this simple property that I can add to hibernate.properties:

hibernate.listeners.envers.autoRegister=false

Sure enough, that did the trick. With that in place, Envers does nothing even if your classes are mapped with @Audited. This is what I needed but I faced another issue with SoftSlate, which is, we don’t distribute hibernate.properties with upgrades, only with new installations. The reason is simply that we don’t want to overwrite that file, because it has your specific database connection settings in it. So for folks who are upgrading, they would have to manually put the above configuration into their hibernate.properties file before upgrading, and we want to avoid making them do that. I realized I wanted to reverse Envers’ default behavior. In other words, instead of registering Envers by default, I want it to start up with Envers disabled by default. I want folks to deliberately enable it by adding this to their own hibernate.properties file:

hibernate.listeners.envers.autoRegister=true

This was relatively easy to do at the moment the SessionFactory is created and right after the properties are loaded from hibernate.properties:

We are adding the property to the Hibernate configuration if doesn’t already exist, and setting it to false, in effect reversing the behavior that Envers is enabled by default. Now existing users can upgrade with no worries about anything breaking. If they then wish to leverage Envers, they can manually add hibernate.listeners.envers.autoRegister=true to their properties file.

Tracking Which Entities (ie Tables) Were Changed in Each Revision

Envers has a few bells and whistles, and after playing around with them, I realized we wanted all of bells and all of the whistles enabled, because it’s all good stuff.

The use case here is say you are auditing 50 tables. Now you have a “revision” and you want to look at what changed across the entire database in that revision. For example, you might find that something weird happened in sscOrder in revision 422, and you might want to look at everything that happened in all the other tables in the same revision. (By the way, a “revision” essentially means a snapshot of the state of the database after each transaction is committed.) This could prove very annoying when you have 50 tables. You have to ask all 50 audit tables if they have something for that revision:

SELECT * FROM sscAudAdministrator WHERE REV=422;
SELECT * FROM sscAudAdministratorRole WHERE REV=422;
SELECT * FROM sscAudAttribute WHERE REV=422;
...

50 select statements. This REVCHANGES table just gives you a shortcut because it tells you which of the 50 classes or tables actually changed. So you know which tables to look in.

Tracking When Records Are Deleted

I definitely recommend adding the following property:

org.hibernate.envers.store_data_at_delete=true

Otherwise, you wont really be able to tell what was deleted. Yes, it will record the primary key of the deleted record but that is not very useful, when the record is gone. This property tells Envers to store the data of the final revision of the entity, right before it is deleted. Could very well come in handy.

Deleting When Auditing Mapped Collections without all-delete-orphan

On the topic of deleting records, I was having a hard time with the following exception when DELETING my Category entity:

I had no idea what was going on so I started adding @NotAudited to the collections properties in my Category entity. I started by not auditing all of the collections properties, and sure enough that worked and I could delete again. Then I started auditing each one and I narrowed it down to my Category.products and Category.subcategoryCollection properties. I further narrowed it down to the fact that the thing these two collections have in common that the other collection mappings do not, is that their cascade attribute is not all-delete-orphan:

Just to verify I changed the two mappings to all-delete-orphan and sure enough, the deletes work fine. I’m still unclear as to why Envers doesn’t like these mappings – seems like they should be perfectly valid. I really don’t want to cascade deletes here – I don’t want to delete the subcategories under a parent category when the parent category is deleted. Might be something else I’m doing wrong. In the meantime, I’m leaving @NotAudited in there for those two.

Which got me thinking, is it really that interesting to audit collections? Particularly if the child is being audited itself, do I really need to know that a given category’s set of products changed? I think it might be enough to know, from the product’s perspective for example, that its primary category changed. I’m pondering adding @NotAudited to all the collection properties everywhere. I’m also wishing Envers had a configuration that lets me do this globally.

Tracking Which Fields Were Changed

org.hibernate.envers.global_with_modified_flag=true

Without this set to true, the logging tables simply record a snapshot of the fields being audited each time a change is made. It’s up to you to compare each audit log record with the previous one for the same entity, if you want to know what actually changed. Setting org.hibernate.envers.global_with_modified_flag=true tracks not only the values of each audited field, but also whether it changed from the previous version. It seems useful, although it does add twice as many columns to the audit tables. For each audited field, a corresponding boolean field is added, which simply flags whether or not the corresponding field changed during the revision. For example, here you can see that in REV 13 the “name” field changed, by seeing that name_MOD=1:

During queries it can be a great help to limit the records to those where just the field you are concerned about changed. Otherwise, you’d have to pull out both the one revision, and the previous revision, and compare the two. Which is not impossible, just a bit harder. (Please note in the above query, I’ve had to CAST the bits to ints for it to show up correctly in the mysql terminal application.)

The Modified Flags and Updating Detached Entities

If you are paying attention, you’ll notice there’s a problem in that we have a lot of false positives in the above query. In revision 12, both code_MOD and name_MOD are true, but as you can clearly see, neither the code nor the name actually changed. This threw me for a loop for a while, especially since I didn’t get these false positives in my audit table for Products:

Turns out there was subtle difference in how I was updating Roles vs. Products. In the code, when updating Roles, I ran session.update against a detached Role object. In this scenario, Envers sees all the fields as being modified (possibly because there is no existing entity bean in the Hibernate session that has become “dirty”). By contrast in the case of Products, our code first loads the Product entity into the Hibnerate session, and then adjusts the properties, before calling session.update. I fixed the issue with Roles by switching it to use the same pattern of loading the entity first, and sure enough the _MOD fields have no more false positives.

As part of our upcoming release, we have changed the behavior for all our entities to load them first so as to make the _MOD fields accurate. (As of this writing however, you may still see this problem for Orders.) While I think I understand why this is happening, it seems like a bug to me with Envers. You should be able to run session.update against a detached object, have it update, and Envers should be able to track which fields were changed and which were not. (Or, make it clear that it can’t.)

Tracking When a Revision Was Replaced

These two properties can be very valuable because when enabled Envers will store both the revision when a change first took place, and the revision when the record was modified afterwards. It also stores the timestamp of when the revision was replaced, which can be used to purge the audit tables. To be precise, they add the columns REVEND and REVEND_TSTMP to each of the audit tables:

Now, to find out what price the product changed from most recently, you can do so with a simple join on REVEND = REV to grab the previous change. REVEND identifies the subsequent revision that replaced the log entry.

Another advantage of REVEND and REVEND_TSTMP specifically is they help if you need to purge old audit logs from time to time. It should be obvious that if the application is very busy, the audit tables will get large over time. Storing the timestamp of when a given revision was replaced makes it easier to develop a purging mechanism. You can now write a query like this:

DELETE FROM sscAudProduct WHERE REVEND_TSTMP < '2016';

And presto, you have purged the audit records prior to a given date. This just gives you some flexibility in terms of archiving and managing the audit records.

One issue with enabling these properties is, it doesn't work if you are adding Envers to a database that has existing tables, unless you first seed the logging tables with the data from your existing tables. The following exception is thrown because it can't find the previous revision for the record when editing a record that already existed (and thus can't update its REVEND field):

I understand the rationale for throwing an exception in this case. If you are relying on these audit logs to be an airtight representation of the changes made to the system, you'll want to throw an exception and terminate the transaction if there is any sort of problem with the audit table updates. Can't update the audit logs? Then NO TRANSACTION FOR YOU.

However, I can also imagine many situations where throwing an exception here is overkill. The audit logs can still be quite useful, if not 100% airtight, if the system fails to update REVEND sometimes. So for our implementation, we replaced the built in ValidityAuditStrategy that handled the REVEND updates with our own version. All our version does differently is swallow the exceptions and log them. This allows the transaction to complete. Yes, REVEND fails to update but we find we can live with that in most cases. Fortunately Envers made it easy to override the default behavior here. We simply modified the audit_strategy property in hibernate.properties to use our own class:

Another advantage of this approach is that you don't have to run all INSERTs through Hibernate. Because we are swallowing the exceptions, we can insert records directly into the database without creating an initial audit log entry.

All of this is not necessary and you can use the default ValidityAuditStrategy if you seed all the audit tables with existing data before turning it on. Which leads us to the next section:

Seeding the Logging Tables with Existing Data

As mentioned above, you can't track when revisions are replaced in Envers unless either (a) you are starting completely from scratch with your database or (c) you override Enver's default ValidityAuditStrategy with something like SoftSlate's AppValidityAuditStrategy, that swallows the exceptions, or (b) you first seed the logging tables from the data in your existing tables. If you can't do (a) or (b) though, seeding the existing tables is not too hard fortunately. If you are upgrading SoftSlate, for example, before turning Envers on, you'll can run the following SQL file from the upgrade distribution: /WEB-INF/src/resources/envers/seed_audit_tables.sql

Note that we have to insert our first revision record into sscREVINFO, with an id of 1 and a timestamp (this one corresponds to 2010 to avoid any overlap with new records!). From there, it is a simple matter of inserting one record in the logging table for each existing record in the table being monitored, to provide a baseline revision for the database.

Obviously, this will come close to doubling the size of your database quite quickly! So try it out in a development environment first and give some consideration to which tables and fields you need to audit. By default, SoftSlate audits most tables (not sscRequestLog, sscEmail, or any of the Quartz tables), and most fields. Also from above, if you are upgrading you must deliberately add this to your hibernate.properties file for auditing to be enabled:

hibernate.listeners.envers.autoRegister=true

As a summary of the above sections, here's what our default Envers configuration settings look like in hibernate.properties:

Reports from Audit Data

Now for the good stuff - getting some sweet reports out of the auditing tables. I'll give a few examples of reports represented by just a query made directly against the database. As of this writing we're looking into making a formal interface for some of these queries in the Administrator.

Example Report: Financial History of an Order

This first report sort of tells the financial "story" of an order after it was placed:

If you look at the financial fields on the far right of the query output, you'll get a history of the order totals for this order - when the totals went up or down. Plus, with the administratorUserName and servletPath columns, you can understand both who made the changes and what requests were sent to the server to make them.

Let's explain the query a bit:

lastRevInRequest. You'll notice there is a fun derived table in this query. Don't fear it! This is simply filtering out the revisions in the audit table to just those that correspond to the last time the record changed in a given application request. Each time SoftSlate takes in a request, it writes a record to the sscRequestLog table and assigns a "requestLogID". But during the course of a given request, a given record might change multiple times. For example, when adding an item to the cart, the sscOrder record for the cart will be updated once after the item is added and again after discounts are applied. You'll end up with multiple REVs in sscAudOrder for the same request. Most of time, we're not concerned about these intermediate revisions. We're really only concerned about the last revision for each request - what the request turned the record into when it was done doing its work. The derived table simply filters out all these intermediate revisions.

sscRequestLog and sscREVINFO.requestLogID. SoftSlate extends Envers’ revision entity so we can store the requestLogID of each request in the sscREVINFO table. This allows us to join it with sscRequestLog, where a bunch of other useful information lives, such as the user name of the person making the request, the request URL, the timestamp of the request (although that is already in sscREVINFO), the user’s IP address, and more. (It should be said that request logging is only available with the Enterprise Edition.)

Example Report: The Last Change Made to a Product

Next up, let’s focus in products. Our goal is to find out when someone last changed a given product’s price. We’ll use the REVEND column to quickly pull out the previous audit record for the product:

from sscAudProduct c join sscAud p on c.REV=p.REVEND and c.productID = p.productID
This joins sscAudProduct with itself. It pulls the current state of the record, and the previous state. This is easy to do because of REVEND. The value of REVEND is populated each time the product changes, and it is populated with the value of the new REV of the new revision. It allows for a quick way to compare two changes that happened.

join sscREVINFO ri on c.REV = ri.id JOIN sscRequestLog rl on ri.requestLogID = rl.requestLogID
Here we’re joining sscREVINFO and from there sscRequestLog, in order to find out who did it (rl.administratorUserName). Note: you have to have request logging turned on for sscREVINFO.requestLogID to be populated (available with the Enterprise Edition).

where c.REVEND IS NULL
This identifies the current revision. REVEND starts out as null. It is only populated when the record changes next. So for this product, REVEND is null for the current values in the table that is being audited.

I thought I would post the following handy little scripts, which have proved invaluable to us.

Nightly Database Backups

The first script simply makes a backup of a SoftSlate Commerce database. We run this guy nightly for all of our clients. Most likely you already have a backup system in place, but many times it may be hard to get at a snapshot of data from a particular day. Occasionally it can be very valuable to load up a backup in order to perform auditing or even restore specific pieces of data that were changed by mistake (eg the pricing or descriptions of products that were changed when they shouldn’t have been). The backups we make have come in handy many times for these sorts of reasons – to essentially bail out our clients from a mistake they’ve made. First, for MySQL:
#!/bin/sh
dir=/var/local/databaseBackups
username=user
databaseName=database
password=pass
cd $dir
mydate=$(date +%Y%m%d_%H%M%S)
mysqldump -u $username -p$password --skip-lock-tables --single-transaction --quick $databaseName > $dir/$databaseName.$mydate.dump.sql
gzip $dir/$databaseName.$mydate.dump.sql
chmod 400 *.gz
find $dir/$databaseName.*.dump.sql.gz -mtime +30 -exec rm -f {} \;

As you can see these scripts purge any backups older than 30 days (which you can adjust by tweaking the scripts). You can put this in your server’s crontab to keep a running set of backups covering the last x days of data.

Restoring a Piece of Data from a Backup

When it comes time to restore one of these backups, you can create a new, separate database and load the backup file into it – using MySQL here for example:
mysql> create database backup_20130816;

Gunzip the backup and load it into the new database:
gunzip softslate.20130816.dump.sql.gz
mysql -u user -p backup_20130816 < softslate.20130816.dump.sql

Then, as an example of how you can restore specific pieces of data, try using an update statement across multiple databases. Using MySQL again as an example, where we want to restore all products' prices:
mysql> update softslate.sscProduct live join backup_20130816.sscProduct backup on live.productID = backup.productID set live.unitPrice = backup.unitPrice;

Pretty sweet, if you need to go backwards in time for a certain piece of data!

Clearing Personal Information from a SoftSlate Commerce Database

This next recipe clears out all personal information about customers from a SoftSlate database. We have run it countless times to prep data for our development or staging environments. Every once in a while, you need to refresh your dev environment with data from the live store, so you can get the latest products and other settings. You might also want to refresh the customer and order tables so you can work with a realistic set of data if you need develop code against the orders and customers. But you don't want it to be too realistic - any personal information about the customers should be cleaned out under the principle that only those who need to know the info should have access to it.

The process we use is to make a back up of the database and load it into a new, separate database.
mysqldump -u user -p --skip-lock-tables --single-transaction --quick softslate > softslate.dump.sql
mysql> create database for_staging;
mysql -u user -p for_staging < softslate.dump.sql

Finally make a back up of the new, cleaned-up database, and use *that* in your development or staging environment:
mysqldump -u user -p --skip-lock-tables --single-transaction --quick for_staging > for_staging.dump.sql

I hope these come in handy, and I'd be curious to know if anyone has similar (or better scripts) you use for these tasks?

We spend a lot of time here working for clients helping them to customize their SoftSlate Commerce stores. That often means extending the built-in SoftSlate domain model (ie, adding new fields to the database and the programming code) to keep track of or support some special business logic.

This is why we are very excited by the SoftSlate Extender tool. The SoftSlate Extender automatically generates the necessary code to add new fields to existing SoftSlate Commerce entities, or even new entities altogether. This includes all the code required to view and manipulate the fields and entities in the SoftSlate Administrator. (If you’re familiar with Grails think “grails generate-all”.) In just a couple of steps, you can add your new field to the existing application, and view and update it within the Administrator application alongside all the core fields. Where before this process could take a few hours of manually creating or updating files, with the SoftSlate Extender it can take just minutes. Joy!

The SoftSlate Extender is available starting with version 3.3.0. We’re calling it “beta” for now because it currently only supports generating SQL for the new database objects in MySQL format, and also there are a couple minor limitations when it comes to extending an existing entity (though it is certainly still worth using for that purpose).

The Extender tool lives entirely within the following package: com.softslate.commerce.administrator.extender, or, on the file system, in the /WEB-INF/src/com/softslate/commerce/administrator/extender directory. Inside there, you’ll find a lot of .vm files, which are Velocity templates. The Extender uses Velocity templates to represent the boilerplate Java classes, JSP templates, and other files it needs to generate. You’ll also find a README file, but I’ll go over everything the README discusses here as well.

Operating the extender boils down to this:

Modify extender-custom.properties with the fields/entities you want to add to the system, and with the settings you want to use for the Extender.

Run the ExtenderCommandLine class as a Java application. (Eg, in Eclipse, right-click on ExtenderCommandLine.java, and Select Run As … Java Application). If you’re using Eclipse, refresh the project to view the new/modified files.

Make any tweaks you need to to the output (eg, adding a database index for the field).

Run the CREATE TABLE or ALTER TABLE statements outputted by the Extender, which will be placed in .sql files for you under the businessobjects package.

Test the new entity or fields in the Administrator.

Take an early lunch!

Here’s what a typical configuration might look like in the configuration file, extender-custom.properties:

entity=The fully-qualified class name of the Bean you wish to create or extend.

foreignKeys=A comma-separated list of the fully-qualified class names of other Beans the entity needs to reference as foreign keys. ID fields referencing each foreign key will be generated in the new entity (or added to the existing entity).

fields=A comma-separated list of all the fields for the new entity, or the new fields being added to an existing entity. Each field is composed of the name of the field, followed by a pipe character (|), followed by the data type for the field (String, Integer, Double, or Boolean).

createAdminInterface=Set to ‘false’ to skip generating Struts configurations, Struts Forms, Struts Actions, and JSPs for the Administrator interface for the new entity or fields.

updateExistingEntity=If you are adding a new field to an existing entity, you should set this to true. If you are creating a brand new entity, set it to false. If it’s set to true, rather than create all the files for a brand new entity, it will just create the files needed to add the new field or fields. If you set overwriteExistingFiles to true, it will also attempt to merge just the new code for the new field(s) into the existing code.

overwriteExistingFiles=Set to true to have the Extender generate the code in the project’s WEB-INF directory itself, overwriting existing files (or merging into existing files if updateExistingEntity=true). Otherwise, the Extender places its output inside a new directory it will create, WEB-INF/extenderOutput. The suggestion is if you have a source control system, set this to true because you can easily revert the changes if you need to. It’s nice to overwrite because you can simply Refresh the project and use source control to review the diffs, and it’s easy to revert.

webinfDirectoryPath=Rare you would need to populate this, unless you have altered the directory structure of the project.

We hop you enjoy working with the Extender and that it saves you a ton of time. Drop us a line to let us know how it’s going!

In version 3.2.8, we’re adding a couple configuration parameters to our Advanced Search (ie, Lucene) implementation in the Enterprise Edition. I have to say, I’m always impressed with how easy it is to use Lucene’s API. It is surprising how easy it is to implement seemingly-complex features via their API. Kudos to the Lucene folks!

We decided we would give administrators control over a few of the aspects of how Lucene analyzes searches inputted by users. For those who are looking for them, here is where the settings can be found on our Advanced Search Settings screen:

Here is the full scoop on what these settings do.

Advanced Search Match Style

The first setting, Advanced Search Match Style, represents the technique the system should use to match documents during Lucene searches inputted by users. They are listed in rough order from narrowest to broadest.

Exact Match (wraps search in double quotes)

All Terms (uses AND operator between terms)

All Terms Fuzzy (AND operator and tilde appended to each term)

All Terms Wildcard (AND operator and asterisk appended to each term)

Any Term (OR opertator – this is Lucene’s default)

Any Term Fuzzy (OR opertator and tilde appended to each term)

Any Term Wildcard (OR opertator and asterisk appended to each term)

There are a couple Lucene features at play here. Let’s go through them briefly.

The Default Operator: AND or OR

First, let’s talk about the “default operator” for searches. Lucene uses the OR operator by default. For example, if a user enters hubble space telescope for his search term, by default it returns products that match either “hubble” or “space” or “telescope”. For product catalogs that are very large, or if your users commonly search for multiple-word phrases where one of the words is very common (like “space” perhaps), you might find the OR operator returns too many matches. In that case you can change the matching style to one of the “All Terms” options so it uses the AND operator instead. That will require a match on all terms in the user’s input, eg., “hubble” and “space” and “telescope”. In terms of the Java code it tells us which way to call QueryParser.setDefaultOperator(). Incidentally, the user still can switch the search to a different operator by inputting the terms AND or OR himself. Ie., hubble or telescope will return matches on either “hubble” or “telescope”, even if you’ve specified an “All Terms” option in the setting.

Fuzziness and Wildcards

Second, we have options to automatically add fuzziness or a wildcard to each of the terms inputted by the user. Fuzziness means that Lucene will forgive a small spelling mistake or fat-fingering, and consider a term to match if it is close to another term. Eg, “huble” might match “hubble” if one of the fuzzy options are enabled. In Java code, this involves appending a tilde character (~) to the end of each term entered by the user. For example, the above search would be submitted to Lucene as “hubble~ space~ telescope~”. Note that if one of the non-Fuzzy settings is chosen the user can still input a fuzzy search by appending the tildes himself in the search string. The setting determines whether or not SoftSlate automatically adds them if they are not there. Similarly for the Wildcard options. If one of those options are set, SoftSlate will automatically append asterisks to each term, making them match on partial words. Eg., “spa” would be submitted as “spa*” and therefore match “space”. Note again that the user does have some control over the situation. Even if a Wildcard setting is not chosen, he can add asterisk to a term to turn it into a wildcard search. You might wondering, what if the setting specifies Fuzzy and user inputs a wildcard? In this case the user wins and the input is left alone as a wildcard search. Any term the user submits with a special character appended to it is left alone (out of respect for his wishes).

Stemming and Wildcards

We discovered we had to resolve an interesting problem using wildcard searches with the Snowball analyzer. The Snowball analyzer can be specified via the “Lucene Analyzer” setting. It’s purpose is to “stem” the terms that are indexed and submitted by users. Stemming means that the term is broken down to its root so that, for example, “running” and “runs” both are broken down to the root term “run”. This posed a problem when combined with the options to automatically add wildcards to each term the user submits. Internally, Lucene assumes that the wildcard is part of the user’s input, not added automatically for them. So by design it does not “stem” wildcard searches when they are submitted. However, the terms are already indexed in stemmed form. So if “running” was submitted, SoftSlate added a wildcard to it to make “running*”, Lucene saw the asterisk and skipped stemming the term. But since the index had not indexed “running”, only “run”, the submission failed to match on the original “running” that was submitted! To make a long story short, to work around this, we also add a search for the original term that the user inputs – minus the wildcard at the end – to make sure it is covered. That is, “running” is reworked as “running OR running*”.

Advanced Search Fuzziness

The Advanced Search Fuzziness setting is a factor from 0.0 to 1.0 that tells the system how fuzzy fuzzy searches should be. This setting applies if one of the Fuzzy options is selected for the Advanced Search Match Style, or the user himself appends a tilde (~) to the end of a search term. A higher value will cause fuzzy searches to be less fuzzy (ie, narrower). A lower value will cause fuzzy searches to be broader. In terms of the Java code the value of this setting is piped directly into QueryParser.setFuzzyMinSim(). Lucene’s default by the way is 0.50.

Many ecommerce web site owners use UPS and FedEx and need to print labels. I would guess a majority of these site owners print their labels through software like UPS WorldShip or FedEx Ship Manager. Others may use the UPS or FedEx web sites to create shipments and labels. But, what if you want more advanced integration options?

Both of these services allow you to pull down labels when creating shipments. FedEx has labels in PNG format while UPS uses GIF. You can print these to a laser printer or inkjet printer without an issue. In fact, it is recommended. But, what happens when you try to print the labels to a Zebra thermal printer?

Well, we have had success printing the FedEx labels directly to the Zebra printer driver. The driver is able to print the image without much issue. Recently we needed to do something similar for UPS. We did not have very good luck when printing the UPS GIF image. The GIF that is returned does not print very clearly and is not crisp. It does look OK when printing to an inkjet printer, but not the thermal printer. To be fair to the Zebra, the original GIF is not all that great when it comes to image quality.

We had the new challenge of trying to find an alternative way to print the UPS label directly from a web browser like we were with raster images. The solution we chose was to integrate the open source plugin called jZebra.https://code.google.com/p/jzebra/

We changed the code to ask UPS for ZPL labels and stored the base64 string in the database. We could then use the jZebra print64 method to print the commands directly to the Zebra Thermal Printer from the browser window. The quality was the same as using WorldShip. Many thanks to the people working on the Java jZebra plugin!

SoftSlate develops a Java shopping cart application so you can imagine where we stand on this debate, if debate is the right word. Preference? Opinion? Does there need to be a winner or loser here?

The argument has been made that Java is more scalable. The counter-argument has been made that PHP is scalable too and after all, Facebook is a PHP application. The counter-counter-argument has been made that yes, Facebook is a PHP application, but they had to do a whole bunch of stuff to make it scale.

On the other hand the argument has been made that PHP is easier to learn (this may explain why there are hundreds of PHP shopping carts and not nearly as many in Java). And maybe more importantly, faster to develop prototypes with, which is very important indeed, when you think about how valuable it is to be agile. I will also note here you are reading this on a page created in PHP.

Which language do you prefer? And what’s the essential factor for you?

I’ll speak for myself: apart from all its other strengths, Java feels easier to keep clean. By way of illustration, may I present here my family room:

It’s a mess of toys as you can see, and it’s almost always like this, except for those days we cajole our two kids into picking up. On those days, the pristine conditions seem to last for about an hour, if we’re lucky. But during that hour, oh yes, it feels almost … civilized.

You’ll note the cubbies and various boxes and such. These are artifacts of an organizational system. The system is there, it’s just only being partly-used. I think Java is much like our family room. You can still make a hell of a mess if you are careless, but at least there are lots of cubbies and boxes there for you (ie, standards, design patterns, and frameworks), if you choose to use them.

Does this mean PHP does not have cubbies? No, it has them too, but I tend to think they might not be as big or fancy. You might have to construct some of them yourself. And (maybe more relevantly), fewer of your neighbors use them, so it’s not embarrassing when they come over and discover that you don’t either. It may be easier to play with all the toys, but it’s easier to make a mess with them too.

One of the important things you need to do when starting an e-commerce store is to integrate with Facebook. The first and easiest thing to do is to add the Facebook Like Button. In this example, we will be adding the Like Button to a Softslate store called eCustom, but the concepts are the same for any web store.

Decide where to put your Like Button
Now that you have a shiny new Facebook page, you need to have people Like your page. For this example, we have decided to put the Like Button on the top of every page in the header and on each product page.

Decide where to put the like button

Get The Main Facebook Like Button
Facebook has a nice tool for creating the code needed for the Like Button. The first button we will create will be a Like Button that actually Likes the Facebook page and not the page itself. This button will “subscribe” the user to your Facebook page so they get updates about your page. When you link the button to a page on your site, it will only push out more of a one-time recommendation to their friends. We will use this second type of link for the individual product pages later.

This tool has many options. The options are explained below the tool on Facebook. You should tailor it to your needs. The important thing here is that we are using the URL link to the Facebook page as described above. Now, click the “Get Code” button to get the actual code we will use in SoftSlate.

Get the Code for the Facebook Like Button

Add the Facebook JavaScript SDK code to header.jsp
Facebook recommends adding the SDK code (the top box in the picture above) into the HTML directly after the <BODY> tag. The body tag is usually in the layout templates in SoftSlate. Probably the best place to add the code would be the header.jsp, which is included after the body tag in all SoftSlate layouts. If you do not have header.jsp in your custom layout directory, you will need to copy it from default-xhtml into the custom directory to override before adding the code as shown in the picture below.

Add Facebook SDK Code to Custom Header JSP

Add the HTML5 Facebook Like Tag to your Header
You can put this button that will like your Facebook page in your header or footer or sidebar. In fact, you can place it wherever you like. In this example, we will be putting it in the header.jsp as well.

Like Facebook Page in Header JSP

Check out the Result and Start Asking People to Like your Page!

How Does the Like Button Look?

Add a Similar Button to your Product Pages
We will add the same tag to our custom product.jsp file with a few small tweaks. This button will have a dynamic URL to the product page in the tag. This means people will not be Liking you to follow your Facebook page, but rather they will be recommending the product to their Friends. We will also turn the Send button on. The code looks like this in the JSP:

Product Page Like Button Code

You can use this same concept on content pages or category pages if you like. You should give your visitors the opportunity to recommend as many pages on your site as possible. Below is how the button looks on one of the eCustom product pages:

Like Button on Product Page

What to do Next?
Interact and engage your customers using your new Facebook page and Likes. A SoftSlate coupon code for people Liking your web site is always a great way to jump start your Likes!