Monday, June 30, 2014

If you've done any pure Java development in the recent years you've probably found the need to work with JSON objects. Lots of SDK's include their own JSON parser or include the json.org parser. But what about if you're writing POJO (plain old java objects) libraries?

Here's a real quick and simple way to get the json.org code and compile it into a reusable library.

Monday, June 23, 2014

This past April I had the pleasure of speaking at #ChefConf 2014 on the topic of Windows Web Server Management With ASP.NET. In that presentation I briefly touched on the topic of creating a Light Weight Resource Provider for Web Deploy.
Today I thought I would walk you through creating that LWRP. The first thing you need to do when creating a LWRP is create the resource definition. The resource definition defines both the actions of the LWRP as well as the attributes. Our web deploy LWRP needs to support Web Deploys sync verb. I've chosen to use :sync for the name of my action because it corresponds to the domain specific language used by msdeploy.
There are three pieces of information that we want the consumers of our LWRP to be able to change in their recipe. The first is the name of the package being deployed. The second is the destination where the package should be deployed to. And last is any additional parameters that should be passed along to msdeploy.
We're also going to implement the initialize method of the resource in order to set :sync as the default action. This is not strictly necessary but allows the consumer to not have to set the action in their recipe.
Here's what our resource definition looks like.
your_cookbook/resources/web_deploy.rb
actions :sync
attribute :package, :kind_of => String, :name_attribute => true
attribute :destination, :kind_of => String, :default => "auto"
attribute :parameters, :kind_of => Hashdefinitialize(name, run_context=nil)
super@action = :syncend
Now that we've got our resource defined our next step in creating our Web Deploy LWRP is to create an implementation of our resource. We do this by defining a provider with the same name as the resource we previously defined. One of the really nice things about LWRP's is that Chef takes care of most of the heavy lifting in defining your provider. All we really have to do is implement the actions we declared in our resource.
Our :sync action implementation does 3 main things. First it creates the msdeploy command. This command is exactly what you'd type if you were running the command through a command prompt. Notice as we generate the command that we're setting the package and destination to the attributes with the same name on our new resource.
The second thing our implementation does is append any parameters set on the new resource to the msdeploy command.
The last thing our implementation does is to execute the command using the built in execute resource. It is VERY important to note that this IS NOT idempotent, meaning that this command will get run every time this recipe is run. This is really important to think about when creating your LWRP as you will need to build in the idempotency for implementations that run dynamically generated scripts.
your_cookbook/providers/web_deploy.rb
action :syncdo
msdeploy_cmd = "\"%programfiles%\\IIS\\Microsoft Web Deploy V2\\msdeploy.exe\" "
msdeploy_cmd << "-verb:sync "
msdeploy_cmd << "-source:package=\"#{@new_resource.package}\" "unless@new_resource.package.nil?
msdeploy_cmd << "-dest:\"#{@new_resource.destination}\" "unless@new_resource.destination.nil?
@new_resource.parameters.each do |name, value|
msdeploy_cmd << "-setParam:name=\"#{name}\",value=\"#{value}\" "end unless@new_resource.parameters.nil?
Chef::Log.info("MSDeploy Command: #{msdeploy_cmd}")
execute "webdeploy"do
command msdeploy_cmd
endend
You can build in that idempotency fairly easily in Chef. The first thing you need to do is define an attr_accessor on your resource. Something like attr_accessor :has_run. Then in your provider you need to override the load_current_resource method and perform any check necessary that will validate whether you need to perform your action or not. For example:
defload_current_resource... perform check to see if your action has already been performed.end
After you have that plumbing in place you can update your action :syncdo implementation to wrap your command execution with a check to see if the action has already been run. For example.
if@new_resource.has_run
Chef::Log.info("Web Deploy has already been run")
else... put the msdeploy command execution logic hereend
Now you have an LWRP which will allow you to idempotently deploy a web deploy package. Using your new LWRP within a recipe is very simple. Here's an example where we get the source package and additional parameters from an attribute. Notice that we don't set the destination parameter because our resource definition includes a default.
your_cookbook_web_deploy "#{node['webapp']['local_directory']}/#{node['webapp']['package']}"do
parameters node['webapp']['parameters']
end

Monday, June 16, 2014

In my previous post Back To The Basics: The Hash Table I described the Hash Table data structure, provided a basic template for the class in Java, as well as walked you through the hashing algorithm. In this post we’ll take a look at the CRUD operations. Actually we'll really only talk about addition, retrieval, and deletion because adding and updating in a Hash Table are the same operation.

Let's take a look at the add, get, and remove operations first in their simplest form.

Looking at these three operations we can now see why a Hash Table in practice provides near constant time additional and retrieval operations. The only non-constant time operation that is performed is the generation of the hash code. But this operation can be optimized and it's resulting value cached (if necessary) effectively making the add and get operations approach O(1).

What you may notice by looking at these operations are that they don't handle collisions. The add operation is simply overwriting the existing value associated with the hash regardless of whether it's actually associated with the key. The get operation is returning whatever value is stored at the index generated by the getKeyIndex method. And, like the add operation, the remove operation is removing whatever is associated with the hash.

We can try to optimize our hashing algorithms as much as we want but we'll never truly be able to avoid collisions 100% of the time. The reason is that we'd have to know all the possible item's that can be hashed in order to create a truly perfect hashing algorithm. Since this is not really possible for a general purpose Hash Table, it's important that we add some code that can handle collisions when hashing the key.

The most straight forward way to handle hash collisions is to store the values at each index in a Linked List. Instead of setting the value associated with a particular key as the value in the backing array the value will be set to a Linked List which will contain all the key/value pairs that map to a particular hash. This will require us to add a little complexity to our Hash Table but will allow us to avoid putting our Hash Table into an unexpected state when a collision occurs.

The first thing we need to do to avoid collisions is to define an object that can represent a key/value pair item that gets stored in our Linked List for each value. Since we're using Java we're going to create a class that implements Map.Entry<String, Object>.

Now we can update our add, get, and remove methods to support using a Linked List as the backing array. We're going to start with the get method because we'll define a helper method that the addition operation can also take advantage of.

Our new get method is fundamentally the same as our previous get method with the only exception being that we've pushed the actual retrieval logic into it's own method named findEntry. First it tries to find the item. If it does it returns it's value, otherwise it returns null. The new get method looks like:

Our new add method is also the same as our previous add method with the exception of two things. First we've added logic to create the Linked List if it doesn't already exist. The second change is that we've pushed the actual addition logic into it's own addToList method. This allows the add method to maintain a single responsibility. The new add method looks like:

The addToList method is not very complicated either. It tries to get a handle to the existing object using the findEntry method we've already defined. If it does not find an entry it adds the new key/value pair to the list otherwise it updates the existing entry value. Here's what the addToList method looks like:

Finally, our new remove method is also able to benefit from our findEntry method. It first tries to find the existing value. If it finds it then it removes it from the list and returns the value of the entry. Otherwise it returns null. The new remove method looks like:

There are a few additional things to consider when creating or using Hash Tables. Concurrency can become an issue when the hash table is used in a multi-threaded environment. If you use a hash table in a multi-threaded environment you'll need to make sure that the implementation you use is thread safe.

Another thing to consider when creating or using Hash Tables is that because the Hash Table relies on a fixed size array as it's backing object store, it may eventually need to grow as more items are added. The cost of performing a resize operation is typically O(N) as each item in the existing array needs to be copied to a new, larger, array. This can be done all at once or incrementally.

Monday, June 9, 2014

A few weeks ago I began a series entitled Back To The Basics. This series is intended to go over various software data structures that are helpful to understand as a software engineer. I started the series with the Binary Tree. The next data structure I'd like to take a look at in this series is the Hash Table.

Hash Table's are very important data structures to understand. At their most basic level they allow for associative array functionality with near constant time addition, deletion, and retrieval. The ability for a Hash Table to provide near constant time operations depends on how good the hashing algorithm is. The goal of the hashing algorithm is to avoid collisions where two different values create the same hash. When a collision is encountered the Hash Table strays farther away from being able to provide constant time operations.

The ability to provide near constant time operations makes the Hash Table ideal for use as a cache or an index as items can be inserted and retrieved with very little overhead.

While most modern programming languages and frameworks provide several optimized implementations of the Hash Table, let's take a look at what it would take to build a Hash Table from scratch which can handle adding, removing, and retrieving values.

Let's first start by defining our HashTable class. The most basic definition of a Hash Table will define the underlying array which will store our values and our constructor will initialize the store. The important thing to pay attention to here is the size of our initial array. We're going to initialize it to a size using a prime number because it will help us evenly distribute values within our backing store. The reason for this will become more apparent when we create our hashing function.

The getKeyIndex method is pretty simple. It first gets a hash code for the key and then returns an index location based on modding the hash code with the length of the array that holds our values. The reason we return a mod is so that we can make sure that the index returned is within the bounds of our array. And because we've used a prime number for our array length we can get better distribution of hashes in our array so that they're not clustered together.

Right now the getKeyIndex method is relatively boring and hides the magic of the actual hashing algorithm because our keys are strings and we're able to rely on the String classes hashCode method. But what if there was no hashCode method on String? How would we go about writing our own?

A good hashing algorithm will try to provide uniform distribution of values in order to avoid collisions which will increase the cost of the operation. The cost is increased because for every collision you have to resolve it somehow. The work it takes to resolve the collision is relative to the number of collisions for that key and the algorithm used to resolve the collisions. One way to resolve the collisions is to store the values in a Linked List and iterate through the list till we find the item with our hash code. This will cause our resolve algorithm to have an O(n) cost where n is the number of collisions for that key.

Now let's build our string hash algorithm. Because strings are made up of individual characters and those characters have an integer value we could create a hashing function that just sums the character values. For example:

This has a problem in that it's likely to cause collisions. For example let's say we wanted to hash the string "dog". We'd start our hash with an initial value of 0. Then we'd iterate through the characters in the string summing them as we go. For dog this would produce a hash of 314 ((d)100 + (o)111 + (g)103). Now let's say we wanted to create a hash of the string "cav". This would also create a hash of 314 ((c)99 + (a)97 + (v)118). This is a collision, which is something we want to avoid.

The good thing is that there's a pretty simple way to help avoid these collisions. If we change our algorithm slightly to include the use of a constant that the character value can be multiplied against then we can avoid these collisions. Notice that I multiply the constant by the current hash value during each iteration. This helps us make sure the hash is unique because the value used for a character is different depending on it's position within the string. This helps us decrease our likelihood of a collision.

With our new algorithm dog will hash to 99644 and cav will hash to 98264. Because we've added a constant that is multiplied against the current hash value at each iteration we've increased the likelihood that the resulting hash code is unique.

The delete algorithm consists of three main scenarios. First, deleting a leaf node. Second deleting a node with only one child. Last, deleting a node with two children.

The first scenario is the easiest scenario to account for. Remember that a leaf node is a node which has no child nodes. So to delete a leaf node requires us simply to null out the value of that node in it's parent.

The second scenario of deleting a node with only one child is also pretty straight forward. You can simply change the pointer that points to the node being deleted to point to the deleted node's child.

The third scenario is the more complicated one. When deleting a node with two children you replace the pointer from the node being deleted to either the minimum value node from the right tree or maximum value node from the left tree and then delete that node from that tree.

About Me

I’ve been in the technology industry for 14+ years mostly as a Software Developer. I've worked on projects ranging from a Ruby on Rails Portal, a high performance/low latency .NET search stack, R&D for building infrastructure in the cloud including automated server builds using Chef, to architecting and building iOS and Android Mobile platforms and frameworks.I've also spent some time building Android, iOS, Windows Phone, and Bada mobile applications that focused on the presentation of long form audio/video.The middle of my career was spent at MSNBC working on their content management publishing platform, their video syndication system, as well as being a co-creator of their iOS iPad framework from which the Rachel Maddow, Today Show, and Nightly News iPad apps were built.Finally, the beginning of my career was spent writing software on government contracts with a few years spent writing criminal investigation software.