Software Engineering/Architecture Blog

Friday, 20 May 2016

In most cases, where I've seen Ansible being implemented to automate ops taks is using the Push mode. On this approach, playbooks start running from a given host where Ansible is set. The Ansible host is gonna interpret the tasks and apply "pushing" them to all target hosts through SSH.
What is maybe unnoticed when start playing with It, is the fact that the same results can be achieved by using a totally different flow, the Pull Mode.

There isn't much about It on the official documentation, but the idea is pretty simple.Instead of pushing playbooks to the target hosts, using pull mode you can make target host "pull" them from a given repository. By doing this, there is no need to have a single machine playing the ansible-host role, in this scenario, this responsibility is spread on the machines on the datacenter..
There is nothing special in order to get It running.Both, Pull and Push will be available after following the installing steps available here.
Lets say I want to deploy the application I build here using the Pull mode in all my cluster machines. After having Ansible properly installed on the target hosts, the following command should be raised:

This command will connect on Github and download the entire repository locally. After doing this, Ansible will look for a file named as Local.yml. This file should contains all tasks, or a reference to the ones who have them in order to perform a playbook.
An interesting approach is to make the target hosts pull the remote repository times to times. By doing this, changes will be applied on all target machines asynchronously and in background as soon they are available on the repository.That could be quite interesting when talking about provision hundreds or thousands of machines. This mode will scale much better than the Push mode. This can be achieved by just setting a cron job. and calling a script that encapsulates the pull command described before, like this:

The Pull mode can be useful also to change application configuration more dynamically. By using tags, I can update the log4j config as soon they hit the remote repository:

As we can see, there are a range of scenarios where the Pull mode can be useful. BTW, It could be a bit more flexible by letting the user specify which playbook to run (It only look or a file named as Local.yml, something different than that is gonna produce an error). Users need also be careful when sending code to repository when using this feature. Code badly written can break an entire datacenter without you notice.

Sunday, 1 May 2016

When talking about automation, Ansible is definitely one of the most simple and easy to use frameworks. It has a pretty low learning curve due Its comprehensive DSL, which is easy to understand. You also don't need to install anything on the server that will be provisioned (agent less architecture), which makes the setup simple. Everything looks great when the provisioning process has only two or three script files, but as soon you add more functionalities ,there will be some issues to deal with:

Reuse: there are certain provisioning tasks that are common to all servers, how to organise them in a such way they can be reused easily?

Organisation: similar as any programming code, without maintenance and good engineering practices, the provisioning process will be difficult to maintain and understand. Naming, modules organisation, conventions are all aspects that needs to be taken into account.

Ansible Roles

Ansible Roles are conventions that as a programmer you need to follow in order to achieve good level of reuse and modularisation. These conventions were added on version 1.2 and before this, the way to achieve better level of reuse was separating scripts into different files and including them on other scripts you want to reuse.

The documentation is very sparse when describing how Roles work, but the idea is pretty simple. Using Roles, you will be able to automatically load tasks, variables, files and handlers when provisioning a server or a group of them.

Lets look to an example, here I'm provisioning a Java application service. The server to run this application will need to have the following roles:

The role common is a role that any server in my infrastructure will need to have (reuse), which in this case is have JDK installed. The other role is called service, which is basically the things needed to run the service run the service Itself.

Ansible will automatically look for a directories called commons and service inside the main directory roles and execute all steps defined for them.
For the role service, we have:

vars

tasks

handlers

files: All files used on tasks will loaded from there.

There is still more directories that can be defined like templates and defaults. They aren't present on this example but are still useful. This is the full working example that provision a server that is able to run this Java application.

Using roles is great because they are expressive. Working with them properly you will be able to say what a given server is, which is much more declarative than just use the include directives. The directory conventions are good to define patterns to the whole team follow since day one and reuse achieved by defining very granular roles that can be set on different play books.

Friday, 15 April 2016

Agility when building, testing, packing and deploying Software is certainly a key quality aspect to pursue. BTW, there is no specific recipe to get there, among several other things that needs to done, avoid manual steps and not reinvent the wheel by relying on well established solutions on the community are some of them. Going on this direction, there is very interesting project maintained by Netflix that is called Nebula.

Nebula Project is a series of individual Gradle plugins, each focused on providing a very specific functionality in an usual Development Pipeline tasks. Today I'm gonna talk about one of those, nebula-os-package.

The main idea

The main idea behind It is to pack a JVM based application and Its metadata as a native OS package. The plugin is able to generate Deb and RPM packages, which are the most popular package formats on the Linux world.

Using

First of all, add the plugin on your build.gradle file.

Then, add the specify the plugin dependency

Now you need to say how the application is gonna be set on Host, after the package is installed.

Couple of important things that are happening here:

Specifying package name and version (lines 2 and 3)

Under which directory the package is gonna be placed after installed on the target Host (line 5).

All jars produced by Gradle during the build (the application jar and Its dependencies), are gonna be placed under /opt/packageName/lib on the target Host. (line 8)

Same thing for configuration files under the resource folder (line 18).

The scripts generated by Gradle when building a Java application are gonna be used to start It up on the target Host (line 11).

With everything property set, just execute the build command accomplished by the package task specified on the build file. The Debian package is gonna be placed at projectName/build/distributions

Yes, these are all valid points. Actually, this is the way we've being done so far when releasing applications outside of the J2EE world. But doing like this tasks like: deploy, start/stop, update and removing applications are all on you. Scripts will need to be create to manage all these, so one more thing that Ops and Dev teams will need to care about.

When deploying applications as Native OS packages, you can leverage a whole set of tools that are already there and none of the scripts mentioned before would be needed. This is a valid point when that affects agility when releasing and maintaining software.

Friday, 8 April 2016

Less than one year ago, Google launched officially on the market a new cloud messaging service, Google Pub/Sub. I've to confess I just looked at It more carefully after reading a very nice Blog post made by Spotify engineering team, describing their experience when testing this service.

As the name suggests, the processing model is based on the publishing/subscribing, pattern implemented by most of the best brokers available on the market.
The message consumption works in two models:

Polling: You can configure your client o poll the topics times to times.

Push: You register an URL that the Google Pub/Sub service is gonna call when messages arrive to the topic (web hook like)

Besides that, here are some characteristics that, in my opinion, makes this tool an interesting solution:

Durability

Messages sent to Google Pub/Sub messing system wont be lost lost. Google guarantee message retention up to 7 days, which pretty resealable. This is a must in scenarios where back pressure needs to implemented and some other consistency requirements needs to be attended.

Reliability

The Google Pub/Sub messaging service is available in almost all regions where their cloud service is working (North America, Europe and Asia). The advantage here is, Multi Region availability and replication. It is all google responsibility keep everything in sync and available for you. The operational costs to make systems like this work on promise are sky high, so rely on a third party service is a very relevant point to consider.

Low Latency and High Throughput

Results shown by the Spotify Engineering team and the Google documentations itself are very enthusiastic (1M messages/second). That makes It a considerable option for near real time processing systems. We should never trust 100% on benchmarks, ideally we should try It by ourselves, but similar number are on the Google documentation also, so It's relevant.

APIs

Choose your flavor

Java

Python

Objective C

GO

PHP

Javascript

REST

Once you pick one, It's resealable to check the performance to be sure It meet your needs. The Spotify team had some issues with the Java API, so they move to REST.

Billing model

It looks pretty fair ($0.40 /million messages on the first 250M). Just be aware that the real cost is gonna consider also:

Number or API calls: The process of sent, consume and ack a message are considered 3 separated API calls. The massage size is also taken into account. Google charges by blocks of 64 KB, so if the message contains 128 KB It's gonna take 2 calls.

Storage

Network

So, the final price may be tricky to calculate. Be aware that It may cost more than you expect It be.

When talking about data ingestion, this is definitively a tool to keep on the radar. The comparison with Apache Kafka is inevitable at this point, but in my opinion they are similar in some aspects but differ in others:

They are both production ready for Real time Log Analytics. There are several benchmarks on the Web using both and showing they are robust alternatives. BTW, Apache Kafka is more time on the market, which gives It some advantage.

They differ on the processing model. Differently than Kakfa, Google Pub/Sub doesn't let consumers rewind back on the topic. On this point, Google Pub/Sub works more like a traditional messaging system (without the overheads). Before being released as a product, Google Pub/Sub was a tool used internally by Google team and they didn't have this requirement so far. BTW, I won't be surprised seeing this feature on the next releases.

In my opinion, the ability to reprocess messages in a topic makes Apache Kafka a more relevant option when implementing Kappa Architecture. BTW, It's always interesting see another option for data ingestion on the market. More use cases would definitely make It more popular.

There is some producer example here. Google lets you use this service for free during 60 days. which is awesome for POCs in case you wanna try It.

Monday, 3 August 2015

Contract is a crucial part of any services anatomy. In a SOA/MSA approach, contracts exposes the services behaviour that consumers will rely on. This behaviour is usually exposed as series of interfaces and input/output structures and implemented by internal components. The last is usually hidden from external consumers. The previous described structure may look like this:

On this example, modules are splitted, there is only one dependency flow (internal modules doesn't know external ones), each module has Its own set of dependencies, consumers rely on abstractions (the contract), so far so good.
When consuming services through REST or SOAP, consumers will need to serialise and deserialise the structures service exposes. It means they will need to hold a copy of these structures locally in order to get this job done. Now lets say you want to avoid this structure replication through all consumers inside your company. Considering services are built following the structure illustrated before, the contract module could be splitted apart and then shared with the consumers. In this scenario, you may get ride of the structure replication issue, but will running into another. When issuing the contract module as is to consumers, they will now depend on the same libraries that contract depends, otherwise consumers wont be able to use them. What about services moving to a different library-A version that isn't compatible with previous ones? A scenario like this:

If all consumers don't upgrade to the same library-A version than service is they will be broken. More consumers you have on this model, more synchronisation between them will be needed to deploy. More services you have worst It get. The expected agility when choosing this architecture approach will be harder to achieve.
One possible solution for this problem is, accept that consumers will hold local copies and just deal with It. Accepting they will have some extra work is still better than the coupling scenario described before.
You can also design your service keeping the contract module with less dependencies as possible. It may looks like this:

This design guideline can increase services flexibility when sharing contract as library. Consumers would need be affected only contract behaviour changes.even here but there are techniques to mitigate these effects. It is also an example that being succeed on going for SOA/MSA approaches depends also on good design and architectures choices.

Sunday, 28 June 2015

On this post, I talked about the ideas behind AWS-Lambdas computation service and how It works.The presented example shows how It can be be deployed and used. Even having an working example, there is an issue on the way I'm using It. All steps regarding the deploy process are manual. It just goes against agility. Manual deploys as such are error prone. More complex the application get, more expansive It will be to maintain.The side-effects when maintaining manual deploy steps are endless, so there should be an alternative to automate It and make AWS-Lambdas really cost effective as It promises to be.

Kappa seems to fill this gap. It is a command line tool that greatly simplifies the process of having lambdas deployed on the cloud. All steps described on the mentioned post can be automated. Now are talking!

Setup

Before start, be sure you have python (2.7.x) and pip available on the command line.

Installing kappa:

I strongly advice build It from sources as far there are important bug fixes that seems to be fixed recently:

Line 2: There should be a profile that kappa will use to authenticate Itself on amazon and create the function in my behalf. We are gonna see it later on the awscli configuration;
Line 4: The policies assigned to this lambda. In case they aren't there yet, kappa will create them for me.
Line 9 - 18: function Runtime configs.
Line 19: This is the file that contains an example request in order to test the function. It is useful once we want to be sure everything is working fine after the deploy is over.
Line 20: Here I'm setting from where events will come from. In this case, any changes on the given bucket, will trigger a call to my function.

Now It's time to configure aws-cli. The only configuration needed is the security profile. Kappa will use It as stated before:

Create the following file in case Isn't already there: ~/aws/credentials and put the following content

It should be enough to see the function deployed on the aws-console. The previous commands in order did:

create the function on the amazon

Make it listen changes on a given bucket

Test the deployed function using fake data (simulating an event)

Check the status of the deployed function on amazon.

As far Kappa let me automate all deploy tasks,I'm able to create a smarter deploy process. I worked in an example about how could It be done here. I may forgot to mention some detail about the process of having It work, so in this case leave me message and I'll be glad to help.

Sunday, 7 June 2015

Less than one year ago, Amazon launched a new computation service, the AWS-Lambda. It promises simplify the process of building applications, by hosting and running code for you. All the infrastructure and some scalability and fail over aspects are Amazon's concerns. It also integrates pretty well with other Amazon's services like SQS, SNS, DynamoDB, S3, etc. The code hosted there can even be called externally by other applications using the aws-sdk.

Here, I'll show how to use this service by doing something very simple. It idea is implement some code that will listen to an event (PUT) in a given S3 bucket, apply some processing on the file content, and send It to a SQS Queue.

This service restricts the language and platform where the code is implemented. A NodeJS module needs to be exported and called after deployed into the Amazon infrastructure, So, if you are not familiarised with Javascript and Nodejs, I would advice you to step back and look some documentation first.

From line 1 to 3: Importing the modules needed on the implementation. All these modules needs to be packed when deploying the application, except by the aws-sdk, which is available by default in runtime.

Line 9 and 10: Getting information from the event. When listing to an event from a S3 bucket, what you receive is the event metadata. So, if you want to do something with the object that uploaded, you need to extract the event metadata and then get the uploaded object and do something with the content.

Line 12: The code from this point is a series of callbacks that depends from each other's results. So, to avoid the callback hell scenario, I used an external lib that make these functions dependencies a bit more clear to read.

In order to sure It's everything ok before deploying It, go to the console and perform a "npm install" command. It should check all code dependencies and put them into a specific directory.

Now It's time to set It up on Amazon infrastructure. The AWS-Lambda service let you upload a zip file with Its dependencies inside a Zip file. When using this option, be careful when creating the zip file. The Javascript file that will contains the code shown before needs to be at "/" on the zip file, otherwise It wont work. Worst than that, when running the code, the error message on the console is gonna show an error message that does not point on this direction.

Once is you have your code properly packed, go to the AWS console, access the "Lambda" option and and ask to create a new function. The presented screen should look like this:

There, I'm putting basic information about what I'm uploading. The most relevant information are the Handler (Javascript file + the module name to be called) and Memory and Timeout values (Amazon will use this information billing). There is still the execution role. If you don't have one yet, create It using the options available on the combo box. Once you managed to finish this step, the module is ready to be called. Now, the last step is just go to the bucket and I'm interested to monitor, and trigger this this function every time a new event happens by changing the bucket properties.

An additional and reasonable step would be test this deployed function in order be sure It's everything ok. In order to do that , go the console where all the functions are listed, select the function you want to test, press the "Actions" and select "Edit/Test". A new screen will be presented. On the left side, there is a section called "Sample Events". It simulates some real use cases. To test this function, pick "S3 Put" option and adapt the event setting valid bucket and file names.

If everything went fine, you should able see a message looking like this on the Execution result area:

"Message sent successfully: 9431465....."

Some additional billing information should be displayed also and that is It. From this point you are sure that the function is properly deployed and ready to be used.
The working example can be found here.