"innovative technology I think people should read about"

Category Archives: cloud

I’ve had the opportunity to work with a few different infrastructure automation tools such as Puppet, Chef, Heat and CloudFormation but Ansible just has a simplicity to it that I like, although I admit I do have a strong preference for Puppet because i’ve used it extensively and have had good success with it.

In one of my previous project I was creating a repeatable solution to create a Docker Swarm cluster (before SwarmKit) with Consul and Flocker. I wanted this to be completely scripted to I climbed on the shoulders of AWS, Ansible and Docker Machine.

The script would do 4 things.

Initialize a security group in an existing VPC and create rules for the given setup.

Create machines using Docker-Machine of Consul and Swarm

Use AWS CLI to output the machines and pipe them to a python script that processes the JSON output and creates an Ansible inventory.

Use the inventory to call Ansible to run something.

This flow can actually be used fairly reliable not only for what I used it for but to automate a lot of things, even expand an existing deployment.

Next, the important piece is the pipe ( `|`). This signifies we pass the output from the command on the left of the | to the command on the right which is create_flocker_inventory.py so that the output is used as input to the script.

So what does the python script do with the output? Below is the script that I used to process the JSON output. It first setups up an _AGENT_YML variable that contains YAML for a configuration then the main() function takes the JSON from json.loads() in the script initialization and creates an array of dictionaries that represent instances and opens a file and writes each instance to the Ansible inventory file called “ansible_inventory”. After that the “agent.yml” is written to a file along with some secrets from the environment.

After this processes the JSON from the AWS CLI, all that remains is to run Ansible with our newly created Ansible inventory. In this case, we pass the inventory and configuration along with the ansible playbook we want for our installation.

Conclusion

Overall this flow can be used along with other Cloud CLI tools such as Azure, GCE etc that can output instance state that you can pipe to a script for more processing. It may not be the most effective way but if you want to get a semi complex environment up and running in a repeatable fashion for development needs it has worked pretty well to follow the “pre-setup_get-output_prcocess-output_install_config” flow outlined above.

What is FIO?

fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. The typical use of fio is to write a job file matching the I/O load one wants to simulate. – (https://linux.die.net/man/1/fio)

fio can be a great tool for helping to measure workload I/O of a specific application workload on a particular device or file. Fio proves to be a detailed benchmarking tool used for workloads today with many options. I personally came across the tool while working at EMC when needing to benchmark Disk I/O of application running in different Linux container runtimes. This leads me to my next topic.

Why Docker based fio-tools

One of the projects I was working on was using Docker on AWS and various private cloud deployments and we wanted to see how workloads performed on these different cloud environments inside Docker container with various CPU, Memory, Disk I/O limits with various block, flash, or DAS based storage devices.

One way to wanted to do this was to containerize fio and allow users to pass the workload configuration and disk to the container that was doing the testing.

The first part of this was to containerize fio with the option to pass in JOB files by pathname or by a URL such as a raw Github Gist.

The Dockerfile (below) is based on Ubuntu 14 which admittedly can be smaller but we can easily install fio and pass a CMD script called run.sh.

What does run.sh do? This script does a few things, is checked that you are passing a JOBFILE name (fio job) which without REMOTEFILES will expect it to exist in `/tmp/fio-data` it also cleans up the fio-data directory by copying the contents which may be jobs files out and then back in while removing any old graphs or output. If the user passes in REMOTEFILES it will be downloaded from the internet with wget before being used.

There are two other Dockerfiles that are aimed at doing two other operations. 1. Producing graphs of the output data with fio2gnuplot and serving the graphs and output from a python SimpleHTTPServer on port 8000.

All Dockerfiles and examples can be found here (https://github.com/wallnerryan/fio-tools) and it also includes an All-In-One image that will run the job, generate the graphs and serve them all in one which is called fiotools-aio.

Other Examples

Important

Your fio job file should reference a mount or disk that you would like to run the job file against. In the job fil it will look something like: directory=/my/mounted/volume to test against docker volumes

If you want to run more than one all-in-one job, just use -v /tmp/fio-data instead of -v /tmp/fio-data:/tmp/fio-data This is only needed when you run the individual tool images separately

Notes

The fio-tools container will clean up the /tmp/fio-data volume by default when you re-run it.

If you want to save any data, copy this data out or save the files locally.

How to get graphs

When you serve on port 8000, you will have a list of all logs created and plots created, click on the .png files to see graph (see below for example screen)

Testing and building with codefresh

As a side note, I recently added this repository to build on Codefresh. Right now, it builds the fiotools-aio Dockerfile which I find most useful and moves on but it was an easy experience that I wanted to add to the end of this post.

Navigate to https://g.codefresh.io/repositories? or create a free account by logging into codefresh with your Github account. By logging in with Github it will have access to your repositories you gave access to and this is where the fio-tools images are.

I added the repository as a build and configured it like so.

This will automatically build my Dockerfile and run any integration tests and unit tests I may have configured in codefresh, thought right now I have none but will soon add some simple job to run against a file as an integration test with a codefresh composition.

Conclusion

I found over my time using both native linux tools and docker-based or containerized tools that there is need for both sometimes and in fact when testing container-native application workloads sometimes it is best to get metrics or benchmarks from the point of view of the application which is why we chose to run fio as a microservice itself.

I’ve been researching and working in the area of modern microservices for the past ~3 to 4 years and have always seen a strong relationship between Modern Microservices with tools and cultures like Docker and DevOps back to Service-Oriented Architecture (SOA) and design. I traced SOA roots back to Gartner Research in 1996 [2] or at least this is what I could find, feel free to correct me here if I haven’t pegged this. More importantly for this post I will briefly explore SOA concepts and design and how they relate to Modern Microservices.

Microservice Architectures (MSA) (credit to meetups and conversations with folks at meetups), are typically RESTful and based on HTTP/JSON. MSA is an architectural style not a “thing” to conform exactly to. In other words, I view it as more of a guideline. MSAs are derived from multiple code bases and each microservice (MS) has or can have its own language it’s written in. Because of this, MSAs typically have better readability and simpler deployments for each MS deployed which in turn leads to better release cycles as long as the organization surrounding the MS teams is put together effectively (more on that later). An MSA doesn’t NEED to be a polyglot of but will often naturally become one because teams may be more familiar with one language over the other which helps delivery time especially if the interfaces between microservices are defined correctly, it truly doesn’t matter most of the time. It also enables scale at a finer level instead of worrying about the whole monolith which is more agile. Scaling a 100 lines of Golang that does one thing well can be achieved much easier when you dont have to worry about other parts of your application that dont need or you dont want to scale in the monolith. In most modern MSAs, the REST interfaces mentioned earlier can be considered the “contract” between microservices in an MSA. These contracts should self describing as they can be, meaning using formats like JSON which is human readable and well-organized.

Overall an MSA doesn’t just have technical benefits but could also mean fewer reviews and approvals because of smaller context boundaries for each microservice team. Better acquisition and on-boarding because you dont have to be so strict about language preference, instead of retooling, you can ingest using polyglot.

Motivations for SOA, from what I have learned, is typically business transformation oriented which shouldn’t be surprising. The enterprise based SOA transformation on a large budget but the motivation is different now with modern MSA, now its quick ROI and better technology to help scale using DevOps practices and platforms.

Some things to consider while designing your modern MSA that I’ve heard and stuck with me:

You can do microservices with or without service discovery / catalog. Does it over complicate things?

The referenced text[1] that I use for a comparison or similar concepts and differences in this post talk about a vast number of important topics related to Service-Oriented Architecture. Such topics include the overall challenges of SOA, service reuse, deployment efficiency, integration of application and data, agility, flexibility, alignment, reference architectures, common semantics, semantic pitfalls, legacy application integration, governance, security, service discovery, inventory and registration, best practices and more. This post does not go into depth of each individual part but instead this post aims at looking at some of the similarities and differences of SOA and modern microservices.

Service-Oriented Architecture:

Some of concepts of SOA that I’d like to mention (not fully encompassing):

Technologies widely used were SOAP, XML, WSDL, XSD and lots of Java

SOAs typically had a Service Bus or ESB (Enterprise Service Bus) a complex middleware aimed at providing access and masking of interfaces.

Identification and Inventory

Value chain and business model is more about changing the entire business process

Modern Microservces:

Technologies widely used are JSON, REST/HTTP and Polyglot services.

Communication is done over HTTP and the interfaces are abstracted using RESTful contracts.

Service Discovery

Value chain and business model is about efficiencies, small teams and DevOps practices while eliminating cilos.

The Bulkhead Analogy

I want to spend a little bit of time on one of the analogies that stuck with me about modern microservices. This was the Bulkhead analogy which I cannot for the life of me remember where I heard it or seem to google a successful author so credit to who or whom ever you are.

The bulkhead analogy is pretty simple actually but has a powerful statement for microservice design. The analogy is such that a MSA, like a large ship is made up of many containers (or in the ships case, bulkheads) that have boundaries between them and hold different component of the ships such as engines, cargo, pumps etc. In MSA, these containers hold different functions or processes that do something wether its handle auth requests, connection to a DB, service a lookup or transformation mechanism it doesn’t matter, just that in both cases you want all containers to be un-damaged for everything to be running the best it can.

The bulkhead analogy goes further to say that if a container gets damaged and takes on water then the entire ship should not sink due to one or few failures. In MSA this can be applied by saying that a few broken microservices should not be designed in a way where there failure would take down your entire application or business process. It essence designs the bulkheads or containers to take damage and remain afloat or “running”.

Again, this analogy is quite simple, but when designing your MSA it’s important to think about these details and is why doing things like proper RESTful design and Chaos testing is worth your time in the long run.

Similarities and Differences or the two architectures / architecture styles:

Given the little glimpse of information I’ve provided above about service oriented architectures and microservices architectures I want to spend a little time talking about the obvious similarities and differences.

Similarities

Both SOA and MSA do the following:

Code or service reuse

Loose coupling of services

Extensibility of the system as a whole

Well-defined, self-contained services or functions that overall help the business process or system

Services Registries/Catalogs to discover services

Differences

Some of the differences that stick out to me are:

Focus on business process, instead of the focus of many services making one important business process MSA focuses on allowing one thing (containerized process) to do one thing and do it well. This allows tighter context boundaries for microservices.

SOA tailors towards SOAP, XML, WSDL while MSA favors JSON, REST and Polyglot. This is one of the major differences to me, even though its just a tech difference this RESTful polyglot paradigm enables MSAs to thrive with todays developers.

The value chain and business model is more DevOps centric allowing the focus to be on loosely coupled teams that break down cilos and can focus on faster release cycles and CI/CD of their services rather than with SOA teams typically still had one monolithic view of the ESB and services without the DevOps focus.

Conclusion

Overall this post was mainly a complete high-level overview of what I think are some of the concepts and major differences between traditional SOA and Modern Microservices that stemmed from a course I took during my masters that explored SOA while I was in the industry working on Microservices. The main point I would say I have is that SOA and MSA are very similar but MSA being SOA’s offspring in a way using modern tooling and architecture approaches to todays scaleable data center.

Note* by no means did I cover SOA or MSA to do them any real justice, so I suggest looking into some of the topics talked about here or reading through some of the references below if your interested.

It has been a crazy past few months leading up to DockerCon in San Francisco and I wanted to share some thoughts about current events and some of the work we have been participating in and around the Docker community and ecosystem now that we’re post-conference this week.

I have been working in open-source communities for more than five years now across technology domains including software defined networking, infrastructure & platform as a service, and container technologies. Working on projects in the Openflow/SDN, Openstack, and container communities has had its ups and downs but Docker is arguably the hottest tech communities out there right now.

There are so many developments going on in this ecosystem around pluggable architectures, logging, monitoring, migrations, networking and enabling stateful services in containers. Before I talk more in depth about persistence and some of the work my team and partners have been up to I want to highlight some of the major takeaways from the conference and the community right now.

The theme at @DockerCon was “docker in production” and by this I mean docker is ready to be used in production. Depending on who you ask and how docker is being used, production and containers with microservices can either be “hell” as Bryan Cantrill put it (If you haven’t seen Bryan’s Talks on the unix philosophy and production debugging, I highly recommend any of his sessions, especially the ones from the recent Orielly conference and this past weeks DockerCon) or it can really help your applications break down into their bounded domains with highly manageable and efficient teams going through the CI/CD build/ship/deploy process efficiently well. Netflix OSS [http://netflix.github.io/] is always a great example of this done well and many talks by Adrian Cockcroft dig into this sufficiently. You can also see my last post [https://aucouranton.com/2015/04/10/what-would2-microservices-do/] about microservices which will help with some context here.

So, Is Docker ready for production? Here are a list of what you need to make sure you have a handle on before productions use, and I might add that these are also big topics at DockerCon technically and there are many projects and problems still being actively solved in these spaces, I’ll try to list a few as I go through.

• Networking

Docker’s “aquirehire-sition” of the Socketplane startup culminated in the ability to use lib network [https://github.com/docker/libnetwork] which mean you can connect your docker daemons from hosts to host to allow container to easily send IP traffic over layer 2 networks. Libnetwork is maintains outside of the main docker daemon and abstracts the networking nicely and most importantly abstracts too much network knowledge away from the end user and make things “just work”

• Security

Docker is hardening the registry and images as well as the docker engine itself. I got to chat with Eric Windisch from Docker about what he had been focusing on with docker security. The docker engine’s security and hardening is at the forefront of focus because any vulnerabilities there means the rest of your container could be compromised. There is a lot of work going on around basic source hardening as well as other techniques using apparmour and selinux. Looking forward to seeing how to security aspect of Docker unfolds with other projects like Lightwave from VMware.

• Logging & Monitoring & Manageability

Containers are great, but once your running thousands to tens of thousands of them in products the need for great tooling to help debug, audit, troubleshoot and manage is a necessity. There seems to be lots of great tooling coming out to help manage containers. First docker talked about “Project Orca”, an opinionated container stack that aims at combining Docker Engine, Docker Swarm, Networking, GUI, Docker Compose, security, plus tools for installation, deployment, configuration. This of course isn’t always the way docker will be run but would be nice to have a way to get all this up and running quickly with manageability. Other tools like loggly, cadvisor, ruxit, datadog, log entries and others are all competing to be the best options here and quite frankly thats great!

• Pluggability

Docker has given the power to its ecosystem by telling it wants a pluggable, extensible toolset that allows for different plugins to work with their network, auth, and storage stacks. This provides a way similar to openstack for customers and users to say plugin an openvswitch network driver, along with lightwave for auth and EMC ScaleIO for persistent storage. Pretty cool stuff considering docker is only just over 2 years old!

• Stateful / Persistant services

This last bullet here is near and dear to my employer EMC and we have done some really awesome work by partnering up with ClusterHQ (The Data Container Poeple) [https://clusterhq.com/] who’s open-source Flocker project can manage volumes for your containers and enables mobility and HA for those volumes when you want to go ahead and start moving around or recovering your containerized applications, really cools stuff.

We had the opportunity to host a meetup at Pivotal Labs in San Francisco to showcase the partnership and drivers, at the Pivotal labs office we had a number of people come out for a few live demos, some beer and great food and conversation. Here is a gist for the ScaleIO demo we ran at the meetup showcasing Flocker + ScaleIO running on Amazon AWS deploying a MEAN stack application that ingested twitter data and placed it into MongoDB using node and express.https://gist.github.com/wallnerryan/7ccc5455840b76c07a70

At DockerCon Clint Kitson and I along with some ECS folks had a packed house for our partner tutorial session showcasing the ClusterHQ, RexRay and ECS announcements at DockerCon. The room was pas standing room only and attendees started to fill the floor. We hoped to have a bit more time to let the folks with laptops actually get to hack on some of the work we did but unfortunately pressed into 40 minutes we did what we could!

EMC also integrates through a native Go implementation called RexRay, a way to manage your persistence volumes but without the auto-mobility Flocker gives. RexRay is really cool in the way its working on letting you use multiple backends at the same time, say EC2 EBS Volumes as well as EMC ScaleIO.

In all, persistence and containers is a here to stay and there are a number of reason why and some items to keep and eye on. First the stateless and 12-factor app was the rage, but its not realistic and people are realizing state exists and running stateful containers like databases is actually important to the microservices world. This is because all containers have state, even if its “stateless” there is in memory state like running programs and open sockets that may need to be dealt with in certain use cases like live migration. If you haven’t seen the live movement of realtime Quake playing and moving between cities across oceans please watch, its great! (I’ll post the video once I find it)

Data is becoming a first class citizen in these containerized environments. As more workloads gets mapped to container architectures we see the need for the import and of data consistency, integrity and availability come into play for services that produce state and need it to persist. Enter enterprise storage into the mix. Over the week and weekend we saw a number of companies and announcements around this including some of our own at EMC. A few offerings that caught my eye are:

Private/Public IaaS, and PaaS environments are some of the fastest moving technology domains of their kind right now. I give credit to the speed of change and adoption to the communities that surround them, open-source or within the enterprise. As someone who is in the enterprise but contributes to open-source, it is surprising to many to find out that within the enterprise there is a whole separate community around these technologies that is thriving. That being said, I have been working with os-level virtualization technologies now for the past (almost) 3 years and in the past 2 years most people are familiar with the “Docker-boom” and now with the resurgence or SoA/Microservices I think its worth while exploring what modern technologies are involved, what drivers for change and how it affects applications and your business alike.

Containers and Microservices are compelling technologies and architectures, however, exposing the benefits and understanding where they come from is a harder subject to catch onto. Deciding to create a microservice architecture for your business application or understanding which contexts are bound to which functionality can seems like the complexity isn’t worth it in the long wrong. So here I explore some of the knowledge of microservices that is already out there in understanding when and why to turn to microservices and figuring out why, as in many cases, there really isn’t a need to.

In this post I will hopefully talk about some of major drivers and topics of microservices and how I see them in relationship to data-center technologies and applications. These are my own words and solely my opinion, however I hope this post can help those to understand this space a little better. I will talk briefly about, Conway’s Law, what it means to Break Down the Silos, why it is important to Continuous delivery, the importance of the unix philosophy, how to define a microservice, their relationship to SOA, the complexity involved in the architecture, what changes in the organization must happen, what companies and products are involved int this space, how to write a microservice, the importance of APIs and service discovery, and layers of persistence.

Microservices

The best definition in my opinion is “A Microservice fits in your head”. There are other definitions involving an amount of pizza, or a specific amount for lines of code, but I don’t like putting these boundaries on what a microservices is. In the simplest, a microservice is something that is small enough to conceptually fit in your head without really having to think to much about it. You can argue about how much someone can fit in their head etc, but then thats just rubbish and un-important to me.

I like to bring up the unix principle here as ice heard from folks at Joyent and other inn he field, this is the design that programs are designed to do one thing and do that one thing well. Like “ls” or “cat” for instance, typically if you design a microservice this way, you can limit its internal failure domain because it does one thing and exposes and API to do so. Now, microservices is a loaded term, and just like SoA there are similarities in these two architectures. But they are just that, architectures and I will add that you can find similarities in many of their parts but the some of the main differences is that SoA used XML, SOAP, typically a Single Message Bus for communication and a shared data source for services. Microservices uses more modern lightweight protocols like RESTful APIs, JSON, HTTP, RPC and typically a single microservice is attached to its own data source, whether is a copy, shard or a its own distributed database. This helps with multi-tenancy, flexibility and context boundaries that help scale such an architecture like microservices. One of the first things people start to realize when deep divining into microservices is the amount of complexity that comes out of slicing up the monolith because inherently you need to orchestrate, monitor, audit, and log many more processes, containers, services etc than you did with a typical monolithic application. The fact that these architectures are much more “elastic and ephemeral” than others forces technical changes that center around the smallest unit of business logic that helps deliver business value when combine with other services to deliver the end goal. This way each smaller unit can have its own change lifecycle, scale independently and be developed free of other dependencies within the typical monolith.

This drives the necessity to adopt a DevOps culture and change organizationally as each service should be developed by independent, smaller teams that can each release code within their own cycles. Teams still need to adhere to the invisible contracts that are between the services, these contracts are the APIs themselves between the services which talk to one another. I could spend an entire post on this topic but there is a great book called “Migrating to Cloud-Native Application Architectures” by Matt Stine of Pivotal (Which is free, download here) that talks about organizational changes, api-based collaboration, microservices and more. There is also a great post by Martin Fowler (here) that talks about microservices and the way Conways law affects the organization.

Importance of APIs

I want to briefly talk about the importance of orchestration, choreography and the important of the APIs that exist within a microservices architecture. A small note on choreography, this is another terms that may be new but its related to orchestration. Choreography is orchestration turned on its head, instead of an orchestration unit signaling when things happen, the intelligence is pushed to the endpoints and those endpoints react to events of changing environments, therefore each service known its own job. A great comparison of this is (here) in the book “Building Microservices” by Sam Newman. Rest APIs are at the heart of this communication, if an event is received from a customer of user, a choreography chain is then initialized and each endpoint talks to each other via these APIs, therefore, these APIs must remain robust, backward compatible and act as contracts between how services interact. A great post of the Netflix microservices work (here) explains this in a little more detail.

If the last few paragraphs and resources make some sense, you end up with a combination of loosely coupled services, strict boundaries, APIs (contracts), robust choreography and vital health and monitoring for all services deployed. These services can be scaled, monitored and moved independently without risk and react well to failures. Some of the exa plea of tools to hel you do this can be found at http://netflix.github.io. This all sounds great, but without taking the approach of “design for the integrations not the infrastructure/platform” (which I’ve heard a lot but can’t quite figure out who the quote belongs to, coudos to who you are :] ) this can fail pretty easily. There is a lot of detail I didnt cover in the above and I suggest looking into the sources I listed for a start on getting into the details of each part. For now I am going to turn to a few topics within microservices, Service Discovery & Registration and Data Persistence layers in the stack.

Service Discovery and Registration

Distributed systems at scale using microservices need a way to registry and discover what services and endpoints are available, enter Service Discovery tools like Consul, Etcd, Zookeepr, Eureka, and Doozerd. (others not listed) These tools make it easy for services to call this layer and find out a way to consume what else is available. Typically this helps one service find out how to connect and use another service. There are three main processes IMO for applications to use this layer:

Registration

when a service gets installed or “comes up” it needs to initially be registered with the discovery layer. An example of this is Registrator (https://github.com/gliderlabs/registrator) which reacts to docker containers starting and sends key/value pairs of data to a tool like Consul or Etcd to keep current discovery data about the service. Such information could be IP Endpoint, Port, API URL/Path, Resources, etc that can be used by the service.

Discovery

Discovery is the other end of Registration, when a service wants to use a (for instance) “proxy”, how will it know where the proxy lives or how to access it? Typically in applications this information is in a configuration file or hard coded into the app, whith service discovery all the app needs to do is know how to implement the information owned by the registration mechanism. For instance, an app can start and immediately say “Where is ‘Proxy'” and the discovery mechanism can respond by saying “Here is the Proxy thats closest to you” or “Here is the first Proxy available” along with the IP and Port of that proxy, the app can then just use those values, typically given in JSON or XML and use them inside the application thus not ever hardcoding any configuration anywhere.

Consume

Last but smallest is when the applications received the response back from the discovery mechanism it must know how to process and the the data. E.g. if your asking for a proxy or asking for a database the information given back would be different for the proxy versus how you would actually access the database.

Persistence and Backing Services

Today most applications are stateless applications, which means that they do not own any persistence themselves. You can think of a stateless application as a web-server, this web-server processes requests and talks to a database, but the database is another microsevice and this is where all the state lives. We can scale the web-server as much as we want and even actively load balance those endpoints without ever worrying about any state. However, that database I mentioned in the above example is something we should worry about, because we want our data to be available, and protected at all times. Though, you can’t (today) spin up your entire application stack (e.g. MongoDB, Express.js, Angular.js and Node.js) all in different services (containers) and not worry about how your data is stored, if you do this today you need persistent volumes that can be flexible enough to move with your apps container, which is hard to do today, the data container is just not as flexible as we need it be in todays architectures like Mesos and Cloud Foundry. Today persistence is added via Backing Services (http://12factor.net/backing-services) which are persistence / data layers that exist outside of the normal application lifecycle. This mean that in order to use a database one must first create the backing service then bind it to the application. Cloud Foundry does this today via “cf create-service and cf bind-service APPLICATION SERVICE_INSTANCE” where SERVICE_INSTANCE is the backing store, you can see more about that here. I won’t dig into this anymore other to say that this is problem that needs to be solved, and making your data services as flexible as the rest of your microservices architecture is not easy feat. The below link is a great article by Luke Marsden of ClusterHQ that talks about this very issue. http://www.infoq.com/articles/microservices-revolution

I also wanted to mention an interesting note on persistence in the way Netflix deploys Cassandra. All the data that Netflix uses is deployed on Amazon on EC2 instance and they use ephemeral storage! Which means when the node dies all their data is gone. But alas! they don’t worry about this type of issue anymore because Cassandra’s distributed, self healing architecture allows Netflix to move around their persistence layers and automatically scale them out when needed. I found out they do run incremental backups to S3 by briefly speaking with Adrian Cockcroft at offices hours at the Oreilly Software Architecture conference. I found this to be a pretty interesting point to how Netflix runs its operations for its data layers with Cassandra showing that these cloud-native, flexible, and de-coupled applications are actually working in production and remain reliable and resilient.

Major Players

Some of the major players in the field today include: (I may have missed some)

Pivotal

Apache Mesos

Joyent (Manta)

IBM Bluemix

Cloud Foundry

Amazon Elastic Container Service

Openshift by RedHat

OpenStack Magnum

CoreOS (with Kubernetes) see (Project Tectonic)

Docker (and Assorted Tools/Binaries)

Cononical’s LXD

Tutum

Giant Swarm

There are many other open source tools at work like Docker Swarm, Consul, Registrator, Powerstrip, Socketplane.io (Now owned by Docker Inc), Docker Compose, Fleet, Weave. Flocker and many more. This is just a token to how this field of technology is booming and were going to see many fast changes in the near future. Its clear that future importance of deploying a service and not caring about the “right layers” or infrastructure will be key. Enabling data flexibility without tight couplings to the service is part of an the overall application design or the data service. These architectures can be powerful for your applications and for your data itself. Ecosystems and communities alike are clearly coming together to help and try to solve problems for these architectures, I’m sure some things are coming so keep posted.

Microservices on your laptop

One way to get some experience with these tools is to run some examples on your laptop, checkout Lattice (https://github.com/cloudfoundry-incubator/lattice/) from Cloud Foundry which allows you to run some microservice-like containerized workloads. This post is more about the high-level thinking, and I hope to have some more technical posts about some of the technologies like Lattice, Swarm, Registrator and others in the future.

Continuos Integration

Recently I have been playing with a way to easily setup build automation for many small projects Continuos Delivery workflows are perfect for the smaller projects I have so settings up test / build automation is super useful and is even better when it can be spun up for a new project in a matter of minutes so I decides to work with Tutum to run docker-based Jenkins that connects into Github repositories.

In this post I explore setting up continuous integration using the Tutum.co platform using amazon AWS, a Jenkins Docker image, and a simple repository that have a C program that calculates primes numbers as an example of automating the build process when a new push happens to Github.

What I haven’t done for the post is explore using a jenkins slave as a docker engine, but hopefully in the future I can update some experiences I have had doing so which basically allows me to have custom build environments by publishing specialized docker images as what Jenkins creates and builds within. This can be helpful if you have many different projects and need specialized build environments for continuos integration.

Tutum, Docker and Jenkins

What I did first was setup a Tutum account, which was really simple. Just go to https://dashboard.tutum.co/accounts/login and sign in or create an account, I just used my github account and it got me going really quickly. Tutum has a notion of Stacks, Services and Nodes.

Node

A node is an agent for your service to run on. This can be a VM from Amazon, Digital Ocean, Microsoft Azure, or IBM Softlayer. You can also “bring your own” node by making a host publicly reachable and running the Tutum Agent on it.

Service

A service is a container, running some process(s)

Stack

A stack is a collection of Services that can be deployed together. You can use a tutum.yml file which looks and feels just like a Docker Compose yml file to deploy multiple services.

Deploy Jenkins

To deploy Jenkins you must first create a Node, then head to Services and click “create service” Jenkins will be our service.

We can search the Docker Hub for Jenkins images, I’ll choose aespinosa/jenkins because its based on ubuntu, and running the build slave directly on the same node this makes things easy since I am farmiliar.

Fill out some basic information about the Service, like published ports, volumes, environment variables, deployment strategy etc. When your finished, click “Create and Deploy”

Once this node is deployed, as you made sure you forwarded/published a port, we can see our Jenkins endpoint under “Endpoints”

To configure this inside of Jenkins, create a new build item and under the SCM portion click Git and add your repository URL as well as your credentials.

We also can add a trigger for build to happen on new commits.

Our build steps are fairly simple for this, just install the dependencies and run configure, make, and make install.

Now to test our new build out, make a change to a file and push to the master branch.

Your should see an active build start, you should see it in your Build History

You can dig on that build # and actually see the commit that it relates to, making it really nice to see which changes broke your build.

This should let us add the build status to our Github page like the below.

Jenkins also allows us to view build trends.

Conclusion

I wanted to take a bit of time to run through a Continuous Integration example using Jenkins, Tutum, and GitHub to show how you can quickly get up and running with these cloud native platforms and technologies. What I didn’t show it how to also add Docker Engines as Jenkins Slaves for custom environments, if you would like to see that let me know. I might get some time to update this with an example in the future as well.

What is continuous integration?

If your wondering, here is a great article by Martin Fowler that does a really great job explaining what CI is, the benefits of doing so, drivers for CI and how to get there. Continuous Integration

Over the past few months one of the areas worth exploring within the container ecosystem is how it works with external services and applications. I currently work in EMC CTO Advanced Development so naturally my interest level is more about data services, but because my background working with SDN controllers and architectures is still one of my highest interests I figured I would get to know what Powerstrip was by working with Socketplane’s Tech Release.

*Disclaimer:

This is not the official integration for powerstrip with sockeplane merged over the last week or so, I was working on this in a rat hole and it works a little differently than the one that Socketplane merged recently.

What is Powerstrip?

Powerstrip is a simple proxy for docker requests and responses to and from the docker client/daemon that allows you to plugin “adapters” that can ingest a docker request, perform an action, modification, service setup etc, and output a response that is then returned to Docker. There is a good explaination on ClusterHQ’s Github page for the project.

Powerstrip is really a prototype tool for Docker Plugins, and a more formal discussion , issues, and hopefully future implementation of Docker Plugins will come out of such efforts and streamline the development of new plugins and services for the container ecosystem.

Using a plugin or adapter architecture, one could imagine plugging storage services, networking services, metadata services, and much more. This is exactly what is happening, Weave, Flocker both had adapters, as well as Socketplane support recently.

Example Implementation in GOlang

I decided to explore using Golang, because at the time I did not see an implementation of the PowerStripProtocol in Go. What is the PowerStripProtocol?

The Powerstrip protocol is a JSON schema that Powerstrip understands so that it can hook in it’s adapters with Docker. There are a few basic objects within the schema that Powerstrip needs to understand and it varies slightly for PreHook and PostHook requests and responses.

Pre-Hook

The below scheme is what PowerStripProtocolVersion: 1 implements, and it needs to have the pre-hook Type as well as a ClientRequest.

Post-Hook

The below scheme is what PowerStripProtocolVersion: 1 implements, and it needs to have the post-hook Type as well as a ClientRequest and a Server Response. We add ServerResponse here because post hooks are already processed by Docker, therefore they already have a response.

Golang Implementation of the PowerStripProtocol

What this looks like in Golang is the below. (I’ll try and have this open-source soon, but it’s pretty basic :] ). Notice we implement the main PowerStripProtocol in a Go struct, but the JSON tag and options likes contain an omitempty for certain fields, particularly the ServerResponse. This is because we always get a ClientRequest in pre or post hooks but now a ServerResponse.

We can implement these Go structs to create Builders, which may be Generic, or serve a certain purpose like catching pre-hook Container/Create Calls from Docker and setting up socketplane networks, this you will see later. Below are generall function heads that return an Marshaled []byte Go Struct to gorest.ResponseBuilder.Write()

Putting it all together

Powerstrip suggests that adapters be created as Docker containers themselves, so the first step was to create a Dockerfile that built an environment that could run the Go adapter.

Dockerfile Snippets

First, we need a Go environment inside the container, this can be set up like the following. We also need a couple of packages so we include the “go get” lines for these.

Next we need to enable our scipt (ADD’ed earlier in the Dockerfile) to be runnable and use it as an ENTRYPOINT. This script takes commands like run, launch, version, etc

Our Go-based socketplane adapter is laid out like the below. (Mind the certs directory, this was something extra to get it working with a firewall).

“powerstrip/” owns the protocol code, actions are Create.go and Start.go (for pre-hook create and post-hook Start, these get the ClientRequests from:

POST /*/containers/create

And

POST /*/containers/*/start

“adapter/” is the main adapter that processes the top level request and figures out whether it is a prehook or posthook and what URL it matches, it uses a switch function on Type to do this, then sends it on its way to the correct Action within “action/”

“actions” contains the Start and Create actions that process the two pre hook and post hook calls mentioned above. The create hook does most of the work, and I’ll explain a little further down in the post.

Now we can run “docker buid -t powerstrip-socketplane .” in this directory to build the image. Then we use this image to start the adapter like below. Keep in mind the script is actually using the “unattended nopowerstrip” options for socketplane, since were using our own separate one here.

So what’s really going on under the covers?

In my implementation of the powerstrip adapater, the adapter does the following things

Adapter recognizes a Pre-Hook POST /containers/create call and forwards it to PreHookContainersCreate

PreHookContainersCreate checks the client request Body foe the ENV variable SOCKETPLANE_CIDR, if it doesn’t have it it returns like a normal docker request. If it does then it will probe socketplane to see if the network exists or not, if it doesn’t it creates it.

In either case, there will be a “network-only-container” created connected to the OVS VXLAN L2 domain, it will then modify the response body in the ModifiedClientRequest so that the NetworkMode gets changed to –net:container:<new-network-only-container>.

Then upon start the network is up and the container boots likes normal with the correct network namespace connected to the socketplane network.