{code}https://blog.thecodeteam.com
Thu, 22 Feb 2018 22:07:52 +0000en-UShourly1https://wordpress.org/?v=4.9.6A final thank you from the {code} Teamhttps://blog.thecodeteam.com/2018/02/22/final-thank-code-team/
https://blog.thecodeteam.com/2018/02/22/final-thank-code-team/#commentsThu, 22 Feb 2018 19:38:15 +0000https://blog.thecodeteam.com/?p=8665From the bottom of our hearts, we want to extend our greatest THANK YOU to the open source community for the amazing support and interactions you’ve given us these past 3.5 years. It has been an honor and privilege to work with you all. The {code} Team has always been committed to the open source […]

From the bottom of our hearts, we want to extend our greatest THANK YOU to the open source community for the amazing support and interactions you’ve given us these past 3.5 years. It has been an honor and privilege to work with you all. The {code} Team has always been committed to the open source community, and we have had a wonderful time working with all of you. Our most successful open source contributions, REX-Rayand its associated projects, are moving to a neutral governance model which we believe will be a great for continued collaboration and growth of the projects.

There were many success stories throughout the {code} Team’s open source journey:

With you, the {code} Community grew to become one of the largest open source-focused Slack teams with over 5,600 members, collaborating on a wide variety of successful open source projects and initiatives.

The DevHigh5 program was a success, resulting in 100+ open source projects from {code} Community members being shepherded through processes surrounding intellectual property, documentation, and community support.

The {code} Catalyst program brought together 38 amazing individuals from 26 organizations and 10 countries, they are who we consider some of the best and most passionate open source community contributors across the globe.

Communications via {code} Webinars, {code} Catalyst Spotlights, Engineering Roundtables and updates, and speaking engagements at events all over the world helped us all to deliver the open source and community message.

Last but not least, all the fun that we had at {code} Assemblies is just another example of the relationships we built and which will live on within the {code} legacy.

The {code} Team’s last official day is Friday, March 2nd.

Personally and from {code}, we truly cannot thank you enough for all of the joy, feedback and support you’ve given us. We couldn’t have dreamt of a better community of people to work with.

]]>https://blog.thecodeteam.com/2018/02/22/final-thank-code-team/feed/5Context Switching Made Easy under Kubernetes powered Docker for Mac 18.02.0 – Collabnixhttps://blog.thecodeteam.com/2018/02/16/context-switching-made-easy-under-kubernetes-powered-docker-for-mac-18-02-0-collabnix/
https://blog.thecodeteam.com/2018/02/16/context-switching-made-easy-under-kubernetes-powered-docker-for-mac-18-02-0-collabnix/#respondFri, 16 Feb 2018 16:40:00 +0000https://blog.thecodeteam.com/?p=8656Say Bye to Kubectx ! I have been a great fan of kubectx and kubectl which has been a fast way to switch between clusters and namespaces until I came across Docker for Mac 18.02. With the newer Docker for Mac 18.02 RC build, it is just a matter of a “toggle”. Life has become […]

Say Bye to Kubectx ! I have been a great fan of kubectx and kubectl which has been a fast way to switch between clusters and namespaces until I came across Docker for Mac 18.02. With the newer Docker for Mac 18.02 RC build, it is just a matter of a “toggle”. Life has become […]

]]>https://blog.thecodeteam.com/2018/02/16/context-switching-made-easy-under-kubernetes-powered-docker-for-mac-18-02-0-collabnix/feed/0Whose Job Is It Anyway? Kubernetes, CRI, & Container Runtimeshttps://blog.thecodeteam.com/2018/02/15/whose-job-is-it-anyway-kubernetes-cri-container-runtimes/
https://blog.thecodeteam.com/2018/02/15/whose-job-is-it-anyway-kubernetes-cri-container-runtimes/#respondThu, 15 Feb 2018 17:36:09 +0000https://blog.thecodeteam.com/?p=8653A talk given at Cloud Native London meetup, February 6, 2018 on the role of container runtimes in Kubernetes, the introduction of the Container Runtime Interfa…

]]>https://blog.thecodeteam.com/2018/02/15/whose-job-is-it-anyway-kubernetes-cri-container-runtimes/feed/0Role-based Access Control for Kubernetes with Docker EEhttps://blog.thecodeteam.com/2018/02/14/role-based-access-control-for-kubernetes-with-docker-ee/
https://blog.thecodeteam.com/2018/02/14/role-based-access-control-for-kubernetes-with-docker-ee/#respondWed, 14 Feb 2018 14:11:00 +0000https://blog.thecodeteam.com/?p=8649Last week we released the latest beta for Docker Enterprise Edition. Without a doubt one of the most significant features in this release is providing a single management control plane for both Swarm and Kubernetes-based clusters – including clusters made up of both Swarm and Kubernetes workers. This offers customers unparalleled choice in how they manage both their traditional and cloud native applications. When we were looking at doing this release we knew we couldn’t just slap a GUI on top of Kubernetes and call it good. We wanted to find areas where we could simplify and secure the deployment of applications onto Kubernetes nodes. One such area is role-based access control (RBAC). Docker EE 17.06 introduced an enhanced RBAC solution that provided flexible and granular access controls across multiple teams and users. While Kubernetes first introduced a basic RBAC solution with the Continu

Last week we released the latest beta for Docker Enterprise Edition. Without a doubt one of the most significant features in this release is providing a single management control plane for both Swarm and Kubernetes-based clusters – including clusters made up of both Swarm and Kubernetes workers. This offers customers unparalleled choice in how they manage both their traditional and cloud native applications. When we were looking at doing this release we knew we couldn’t just slap a GUI on top of Kubernetes and call it good. We wanted to find areas where we could simplify and secure the deployment of applications onto Kubernetes nodes. One such area is role-based access control (RBAC). Docker EE 17.06 introduced an enhanced RBAC solution that provided flexible and granular access controls across multiple teams and users. While Kubernetes first introduced a basic RBAC solution with the Continu

]]>https://blog.thecodeteam.com/2018/02/14/role-based-access-control-for-kubernetes-with-docker-ee/feed/0Namespace Context Switching Made Easy under Kubernetes powered Docker for Mac 18.02.0 – Collabnixhttps://blog.thecodeteam.com/2018/02/08/namespace-context-switching-made-easy-under-kubernetes-powered-docker-for-mac-18-02-0-collabnix/
https://blog.thecodeteam.com/2018/02/08/namespace-context-switching-made-easy-under-kubernetes-powered-docker-for-mac-18-02-0-collabnix/#respondThu, 08 Feb 2018 11:41:42 +0000https://blog.thecodeteam.com/?p=8636Say Bye to Kubectx ! I have been a great fan of kubectx and kubectl which has been a fast way to switch between clusters and namespaces until I came across Docker for Mac 18.02. With the newer Docker for Mac 18.02 RC build, it is just a matter of a “toggle”. Life has become […]

Say Bye to Kubectx ! I have been a great fan of kubectx and kubectl which has been a fast way to switch between clusters and namespaces until I came across Docker for Mac 18.02. With the newer Docker for Mac 18.02 RC build, it is just a matter of a “toggle”. Life has become […]

]]>https://blog.thecodeteam.com/2018/02/06/containerd-project-update-fosdem-2018/feed/0Getting Started with a Service Mesh – a Linkerd Introhttps://blog.thecodeteam.com/2018/02/05/getting-started-service-mesh-linkerd-intro/
https://blog.thecodeteam.com/2018/02/05/getting-started-service-mesh-linkerd-intro/#respondMon, 05 Feb 2018 14:05:59 +0000https://blog.thecodeteam.com/?p=8535As prominent as microservices have become, there is a need to make sure network services are communicating properly and changes can be done effectively. Techniques such as traffic analysis, blue/green testing, load balancing, circuit breaking, and more, can be boiled into this new networking model called a “service mesh”. To get an introduction on it […]

]]>As prominent as microservices have become, there is a need to make sure network services are communicating properly and changes can be done effectively. Techniques such as traffic analysis, blue/green testing, load balancing, circuit breaking, and more, can be boiled into this new networking model called a “service mesh”. To get an introduction on it all, read William Morgan’s article What’s a service mesh? And why do I need one? from Buoyant. Now as everyday conversation becomes filled with words like Linkerd, Istio, and Envoy, you won’t be as puzzled.

Many companies run internal load-balancers, like HAProxy or F5, that assume responsibility of routing traffic between microservices. This can become cumbersome as these weren’t designed to handle inter-app communication, especially in cloud native applications at large scale. Linkerd replaces this with something tailored.

As of the published date of this article, the Cloud Native Computing Foundation is a sponsor of the Linkerd project in the “inception” phase. This means it has the potential to go to an incubation state if it graduates. Envoy is currently in an incubation state so it’s worth noting there are two routes (no pun intended) to get started with service meshes.

To get a jumpstart on the overview, architecture, and technical jargon of Linkerd, consider watching Alex Leong give a Linkerd 101 talk and take a few minutes to peruse the documentation. Like learning any piece of software, keep in mind what the end state should look like when complete. The goal of learning Linkerd for myself is to eventually run it in a full mesh mode within Kubernetes. The docs show how to do this very easily, however, it doesn’t necessarily explain what’s happening. So this blog starts on a much smaller scale using a single host with Docker installed that simplifies the network. This gives a basic understanding of packet flow.

A single host removes cumbersome networking that could involve things like CNI and port forwarding with Kubernetes. If you know your machine’s IP address, the subnet of the docker0 bridge, and how to expose a port with Docker, then the traffic path becomes very easy to understand.

This simple demonstration will be using NGINX in its most basic form using Linkerd as a proxy to intercept and forward requests. Wrapping everything in containers will make this clean and no binaries need to be installed on your machine. Deploy the NGINX container exposing port 8888 as it’s forwarded to port 80 using the docker command line:

docker run -d --name webtest -p 8888:80 nginx

Verify NGINX access through the configured port of 8888 on the localhost:

The Docker examples miss some key configuration steps and don’t necessarily explain how the router works. If you found this article because Linkerd didn’t work as noted in the examples, it’s because ip is not specified in all the necessary places. Let’s take a look at the Linkerd configuration and what each component is accomplishing.

The admin section is where the Admin UI is configured. The standard default is to specify port 9990 and the local IP of 0.0.0.0. routers is where the magic happens in this example. Linkerd has the ability to work with http, http/2, thrift, and mux protocols. Since NGINX uses the standard web service of HTTP/1.1, the http protocol is noted in the config. It’s possible to have multiple routers all using the http protocol, therefore using a label will differentiate the routers and landing page within the Admin UI. The biggest learning curve are Dtabs. This is the logical path of how traffic is forwarded. In this example, we are taking the default /svc request and translating it to a specific IP address for the deployed NGINX service.

In some cases you could use the local address, but since this is being encapsulated with Docker, the machine’s IP needs to be used. This can be abstracted out further using something like Consul or Zookeeper which could have multiple instances of NGINX for proper load balancing. Dtabs become increasingly more advanced as the notion of prefix matching, namers with service discovery, and wildcards are introduced. The servers is how the service is configured to accept incoming requests. This is port being advertised as a proxy and requests are routed through dtab. For this demo, requests are accepted on port 8080 and sent to NGINX on port 8888. This file will be called webtest.yml

Now it’s time to deploy Linkerd! Using the configuration file, the ports for the Admin UI and the new NGINX proxy service need to be exposed with Docker. In addition, the container needs access to webtest.yml file and will use it as the entrypoint.

Now open a tab to http://localhost:8080 and the NGINX welcome screen will appear. Put this tab alongside the Linkerd Admin UI and verify there is 1 total connection. Repeatedly hit the refresh button to see the requests spike in real time.

You just deployed your first Linkerd service that functioned as a proxy for NGINX! This simple example sets a baseline for understanding traffic flow, port allocation, and more. This is an integral step before diving into more advanced setups that include advanced routers, configuring namers and namerd, and full mesh within Kubernetes.

Before going too far, be aware of HTTPS and TLS. Linkerd has the ability to use TLS but certificates and untrusted domains can make it very difficult to troubleshoot. After discovering the linkerd-tcp project, it showed promise for proxying traffic like MySQL communication and HTTPS (not just http). I was able to successfully use a combination of linkerd, linkerd-tcp, and namerd to create another service mesh for proxying Dell EMC ScaleIO gateway traffic. Unfortunately, the linkerd-tcp project looks to be stalled and no longer getting attention. The Docker image is out of date, building from source is broken, and there is only the ability to view metrics by forwarding to Prometheus.Linkerd-tcp has recently been updated to improve parity. The next phase of Buoyant’s journey is a new service mesh called Conduit that is tailor-made for Kubernetes. Learn more about Conduit from The New Stack’s podcast at KubeCon 2017. Welcome to the fast-paced and grueling ecosystem of service meshes!

]]>https://blog.thecodeteam.com/2018/02/05/getting-started-service-mesh-linkerd-intro/feed/0Open Source, Container, and Community Events for 2018https://blog.thecodeteam.com/2018/01/25/open-source-container-community-events-2018/
https://blog.thecodeteam.com/2018/01/25/open-source-container-community-events-2018/#respondThu, 25 Jan 2018 18:52:01 +0000https://blog.thecodeteam.com/?p=8454The {code} team’s 2018 events calendar is coming together. This is where to find us so we can connect with you to talk community, open source, and tech! We’ll be updating this list as we add events and specifics on our activities, so visit our events page periodically for the updated information, including registration discount […]

]]>The {code} team’s 2018 events calendar is coming together. This is where to find us so we can connect with you to talk community, open source, and tech!

We’ll be updating this list as we add events and specifics on our activities, so visit our events page periodically for the updated information, including registration discount codes, speaking sessions, and special events.

Come see how the rise of Artificial Intelligence and Internet-of-Things intersects with the open source community with the {code} Team. Be a part of a short series of interactive lightning talks and compete in a live open-source space racing video game premiere.

In the Code & Modern Ops track at Dell Technologies World, learn new methodologies to better manage your cloud-native infrastructure, design patterns like Microservices and the 12-Factor, and organizational best practices like Agile and DevOps. Our code and modern ops track covers 30+ in-depth topics to meet your transformation needs.

The Cloud Native Computing Foundation’s flagship conference gathers adopters and technologists from leading open source and cloud native communities. Join Kubernetes, Prometheus OpenTracing, Fluentd, Linkerd, gRPC, CoreDNS, containerd, rkt, CNI, and more, as the community gathers for three days to further the education and advancement of cloud native computing.

DockerCon is the community and container industry conference for makers and operators of next generation distributed apps built with containers. The three-day conference provides talks by practitioners, hands-on labs, an expo hall of Docker ecosystem innovators and great opportunities to share your experiences with other virtual container enthusiasts.

Open Source Summit Europe connects the open source ecosystem, delving into the newest technologies and latest trends touching open source, including networking, serverless, edge computing, AI and much more.

KubeCon + CloudNativeCon gathers users adopters and technologists from leading open source and cloud native communities to further the education and advancement of cloud native computing.

Not attending any of the conferences? There are other ways to connect! If you live in a city where these conference will be held, you can always reach out to us directly on Slack or Twitter to find out opportunities to meet us outside of the event such as a {code} Assembly or other get-togethers where we talk tech.

It’s going to be a great year and we are looking forward to seeing you soon!

]]>https://blog.thecodeteam.com/2018/01/25/open-source-container-community-events-2018/feed/0The Open Source {code} Storyhttps://blog.thecodeteam.com/2018/01/17/open-source-code-story/
https://blog.thecodeteam.com/2018/01/17/open-source-code-story/#respondWed, 17 Jan 2018 18:37:47 +0000https://blog.thecodeteam.com/?p=8429When looking at the open source community one thing really stands out: transparency. This incredibly important aspect makes it possible for everyone to see what’s happening within a project, and also helps others when looking for answers. We’ve learned a few things along the way through working with the community and us all sharing best […]

]]>When looking at the open source community one thing really stands out: transparency. This incredibly important aspect makes it possible for everyone to see what’s happening within a project, and also helps others when looking for answers. We’ve learned a few things along the way through working with the community and us all sharing best practices, and as part of the 20th anniversary of open source we would like to share our story with you.

This blog post will cover the {code} Team, the {code} Community and what it’s made up of, and the DevHigh5 and {code} Catalyst programs.

Recognizing the big shift

No one working in cloud and data centers should be surprised that organizations have changed how they run their IT departments. Applications are written and deployed differently, moving away from monoliths to microservices. Organizations operate their data centers by applying development principles to operations through open source software and community collaboration. Open source software is used heavily in development, testing, and production. In a survey done in 2016, 90% of respondents say open source improves their efficiency, interoperability, and innovation, and 65% of companies are contributing to open source projects.

This type of “innovation-through-openness” has proven that global collaboration on code and inclusivity of diverse intellectual contributions advance the technological state of the art and solve problems faster.

Recognizing this shift, Dell Technologies (whose family of brands include Dell EMC) knew that—in order to stay relevant in the data center and software infrastructure of the future—it needed to invest in its own open source initiative. When reaching out to users to understand why they were adopting open source software, it wasn’t necessarily about cost or that they wanted to contribute back to the project. The main reasons users wanted open source was that it provides them with freedom, innovation, flexibility, and integration:

Users want the freedom to run software anywhere, for any purpose

Users want the opportunity to innovate, develop and participate in open source projects

Users want the flexibility to choose the software and hardware that fits their needs

Users want to be able to integrate software with existing infrastructure

On August 29, 2014 {code} launched as a strategic initiative with support from executive management. Three main principles drive {code}’s approach to open source:

Open source efforts are developed in the best interests of the community

Projects are executed with complete transparency and openness

Open source technologies are made to be consumable by the widest range of users and organizations

The {code} Team contributes to and creates open source projects, acts in the interest of building a community, and drives awareness of emerging technology trends. It consists of three programs, each operating with these core tenets in mind: the {code} Community, the DevHigh5 program, and the {code} Catalyst program.

The {code} Community started in June 2015, and has grown to more than 4,800 members who have open dialogues across company boundaries on topics ranging from contributions to cloud native projects, persistent storage in containers, virtual reality, and hardware hacking. Members include developers, project managers, users, recruiters, and tinkerers.

Through the DevHigh5 program, {code} has created and shepherded more than 100 open source projects which solve community challenges. Through guidance, promotion, and community support, these projects are able to thrive and get the recognition they deserve.

The {code} Catalyst program brings together passionate open source aficionados from across the globe. The program is focused on promoting their work and establishing an ecosystem of creative individuals who improve and move open source forward.

This article explains how {code}’s community-oriented approach has helped Dell Technologies and Dell EMC achieve new innovations through its participation in community-focused efforts that focuses on transparency, inclusivity, and collaboration.

Introducing the {code} Community

“We need a way to communicate with other developers who are interested in open source.”

That statement drives the {code} Community and its activities. In 2015, the {code} Team identified the need for a place where internal and external developers could communicate, collaborate on projects, and promote their work. With this in mind, the team crafted a plan to build a community of and for open source developers. When {code} looked at different methods of communicating across teams and company borders, the team noticed that there were several modern approaches available—something other than distribution lists and forums—eventually leading to the decision to adopt Slack as the community’s primary platform for communication and collaboration.

At the time, there were no indications that the {code} Community would ever grow as large as it has, encompass as many people and projects as it currently does, or have as big an impact on the wider organization as it currently does.

On June 18, 2015, the doors to the {code} Community on Slack opened, and invitations were sent to internal employees who were already involved in or wanted to know more about open source. Shortly after that, {code} established a public community website to make sure people could join without needing a personal invitation. The {code} Community quickly grew to 30 members, then 50, then 100, and, within just nine months, reached 1,000 members. The most amazing aspect of this growth was that internal employees weren’t the only people participating; users, partners, and customers of open source projects from {code} all wanted to interact and collaborate. Even direct competitors are part of the {code} Community, which says a lot about the nature of the open source community itself.

Since the {code} Community is open to everyone, everyone needs to follow the rules of the community. Community members must all agree to adhere to the Code of Conduct before they are able to join, and guidelines for contributing to different parts of the {code} Community are communicated to every new member with an automated message as soon as they join. Based on these ground rules, the {code} Community members engage each other in collaboration at both strategic and engineering levels. The members continuously discuss new ideas and challenges around cloud native projects, persistent storage in containers, virtual reality, hardware hacking, drone racing, and much more. They help each other get inspired, suggest reading and learning material, and debug and fix issues, regardless of organizational affiliation.

By the time the {code} Community celebrated its second anniversary in June 2017, it had more than 3,600 members. It’s still growing at an exponential rate, leading up to more than 5,000 members in December of 2017.

By having an open mindset and using modern communication and collaboration tools, the {code} Community has worked to institute best practices for how Dell Technologies integrates into the open source community. There are large and small open source projects run in the open by Dell Technologies’ employees and business units, shared between and collaborated on with thousands of community members. This direct feedback-loop enhances innovation, speeds up development and shows that Dell Technologies is focused and invested in the future of open source software, driving the future of IT.

The DevHigh5 program

“How can we make it easier for users, partners, and employees to open source and promote their projects?”

That question drives the DevHigh5 program. After starting the {code} Team, it was quickly realized that many individuals within the organization shared the belief that software should be open source and shared with the world. Employees had been working on tools, scripts, and applications to augment existing products and solutions, and the {code} Team was delighted to see that this was not just a one-time occurrence but rather that ongoing projects lived and thrived in the open source community. The fact that there was a group of individuals who were interested in contributing and giving back to the open source community made the creation of the DevHigh5 program easier than anticipated.

The DevHigh5 program was launched in November 2014 to recognize and promote open source contributions from users, partners, and employees. This promotion is done through social media, prominent placement on the {code} Team’s project site, guest blog posts, newsletter, visibility at open source tradeshows, and featured conference sessions. DevHigh5 projects range from those developed by individuals to those developed by business units.

The DevHigh5 program helps projects go from unpublished to fully open sourced. The program gives guidance on how to structure the project code; helps with naming, documentation, licenses, and logos; and gives the project a place in the {code} Community to continue working on the project in the open.

Throughout the lifespan of the DevHigh5 program, many project owners have approached the {code} Team with questions about how to run projects in the open, build communities around their projects, and work as good open source citizens. They ask for guidance on how to best approach the open source community, how to share information without sharing confidential IP, and how get feedback and contributions on projects by utilizing the {code} Community. The {code} Team has been very fortunate to see many of these interactions end up in successful open source projects such as REX-Ray and RackHD, with internal staff, external partners, and users working and collaborating side by side in the open to create and innovate.

By being inclusive and acting in the interest of building a community focused on promoting the work of others, the DevHigh5 program has shepherded and promoted more than 100 open source projects. This has helped to support an open culture between Dell Technologies and its users, partners, and employees—leading to more customer deployments, faster feedback loops, and greater innovation that enrich both the community and the business value it provides.

The {code} Catalyst program

“How can we help promote the work of great open source minds across the world, and create an ecosystem of those who lead and advance emerging technologies?”

That question guides the third and final component of {code}’s community strategy.

As the {code} Community and its projects continued to grow in popularity, there was a need to expand the community to involve open source leaders who are passionate about new technologies and sharing knowledge.

Launched in December 2016, the {code} Catalyst program brings together prominent members of the open source community across the globe. The members are passionate open source aficionados, bloggers, professional speakers, book authors, community leaders, and developers. The program is designed to promote the work and advocacy of the {code} Catalyst members, and establish an ecosystem of creative individuals who improve and advance the open source space.

With the focus of the {code} Catalyst program being on global collaboration and promotion, individuals who may seem like competitors based on their respective organization affiliation are now part of the same community, all pushing for the same goal: bringing the best out of the open source community.

As a way of giving back to the open source community, the {code} Catalyst program covers several ways of supporting and promoting each member. This includes promoting their work on social media, producing public video interviews, supporting them in the CFP process, co-presenting to a global audience at virtual and physical events, participating in engineering roundtables, providing early access to project information, attending exclusive {code} Assemblies that bring together open source leaders at events worldwide, interacting with the {code} Community, and networking with industry luminaries.

{code} Catalyst members are seen as open source leaders and provide advancements in many areas of the open source community. They are teaching others by sharing knowledge in the {code} Community, presenting at monthly {code} Webinars or at global events, and blogging and writing on interesting open source topics. They are also a part of larger conversations around current and possible future {code} related projects, giving valuable feedback that helps inform project roadmaps. The members are also asked to give feedback on how to improve the program to ensure that the {code} Catalyst program is constantly growing and changing to become a better and more engaging place for everyone involved.

Final thoughts and conclusion

By focusing on transparency, inclusivity, adaptability, collaboration, and the {code} Community, a space has been created within Dell Technologies for open source to thrive. Several factors have led to the success of the {code} Team and the {code} Community:

Executive support was critical for getting the open source initiative started and for its continued growth. This helped the {code} Team greatly when getting started as we needed to encourage other internal teams to fully understand open source and its consequences and benefits.

The fact that there were already many individuals within the organization who shared our open source mindset helped make the transition from closed-source-only to open source-friendly an easier (but still daunting) task. This was the basis of the {code} Community and also drove the DevHigh5 program from the start. The support from the DevHigh5 contributors has been extremely important for the team’s mission and the community.

The corporate support we received from legal for licensing and marketing for public relations ensured that projects were vetted and promoted properly. This led to having a simplified process that lowers the burden on the creators and on the {code} Team, while still ensuring accountability and responsibility when publishing open source code. This was crucial to the success of several open source projects.

Supporting the organization as it continues to shift toward becoming a large contributor has furthered Dell Technologies’ trajectory in this area. By being involved in open source projects, taking leadership roles, and embracing the community, we are now involved in many large open source projects that transform the way users all over the globe manage their IT. This provides high strategic value for the organization’s products and its relationships with its customers. This also strengthens our credibility with customers: Visible contribution to projects they are leveraging builds trust by demonstrating a commitment to a shared vision of future IT management.

By ensuring that the {code} Community, the DevHigh5 program, and the {code} Catalyst program are completely open to everyone, foster creativity, and value member contributions, {code} now has the ability to reach and collaborate with more people than ever before and be involved in new trends that are impacting the global IT market.

]]>https://blog.thecodeteam.com/2018/01/17/open-source-code-story/feed/0The Changing Face of Data Analytics – Fast Data displaces Big Datahttps://blog.thecodeteam.com/2018/01/09/changing-face-data-analytics-fast-data-displaces-big-data/
https://blog.thecodeteam.com/2018/01/09/changing-face-data-analytics-fast-data-displaces-big-data/#respondTue, 09 Jan 2018 22:07:57 +0000https://blog.thecodeteam.com/?p=8309Introduction Big Data, Fast Data, Internet of Things, Machine Learning – what is the current landscape? There are many tools – Hadoop, Kafka, Spark to make the management of data easier and faster, but what is the right approach? I will contend that there is no one “right approach”. Solutions are changing fast and often. […]

Big Data, Fast Data, Internet of Things, Machine Learning – what is the current landscape? There are many tools – Hadoop, Kafka, Spark to make the management of data easier and faster, but what is the right approach?

I will contend that there is no one “right approach”. Solutions are changing fast and often. This doesn’t mean that you should do nothing and wait. You should understand the tools available, your needs, and then proceed with a “big picture” approach using a platform that offers support for all the popular tools, along with the collateral business logic you will need to interact with your analytics.

In this blog I explain the current environment, existing tools, the pros and cons, and tips on how you can navigate the changing face of data analytics.

Want to hear more, in person? Join me and the Mesosphere team for a Meetup January 11 7-9pm in Playa del Rey, CA. Register here.

Hadoop Origin Story

When Google published the “Google File System” and “MapReduce” papers, it inspired a developer community to apply the concepts in what became the Hadoop project, released in 2006.

Hadoop saw a quick uptake with some rising internet giants of the era (Yahoo, Twitter, Facebook, and LinkedIn). Later, supported distributions (Cloudera, Hortonworks) led traditional enterprises to embark on Hadoop based projects.

The path to present day has not been without issues. Some might even say that Hadoop has lived a reverse “superhero origin story”, starting out with much excitement and fanfare, only to see the world move away, becoming an orphan as distros bundle Spark for use with Hadoop’s Zookeeper and HDFS components.

“I can’t find a happy Hadoop customer. It’s sort of as simple as that,” says Bob Muglia. His role CEO of Snowflake Computing may justify suspicion, but Bobby Johnson who built Facebook’s Hadoop analytics stack is also a critic: “there’s a bunch of things that people have been trying to do with it for a long time that it’s just not well suited for.” He goes on to criticize Hadoop’s complexity and performance. Source: “Hadoop Has Failed Us, Tech Experts Say”.

Some criticism may be based on unwarranted expectations, along with advances in technology. A lot has changed in the 11 years since Hadoop’s architecture was forged. Network bandwidth is higher, storage latency lower, and many use cases call for near immediate results.

Hadoop was built for batch processing of big datasets. It delivered advantages compared to traditional relational databases when dealing with unstructured data, but was subject to misapplication and disappointing results in use cases needing transaction processing and connections to legacy business logic built for relational databases. Batch processing is a poor match for interactive user facing applications.

The Changing Nature of Demand

The world is moving away from batch oriented to stream oriented data processing because people want notifications and answers faster. Batch will never go away, but you can run batch jobs through stream processing, while the converse is problematic. It very difficult to serve mobile apps, IoT, interactive gaming and many other application types responsively through batch processing.

“Most decisions should probably be made with somewhere around 70% of the information you wish you had. If you wait for 90%, in most cases, you’re probably being slow.”

Jeff Bezos

If you follow discussions about the Internet of Things, you’ve heard some stunning predictions of devices counts and markets size. Even if you are skeptical as to the exact numbers, it is fair to make these observations:

IoT is going to generate way more data than we have today. Even a tiny device can generate a lot of data. And we will have a lot of devices.

Tiny devices are not going to hold data for long. It will need to go somewhere, or be lost.

Some control loops and user interaction requirements will demand low latency or be so critical (public safety or economic loss issues) that processing will need to be done quickly, and near the data origination point, rather than in a centralized cloud.

While there are tasks that can be characterized as having a discrete lifecycle with a start and end, other activities are essentially continuous processing with never ending inputs and outputs.

If you intend to utilize IOT inputs for business purposes, and don’t take advantage of machine learning in the workflow, you will miss opportunities, be too slow, and incur unneeded costs.

The Progression of Data Analytics to Become Faster

Hadoop is a distributed computing framework. It uses the MapReduce programming paradigm to execute jobs in parallel on a multi-node cluster. The Hadoop project spawned an ecosystem of projects. Some like Zookeeper and the HDFS filesystem continue to be repurposed and used in other projects. Others such as MapReduce, have been effectively left behind by replacements with more attractive attributes.

Memory vs Disk

Hadoop IO bandwidth: HDFS (based on disk) was OK, in 2006. It’s use of HDFS to stage intermediate results is a processing speed “bottleneck” by modern standards.

Spark is designed to do faster processing on in-memory data. It can utilize Hadoop’s HDFS where persistence is required. It is a distributed computing library – not a complete framework – it is designed to work as a pipeline used with additional “plug-able” components to ingest, process and store data.

Batch vs Micro-batching

The move to memory vs. disk is not the only avenue of change in data analytics.

Hadoop is limited to batch processing of a job.

Spark can be used for batch processes, as well as near real time stream processing. Spark supports “micro-batching” where batch sizes are many times smaller. (Spark does not support streaming in the strictest sense but it can provide pseudo real time streaming when used with a very small batch size.)

Other open source projects such as Apache Storm do stream processing in a strict sense (processing handles a single event at a time), but can also support micro-batching. (For comparison, Spark might have a latency measured in seconds, while Storm might produce results in milliseconds.)

Dealing with Complexity

Bringing Home the “free puppy”

Many an enterprise analytics project got started when a developer went to a tech conference and had an “Aha!, what if?” experience.

Alas, there is rarely a single data input source

A prototype looks good and an initial proof of concept succeeds. But as the project moves toward production it is scaled up with more than one data source.

And there is rarely a single use for a data input source

And other useful applications for these same inputs are identified. Some uses might accommodate batch processing. Others might demand faster processing (micro-batching or true streaming).

Suddenly the solution becomes the problem

A combinatorial explosion becomes difficult to manage – from a development and operational perspective.

Enter the message exchange model

Apache Kafka stores messages which come from arbitrarily many processes called “producers”. The data can thereby be partitioned in different “partitions” within different “topics”. Within a partition the messages are indexed and stored together with a timestamp. Other processes called “consumers” can query messages from partitions. Kafka runs on a cluster of one or more servers and the partitions can be distributed across cluster nodes.

Apache Kafka efficiently processes the real-time and streaming data when implemented along with Apache Storm, Apache HBase, and Apache Spark. Deployed as a cluster on multiple servers, Kafka handles its entire publish and subscribe messaging system with the help of four APIs.

Kafka creator Jay Kreps was responsible for running a large Hadoop cluster at LinkedIn. Kreps and other engineers from the project left to form Confluent, a company focussed on Kafka.

Kafka is designed to be a general purpose message broker that can handle millions of rapid-fire events. It features low latency, with “at-least-once”, delivery of messages to consumers. Kafka also supports queuing of data for offline consumers, supporting both real-time or offline consumption – which is useful for batch operations, and tolerance of temporary outages at a consumption layer. Performance is compatible with real time use cases.

Future

The progression from batch, to microbatch, to streaming, to messaging is not likely to stop there. This is suitable for many classes of application – but not all. And if there is any lesson to be observed in the history of data analytics, it is that human ingenuity will result in new and better solutions in the near future.

Kafka can generally deliver “at least once” delivery, but a playful mocking remark is that the two hardest problems in distributed systems are:

Exactly once delivery of messages

Preservation of order of messages

Exactly once delivery of messages

Certain patterns can be used to throw out duplicates, but a standardized solution would be attractive for some forms of applications.

Application Requirements variation

Pravega is an example of recent work that is continuing the pattern of new open source projects, attempting to supplement, or replacing the old. It is a streaming storage solution that can deliver transactional behavior with exactly once delivery and order preservation, along with long term data retention

How to Choose your Analytics Solution

If you have been following along, you might conclude that you should just jump in and deploy Hadoop, Spark, Kafka. Alas it’s not that simple.

Any realistic production application will add more components: business logic, a user interface, and an assortment of collateral stateful backing stores and microservices. These will plug into and out of your analytics pipeline.

And if there is a lesson to be learned from the 10+ history since the advent of Hadoop – whatever is the “best” solution at this point in time, will not be in a few years.

Under the Apache Foundation alone there are 38 projects under the “big data” category, perhaps 12 of which can be classified as built to handle streaming. We will define streaming as a never ending sequence of records potentially originating from multiple sources.

Tradeoffs include low vs high latency, ability to horizontally scale to handle changing volume, machine learning support, filtering and transform plugin support, and support for legacy interfaces such as SQL.

This scenario is not unlike what happened in the 90’s during the early days of the internet. Someone circa 1995 might have asked “What language should I use to build a website?” This would have been the wrong question to ask.

In the 90’s, books and articles were published, with tables comparing features of Java vs .net etc. These did little more than list the general properties of each platform, leaving an impression that you might be equally successful with any of them. While technically true, the world quickly moved on to more highly evolved “suites” that offered publication of your website at a much much higher level of abstraction – with perhaps the most important feature being flourishing user communities that supported conferences, support forums, and best-practice articles and presentations.

Something very much like this is taking place now in the form of the “SMACK Stack”.

Your solution should be based on Container Orchestration

Rather than give you a table comparing features of 12+ streaming analytics tools, (likely obsolete within weeks anyway), my advice is to seek a platform which can host all the popular tools, and then choose whatever shows the most traction in the form of user experience stories in media and at tech conferences. Be prepared to see your choice leapfrogged a few years later. Expect that you will use multiple, different streaming platforms simultaneously, either because you are in transition, or have applications with varying requirements. Accept that this is a good thing. This is exactly why you need a platform that is versatile and flexible enough to host whatever comes along.

Rather than pick a winner between VHS and BetaMax, understand that a replacement by DVD, Blu-ray, etc. will come along. Base your technology choice on flexibility to change.

You also want a platform that supports whatever collateral applications and services you are likely to use. Things like TensorFlow, SQL and NoSQL datastores, and scalable user interface platforms.

The Internet of Things may lead you to require data ingestion and analytics compute capacity at edge locations, along with applications in public clouds. Bandwidth constraints, response latency needs, and resiliency can demand local processing and data reduction. Having a solution that makes applications run portably from edge to public cloud matters.

You need to retain freedom to choose what you run and where you run it. Even if you identify and deploy the best solution available today, something better will eventually come around. What you need is a flexible platform that lets you deploy not just analytics tools of your choice, but the collateral applications and services that are needed to go with it. And you want this to be simple to scale and maintain. You want to train your staff in one technology/API and reuse these skills everywhere.

Expect that your needs will change annually, seasonally and even on intervals of seconds based on changing workloads

The Apache Mesos, DC/OS, and Kubernetes platforms are the leading candidates for this role. These fall into the category of container orchestrators, and can be overlayed on top of public clouds or an on-prem data center. They are designed to deploy broad ranges of applications and services under automation.

These orchestrators handle mapping of dynamically changing workloads to variable resources, along with the service discovery, networking, and security issues that arise in practice. Their use of container technology allows better probability and more responsive scaling compared to virtualization under hypervisors alone.

Apache Mesos and DC/OS

Most of the leading open source analytics projects are sponsored by the Apache foundation, leading to a high correlation of user deployments on the Mesos platform. These projects publish container images as part of their release cycles. So a platform that allows to to deploy workloads at a container level of abstraction is critical.

Kubernetes (and DC/OS, SWARM, PKS distributions)

Kubernetes can be used in conjunction with the Helm project, StatefulSets, and Operators to automate deployments and management of a modern analytics applications stack. Example: https://github.com/kubernetes/charts/tree/master/incubator/kafka