Open Source Insider

Companies normally keep things pretty quiet in the run up to their annual user conferences, so they can pepper the press with a bag of announcements designed to show how much market momentum and traction that have going.

Not so with ScyllaDB, the company has been dropping updates in advance of its Scylla Summit event in what is perhaps an unusually vocal kind of way.

The company has just announced Alternator, an open source software project designed to enable application-level and API-level compatibility between Scylla and Amazon’s NoSQL cloud database, Amazon DynamoDB.

Scylla itself is a real-time big data database that is fully compatible with Apache Cassandra and is known for its ‘shared-nothing’ approach (a distributed-computing architecture in which each update request is satisfied by a single node –processor/memory/storage unit to increase throughput and storage capacity.

Scylla’s DynamoDB-compatible API will be available for use with Scylla Open Source, supporting the majority of DynamoDB use cases and features.

Alternator allows DynamoDB users to migrate to an open source database that runs anywhere i.e. on any cloud platform, on-premises, on bare-metal, virtual machines or Kubernetes.

Reversing a trend

“Cloud vendors routinely commercialise open source software,” said Dor Laor, CEO and co-founder, ScyllaDB. “With Alternator, we’re reversing that trend by creating open source options for a commercial cloud product. Open source software is all about disrupting the existing model and creating new opportunities for users. True to our roots, we’ve first released the Alternator source upstream for feedback and exploration; later this year we’ll incorporate it in our free open source distribution, followed by our enterprise and hosted products.”

Both Scylla and DynamoDB have their roots in the Dynamo paper, which described a NoSQL database with ‘tunable’ consistency.

Scylla’s close-to-the-hardware design claims to improves on DynamoDB’s price/performance ratio, which is meant to democratize access to real-time big data.

Cluster luck

Alternator also frees developers to access their data with fewer limits by eliminating payment per operation — they can run as many operations as their clusters support.

Let’s also note that Alternator gives developers the ability to control the number of replicas and the balance of cost vs. redundancy to suit their applications. They can set and change the replica number per datacentre, the number of zones and the consistency level on a per-query basis.

Software is eating the world… and open source software is creating a new set of recipes, chewing it up and sticking it all into a completely different kind of sandwich with a whole new set of condiments and relishes.

Sumo Logic says that there is open source disruption in as many as four of the six key levels of the of the traditional IT stack.

The machine-generated data logs & metrics management company made the statement at its annual Illuminate developer conference held in California this September.

The company states that open source has disrupted the modern application stack – and that: today we see that four of the six tiers that make up the modern application stack have been disrupted by open source – and open source solutions for containers, orchestration, infrastructure and application services are leading this transformation.

This is perhaps a good point to stand back and ask what those six levels might be according to the Sumo Logic view of the world.

Six pack stack

DevSecOps management

Application services

Custom application code

Application runtime infrastructure

Database and storage services

Infrastructure, container and orchestration

Staying in open source, the company says that as customers adopt multi-cloud, Kubernetes adoption significantly rises. Enterprises are betting on Kubernetes to drive their multi-cloud strategies.

“Multi-cloud and open source technologies, specifically Kubernetes, are hand-in-hand dramatically reshaping the future of the modern application stack,” said Kalyan Ramanathan, vice president of product marketing for Sumo Logic. “For companies, the increased adoption of services to enable and secure a multi-cloud strategy are adding more complexity and noise, which current legacy analytics solutions can’t handle. To address this complexity, companies will need a continuous intelligence strategy that consolidates all of their data into a single pane of glass to close the intelligence gap.”

The logical question we must ask, then, is… how long will it be before open source exerts massive disruption levels every level of the 6-layer stack?

The two areas least affected (as per Sumo Logic’s yardstick at least) are application services and, slightly quirkily, the infrastructure layer… even though this layer is essentially born of open source technologies in the first place, the point is that the disruption factor here is lower.

The recent Open Source Summit was held in the balmy climes of San Diego and, among the news emanating from the event itself, the Computer Weekly Open Source Insider team were made aware of announcements made by The Linux Foundation itself.

The foundation announced its intent to form the non-profit Confidential Computing Consortium.

Across industries, computing is moving to span multiple environments, from on premises to public cloud to edge. As companies move these workloads to different environments, they need protection controls for sensitive IP and workload data and are increasingly seeking greater assurances and more transparency of these controls.

Current approaches in cloud computing address data at rest and in transit — but encrypting data-in-use is considered the third and possibly most challenging step to providing a fully encrypted lifecycle for sensitive data.

What is confidential computing?

Confidential computing will enable encrypted data to be processed in memory without exposing it to the rest of the system and reduce exposure for sensitive data and so, it is claimed, this will provide greater control and transparency for users.

The first project to be contributed to the Consortium is the Open Enclave SDK, an open source framework that allows developers to build Trusted Execution Environment (TEE) applications using a single enclaving abstraction. Developers can build applications once that run across multiple TEE architectures.

The Confidential Computing Consortium will bring together hardware vendors, cloud providers, developers, open source experts and academics to accelerate the confidential computing market; influence technical and regulatory standards; and build open source tools that provide the right environment for TEE development. The organisation will also anchor industry outreach and education initiatives.

“Confidential computing provides new capabilities for cloud customers to reduce trusted computing base in cloud environments and protect their data during runtime. Alibaba launched Alibaba Encrypted Computing technology powered by Intel SGX in Sep 2017 and has provided commercial cloud servers with SGX capability to our customers since April 2018. We are very excited to join CCC and work with the community to build a better confidential computing ecosystem,” said Xiaoning Li, chief security architect, Alibaba Cloud.

Google VP of security Royal Hansen added to this story by noting that for users to make the best choice in terms of how to protect their workloads, they need to be met with a common language and understanding around confidential computing.

“As the open source community introduces new projects like Asylo and OpenEnclave SDK, and hardware vendors introduce new CPU features that change how we think about protecting programs, operating systems, and virtual machines, groups like the Confidential Computing Consortium will help companies and users understand its benefits and apply these new security capabilities to their needs,” said Hansen.

The proposed structure for the Consortium includes a Governing Board, a Technical Advisory Council and separate technical oversight for each technical project.

Scylla [pronounced: sill-la] was (and to all intents and purposes still is) a Greek god era sea monster whose mission is to haunt and torment the rocks of a narrow strait of water opposite the Charybdis [pronounced: car-ib-diss] whirlpool.

Ships who sailed too close to her (she was thought to have been created from a beautiful nymph) rocks would risk having sailors killed by the razor-sharp shards of Scylla’s darting heads.

Scylla and ScyllaDB on the other hand are neither mythological, sea-based or dangerous to your health… but this open source-centric real-time big data database does have shards.

Scylla uses a sharded design on each node, meaning each CPU core handles a different subset of data. It is fully compatible with Apache Cassandra and embraces a shared-nothing approach that increases throughput and storage capacity as much as 10X that of Cassandra.

Shared-nothing

A shared-nothing architecture (SN) is a distributed computing architecture in which each update request is satisfied by a single node (processor/memory/storage unit) — the intent is to eliminate contention among nodes… and in this way, nodes do not share memory or storage because they access them independently.

ScyllaDB has now announced that Southeast Asia’s leading ‘super app’, Grab, is using Scylla Enterprise to support real-time data processing and time-series data for the millions of transactions it processes every day.

What is a super app?

According to gojek, “A super app is many apps within an umbrella app. It’s an OS that unbundles the tyranny of apps. It’s the portal to the Internet for a mobile-first generation. More often than not, it will likely be operate at the intersection of logistics/hyper-local delivery, commerce, payments and social.”

Grab offers a wide range of everyday services such as on-demand rides, food and package delivery and mobile payments. The Grab app has been downloaded over 152 million times and its service area covers Cambodia, Indonesia, Malaysia, Myanmar, the Philippines, Singapore, Thailand and Vietnam.

Every Grab transaction needs to be processed in near-real time.

To ensure its operations stay fast, responsive and reliable, Grab developed a microservices architecture based on data streaming with Apache Kafka. These streams power Grab’s business and provide a source of intelligence.

Close to the metal

Grab’s engineering teams aggregate and republish the streams using a low-latency metadata store built on Scylla Enterprise, a massively scalable real-time NoSQL database designed for high-performance applications. Coded ‘close to the metal’ in C++, Scylla Enterprise offers high availability with improved latency, throughput, resource efficiency and fault tolerance.

Grab’s Trust and Safety Engineering team uses the metadata stored in Scylla to monitor and analyse activity streams for signs of illicit transactions.

“Scylla does a fantastic job for us in helping Grab process the massive amounts of data generated everyday, and turns it into usable information for the rest of the business. When you deal with millions and millions of events coming in everyday, being able to efficiently find and remove duplicate events, pull together aggregate data, and run joins across multiple real-time streams can be challenging,” said Brian Trenbeath, a technical program manager at Grab.

He has pointed to the stream of articles that agonise over how to extract the most from that scarce resource, ‘the good developer’.

Some push it and write about how to turn ‘good developers’ (by which they mean anyone with a modicum of skill) into spectacular coding robots that churn out thousands of lines of perfect user-centric code.

But, he argues, it can’t be done.

So what does ‘good’ even look like…? … and when developers are good, what keeps them motivated?

Drumgoole writes…

On the subject of code, you’d be surprised to learn that good and bad code can often look amazingly similar. So it [i.e. code] alone cannot always be easily analysed. This static analysis rarely uncovers the kinds of live problems that really destroy a system’s utility.

So if we don’t know what’s good, how do we define better?

Instead of defining good systems, we should try and define good programmers in some abstract way. What mould do they fit into? Do they work well with people? What’s their past experience? This somewhat intangible (dubious even) list goes on.

Aye, there’s the runtime rub

Well here’s the rub of it. Even the best developers (especially the best developers) don’t know what makes them better than the average developer. What’s worse, bad developers don’t know why they are substandard. So when we talk about what motivates developers we cannot afford to talk at a level of abstraction that could be the same for doctors, firemen, lawyers or any post-graduate profession.

Programming, software engineering and development – whatever you choose to call it – is just different.

Let me tell you from my experience what the best programmers look like and what continues to motivate them:

Constant Learning: Smart people can usually weed out low performers much faster than management can ever detect them. I worked at a large investment bank in London and the team I was on was responsible for core infrastructure. They simply wouldn’t tolerate people that could not operate at their level but they created some of the finest software of their generation (and it’s still being used to this day). They also continually learned from each other. Once you find yourself in a team where everyone is challenged and in pulling in the same direction – that’s powerful.

Fear of defeat: The willingness to apply themselves to a mountain of minutiae to get to the bottom of a problem. The original hacker team in the Trinity College maths department trying to boot a 4.1 BSD tape from which they had not viable tape drives (they managed it after a few weeks) is a good one. The programmer wrote out hex dumps by hand from an in-memory dump to debug a memory link in that same bank. Now that’s something.

Being the resilient ones: Great developers believe in themselves, bet on themselves and have faith in themselves to get the job done. Programming is often relentless so that inner belief is a key driving force. Feel like you don’t have it? Embrace it. Don’t wallow or dwell on failures; acknowledge the situation, learn from your mistakes, and move forward. Now that’s resilience.

Seeing the job through: The need to get a system delivered, documented, used as a first priority. In the developer’s mind, half done is simply not done. We’ve all started a book and given up, remember how bad that felt? On the flip slide, completion never felt so sweet. Now go off and tell all your friends about it.

Taking full responsibility: The capability to apply themselves to not just the ‘cool’ part of the system but the boring dingy parts: the localisation, the documentation, the logging, the build system and installer requires dedication.

Interesting right?

What’s clear is that cultures must be built around a set of values that ensure that these kinds of behaviours can predominate and flourish.

Go-faster bean bags

If you build a culture like this… then suddenly the foosball table and the bean bags will seem like the management equivalent of go-faster stripes.

People like nice offices and decent salaries (that’s table stakes) but they will neither join nor stay for those aspects of the job alone. They stay for great work, great colleagues and the opportunity to change the world – one application at a time.

This is a guest post for the Computer Weekly Open Source Insider blog written by Matt Boyle in his capacity as lead software engineer at Curve.

Curve allows users to spend money from all their accounts with one Curve card – and hopes to simplify your finance through one secure mobile app.

Boyle writes…

Emerging only in 2009, Golang is still relatively new and not as widely used as other mainstream coding languages.

This young language was incubated inside Google, and has already been proven to perform well on a massive scale. We wanted to share with you a few reasons why we love Golang (Go) and how Curve is using it.

Go has excellent characteristics for scalability and services written using it typically have very small memory footprints. Because code is compiled into a single static binary, services can also be containerised with ease, making it much simpler to build and deploy. These attributes make Go an ideal choice for companies building microservices, as you can easily deploy into a highly available and scalable environment such as Kubernetes.

Go has everything you need to build APIs as part of its standard library.

It has an easy to use and performant http server out of the box, which eliminates some of the exploration and paralysis that can occur when teams are faced with designing a new project. If you were to use other languages such as Java or Node, this is often a significant obstacle in a team dynamic.

Automated formatter

There’s also another way it makes for smoother group workflow: code formatting is a first class concern, with an automated formatter [Ed: yes, that’s now a word] built into the language. With other languages, a lot of time and energy can be wasted agreeing on code formatting and which style guide to follow.

Go completely removes the need for this conversation.

Go is very easy to learn. Although finding engineers with significant production Go experience can be challenging, at Curve we have had great success with hiring people from Java and PHP backgrounds and upskilling them in Go.

It usually only takes about a week or two to begin actively contributing production-ready code. We have also found that developers end up preferring using Go. It really is simple yet effective: Go favours “what you see is what you get” – which means readable clear code with few complex abstractions.

This makes peer review a much easier task; whether its a colleague’s code or even huge open source projects such as Kubernetes.

We are strong advocates of TDD and Go has a fantastic test framework built into the language. Just by naming a file with _test.go and adding some test functions within that file, Go can automatically run all of your unit tests at lightning speed. This makes TDD easy to learn and use as part of the development cycle.

Kinky, in places

There are still a few kinks to work out, but we’ve found that it doesn’t take away from the functionality of Go.

For example, one particularly contentious feature is that it does not have explicit interfaces. Opinions are divided on this as many developers are used to the concept, but it can make it tricky to determine what interfaces your struct does satisfy. This is because you do not write X implements Y as you may in other languages. However, it is something you quickly learn to be okay with.

Dependency management was also originally overlooked by the team developing Go at Google. As such, the open source community stepped in and created Glide and Dep. Both were admirable attempts at solving dependency management but also came with their own set of problems.

As of Go 1.11, support has been added for modules and this has become the official dependency management tool. This has received mixed feedback, and there is definitely more improvements to be made in this area.

Vibrant open source community

Despite these growing pains, what really takes Go above and beyond is its vibrant community. In London there is a great meetup community that is very welcoming and open to collaboration. Everyone is friendly, helpful and keen to develop Go further, together. The Go open source community is thriving — some game-changing projects such as Istio, Kubernetes and Docker are all written in Go and available to download, contribute to and extend on GitHub.

It is this dynamic and innovative yet straightforward makeup that makes Go the ideal coding language for developing a company like our own.

Curve attended the Gophercon Golang conference in the UK this year… details of the event are shown in the link above.

The organisation has this month worked to improve its open source marketplace with features that focus on faster code deployment.

First deployed in December 2018, the Codefresh Marketplace [kind of like an app store] allows developers to find commands without having to learn a proprietary API — this is because every step, which is browsable in the pipeline builder, is a simple Docker image.

The Marketplace contains a more set of pipeline steps provided both by Codefresh and partners, such as Blue-Green and Canary deployment steps for Kubernetes, Aqua security scanning and Helm package and deployment.

As Octopus Deploy reminds us here, “Canary deployments are a pattern for rolling out releases to a subset of users or servers. The idea is to first deploy the change to a small subset of servers, test it, and then roll the change out to the rest of the servers. The canary deployment serves as an early warning indicator with less impact on downtime: if the canary deployment fails, the rest of the servers aren’t impacted.”

Blue-Green deployment (as defined by Cloud Foundry here) is a technique that reduces downtime and risk by running two identical production environments, one called Blue and one called Green.

“At any time, only one of the environments is live, with the live environment serving all production traffic. For this example, Blue is currently live and Green is idle,” notes Cloud Foundry at the above link.

Private steps

Additional new functionality in Codefresh includes the ability to create private steps for a specific team, a new section for items maintained by Codefresh and automatic scanning and security checking of Marketplace additions.

“Our steps Marketplace provides building blocks for your pipelines. It is very easy to search for a keyword and see if there is a step for that method,” said Dan Garfield, Chief Technology Evangelist for Codefresh. “We look forward to communities adding more plugins as the adoption of Docker within companies skyrockets and the benefits of Docker-based tooling become more clear.”

All plugins are open source and users can contribute to the collection by creating a new plugin.

Now open sourced by Facebook under an MIT licence, Hermes is supposed to supercharge startup times, drain less memory and result in a smaller overall application code footprint.

Why focus on startup times?

Because application startup times impact what the tech industry likes to call Time To Interaction or TTO (a measure of the period between an application being launched and the user being able to use it)… and that’s a real make or break factor for software houses that pump out mass market applications.

How does it do it?

Part of the secret sauce in Hermes is its ability to execute what is known as bytecode precompilation.

Bytecode precompilation allows code to be processed employing a technique known as Ahead Of Time (AOT) compilation.

“Commonly, a JavaScript engine will parse the JavaScript source after it is loaded, generating bytecode. This step delays the start of JavaScript execution. To skip this step, Hermes uses an Ahead Of Time compiler which runs as part of the mobile application build process. As a result, more time can be spent optimising the bytecode, so the bytecode is smaller and more efficient. Whole-program optimisations can be performed, such as function deduplication and string table packing,” noted Facebook, in a technical statement.

Facebook itself says that as mobile applications are growing larger and more complex, larger apps using JavaScript frameworks often experience performance issues as developers add features and complexity.

According to a summary press statement, “To increase the performance of Facebook’s apps, we have teams that continuously improve our JavaScript code and platforms. As we analysed performance data, we noticed that the JavaScript engine itself was a significant factor in startup performance and download size. With this data in hand, we knew we had to optimise JavaScript performance in the more constrained environments of a mobile phone compared to a desktop or laptop.”

Hermes currently targets the ES6 specification and the team intends to keep current with the JavaScript specification as it evolves.

With its roots and foundations in the open source Apache Cassandra database, Santa Clara headquartered DataStax insists that it likes to keep things open.

As such, the company is opening a wider aperture on its collaboration with VMware by now offering DataStax production support on VMware vSAN, now in hybrid and multi-cloud configurations.

For the record, VMware vSAN (the artist formerly known as Virtual SAN) is a hyper-converged software-defined storage (SDS) technology that ‘pools’ direct-attached storage devices across a VMware vSphere cluster to create a distributed, shared data store.

So, think about it… DataStax is known for its ability to provide an always-on distributed hybrid cloud database for real-time applications at scale — and, VMware is known (at least with vSAN) for its ability to coalesce distributed storage resources.

Consistent infrastructure

The end result of the two technologies combined should, in theory, if not in practice, deliver a more consistent infrastructure and data/application management experience across on-premises, hybrid and multi-cloud applications.

The software engineering here is hybrid and multi-cloud-ready with capabilities to deliver operational and deployment consistency. There is built-in enterprise-grade availability here too.

The firms claim that customers can avoid cloud lock-in with unified operations between environments and across clouds with a single interface for end-to-end security and infrastructure management.

Progressive cloud: defined

So then, what is a ‘progressive’ cloud strategy?

A progressive cloud strategy (in the context of this discussion at least) is one that seeks to run essentially distributed database resources (plural) uniformly from development to production across the essentially distributed multi-cloud world of the hybrid cloud — and across different departmental zones, digital workflows, world regions, datacentres and device endpoints…

… and this (as above) is what the two firms here are seeking to achieve.

“For enterprises with a progressive cloud strategy, our expanded collaboration enables them to prevent cloud vendor lock-in, improve developer productivity by being able to easily test use cases in minutes, and ultimately, rely on DataStax for enterprise data management and VMware as the platform for modern applications,” said Kathryn Erickson, senior director of strategic partnerships at DataStax.

Erickson insists that DataStax is focused on making it easy for developers to use and manage DataStax by expanding VMware vSAN’s footprint to show that distributed systems do not need special treatment in their software stack.

Red Hat… no, wait, stop there — not Red Hat the IBM company, actually just Red Hat — that’s how the company is still putting out news stories.

We’ll start again, open source enterprise software company Red Hat has announced a point release for Red Hat Enterprise Linux (RHEL) as it now hits its 7.7 version.

But what could Red Hat have put into version 7.7 that it failed to markedly address in version 7.6 may we ask?

The company points to terms like ‘enhanced consistency and control’ across cloud infrastructures (plural) for IT operations teams.

There’s also ‘modern supported container creation tools’ for enterprise application developers — as opposed to the old fashioned ones, that shipped in 7.6, presumably.

This version also moves to what Red Hat calls Maintenance Phase I, which does sound like the workshop power-down time that Star Wars TIE fighters need to go through in order to recharge their nuclear cells.

Infrastructure stability

In reality, Maintenance Phase I is all about Red Hat working to try and ‘maintain infrastructure stability’ for production environments and enhancing the reliability of the operating system.

Red Hat doesn’t go into detail to explain how it works to maintain infrastructure stability, but we can guess that this means looking at how the operating system behaves when exposed to different application types, running different data workloads, requiring different compute, storage, Input/Output actions, analytics engine calls (and so on and so on)… and then firming up the core build of the kernel itself so that it’s strong and flexible enough to handle life in the real world.

Future minor releases of Red Hat Enterprise Linux 7 will now focus solely on retaining and improving this stability rather than what have been called ‘net-new’ features.

Toolkit treats

Red Hat Enterprise Linux subscribers are able to migrate across platform versions as support and feature needs dictate. To help with the process, Red Hat offers tools, including in-place upgrades, which helps to streamline and simplify migrating one Red Hat version to another.

NOTE: Let’s remember that Red Hat Enterprise Linux version 8.0 does already exist, so this is Red Hat updating a version at a slightly lower level for customers who wish to progress their upgrade paths one point at a time.

Support for image builder, a Red Hat Enterprise Linux utility that enables IT teams to build cloud images for major public cloud infrastructures, including Amazon Web Services, Microsoft Azure and Google Cloud Platform.

Red Hat Enterprise Linux 7.7 also introduces support for live patching the underlying Linux kernel. Live patching support enables IT teams to apply kernel updates to remediate Critical or Important Common Vulnerabilities and Exposures (CVEs) while reducing the need for system reboots.