Uber needs to visualize data on a range of different surfaces. A smartphone user sees cars moving around on a map as they wait for their ride to arrive. Data scientists and operations researchers within Uber study the renderings of traffic moving throughout a city.

Data visualization is core to Uber, and the company has developed a stack of technologies around visualization in order to build appealing, highly functional applications. DeckGL is a library for high-performance visualizations of large data sets. LumaGL is a set of components that targets high performance rendering. These and other tools make up VisGL, the data visualization technology that powers Uber.

Uber’s visualization team included Ib Green, who left Uber to co-found Unfolded.ai, a company that builds geospatial analytics products. He joins the show to discuss his work on visualization products and libraries at Uber, as well as the process of taking that work to found Unfolded.ai. Full disclosure: I am an investor in Unfolded.ai.

]]>Uber needs to visualize data on a range of different surfaces. A smartphone user sees cars moving around on a map as they wait for their ride to arrive. Data scientists and operations researchers within Uber study the renderings of traffic moving throu...Uber needs to visualize data on a range of different surfaces. A smartphone user sees cars moving around on a map as they wait for their ride to arrive. Data scientists and operations researchers within Uber study the renderings of traffic moving throughout a city. Data visualization is core to Uber, and the company hasSoftware Engineering Daily48:419447Prisma: Modern Database Tooling with Johannes Schicklinghttp://softwareengineeringdaily.com/2020/06/04/prisma-modern-database-tooling-with-johannes-schickling/?utm_source=rss&utm_medium=rss&utm_campaign=prisma-modern-database-tooling-with-johannes-schickling
Thu, 04 Jun 2020 09:00:15 +0000http://softwareengineeringdaily.com/?p=9446A frontend developer issuing a query to a backend server typically requires the developer to issue that query through an ORM or a raw database query. Prisma is an alternative to both of these data access patterns, allowing for easier database access through auto-generated, type-safe query building tailored to an existing database schema. By integrating

A frontend developer issuing a query to a backend server typically requires the developer to issue that query through an ORM or a raw database query. Prisma is an alternative to both of these data access patterns, allowing for easier database access through auto-generated, type-safe query building tailored to an existing database schema.

By integrating with Prisma, the developer gets a database client that has query autocompletion, and an API server with less boilerplate code. Prisma also has a system called Prisma Migrate, which simplifies database and schema migrations.

Johannes Schickling is CEO of Prisma, and he joins the show to talk about the developments of Prisma that have occurred since we last spoke, and where the company is headed.

]]>A frontend developer issuing a query to a backend server typically requires the developer to issue that query through an ORM or a raw database query. Prisma is an alternative to both of these data access patterns,A frontend developer issuing a query to a backend server typically requires the developer to issue that query through an ORM or a raw database query. Prisma is an alternative to both of these data access patterns, allowing for easier database access through auto-generated, type-safe query building tailored to an existing database schema. By integratingSoftware Engineering Daily50:359446Tecton: Machine Learning Platform from Uber with Kevin Stumpfhttp://softwareengineeringdaily.com/2020/06/03/tecton-machine-learning-platform-from-uber-with-kevin-stumpf/?utm_source=rss&utm_medium=rss&utm_campaign=tecton-machine-learning-platform-from-uber-with-kevin-stumpf
Wed, 03 Jun 2020 09:00:40 +0000http://softwareengineeringdaily.com/?p=9449Machine learning workflows have had a problem for a long time: taking a model from the prototyping step and putting it into production is not an easy task. A data scientist who is developing a model is often working with different tools, or a smaller data set, or different hardware than the environment which that

Machine learning workflows have had a problem for a long time: taking a model from the prototyping step and putting it into production is not an easy task. A data scientist who is developing a model is often working with different tools, or a smaller data set, or different hardware than the environment which that model will be deployed to.

This problem existed at Uber just as it does at many other companies. Models were difficult to release, iterations were complicated, and collaboration between engineers could never reach a point that resembled a harmonious “DevOps”-like workflow. To address these problems, Uber developed an internal system called Michelangelo.

Some of the engineers working on Michelangelo within Uber realized that there was a business opportunity in taking the Michelangelo work and turning it into a product company. Thus, Tecton was born. Tecton is a machine learning platform focused on solving the same problems that existed within Uber. Kevin Stumpf is the CTO at Tecton, and he joins the show to talk about the machine learning problems of Uber, and his current work at Tecton.

]]>Machine learning workflows have had a problem for a long time: taking a model from the prototyping step and putting it into production is not an easy task. A data scientist who is developing a model is often working with different tools,Machine learning workflows have had a problem for a long time: taking a model from the prototyping step and putting it into production is not an easy task. A data scientist who is developing a model is often working with different tools, or a smaller data set, or different hardware than the environment which thatSoftware Engineering Daily55:489449TIBCO: Embedded Visualizations’ Impact on Software Trendshttp://softwareengineeringdaily.com/2020/06/02/tibco-embedded-visualizations-impact-on-software-trends/?utm_source=rss&utm_medium=rss&utm_campaign=tibco-embedded-visualizations-impact-on-software-trends
Tue, 02 Jun 2020 15:00:44 +0000http://softwareengineeringdaily.com/?p=9435The Grease Behind Software Systems The Information Bus Company (TIBCO), as it was originally known, may not be as well known as some of the other large tech companies, even though it was founded in the late 90s. Presently, TIBCO has many different products that serve a variety of use cases, so it can be

The Information Bus Company (TIBCO), as it was originally known, may not be as well known as some of the other large tech companies, even though it was founded in the late 90s. Presently, TIBCO has many different products that serve a variety of use cases, so it can be difficult to understand exactly what they do, if this article is the first time you’re reading about them.

Figure 1

Here are the important pieces of context: TIBCO initially offered a messaging bus to enterprises, this product grew in popularity, the business evolved with trends and developed more products, and now TIBCO is a global software force. At the time of writing, TIBCO’s customers include T-Mobile and NASA. The evolution of TIBCO’s product line was fueled, in part, by a number of acquisitions. One such acquisition was Jaspersoft, an embedded analytics tool, in 2014.

Jaspersoft, originally called Panscopic, emerged from a previously open-sourced project called JasperReports in 2001. JasperReports was a Java library that could produce reports in a number of formats, including PDF, HTML, and CSV. Jaspersoft eventually acquired the intellectual property behind JasperReports. Jaspersoft’s pre-acquisition market position and its post-acquisition transition away from a Java monolith provide insight into how visualization tools are changing to maintain relevance. If you’re specifically interested in learning more about Jaspersoft’s successful migration from monolith to microservices, check out this episode of Software Engineering Daily.

Use Case for Embedded Analytics

Data can be easier to consume if it is visualized properly. Large enterprises have a need for tools that can generate data visualizations, in order to make better business-related decisions. In other words, data visualizations can help a business increase their bottom line. Data visualization frameworks fall under the umbrella of business intelligence tools. Sherman Wood describes early business intelligence as “a dashboard in a portal-based application… where you looked at all of your BI” in this episode of Software Engineering Daily. This pattern has fallen by the wayside, while embedded analytics are increasing in popularity. It’s interesting to note that microservices were also increasing in popularity during this timeframe.

Figure 2

Embedded analytics are visualizations that live within internal applications that typically allow users to take some kind of action. Sherman Wood notes that portals dedicated to visualizations cause “…context switching to another application to actually take some action.” As detailed in the episode mentioned previously, Jaspersoft migrated from a monolithic Java application to a microservice-oriented architecture. This migration was driven by evolving business intelligence needs; it’s more difficult to support embedded visualizations with monolithic applications than with microservices.

Sherman Wood believes the initial impetus for this migration was Jaspersoft becoming “…the first BI platform on AWS, in their marketplace.” Alongside Jaspersoft beginning to ship their product in containers on AWS, their customer base began “…shifting from users of a standalone BI server to those who are SaaS providers”, as noted by Sherman. This necessitated an architectural shift in Jaspersoft’s software.

Figure 3

The Unavoidable Problems in Data Visualization

Modularizing Jaspersoft’s architecture was a critical step towards meeting the demands of customers operating SaaS businesses. In other words, Jaspersoft had to conform to SaaS architecture and all that came with it, including containerization. Dumping a Java monolith into a Docker container is a great way to give your customers a sub-par experience, to say the least; microservices were a must.

Though it’s a tautology, it should still be reaffirmed that sub-par user experiences do not optimize a business’s bottom-line. An evergreen problem in the realm of data visualization, noted Jaspersoft Principal Architect Sherman Wood, is the notion of “finding the data and then making it operationally available.” Wood also mentions that “you often see microservice after microservice being created to support individual visualizations.” If you’re interested in learning more about ubiquitous problems in the data visualization arena from Jaspersoft engineers, check out this episode of Software Engineering Daily.

Finding data and making it available are problems that will exist for the foreseeable future. But, breaking up the Jaspersoft monolith was an attempt to assuage the impact of these problems. It can be debated as to how the responsibilities of solving these problems should be split between backend and frontend technologies. Regardless of how they should be split, it’s interesting to examine why a debate about this partitioning of responsibility can exist.

Figure 4

Figure 4 helps illustrate this by presenting one way for backend and frontend technologies can interact to produce a visualization. Given, this is a very high-level overview, but the gist is there. By no means does Figure 4 attempt to represent the optimal interaction between backend and frontend technologies to render a visualization. Rather, Figure 4 outlines one approach; all of the logic is in the microservice and the frontend is agnostic to the data rendered.

The pattern presented in Figure 4 isn’t nearly as common as it used to be. In this episode of Software Engineering Daily, Jaspersoft engineer Chad Lunley rightly states “computers are so fast right now that we can offload a lot of that backend, technical stuff to … the frontend.” In today’s age, the client-facing side of an application is usually backed by much more computational power than it was a decade ago. All of this compute power ultimately enables developers to expand how they allow users to interact with data and meet the increased expectations of users.

Figure 5

Though TIBCO® Jaspersoft is only one of TIBCO’s many products, it is representative of how TIBCO has evolved with the times. Jaspersoft’s evolution also touches upon a number of pervasive themes in the world of software such as decomposing monoliths, increasing computational power of client-side devices, and rising expectations of users. Its history is an interesting case study into a number of technological trends and is worth exploring more. The episodes of Software Engineering Daily linked in this article are well worth your time.

]]>9435HoloClean: Data Quality Management with Theodoros Rekatsinashttp://softwareengineeringdaily.com/2020/06/02/holoclean-data-quality-management-with-theodoros-rekatsinas/?utm_source=rss&utm_medium=rss&utm_campaign=holoclean-data-quality-management-with-theodoros-rekatsinas
Tue, 02 Jun 2020 09:00:11 +0000http://softwareengineeringdaily.com/?p=9444Many data sources produce new data points at a very high rate. With so much data, the issue of data quality emerges. Low quality data can degrade the accuracy of machine learning models that are built around those data sources. Ideally, we would have completely clean data sources, but that’s not very realistic. One alternative

Many data sources produce new data points at a very high rate. With so much data, the issue of data quality emerges. Low quality data can degrade the accuracy of machine learning models that are built around those data sources. Ideally, we would have completely clean data sources, but that’s not very realistic. One alternative is a data cleaning system, which can allow us to clean up the data after it has already been generated.

HoloClean is a statistical inference engine that can impute, clean, and enrich data. HoloClean is centered around “The Probabilistic Unclean Database Model”, which allows for two systems–an “intension” and a “realizer” to work together to fill in missing fields and fix erroneous fields in data.

HoloClean was created by Theo Rekatsinas, and he joins the show to talk about the problem of fast, unclean data, and his work with HoloClean. We also talk about other problems in machine learning and the engineering workflows around data.

]]>Many data sources produce new data points at a very high rate. With so much data, the issue of data quality emerges. Low quality data can degrade the accuracy of machine learning models that are built around those data sources. Ideally,Many data sources produce new data points at a very high rate. With so much data, the issue of data quality emerges. Low quality data can degrade the accuracy of machine learning models that are built around those data sources. Ideally, we would have completely clean data sources, but that’s not very realistic. One alternativeSoftware Engineering Daily1:00:529444Disaggregated Servers with Yiying Zhanghttp://softwareengineeringdaily.com/2020/06/01/disaggregated-servers-with-yiying-zhang/?utm_source=rss&utm_medium=rss&utm_campaign=disaggregated-servers-with-yiying-zhang
Mon, 01 Jun 2020 09:00:10 +0000http://softwareengineeringdaily.com/?p=9443Server infrastructure traditionally consists of monolithic servers containing all of the necessary hardware to run a computer. These different hardware components are located next to each other, and do not need to communicate over a network boundary to connect the CPU and memory. LegoOS is a model for disaggregated, network-attached hardware. LegoOS disseminates the traditional

Server infrastructure traditionally consists of monolithic servers containing all of the necessary hardware to run a computer. These different hardware components are located next to each other, and do not need to communicate over a network boundary to connect the CPU and memory.

LegoOS is a model for disaggregated, network-attached hardware. LegoOS disseminates the traditional operating system functionalities into loosely-coupled hardware and software components. By disaggregating data center infrastructure, the overall resource usage and failure rate of server infrastructure can be improved.

Yiying Zhang is an assistant professor of computer science at UCSD. Her research focuses on operating systems, distributed systems, and datacenter networking. She joins the show to discuss her work and its implications for data centers and infrastructure.

]]>Server infrastructure traditionally consists of monolithic servers containing all of the necessary hardware to run a computer. These different hardware components are located next to each other, and do not need to communicate over a network boundary to...Server infrastructure traditionally consists of monolithic servers containing all of the necessary hardware to run a computer. These different hardware components are located next to each other, and do not need to communicate over a network boundary to connect the CPU and memory. LegoOS is a model for disaggregated, network-attached hardware. LegoOS disseminates the traditionalSoftware Engineering Daily57:499443Kubernetes vs. Serverless with Matt Wardhttp://softwareengineeringdaily.com/2020/05/29/kubernetes-vs-serverless-with-matt-ward/?utm_source=rss&utm_medium=rss&utm_campaign=kubernetes-vs-serverless-with-matt-ward
Fri, 29 May 2020 09:00:11 +0000http://softwareengineeringdaily.com/?p=9412Kubernetes has become a highly usable platform for deploying and managing distributed systems. The user experience for Kubernetes is great, but is still not as simple as a full-on serverless implementation–at least, that has been a long-held assumption. Why would you manage your own infrastructure, even if it is Kubernetes? Why not use autoscaling Lambda

Kubernetes has become a highly usable platform for deploying and managing distributed systems.

The user experience for Kubernetes is great, but is still not as simple as a full-on serverless implementation–at least, that has been a long-held assumption. Why would you manage your own infrastructure, even if it is Kubernetes? Why not use autoscaling Lambda functions and other infrastructure-as-a-service products?

Matt Ward is a listener of the show and an engineer at Mux, a company that makes video streaming APIs. He sent me an email that said Mux has been having success with self-managed Kubernetes infrastructure, which they deliberately opted for over a serverless deployment. I wanted to know more about what shaped this decision to opt for self-managed infrastructure, and the costs and benefits that Mux has accrued as a result.

Matt joins the show to talk through his work at Mux, and the architectural impact of opting for Kubernetes instead of fully managed serverless infrastructure.

]]>Kubernetes has become a highly usable platform for deploying and managing distributed systems. The user experience for Kubernetes is great, but is still not as simple as a full-on serverless implementation–at least,Kubernetes has become a highly usable platform for deploying and managing distributed systems. The user experience for Kubernetes is great, but is still not as simple as a full-on serverless implementation–at least, that has been a long-held assumption. Why would you manage your own infrastructure, even if it is Kubernetes? Why not use autoscaling LambdaSoftware Engineering Daily50:329412Distributed Systems Research with Peter Alvarohttp://softwareengineeringdaily.com/2020/05/28/distributed-systems-research-with-peter-alvaro/?utm_source=rss&utm_medium=rss&utm_campaign=distributed-systems-research-with-peter-alvaro
Thu, 28 May 2020 09:00:11 +0000http://softwareengineeringdaily.com/?p=9411Every software company is a distributed system, and distributed systems fail in unexpected ways. This ever-present tendency for systems to fail has led to the rise of failure testing, otherwise known as chaos engineering. Chaos engineering involves the deliberate failure of subsystems within an overall system to ensure that the system itself can be resilient

Every software company is a distributed system, and distributed systems fail in unexpected ways.

This ever-present tendency for systems to fail has led to the rise of failure testing, otherwise known as chaos engineering. Chaos engineering involves the deliberate failure of subsystems within an overall system to ensure that the system itself can be resilient to these kinds of unexpected failures.

Peter Alvaro is a distributed systems researcher who has published papers on a range of subjects, including debugging, failure testing, databases, and programming languages. He works with both academia and industry. Peter joins the show to discuss his research topics and goals.

]]>Every software company is a distributed system, and distributed systems fail in unexpected ways. This ever-present tendency for systems to fail has led to the rise of failure testing, otherwise known as chaos engineering.Every software company is a distributed system, and distributed systems fail in unexpected ways. This ever-present tendency for systems to fail has led to the rise of failure testing, otherwise known as chaos engineering. Chaos engineering involves the deliberate failure of subsystems within an overall system to ensure that the system itself can be resilientSoftware Engineering Daily54:049411Brex Engineering with Cosmin Nicolaescuhttp://softwareengineeringdaily.com/2020/05/27/brex-engineering-with-cosmin-nicolaescu/?utm_source=rss&utm_medium=rss&utm_campaign=brex-engineering-with-cosmin-nicolaescu
Wed, 27 May 2020 09:00:08 +0000http://softwareengineeringdaily.com/?p=9410Brex is a credit card company that provides credit to startups, mostly companies which have raised money. Brex processes millions of transactions, and uses the data from those transactions to assess creditworthiness, prevent fraud, and surface insights for the users of their cards. Brex is full of interesting engineering problems. The high volume of transactions

Brex is a credit card company that provides credit to startups, mostly companies which have raised money. Brex processes millions of transactions, and uses the data from those transactions to assess creditworthiness, prevent fraud, and surface insights for the users of their cards.

Brex is full of interesting engineering problems. The high volume of transactions requires data infrastructure to support all those transactions coming through the platform. As a credit card company, Brex needs to integrate with credit card networks and banking systems. There are internal systems for applications such as dispute resolution.

Cos Nicolaescu is the CTO at Brex. He joins the show to discuss engineering at Brex, the dynamics of a credit card company, and his strategies around management. It was an instructive look inside of a rapidly growing fintech company.

]]>Brex is a credit card company that provides credit to startups, mostly companies which have raised money. Brex processes millions of transactions, and uses the data from those transactions to assess creditworthiness, prevent fraud,Brex is a credit card company that provides credit to startups, mostly companies which have raised money. Brex processes millions of transactions, and uses the data from those transactions to assess creditworthiness, prevent fraud, and surface insights for the users of their cards. Brex is full of interesting engineering problems. The high volume of transactionsSoftware Engineering Daily52:489410Edge Machine Learning with Zach Shelbyhttp://softwareengineeringdaily.com/2020/05/26/edge-machine-learning-with-zach-shelby/?utm_source=rss&utm_medium=rss&utm_campaign=edge-machine-learning-with-zach-shelby
Tue, 26 May 2020 09:00:07 +0000http://softwareengineeringdaily.com/?p=9409Devices on the edge are becoming more useful with improvements in the machine learning ecosystem. TensorFlow Lite allows machine learning models to run on microcontrollers and other devices with only kilobytes of memory. Microcontrollers are very low-cost, tiny computational devices. They are cheap, and they are everywhere. The low-energy embedded systems community and the machine

]]>Devices on the edge are becoming more useful with improvements in the machine learning ecosystem. TensorFlow Lite allows machine learning models to run on microcontrollers and other devices with only kilobytes of memory. Microcontrollers are very low-cost, tiny computational devices. They are cheap, and they are everywhere.

The low-energy embedded systems community and the machine learning community have come together with a collaborative effort called tinyML. tinyML represents the improvements of microcontrollers, lighter weight frameworks, better deployment mechanisms, and greater power efficiency.

Zach Shelby is the CEO of EdgeImpulse, a company that makes a platform called Edge Impulse Studio. Edge Impulse Studio provides a UI for data collection, training, and device management. As someone creating a platform for edge machine learning usability, Zach was a great person to talk to the state of edge machine learning and his work building a company in the space.

]]>Devices on the edge are becoming more useful with improvements in the machine learning ecosystem. TensorFlow Lite allows machine learning models to run on microcontrollers and other devices with only kilobytes of memory.Devices on the edge are becoming more useful with improvements in the machine learning ecosystem. TensorFlow Lite allows machine learning models to run on microcontrollers and other devices with only kilobytes of memory. Microcontrollers are very low-cost, tiny computational devices. They are cheap, and they are everywhere. The low-energy embedded systems community and the machineSoftware Engineering Daily1:03:019409Software Dailyhttp://softwareengineeringdaily.com/2020/05/23/software-daily/?utm_source=rss&utm_medium=rss&utm_campaign=software-daily
Sat, 23 May 2020 09:00:24 +0000http://softwareengineeringdaily.com/?p=9407For the last five months, we have been working on a new version of Software Daily, the platform we built to host and present our content. We are creating a platform that integrates the podcast with a set of other features that make it easier to learn from the audio interviews. Software Daily includes the

Written questions and answers. After listening to a set of episodes about a topic, you may enjoy writing about those topics. It also helps us build up a written content base.

Company and topic postings. Do you have a new company or project that we should cover in the show? You can create a new topic, and we will look at it for coverage in the podcast.

A freeform “Write” feature to write about subjects you are learning about.

Jobs board. If you are looking for a software worker in a particular niche, post your job on Software Daily.

The world of software is large, and growing bigger every day. Software Daily is a place to explore this world of software companies and projects.

If the podcast is a useful resource for you to learn about software, then Software Daily might also provide you with value. This post (and episode) is a brief description of the features that we have built into Software Daily.

Premium RSS Feed with No Ads

If you want to listen to Software Engineering Daily without ads, you can become a paid subscriber, paying $10/month or $100/year by going to softwaredaily.com/subscribe. We now have an RSS feed that paid customers can add to a podcast player like Overcast (on iOS) or Podcast Addict (on Android). You can also listen to the premium episodes using our apps for iOS or Android.

Whether you are a listener who is fine with listening to ads, or you are a listener who pays to hear episodes without ads, we are happy to have you tuning in.

Listeners often want to find all our episodes on React, or Kubernetes, or serverless, or self-driving cars. We have been covering these topics for years, and much of the old content has retained its value. Software Daily allows you to easily find all the episodes relating to a subject that you are interested in.

Additionally, episode transcripts have interactive features with highlighting, commenting, and discussions. We want to create a Medium-like experience for the episodes.

Writers

Software Daily is a place where listeners can write about the topics they are listening to. When you are listening to lots of episodes about a topic such as GraphQL, you may find it useful to write about that topic as a form of active learning. The topic pages also have a Q&A section. Post questions about a topic, or post an answer. Engage in the community dialogue surrounding a topic you are passionate or curious about. If there is a topic you want to write about, check out softwaredaily.com/write.

We will be turning the best written content into short podcast episodes published on the weekends where we will read your contribution and mention your name. If you write something awesome, we want to turn it into audio for larger distribution.

Question and Answer

Every topic on Software Daily has a Q&A section. We have covered lots of niche software companies and open source projects, and on Software Daily we want to collect more information about the world of software with Q&A.

If you want to write about a specific company or topic that you heard about on Software Daily, Q&A is also an option. Our goal with Q&A is to provide a companion experience to listening to the podcast. It is not always easy to retain what you hear in a podcast episode. Answering some questions after you listen to an episode can help with that retention.

Jobs Board

Are you looking to hire someone specific in the world of software? Post a job on the Software Daily jobs board. We will be announcing some of these jobs on the podcast, especially the more interesting postings, and ones that align with content we are producing.

Thanks

We appreciate you tuning into Software Daily. We would welcome your feedback, and hope you take the time to check out SoftwareDaily.com.

]]>For the last five months, we have been working on a new version of Software Daily, the platform we built to host and present our content. We are creating a platform that integrates the podcast with a set of other features that make it easier to learn ...For the last five months, we have been working on a new version of Software Daily, the platform we built to host and present our content. We are creating a platform that integrates the podcast with a set of other features that make it easier to learn from the audio interviews. Software Daily includes theSoftware Engineering Daily8:159407RedwoodJS with Tom Preston-Wernerhttp://softwareengineeringdaily.com/2020/05/22/redwoodjs-with-tom-preston-werner/?utm_source=rss&utm_medium=rss&utm_campaign=redwoodjs-with-tom-preston-werner
Fri, 22 May 2020 09:00:35 +0000http://softwareengineeringdaily.com/?p=9385Over the last 5 years, web development has matured considerably. React has become a standard for frontend component development. GraphQL has seen massive growth in adoption as a data fetching middleware layer. The hosting platforms have expanded beyond AWS and Heroku, to newer environments like Netlify and Vercel. These changes are collectively known as the

Over the last 5 years, web development has matured considerably. React has become a standard for frontend component development. GraphQL has seen massive growth in adoption as a data fetching middleware layer. The hosting platforms have expanded beyond AWS and Heroku, to newer environments like Netlify and Vercel.

These changes are collectively known as the JAMStack. With the changes brought by the JAMStack, it raises the question: how should an app be built today? Can a framework offer guidance for how the different layers of a JAMStack app should fit together?

RedwoodJS is a framework for building JAMStack applications. Tom Preston-Werner is one of the creators of RedwoodJS, as well as the founder of GitHub and Chatterbug, a language learning app. He joins the show to talk about the future of JAMStack development, and his goals for RedwoodJS.

]]>Over the last 5 years, web development has matured considerably. React has become a standard for frontend component development. GraphQL has seen massive growth in adoption as a data fetching middleware layer.Over the last 5 years, web development has matured considerably. React has become a standard for frontend component development. GraphQL has seen massive growth in adoption as a data fetching middleware layer. The hosting platforms have expanded beyond AWS and Heroku, to newer environments like Netlify and Vercel. These changes are collectively known as theSoftware Engineering Daily1:04:299385ArcGIS: Geographic Information Software with Max Paysonhttp://softwareengineeringdaily.com/2020/05/21/arcgis-geographic-information-software-with-max-payson/?utm_source=rss&utm_medium=rss&utm_campaign=arcgis-geographic-information-software-with-max-payson
Thu, 21 May 2020 09:00:34 +0000http://softwareengineeringdaily.com/?p=9384Geospatial analytics tools are used to render visualizations for a vast array of applications. Data sources such as satellites and cellular data can gather location data, and that data can be superimposed over a map. A map-based visualization can allow the end user to make decisions based on what they see. ArcGIS is one of

Geospatial analytics tools are used to render visualizations for a vast array of applications. Data sources such as satellites and cellular data can gather location data, and that data can be superimposed over a map. A map-based visualization can allow the end user to make decisions based on what they see.

ArcGIS is one of the most widely used geospatial analytics platforms. It is created by ESRI, the Environmental Systems Research Institute, which was started in 1969. Today, ESRI products have 40% of the global market share of geospatial analytics software.

Max Payson is a solutions engineer at ESRI, and he joins the show to talk about applications of ArcGIS, and the landscape of GIS more broadly.

]]>Geospatial analytics tools are used to render visualizations for a vast array of applications. Data sources such as satellites and cellular data can gather location data, and that data can be superimposed over a map.Geospatial analytics tools are used to render visualizations for a vast array of applications. Data sources such as satellites and cellular data can gather location data, and that data can be superimposed over a map. A map-based visualization can allow the end user to make decisions based on what they see. ArcGIS is one ofSoftware Engineering Daily56:379384RudderStack: Open Source Customer Data Infrastructure with Soumyadeb Mitrahttp://softwareengineeringdaily.com/2020/05/20/rudderstack-open-source-customer-data-infrastructure-with-soumyadeb-mitra/?utm_source=rss&utm_medium=rss&utm_campaign=rudderstack-open-source-customer-data-infrastructure-with-soumyadeb-mitra
Wed, 20 May 2020 09:00:31 +0000http://softwareengineeringdaily.com/?p=9383Customer data infrastructure is a type of tool for saving analytics and information about your customers. The company that is best known in this category is Segment, a very popular API company. This customer data is used for making all kinds of decisions around product roadmap, pricing, and design. RudderStack is a company built around

Customer data infrastructure is a type of tool for saving analytics and information about your customers. The company that is best known in this category is Segment, a very popular API company. This customer data is used for making all kinds of decisions around product roadmap, pricing, and design.

RudderStack is a company built around open source customer data infrastructure. RudderStack can be self-hosted, allowing users to deploy it to their own servers and manage their data however they please. Soumyadeb Mitra is the creator of RudderStack, and he joins the show to talk about the space of customer data infrastructure, and his own company.

]]>Customer data infrastructure is a type of tool for saving analytics and information about your customers. The company that is best known in this category is Segment, a very popular API company. This customer data is used for making all kinds of decisio...Customer data infrastructure is a type of tool for saving analytics and information about your customers. The company that is best known in this category is Segment, a very popular API company. This customer data is used for making all kinds of decisions around product roadmap, pricing, and design. RudderStack is a company built aroundSoftware Engineering Daily47:529383Matterport 3-D Imaging with Japjit Tulsihttp://softwareengineeringdaily.com/2020/05/19/matterport-3-d-imaging-with-japjit-tulsi/?utm_source=rss&utm_medium=rss&utm_campaign=matterport-3-d-imaging-with-japjit-tulsi
Tue, 19 May 2020 09:00:29 +0000http://softwareengineeringdaily.com/?p=9382Matterport is a company that builds 3-D imaging for the inside of buildings, construction sites, and other locations that require a “digital twin.” Generating digital images of the insides of buildings has a broad spectrum of applications, and there are considerable engineering challenges in building such a system. Matterport’s hardware stack involves a camera built

Matterport is a company that builds 3-D imaging for the inside of buildings, construction sites, and other locations that require a “digital twin.” Generating digital images of the insides of buildings has a broad spectrum of applications, and there are considerable engineering challenges in building such a system.

Matterport’s hardware stack involves a camera built in-house by the company. The camera can take 360 degree scans of a room, stitch the imagery together, and make the digital twin available on the cloud.

Japjit Tulsi works at Matterport, and he joins the show to discuss 3-D imaging, and his role as CTO of the company.

]]>Matterport is a company that builds 3-D imaging for the inside of buildings, construction sites, and other locations that require a “digital twin.” Generating digital images of the insides of buildings has a broad spectrum of applications,Matterport is a company that builds 3-D imaging for the inside of buildings, construction sites, and other locations that require a “digital twin.” Generating digital images of the insides of buildings has a broad spectrum of applications, and there are considerable engineering challenges in building such a system. Matterport’s hardware stack involves a camera builtSoftware Engineering Daily49:389382Frontend Performance with Anycart’s Rafael Sancheshttp://softwareengineeringdaily.com/2020/05/18/frontend-performance-with-anycarts-rafael-sanches/?utm_source=rss&utm_medium=rss&utm_campaign=frontend-performance-with-anycarts-rafael-sanches
Mon, 18 May 2020 09:00:27 +0000http://softwareengineeringdaily.com/?p=9381There are many bad recipe web sites. Every time I navigate to a recipe website, it feels like my browser is filling up with spyware. The page loads slowly, everything seems broken, I can feel the 25 different JavaScript adtech tags interrupting each other. Whether I am searching for banana bread or a spaghetti sauce

There are many bad recipe web sites. Every time I navigate to a recipe website, it feels like my browser is filling up with spyware. The page loads slowly, everything seems broken, I can feel the 25 different JavaScript adtech tags interrupting each other. Whether I am searching for banana bread or a spaghetti sauce recipe, recipe sites usually make me lose my appetite.

Anycart is a recipe platform that allows users to buy all of the ingredients for the recipe and have those ingredients delivered. It’s a vertically integrated content site and delivery system. It is also beautifully designed and extremely performant. I learned about it from Zack Bloom, who works at Cloudflare, as he mentioned it as a case study in performance.

Rafael Sanches is a founder of Anycart, and he joins the show to talk about building a recipe delivery service, and the innovations in performance that were necessary to building it.

]]>There are many bad recipe web sites. Every time I navigate to a recipe website, it feels like my browser is filling up with spyware. The page loads slowly, everything seems broken, I can feel the 25 different JavaScript adtech tags interrupting each ot...There are many bad recipe web sites. Every time I navigate to a recipe website, it feels like my browser is filling up with spyware. The page loads slowly, everything seems broken, I can feel the 25 different JavaScript adtech tags interrupting each other. Whether I am searching for banana bread or a spaghetti sauceSoftware Engineering Daily51:149381AWS Virtualization with Anthony Liguorihttp://softwareengineeringdaily.com/2020/05/15/aws-virtualization-with-anthony-liguori/?utm_source=rss&utm_medium=rss&utm_campaign=aws-virtualization-with-anthony-liguori
Fri, 15 May 2020 09:00:11 +0000http://softwareengineeringdaily.com/?p=9334Amazon’s virtual server instances have come a long way since the early days of EC2. There are now a wide variety of available configuration options for spinning up an EC2 instance, which can be chosen from based on the workload that will be scheduled onto a virtual machine. There are also Fargate containers and AWS

Amazon’s virtual server instances have come a long way since the early days of EC2. There are now a wide variety of available configuration options for spinning up an EC2 instance, which can be chosen from based on the workload that will be scheduled onto a virtual machine. There are also Fargate containers and AWS Lambda functions, creating even more options for someone who wants to deploy virtualized infrastructure.

The high demand for virtual machines has led to Amazon moving down the stack, designing custom hardware such as the Nitro security chip, and low level software such as the Firecracker virtual machine monitor. AWS also has built Outposts, which allow for on-prem usage of AWS infrastructure.

Anthony Liguori is an engineer at AWS who has worked on a range of virtualization infrastructure: software platforms, hypervisors, and hardware. Anthony joins the show to talk about virtualization at all levels of the stack.

]]>Amazon’s virtual server instances have come a long way since the early days of EC2. There are now a wide variety of available configuration options for spinning up an EC2 instance, which can be chosen from based on the workload that will be scheduled o...Amazon’s virtual server instances have come a long way since the early days of EC2. There are now a wide variety of available configuration options for spinning up an EC2 instance, which can be chosen from based on the workload that will be scheduled onto a virtual machine. There are also Fargate containers and AWSSoftware Engineering Daily59:059334An Introduction to API Management and NGINXhttp://softwareengineeringdaily.com/2020/05/14/an-introduction-to-api-management-and-nginx/?utm_source=rss&utm_medium=rss&utm_campaign=an-introduction-to-api-management-and-nginx
Thu, 14 May 2020 15:00:56 +0000http://softwareengineeringdaily.com/?p=9360This article is based on the content found in this episode of Software Engineering Daily. This episode features NGINX product manager Kevin Jones. All quotes from Jones can be found in this episode’s transcript. Setting the Stage for API Management The term “API” is, and has been for quite some time, ubiquitous within the context

The term “API” is, and has been for quite some time, ubiquitous within the context of computing. Today, web-based APIs are among the most prevalent kinds of APIs. Salesforce introduced the first web-based API in early 2000, which birthed the notion of Internet-as-a-service (IaaS).

Much has changed within the arena of web infrastructure over the past twenty years. These changes have influenced the body of thought surrounding APIs, as well as how APIs are operationalized. Jones notes “a rise in services, the amounts of services, and […] an increase in various protocols being used to communicate over the Internet” as high-level changes within the scope of web infrastructure.

In addition to changes within web infrastructure, the increased popularity and number of mobile devices has drastically increased web usage. The near-simultaneous rise of microservice-based applications led to even more communication over the Internet. Jones states the resulting effects concisely: “more devices, more connections, and more requests being processed throughout the internet.”

Microservice-based architecture also increases to the number of APIs within a system. The initial wave of companies breaking apart their monoliths in order to extract services brought an inherent increase in APIs and requests being processed. Communication protocols are defined by each microservice; this is the microservice’s API. Microservices come in all different flavors. Their uses vary widely, environments may be different, needs for role-based access can change, and hardware may be located in different geographic locations.

Now, this is where the notion of API management becomes relevant. A common way to implement API management is by using NGINX, a popular and reliable reverse proxy. NGINX has an API management module that provides users with a control plane that sits on top of API gateways.

What is API Management?

API management is often discussed in tandem with API gateways. In some contexts, these two concepts are sometimes used interchangeably, though they aren’t synonymous. Technically, an API gateway is a reverse proxy that sits between an API and its consumers. API management refers to the process of maintaining a group of API gateways; an API management tool is the control plane for a collection of gateways.

Exploring the functionalities of API gateways can help shed light on the benefits offered by an API management tool. These benefits include the ability to perform authentication and rate limiting, routing, and canary rollouts. NGINX has the ability to act as a reverse proxy and assume the role of an API gateway. Not only does NGINX offer fine-grain control over all of the aforementioned benefits of API gateways, but it offers many others, as well. The configuration of NGINX is controlled by directives. Visit the NGINX directives documentation for a comprehensive list of NGINX’s potential use cases.

Note that Figure 5 is only illustrating to give an overview of the benefits of an API gateway. Assuming a microservice-based architecture is being used in this hypothetical, the API endpoints shown would most likely each belong to a number of different microservices. To that end, there would most likely be more than one API gateway; this figure is not attempting to demonstrate an archetypal pattern relating to how an API gateway fits into a system’s architecture. If you’re interested in learning more about using an API gateway to help build a microservice-based application, check out this blog post from NGINX.

Figure 5 pictures a single tier API gateway. A two-tiered gateway pattern is often used to separate responsibilities of security teams from those of SRE and DevOps teams. The idea behind this pattern is to separate high-level functionality, like security and access control, from service-dependent functionality, like routing. Jones notes another benefit to this pattern: “microservice[s] sitting behind that internal router gateway […] don’t have to go all the way back out through to the Internet and come into the DMZ again.”

There aren’t hard-and-fast rules describing the optimal API gateway pattern for a system. One could consider a service mesh as a system with an API gateway at each instance of a service, with sidecar proxies acting as gateways; this is where a tool like NGINX would be hosted. Jones notes the largest benefit of this pattern: “you can really have a fine-grained configuration all the way up into the container.” Check out this blog post from NGINX, if you’re interested in learning more about service meshes.

Takeaways

Software architecture is constantly changing. There are different protocols used to communicate between services, and their preferred mediums, like RPC and JSON, may change. The environments a system’s API gateways are in may change. A popular API management tool is the NGINX Controller. It’s a way to manage these gateways that’s infrastructure-agnostic, removing the burden of maintaining each gateway in its particular environment.

]]>9360International Consumer Credit Infrastructure with Brian Regan and Misha Esipovhttp://softwareengineeringdaily.com/2020/05/14/international-consumer-credit-infrastructure-with-brian-regan-and-misha-esipov/?utm_source=rss&utm_medium=rss&utm_campaign=international-consumer-credit-infrastructure-with-brian-regan-and-misha-esipov
Thu, 14 May 2020 09:00:10 +0000http://softwareengineeringdaily.com/?p=9333A credit score is a rating that allows someone to qualify for a line of credit, which could be a loan such as a mortgage, or a credit card. We are assigned a credit score based on a credit history, which could be related to work history, rental payments, or loan repayments. One problem with

A credit score is a rating that allows someone to qualify for a line of credit, which could be a loan such as a mortgage, or a credit card. We are assigned a credit score based on a credit history, which could be related to work history, rental payments, or loan repayments.

One problem with the credit scoring system is that it is not internationalized. If I am coming from Brazil, I have a rental history of someone from Brazil. That information does not get naturally ported over to the United States. There needs to be a system for translating a foreign credit history to a US credit history.

Nova Credit is a company that makes a credit passport–a system for allowing users in one geographic location to use the credit history that they have built up to have credit in another location, namely the United States. Brian Regan and Misha Esipov work at Nova Credit, and they join the show to talk about how the company works, and the problem it solves.

]]>A credit score is a rating that allows someone to qualify for a line of credit, which could be a loan such as a mortgage, or a credit card. We are assigned a credit score based on a credit history, which could be related to work history,A credit score is a rating that allows someone to qualify for a line of credit, which could be a loan such as a mortgage, or a credit card. We are assigned a credit score based on a credit history, which could be related to work history, rental payments, or loan repayments. One problem withSoftware Engineering Daily45:139333Grapl: Graph-Based Detection and Response with Colin O’Brienhttp://softwareengineeringdaily.com/2020/05/13/grapl-graph-based-detection-and-response-with-colin-obrien/?utm_source=rss&utm_medium=rss&utm_campaign=grapl-graph-based-detection-and-response-with-colin-obrien
Wed, 13 May 2020 09:00:07 +0000http://softwareengineeringdaily.com/?p=9332A large software company such as Dropbox is at a constant risk of security breaches. These security breaches can take the form of social engineering attacks, network breaches, and other malicious adversarial behavior. This behavior can be surfaced by analyzing collections of log data. Log-based threat response is not a new technique. But how should

A large software company such as Dropbox is at a constant risk of security breaches. These security breaches can take the form of social engineering attacks, network breaches, and other malicious adversarial behavior. This behavior can be surfaced by analyzing collections of log data.

Log-based threat response is not a new technique. But how should those logs be analyzed? Grapl is a system for modeling log data as a graph, and analyzing that graph for threats based on how nodes in the graph have interacted. By building a graph from log data, Grapl can classify interaction patterns that correspond to threats.

Colin O’Brien is the creator of Grapl, and he joins the show to discuss security, as well as threat detection and response.

]]>A large software company such as Dropbox is at a constant risk of security breaches. These security breaches can take the form of social engineering attacks, network breaches, and other malicious adversarial behavior.A large software company such as Dropbox is at a constant risk of security breaches. These security breaches can take the form of social engineering attacks, network breaches, and other malicious adversarial behavior. This behavior can be surfaced by analyzing collections of log data. Log-based threat response is not a new technique. But how shouldSoftware Engineering Daily1:02:539332Static Analysis for Infrastructure with Guy Eisenkothttp://softwareengineeringdaily.com/2020/05/12/static-analysis-for-infrastructure-with-guy-eisenkot/?utm_source=rss&utm_medium=rss&utm_campaign=static-analysis-for-infrastructure-with-guy-eisenkot
Tue, 12 May 2020 09:00:04 +0000http://softwareengineeringdaily.com/?p=9331Infrastructure-as-code tools are used to define the architecture of software systems. Common infrastructure-as-code tools include Terraform and AWS CloudFormation. When infrastructure is defined as code, we can use static analysis tools to analyze that code for configuration mistakes, just as we could analyze a programming language with traditional static analysis tools. When a developer writes

Infrastructure-as-code tools are used to define the architecture of software systems. Common infrastructure-as-code tools include Terraform and AWS CloudFormation. When infrastructure is defined as code, we can use static analysis tools to analyze that code for configuration mistakes, just as we could analyze a programming language with traditional static analysis tools.

When a developer writes a program, that developer might use static analysis to parse a program for common mistakes–memory leaks, potential null pointers, and security holes. The concept of static analysis can be extended to infrastructure as code, allowing for the discovery of higher level problems such as insecure policies across cloud resources.

Guy Eisenkot is an engineer with Bridgecrew, a company that makes static analysis tools for security and compliance. Guy joins the show to talk about cloud security and how static analysis can be used to improve the quality of infrastructure deployments.

]]>Infrastructure-as-code tools are used to define the architecture of software systems. Common infrastructure-as-code tools include Terraform and AWS CloudFormation. When infrastructure is defined as code, we can use static analysis tools to analyze tha...Infrastructure-as-code tools are used to define the architecture of software systems. Common infrastructure-as-code tools include Terraform and AWS CloudFormation. When infrastructure is defined as code, we can use static analysis tools to analyze that code for configuration mistakes, just as we could analyze a programming language with traditional static analysis tools. When a developer writesSoftware Engineering Daily1:00:579331Social Distancing Data with Ryan Fox Squirehttp://softwareengineeringdaily.com/2020/05/11/social-distancing-data-with-ryan-fox-squire/?utm_source=rss&utm_medium=rss&utm_campaign=social-distancing-data-with-ryan-fox-squire
Mon, 11 May 2020 09:00:20 +0000http://softwareengineeringdaily.com/?p=9294Social distancing has been imposed across the United States. We are running an experiment unlike anything before it in history, and it is likely to have a lasting impact on human behavior. By looking at location data of how people are moving around today, we can examine the real-world impacts of social distancing. SafeGraph is

Social distancing has been imposed across the United States. We are running an experiment unlike anything before it in history, and it is likely to have a lasting impact on human behavior. By looking at location data of how people are moving around today, we can examine the real-world impacts of social distancing.

SafeGraph is a company that provides geospatial location data to be used by developers and researchers. Much of their data is aggregated from cell phone GPS pings which identify where anonymized users are in the world. This data set provides the basis for SafeGraph’s social distancing metrics, which measure how frequently people are coming into contact with one another.

Ryan Fox Squire works at SafeGraph, and he returns to the show to discuss social distancing metrics and the research that has come out of studying these metrics.

]]>Social distancing has been imposed across the United States. We are running an experiment unlike anything before it in history, and it is likely to have a lasting impact on human behavior. By looking at location data of how people are moving around tod...Social distancing has been imposed across the United States. We are running an experiment unlike anything before it in history, and it is likely to have a lasting impact on human behavior. By looking at location data of how people are moving around today, we can examine the real-world impacts of social distancing. SafeGraph isSoftware Engineering Daily50:259294Dropbox Engineering with Andrew Fonghttp://softwareengineeringdaily.com/2020/05/08/dropbox-engineering-with-andrew-fong/?utm_source=rss&utm_medium=rss&utm_campaign=dropbox-engineering-with-andrew-fong
Fri, 08 May 2020 09:00:22 +0000http://softwareengineeringdaily.com/?p=9296Dropbox is a consumer storage product with petabytes of data. Dropbox was originally started on the cloud, backed by S3. Once there was a high enough volume of data, Dropbox created its own data centers, designing hardware for the express purpose of storing user files. Over the last 13 years, Dropbox’s infrastructure has developed hardware,

Dropbox is a consumer storage product with petabytes of data. Dropbox was originally started on the cloud, backed by S3. Once there was a high enough volume of data, Dropbox created its own data centers, designing hardware for the express purpose of storing user files.

Over the last 13 years, Dropbox’s infrastructure has developed hardware, software, networking, data center infrastructure, and operational procedures that make the cloud storage product best in class.

Andrew Fong has been an engineer at Dropbox for 8 years. He joins the show to talk about how the Dropbox engineering organization has changed over that period of time, and what he is doing at the company today.

]]>Dropbox is a consumer storage product with petabytes of data. Dropbox was originally started on the cloud, backed by S3. Once there was a high enough volume of data, Dropbox created its own data centers, designing hardware for the express purpose of st...Dropbox is a consumer storage product with petabytes of data. Dropbox was originally started on the cloud, backed by S3. Once there was a high enough volume of data, Dropbox created its own data centers, designing hardware for the express purpose of storing user files. Over the last 13 years, Dropbox’s infrastructure has developed hardware,Software Engineering Daily54:479296Pravega: Storage for Streams with Flavio Junquierahttp://softwareengineeringdaily.com/2020/05/07/pravega-storage-for-streams-with-flavio-junquiera/?utm_source=rss&utm_medium=rss&utm_campaign=pravega-storage-for-streams-with-flavio-junquiera
Thu, 07 May 2020 09:00:21 +0000http://softwareengineeringdaily.com/?p=9295“Data stream” is a word that can be used in multiple ways. A stream can refer to data in motion or data at rest. When a stream is data in motion, an endpoint is receiving new pieces of data on a continual basis. Each new data point is sent over the wire and captured by

“Data stream” is a word that can be used in multiple ways. A stream can refer to data in motion or data at rest.

When a stream is data in motion, an endpoint is receiving new pieces of data on a continual basis. Each new data point is sent over the wire and captured by the other end. Another way a stream can be represented is as a sequence of events that have been written to a storage medium. This is a stream at rest.

Pravega is a system for storing large streams of data. Pravega can be used as an alternative to systems like Apache Kafka or Apache Pulsar. Flavio Junquiera is an engineer at Dell EMC who works on Pravega. He joins the show to talk about the history of stream processing and his work on Pravega.

]]>“Data stream” is a word that can be used in multiple ways. A stream can refer to data in motion or data at rest. When a stream is data in motion, an endpoint is receiving new pieces of data on a continual basis.“Data stream” is a word that can be used in multiple ways. A stream can refer to data in motion or data at rest. When a stream is data in motion, an endpoint is receiving new pieces of data on a continual basis. Each new data point is sent over the wire and captured bySoftware Engineering Daily53:009295Advanced Redis with Alvin Richardshttp://softwareengineeringdaily.com/2020/05/06/advanced-redis-with-alvin-richards/?utm_source=rss&utm_medium=rss&utm_campaign=advanced-redis-with-alvin-richards
Wed, 06 May 2020 09:00:36 +0000http://softwareengineeringdaily.com/?p=9316Redis is an in-memory object storage system that is commonly used as a cache for web applications. This core primitive of in-memory object storage has created a larger ecosystem encompassing a broad set of tools. Redis is also used for creating objects such as queues, streams, and probabilistic data structures. Machine learning systems also need

Redis is an in-memory object storage system that is commonly used as a cache for web applications. This core primitive of in-memory object storage has created a larger ecosystem encompassing a broad set of tools. Redis is also used for creating objects such as queues, streams, and probabilistic data structures.

Machine learning systems also need access to fast, in-memory object storage. RedisAI is a newer module for supporting machine learning tasks. For serverless computing, RedisGears allows for the execution of functions close to your Redis instance. RedisEdge allows for edge computing with Redis.

Alvin Richards returns to the show to discuss the expansion of Redis to becoming a broad suite of in-memory tools, as well as the resiliency properties of Redis and usage patterns for the tool. RedisLabs is a sponsor of Software Engineering Daily, and RedisConf is a virtual conference around Redis that runs May 12-13. If you are interested in Redis, you can check out RedisConf for free by going to RedisConf.com.

]]>Redis is an in-memory object storage system that is commonly used as a cache for web applications. This core primitive of in-memory object storage has created a larger ecosystem encompassing a broad set of tools.Redis is an in-memory object storage system that is commonly used as a cache for web applications. This core primitive of in-memory object storage has created a larger ecosystem encompassing a broad set of tools. Redis is also used for creating objects such as queues, streams, and probabilistic data structures. Machine learning systems also needSoftware Engineering Daily53:209316Multicloud MySQL with Jiten Vaidya and Anthony Yehhttp://softwareengineeringdaily.com/2020/05/05/multicloud-mysql-with-jiten-vaidya-and-anthony-yeh/?utm_source=rss&utm_medium=rss&utm_campaign=multicloud-mysql-with-jiten-vaidya-and-anthony-yeh
Tue, 05 May 2020 09:00:17 +0000http://softwareengineeringdaily.com/?p=9293For many applications, a transactional MySQL database is the source of truth. To make a MySQL database scale, some developers deploy their database using Vitess, a sharding system built on top of Kubernetes. Jiten Vaidya and Anthony Yeh work at PlanetScale, a company that focuses on building and supporting MySQL databases sharded with Vitess. Their

For many applications, a transactional MySQL database is the source of truth. To make a MySQL database scale, some developers deploy their database using Vitess, a sharding system built on top of Kubernetes.

Jiten Vaidya and Anthony Yeh work at PlanetScale, a company that focuses on building and supporting MySQL databases sharded with Vitess. Their experience comes from working at YouTube, which has a massive, rapidly growing database for storing the information about videos on the site. Sharding is not the only database problem that YouTube faced. Availability was another issue.

At YouTube, the database operators want YouTube’s MySQL cluster to be resilient to the failure of an entire data center. Similarly, a developer deploying an important MySQL database to the cloud wants their database to be resilient to the failure of an entire cloud provider. Jiten and Anthony join the show to talk about their work building multicloud support for MySQL, and their process of deploying a consistent MySQL database in Azure, GCP, and AWS.

]]>For many applications, a transactional MySQL database is the source of truth. To make a MySQL database scale, some developers deploy their database using Vitess, a sharding system built on top of Kubernetes.For many applications, a transactional MySQL database is the source of truth. To make a MySQL database scale, some developers deploy their database using Vitess, a sharding system built on top of Kubernetes. Jiten Vaidya and Anthony Yeh work at PlanetScale, a company that focuses on building and supporting MySQL databases sharded with Vitess. TheirSoftware Engineering Daily52:019293Isolation with Courtland Allen and Anurag Goelhttp://softwareengineeringdaily.com/2020/05/04/isolation-with-courtland-allen-and-anurag-goel/?utm_source=rss&utm_medium=rss&utm_campaign=isolation-with-courtland-allen-and-anurag-goel
Mon, 04 May 2020 09:00:16 +0000http://softwareengineeringdaily.com/?p=9292We are all living in social isolation due to the quarantine from COVID-19. Isolation is changing our habits and our moods, ravaging the economy, and changing how we work. One positive change is that more people have been reconnecting with their friends and family over frequent calls and video chats. Isolation is not a normal

We are all living in social isolation due to the quarantine from COVID-19. Isolation is changing our habits and our moods, ravaging the economy, and changing how we work. One positive change is that more people have been reconnecting with their friends and family over frequent calls and video chats.

Isolation is not a normal way for humans to live. We are social animals, and we need social interaction. We’ve changed how we use Internet products. There has been an evolution of trends in online shopping, social networking, and video communication software.

Courtland Allen is the founder of Indie Hackers and Anurag Goel is the founder of Render, a new cloud provider. Both Courtland and Anurag are friends of mine, and join this episode to talk about how their lives are changing as a result of social isolation.

]]>We are all living in social isolation due to the quarantine from COVID-19. Isolation is changing our habits and our moods, ravaging the economy, and changing how we work. One positive change is that more people have been reconnecting with their friends...We are all living in social isolation due to the quarantine from COVID-19. Isolation is changing our habits and our moods, ravaging the economy, and changing how we work. One positive change is that more people have been reconnecting with their friends and family over frequent calls and video chats. Isolation is not a normalSoftware Engineering Daily57:009292Data Lakehouse with Michael Armbrusthttp://softwareengineeringdaily.com/2020/05/01/data-lakehouse-with-michael-armbrust/?utm_source=rss&utm_medium=rss&utm_campaign=data-lakehouse-with-michael-armbrust
Fri, 01 May 2020 09:00:15 +0000http://softwareengineeringdaily.com/?p=9258A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical workflow for a data engineer is to pull data sets from this slow data lake storage into the

A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical workflow for a data engineer is to pull data sets from this slow data lake storage into the data warehouse for faster querying.

Apache Spark is a system for fast processing of data across distributed datasets. Spark is not thought of as a data warehouse technology, but it can be used to fulfill some of the responsibilities. Delta is an open source system for a storage layer on top of a data lake. Delta integrates closely with Spark, creating a system that Databricks refers to as a “data lakehouse.”

Michael Armbrust is an engineer with Databricks. He joins the show to talk about his experience building the company, and his perspective on data engineering, as well as his work on Delta, the storage system built for the Spark ecosystem.

]]>A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical workflow for a data engineer is to pull data sets from this slow d...A data warehouse is a system for performing fast queries on large amounts of data. A data lake is a system for storing high volumes of data in a format that is slow to access. A typical workflow for a data engineer is to pull data sets from this slow data lake storage into theSoftware Engineering Daily59:519258Components of Modern Data Pipelineshttp://softwareengineeringdaily.com/2020/04/30/components-of-modern-data-pipelines/?utm_source=rss&utm_medium=rss&utm_campaign=components-of-modern-data-pipelines
Thu, 30 Apr 2020 14:00:45 +0000http://softwareengineeringdaily.com/?p=9279Figure 1 Data flows to and from systems through data pipelines. The motivations for data pipelines include the decoupling of systems, avoidance of performance hits where the data is being captured, and the ability to combine data from different systems. Pipelines are also well-suited to help organizations train, deploy, and analyze machine learning models. Figure

Data flows to and from systems through data pipelines. The motivations for data pipelines include the decoupling of systems, avoidance of performance hits where the data is being captured, and the ability to combine data from different systems. Pipelines are also well-suited to help organizations train, deploy, and analyze machine learning models. Figure 1 provides a high-level overview of how a machine learning pipeline may be architected.

This article does not take a deep dive into the machine learning side of data pipelines. If you’re interested in learning about data engineering’s intersection with machine learning, check out this episode of Software Engineering Daily. This episode features two Stripe engineers, Rob Story and Kelly Rivoire, working on machine learning infrastructure, as well as an API they built for machine learning workloads.

Without systematized pipelines, a number of issues may arise. For example, systems with many data sources may suffer from increased complexity, especially as the number of data sources increase and the types of data change. Another example is data veracity; pipelines can ease the process of extracting, transforming, and loading (ETL) data.

Data pipelines can integrate with a number of systems, including visualization tools and third party software, such as Salesforce. Moving data into these kinds of systems is commonly one of the last steps in a data pipeline. More common is the movement of data into a data warehouse, lake, or mart. All of these data stores serve different purposes, each of which is explored later in this article.

Data processing systems sit on top of these data stores and transportation systems. The two main categories of data processing systems are online transaction processes (OLTP) and online analytical processing (OLAP). Each serves an important role within a system. As CEO of Starburst Justin Borgman notes in this episode of Software Engineering Daily, “You’re always going to need your OLTP operational style system to serve your application.” Starburst offers their own version of Presto, a popular, open-sourced OLTP system that originated at Facebook.

Figure 2

Many of the topics covered in this article are better explained with examples. The notion of a data pipeline encompasses an end-to-end system. As such, it is useful to examine a consistent hypothetical, while examining different components of the larger system. From this point forward, we’ll refer to a hypothetical startup called, “Claire’s Cakes,” an on-demand cake delivery service.

OLTP & OLAP

Online transaction processing (OLTP) and online analytical processing (OLAP) are the primary ways of characterizing systems. Don’t be fooled by the similarity of these acronyms: OLTP and OLAP systems are very different from one another.

Think of OLTP systems as managing organization’s day-to-day transactions; the system’s CRUD operations and database are optimized for transactional superiority. Some common examples of OLTP databases are MySQL, PostgreSQL, and MemSQL. If you’re interested in learning more about the technology backing common OLTP systems, check out these episodes of Software Engineering Daily.

OLAP systems lie on the other side of the coin. Rather than capturing and persisting data, OLAP systems help an organization generate insights about data. The arena of OLAP encompasses various types of OLAP such as relational (ROLAP), multidimensional (MOLAP), and hybrid (HOLAP). Check out these episodes of Software Engineering Daily to learn more about OLAP technology.

Figure 3

Let’s examine OLTP more closely. Typically, discussions about OLTP surround databases. OLTP databases process transactions. In the context of a data pipeline, these transactions are the events that affect an organization’s business applications. Building off of this point, it’s often useful for business applications to access records by row. So, OLTP systems often make use of SQL databases.

To understand why row-based access is more useful for business applications, let’s refer to a hypothetical series of events caused by one of the customers of Claire’s Cakes. The customer places an order for two cakes, adds a third cake to his order after placing the initial order, updates the drop-off location, and rates the cake courrier’s service after receiving his order. Accessing the database fields in a columnar fashion would be inefficient; all of the fields associated with this customer will be stored in a single row. So, row-based access is preferable in OLTP databases.

In addition to row-based access, OLTP databases have a number of other requirements. Former Uber software engineer Zhenxiao Luo states “reliability is more important” for OLTP systems than OLAP systems because OLTP systems “serve online data, so performance has a really real-time requirement.” Luo’s full interview on Software Engineering Daily provides an end-to-end overview of Uber’s data platform, including discussions of both OLTP and OLAP systems at Uber.

Figure 4

Let’s look at the other side of the coin: OLAP systems. At a high level, OLAP systems

are responsible for the parts of a data pipeline following initial storage in an OLTP database. Figure 3 illustrates the overarching theme in an OLAP system: data is taken from disparate data sources and loaded into a new kind of data store. This process is known as extract, transform, load (ETL), or extract, load, transform (ELT), in some cases. Data is typically moved to a data warehouse, lake, or mart. From here, data is queried to derive analytics.

OLAP systems can be divided into two large categories: relational online analytical processing (ROLAP) and multidimensional analytical processing (MOLAP). Each of these subcategories could be articles in and of themselves; we will not dive deeply into these. Essentially, ROLAP is OLAP applied to relational databases and MOLAP is OLAP applied to multidimensional databases. If you’re interested in learning more about the differences between these flavors of OLAP systems, check out this article.

The differences between OLAP and OLTP systems are most prominent when the MOLAP flavor of OLAP systems is considered. The most prominent characteristics of MOLAP are the storage of massive amounts of data, ability to optimize ad hoc queries, prioritization of response time over transaction throughput, and a column oriented storage model. Clearly, these stand in contrast to the canonical OLTP system.

Let’s examine why a column oriented storage model is beneficial for OLAP systems. Suppose a data scientist employed by Claire’s Cakes wants to calculate the daily total number of cakes ordered over the past week. He or she would want an efficient way of grabbing a particular field across all orders placed: number of cakes ordered. Querying a single field across unrelated entities highlights the benefit of columnar storage model: aggregation is more efficient. As George Fraser stated in this episode of Software Engineering Daily, “[…]column stores actually go way beyond just the file format. Every level of the database is implemented in different, and in many cases opposite, ways.”

Data Movement and Where Data Lives

Figure 5

The authoritative way to move data from one data store to another is by making use of an ETL process. ETL is an acronym for, “extract, transform, load.” George Fraser, the CEO of Fivetran, which is a company that builds data connectors, explains that ETL “just refers to the nitty-gritty of the process of getting data from a source […] into a database.”

An ETL process loads data into a kind of data store known as a data warehouse. Data warehouses have structured data; there is an exact schema for data loaded into a data warehouse. Not only does a schema improve how efficiently queries are served, but it also provides a level of consistency that’s able to serve the needs of employees across an organization.

Another common data movement pattern is ELT: extract, load, transform. Rather than applying structure before loading it, data is simply dumped into a data store. This kind of data store is known as a data lake. As one may imagine, loading data into a data lake is easier than that into a data warehouse: the data doesn’t need to be transformed before the loading step. However, this comes at a price: a schema is applied to data at query time.

Figure 6

Regardless of ETL or ELT, there are two main processing paradigms in the world of data movement: batch and stream processing. A breakdown of batch and stream processing systems could be a separate article entirely. Batch processing performs ETL on chunks of data that have already been stored, whereas stream processing performs ETL on data that has yet to be persisted. Popular stream processing frameworks include Kafka, Flink, Samza, and Storm.

Batch processing can yield near real time analytics. However, near real time does not fit all business use cases. Stream processing is often used when real time analytics are required. If you’re interested in learning more about the differences between batch and stream processing, check out these related articles and podcasts.

]]>9279JAMStack Content Management with Scott Gallant, Jordan Patterson, and Nolan Phillipshttp://softwareengineeringdaily.com/2020/04/30/jamstack-content-management-with-scott-gallant-jordan-patterson-and-nolan-phillips/?utm_source=rss&utm_medium=rss&utm_campaign=jamstack-content-management-with-scott-gallant-jordan-patterson-and-nolan-phillips
Thu, 30 Apr 2020 09:00:13 +0000http://softwareengineeringdaily.com/?p=9257A content management system (CMS) defines how the content on a website is arranged and presented. The most widely used CMS is WordPress, the open source tool that is written in PHP. A large percentage of the web consists of WordPress sites, and WordPress has a huge ecosystem of plugins and templates. Despite the success

A content management system (CMS) defines how the content on a website is arranged and presented. The most widely used CMS is WordPress, the open source tool that is written in PHP. A large percentage of the web consists of WordPress sites, and WordPress has a huge ecosystem of plugins and templates.

Despite the success of WordPress, the JAMStack represents the future of web development. JAM stands for JavaScript, APIs, and Markup. In contrast to the monolithic WordPress deployments, a JAMStack site consists of loosely coupled components. And there are numerous options for a CMS in this environment.

TinaCMS is one such option. TinaCMS is an acronym for “Tina Is Not A CMS”, and it is a toolkit for content management. Scott Gallant, Jordan Patterson, and Nolan Phillips work on TinaCMS, and they join the show to explore the topic of content management on the JAMStack.

]]>A content management system (CMS) defines how the content on a website is arranged and presented. The most widely used CMS is WordPress, the open source tool that is written in PHP. A large percentage of the web consists of WordPress sites,A content management system (CMS) defines how the content on a website is arranged and presented. The most widely used CMS is WordPress, the open source tool that is written in PHP. A large percentage of the web consists of WordPress sites, and WordPress has a huge ecosystem of plugins and templates. Despite the successSoftware Engineering Daily55:499257LinkedIn: Building with agility for our members in times of needhttp://softwareengineeringdaily.com/2020/04/29/linkedin-building-with-agility-for-our-members-in-times-of-need/?utm_source=rss&utm_medium=rss&utm_campaign=linkedin-building-with-agility-for-our-members-in-times-of-need
Wed, 29 Apr 2020 14:00:46 +0000http://softwareengineeringdaily.com/?p=9280by Maria Zhang Reposted from LinkedIn The needs of our members and customers have evolved rapidly in recent weeks, and we’ve been focused on evolving alongside them to meet their needs to the best of our ability. Recently, we introduced the ability for healthcare and essential organizations that have urgent hiring needs to post jobs for

The needs of our members and customers have evolved rapidly in recent weeks, and we’ve been focused on evolving alongside them to meet their needs to the best of our ability. Recently, we introduced the ability for healthcare and essential organizations that have urgent hiring needs to post jobs for free on LinkedIn, and unlocked free learning paths on LinkedIn Learning to help people navigate the shift to remote work. Both updates were in service of providing our members access to tools and skills to help them navigate these uncertain macroeconomic times.

With talent needs and the way we work changing rapidly, we needed to be agile and develop these two new features in a condensed timeline while maintaining our high standards in areas like security and intuitive design.

While we’re still learning and adapting with each new day, there are several best practices that our engineering teams have embraced to help us rapidly execute these recent offerings.

Leverage where possible

It can be natural to try and tackle a new problem with new tricks, but when time is a critical factor, sometimes the best path forward involves leveraging solutions you’ve already found. Building the workflow that allows companies to post free jobs wasn’t as simple as flipping a switch. It involved making adjustments to the infrastructure used for job postings, creation, billing, and our recommendation systems. Since there is no complete framework already in place to offer free jobs on LinkedIn, the team came up with a solution that would allow us to build and launch within two weeks.

We realized the simplest way to make these critical jobs available faster was to modify our existing billing and relevance systems to allow for a $0 budget for posting a job. Leveraging other existing structures also helped ensure these jobs would have the same anti-abuse heuristic and AI as we use with regular, paid postings.

We also wanted to make sure that the posting workflow for free jobs was easy to understand, as we didn’t have the time to create a new interactive guide. Instead, we used our existing popup and page banner infrastructure, which are normally used for notifications, to provide guidance for members and customers on how to successfully post their free job. We would have preferred to create a more customized support system, but using this existing resource allowed us to put the product in members’ hands more quickly.

Bring together a “tiger team”

Challenging times often bring out the best in us, including a desire to help solve challenges for our teammates, neighbors, and friends. To help tackle these projects, we were fortunate to have employees from several cross-functional teams raise their hands to create a focused tiger team to get these updates built and provided to members quickly.

To guide their work, the tiger team for the free job postings project identified three pillars that would govern their work:

Leverage existing infrastructure where possible;

Create designs that could be quickly implemented;

Maintain a good product experience, as well as a high security and trust standard.

These pillars allowed the team to stay focused and leverage both technical stacks as well as the product design with the emphasis on our top priority, which is trust. They became our true north in terms of the best approaches when building effective solutions quickly. As a fully remote operation, this focused team needed to maintain constant contact using collaboration tools, working together to quickly resolve any blockers. The decision making process, alongside the pillars above, had been fine-tuned to facilitate fast execution.

Embrace “hacks” to accelerate

Craftsmanship is one of our core values as an engineering team, but sometimes, you need to be willing to “hack” a sound solution, even if it is not yet perfected, when prioritizing time to market. Of course, this tradeoff should be treated as an exception, not a rule, in order to avoid accruing technical debt. Our team weighed the amount of time it would take to write new code and build out a scalable platform from scratch alongside our current member needs, and ultimately decided that this approach was warranted under the circumstances. In the case of rolling out free courses on LinkedIn Learning, getting creative about piecemealing together existing code helped us achieve the desired end result on a condensed timeline.

The main challenge facing the LinkedIn Learning team was making the courses accessible for free and blocking the calls to action to subscribe or purchase individual courses that are normally woven into the system for guests (users who are not logged into the platform) and non-paid users. The team quickly evaluated the existing architecture, assessed risks, got buy-in from the right stakeholders, and then implemented a hack that leveraged code from a previous project that would enable us to move quickly in our goal of providing free learning tools to members during this time. This hack consisted of changes that had been deployed to codebases for both logged-in and guest experiences as a way to block upsells from appearing on courses in these learning paths. Using this hack enabled us to optimize for time-to-market, while maintaining our standards for trust, safety, and member experience. The team is now building a tool that would allow content and marketing teams to remove the call to action to subscribe or purchase on unlocked courses in bulk in special circumstances.

Takeaway

The daily changing dynamics around the world right now mean our member’s needs—and how we can best serve them as a platform—are constantly evolving. It has required us to be agile and quickly engineer solutions that would otherwise have taken longer to bring to life while reminding us that in unprecedented times, flexibility, teamwork, and agility are paramount.

]]>9280Prefect Dataflow Scheduler with Jeremiah Lowinhttp://softwareengineeringdaily.com/2020/04/29/prefect-dataflow-scheduler-with-jeremiah-lowin/?utm_source=rss&utm_medium=rss&utm_campaign=prefect-dataflow-scheduler-with-jeremiah-lowin
Wed, 29 Apr 2020 09:00:18 +0000http://softwareengineeringdaily.com/?p=9259A data workflow scheduler is a tool used for connecting multiple systems together in order to build pipelines for processing data. A data pipeline might include a Hadoop task for ETL, a Spark task for stream processing, and a TensorFlow task to train a machine learning model. The workflow scheduler manages the tasks in that

A data workflow scheduler is a tool used for connecting multiple systems together in order to build pipelines for processing data. A data pipeline might include a Hadoop task for ETL, a Spark task for stream processing, and a TensorFlow task to train a machine learning model.

The workflow scheduler manages the tasks in that data pipeline and the logical flow between them. Airflow is a popular data workflow scheduler that was originally created at Airbnb. Since then, the project has been adopted by numerous companies that need workflow orchestration for their data pipelines. Jeremiah Lowin was a core committer to Airflow for several years before he identified several features of Airflow that he wanted to change.

Prefect is a dataflow scheduler that was born out of Jeremiah’s experience working with Airflow. Prefect’s features include data sharing between tasks, task parameterization, and a different API than Airflow. Jeremiah joins the show to discuss Prefect, and how his experience with Airflow led to his current work in dataflow scheduling.

]]>A data workflow scheduler is a tool used for connecting multiple systems together in order to build pipelines for processing data. A data pipeline might include a Hadoop task for ETL, a Spark task for stream processing,A data workflow scheduler is a tool used for connecting multiple systems together in order to build pipelines for processing data. A data pipeline might include a Hadoop task for ETL, a Spark task for stream processing, and a TensorFlow task to train a machine learning model. The workflow scheduler manages the tasks in thatSoftware Engineering Daily1:04:549259CockroachDB with Peter Mattishttp://softwareengineeringdaily.com/2020/04/28/cockroachdb-with-peter-mattis/?utm_source=rss&utm_medium=rss&utm_campaign=cockroachdb-with-peter-mattis
Tue, 28 Apr 2020 09:00:12 +0000http://softwareengineeringdaily.com/?p=9256A relational database often holds critical operational data for a company, including user names and financial information. Since this data is so important, a relational database must be architected to avoid data loss. Relational databases need to be a distributed system in order to provide the fault tolerance necessary for production use cases. If a

A relational database often holds critical operational data for a company, including user names and financial information. Since this data is so important, a relational database must be architected to avoid data loss.

Relational databases need to be a distributed system in order to provide the fault tolerance necessary for production use cases. If a database node goes down, the database must be able to recover smoothly without data loss, and this requires having all of the data in the database replicated beyond a single node.

If you write to a distributed transactional database, that write must propagate to each of the other nodes in the database. If you read from a distributed database, that read must return the same data that any other database reader would see. These constraints can be satisfied differently depending on the design of the database system. As a result, there is a vast market of distributed databases from cloud providers and software vendors.

CockroachDB is an open source, globally consistent relational database. CockroachDB is heavily informed by Google Spanner, the relational database that Google uses for much of its transactional workloads. Peter Mattis is a co-founder of CockroachDB, and he joins the show to discuss the architecture of CockroachDB, the process of building a business around a database, and his memories working on distributed systems at Google. Full disclosure: CockroachDB is a sponsor of Software Engineering Daily.

]]>A relational database often holds critical operational data for a company, including user names and financial information. Since this data is so important, a relational database must be architected to avoid data loss.A relational database often holds critical operational data for a company, including user names and financial information. Since this data is so important, a relational database must be architected to avoid data loss. Relational databases need to be a distributed system in order to provide the fault tolerance necessary for production use cases. If aSoftware Engineering Daily56:559256Dask: Scalable Python with Matthew Rocklinhttp://softwareengineeringdaily.com/2020/04/27/dask-scalable-python-with-matthew-rocklin/?utm_source=rss&utm_medium=rss&utm_campaign=dask-scalable-python-with-matthew-rocklin
Mon, 27 Apr 2020 09:00:12 +0000http://softwareengineeringdaily.com/?p=9255Python is the most widely used language for data science, and there are several libraries that are commonly used by Python data scientists including Numpy, Pandas, and scikit-learn. These libraries improve the user experience of a Python data scientist by giving them access to high level APIs. Data science is often performed over huge datasets,

Python is the most widely used language for data science, and there are several libraries that are commonly used by Python data scientists including Numpy, Pandas, and scikit-learn. These libraries improve the user experience of a Python data scientist by giving them access to high level APIs.

Data science is often performed over huge datasets, and the data structures that are instantiated with those datasets need to be spread across multiple machines. To manage large distributed datasets, a library such as scikit-learn can use a system called Dask. Dask allows the instantiation of data structures such as a Dask dataframe or a Dask array.

Matthew Rocklin is the creator of Dask. He joins the show to talk about distributed computing with Dask, its use cases, and the Python ecosystem. He also provides a detailed comparison between Dask and Spark, which is also used for distributed data science.

]]>Python is the most widely used language for data science, and there are several libraries that are commonly used by Python data scientists including Numpy, Pandas, and scikit-learn. These libraries improve the user experience of a Python data scientist...Python is the most widely used language for data science, and there are several libraries that are commonly used by Python data scientists including Numpy, Pandas, and scikit-learn. These libraries improve the user experience of a Python data scientist by giving them access to high level APIs. Data science is often performed over huge datasets,Software Engineering Daily1:01:079255Rasa: Conversational AI with Tom Bocklischhttp://softwareengineeringdaily.com/2020/04/24/rasa-conversational-ai-with-tom-bocklisch/?utm_source=rss&utm_medium=rss&utm_campaign=rasa-conversational-ai-with-tom-bocklisch
Fri, 24 Apr 2020 09:00:02 +0000http://softwareengineeringdaily.com/?p=9221Chatbots became widely popular around 2016 with the growth of chat platforms like Slack and voice interfaces such as Amazon Alexa. As chatbots came into use, so did the infrastructure that enabled chatbots. NLP APIs and complete chatbot frameworks came out to make it easier for people to build chatbots. The first suite of chatbot

Chatbots became widely popular around 2016 with the growth of chat platforms like Slack and voice interfaces such as Amazon Alexa. As chatbots came into use, so did the infrastructure that enabled chatbots. NLP APIs and complete chatbot frameworks came out to make it easier for people to build chatbots.

The first suite of chatbot frameworks were largely built around rule-based state machine systems. These systems work well for a narrow set of use cases, but fall over when it comes to chatbot models that are more complex. Rasa was started in 2015, amidst the chatbot fever.

Since then, Rasa has developed a system that allows a chatbot developer to train their bot through a system called interactive learning. With interactive learning, I can deploy my bot, spend some time talking to it, and give that bot labeled feedback on its interactions with me. Rasa has open source tools for natural language understanding, dialogue management, and other components needed by a chatbot developer.

Tom Bocklisch works at Rasa, and he joins the show to give some background on the field of chatbots and how Rasa has evolved over time.

]]>Chatbots became widely popular around 2016 with the growth of chat platforms like Slack and voice interfaces such as Amazon Alexa. As chatbots came into use, so did the infrastructure that enabled chatbots.Chatbots became widely popular around 2016 with the growth of chat platforms like Slack and voice interfaces such as Amazon Alexa. As chatbots came into use, so did the infrastructure that enabled chatbots. NLP APIs and complete chatbot frameworks came out to make it easier for people to build chatbots. The first suite of chatbotSoftware Engineering Daily59:579221Cloudburst: Stateful Functions-as-a-Service with Vikram Sreekantihttp://softwareengineeringdaily.com/2020/04/23/cloudburst-stateful-functions-as-a-service-with-vikram-sreekanti/?utm_source=rss&utm_medium=rss&utm_campaign=cloudburst-stateful-functions-as-a-service-with-vikram-sreekanti
Thu, 23 Apr 2020 09:00:00 +0000http://softwareengineeringdaily.com/?p=9220Serverless computing is a way of designing applications that do not directly address or deploy application code to servers. Serverless applications are composed of stateless functions-as-a-service and stateful data storage systems such as Redis or DynamoDB. Serverless applications allow for scaling up and down the entire architecture, because each component is naturally scalable. And this

Serverless computing is a way of designing applications that do not directly address or deploy application code to servers. Serverless applications are composed of stateless functions-as-a-service and stateful data storage systems such as Redis or DynamoDB.

Serverless applications allow for scaling up and down the entire architecture, because each component is naturally scalable. And this pattern can be used to create a wide variety of applications. The functions-as-a-service can handle the compute logic, and the data storage systems can handle the storage. But these applications do not give the developer as much flexibility as an ideal serverless system might. The developer would need to use cloud-specific state management systems.

Vikram Sreekanti is the creator of Cloudburst, a system for stateful functions as a service. Cloudburst is architected as a set of VMs that can execute functions-as-a-service that are scheduled onto them. Each VM can utilize a local cache, as well as an autoscaling key-value store called Anna which is accessible to the Cloudburst runtime components. Vikram joins the show to talk about serverless computing and his efforts to build stateful serverless functionality.

]]>Serverless computing is a way of designing applications that do not directly address or deploy application code to servers. Serverless applications are composed of stateless functions-as-a-service and stateful data storage systems such as Redis or Dyna...Serverless computing is a way of designing applications that do not directly address or deploy application code to servers. Serverless applications are composed of stateless functions-as-a-service and stateful data storage systems such as Redis or DynamoDB. Serverless applications allow for scaling up and down the entire architecture, because each component is naturally scalable. And thisSoftware Engineering Daily59:359220Heroku CI/CDhttp://softwareengineeringdaily.com/2020/04/22/heroku-ci-cd/?utm_source=rss&utm_medium=rss&utm_campaign=heroku-ci-cd
Wed, 22 Apr 2020 15:00:41 +0000http://softwareengineeringdaily.com/?p=9240Heroku Flow You’ve written great code for an application- now what? Depending on the size and structure of your organization, testing, branching, and deployment may be handled in different ways. In a large organization with highly specialized departments, developers may hand off new features or bug fixes to other teams to handle integration and deployment.

You’ve written great code for an application- now what? Depending on the size and structure of your organization, testing, branching, and deployment may be handled in different ways. In a large organization with highly specialized departments, developers may hand off new features or bug fixes to other teams to handle integration and deployment. Smaller teams may rely more on the developers to handle operations processes or testing. Perhaps you’re an army of one- in that case, you’re responsible for the whole pipeline to get your code from your local machine to production. Regardless of scope, application deployment can be a complex process with several moving parts. To manage the developer-to-user pipeline, teams often use a workflow known as “CI/CD.” CI/CD typically refers to “Continuous Integration/Continuous Delivery.” CI/CD workflows grew out of the application of Agile Development practices to integration and delivery processes.

“Continuous Integration” and “Continuous Delivery” can mean something different to everyone, but there are broadly defined features that define them. Continuous Integration refers to a development practice where changes to a codebase are pushed to a version control repository and merged into the master branch frequently. Typically, these changes trigger testing to verify integration, though tests are not strictly part of the definition of CI. Continuous Delivery is a logical “next step” to Continuous Integration. Continuous Delivery is a delivery model involving frequent builds, short release cycles, and automated processes. Like CI, CD workflows frequently feature automated testing that ensures the changes being pushed work properly before being released.

Heroku Flow is a continuous delivery solution available as part of Heroku, a platform-as-a-service provider. Heroku Flow incorporates several features of the Heroku system into an integrated process. Heroku is designed to be a developer-focused product, so individual elements of Heroku Flow were built to solve specific problems faced by internal Heroku developers, or users of the platform. In our interview with Andy Appleton, an engineer on the Heroku Flow team, he described how CI/CD tooling was a natural outgrowth of the Heroku platform, but that the overall evolution of Flow was a piece-by-piece process that was later integrated into end-to-end toolset.

“The way that our team likes to work is we picked one feature that we could deliver in a fairly short amount of time, which was the automatic deployment piece at the beginning. Then we think to ourselves what feels like it’s missing now. Where is the pain point? Then we build review apps. Then once we’ve got review apps, then you start saying, “Well, there’s no formal definition of staging or production. Then we start to think about pipelines and it sort of rolls on one thing to the next to the next.”

Even though Heroku Flow was built in an evolutionary process, the Heroku team has organized the various functions created for CI/CD into a structured deployment workflow. Overall, Heroku is an opinionated, convention-over-configuration platform, which uses plug-and-play components to abstract away some of the lower-level infrastructure tasks. Heroku Flow fits within that design philosophy by building visual tools to manage some of the more abstract or complex stages of a CI/CD workflow. For example, Heroku Pipelines takes the multi-environment testing and deployment process and makes it visual. The environment feels similar to a Kanban board system like Jira or Trello, which are familiar technologies for developers working in an Agile environment.

Pipelines also work with Heroku CI, a “low-configuration test runner” that allows easy automated testing. Moving code from one stage to another can be done manually or automatically, and tests can be applied at any stage. Heroku CI uses “disposable apps” that run quickly and have a strong test/prod parity since Heroku has access to the production environment configuration information. Developers can also use Heroku ChatOps to move code through pipelines via a Slack integration, and receive Slack notifications about pull requests, test results, and merges.

Heroku Flow provides Review Apps, which are temporary test apps spun up for every opened pull request on GitHub. Review Apps provide a convenient way to allow any user in an organization to take a “test drive” of the code in a given pull request, which is especially helpful for non-technical users who may not be comfortable running a branch in their local environment. Review Apps last for the duration of the pull request and are destroyed when the code is merged. If a developer makes a new commit on the feature branch in the pull request, the Review App will be automatically updated allowing for quick iteration and rapid feedback cycles.

Heroku Flow puts Release Phase tasks directly in an app’s Dashboard, including database migration, uploading assets to a CDN, cache management, and more. Heroku already allows developers to manage tools such as Postgres databases with a few commands in the Dashboard or the Heroku CLI, and this philosophy of simplicity and streamlining carries over to release and deployment. These tasks can be run automatically, ensuring high-quality and reliable releases.

A continuous integration and delivery pipeline a number of advantages for a development team, including accelerated time to market, dependable releases, and improved developer productivity. Heroku Flow provides intuitive and effective tooling for a CI/CD workflow and is definitely worth a look for developers looking to capture the benefits of CI/CD. For more on Heroku Flow, check out our interview with Andy Appleton of Heroku, an engineer on the Flow team. For more on Heroku in general, check out our Heroku archives or visit their website.

]]>9240NGINX API Management with Kevin Joneshttp://softwareengineeringdaily.com/2020/04/22/nginx-api-management-with-kevin-jones/?utm_source=rss&utm_medium=rss&utm_campaign=nginx-api-management-with-kevin-jones
Wed, 22 Apr 2020 09:00:57 +0000http://softwareengineeringdaily.com/?p=9219NGINX is a web server that can be used to manage the APIs across an organization. Managing these APIs involves deciding on the routing and load balancing across the servers which host them. If the traffic of a website suddenly spikes, the website needs to spin up new replica servers and update the API gateway

NGINX is a web server that can be used to manage the APIs across an organization. Managing these APIs involves deciding on the routing and load balancing across the servers which host them. If the traffic of a website suddenly spikes, the website needs to spin up new replica servers and update the API gateway to route traffic to those new replicas.

Some servers should not be accessible to outside traffic, and policy management is used to configure the security policies of different APIs. And as a company grows, the number of APIs also grows, increasing the complexity of managing routing logic and policies.

Kevin Jones is a product manager with NGINX. He joins the show to discuss how API management has changed with the growth of cloud and mobile, and how NGINX has evolved over that period of time. Full disclosure: NGINX is a sponsor of Software Engineering Daily.

]]>NGINX is a web server that can be used to manage the APIs across an organization. Managing these APIs involves deciding on the routing and load balancing across the servers which host them. If the traffic of a website suddenly spikes,NGINX is a web server that can be used to manage the APIs across an organization. Managing these APIs involves deciding on the routing and load balancing across the servers which host them. If the traffic of a website suddenly spikes, the website needs to spin up new replica servers and update the API gatewaySoftware Engineering Daily53:119219Frontend Monitoring with Matt Arbesfeldhttp://softwareengineeringdaily.com/2020/04/21/frontend-monitoring-with-matt-arbesfeld/?utm_source=rss&utm_medium=rss&utm_campaign=frontend-monitoring-with-matt-arbesfeld
Tue, 21 Apr 2020 09:00:56 +0000http://softwareengineeringdaily.com/?p=9218Web development has historically had more work being done on the server than on the client. The observability tooling has reflected this emphasis on the backend. Monitoring tools for log management and backend metrics have existed for decades, helping developers debug their server infrastructure. Today, web frontends have more work to do. Detailed components in

Web development has historically had more work being done on the server than on the client. The observability tooling has reflected this emphasis on the backend. Monitoring tools for log management and backend metrics have existed for decades, helping developers debug their server infrastructure.

Today, web frontends have more work to do. Detailed components in frameworks such as React and Angular might respond quickly without waiting for a network request, with their mutations being processed entirely in the browser. This results in better user experiences, but more work is being done on the client side, away from the backend observability tools.

Matt Arbesfeld is a co-founder of LogRocket, a tool that records and plays back browser sessions and allows engineers to look at those sessions to understand what kinds of issues are occurring in the user’s browser. Matt joins the show to talk about the field of frontend monitoring, and the engineering behind his company LogRocket.

]]>Web development has historically had more work being done on the server than on the client. The observability tooling has reflected this emphasis on the backend. Monitoring tools for log management and backend metrics have existed for decades,Web development has historically had more work being done on the server than on the client. The observability tooling has reflected this emphasis on the backend. Monitoring tools for log management and backend metrics have existed for decades, helping developers debug their server infrastructure. Today, web frontends have more work to do. Detailed components inSoftware Engineering Daily57:299218Zoom Vulnerabilities with Patrick Wardlehttp://softwareengineeringdaily.com/2020/04/20/zoom-vulnerabilities-with-patrick-wardle/?utm_source=rss&utm_medium=rss&utm_campaign=zoom-vulnerabilities-with-patrick-wardle
Mon, 20 Apr 2020 09:00:55 +0000http://softwareengineeringdaily.com/?p=9217Zoom video chat has become an indispensable part of our lives. In a crowded market of video conferencing apps, Zoom managed to build a product that performs better than the competition, scaling with high quality to hundreds of meeting participants, and millions of concurrent users. Zoom’s rapid growth in user adoption came from its focus

Zoom video chat has become an indispensable part of our lives. In a crowded market of video conferencing apps, Zoom managed to build a product that performs better than the competition, scaling with high quality to hundreds of meeting participants, and millions of concurrent users.

Zoom’s rapid growth in user adoption came from its focus on user experience and video call quality. This focus on product quality came at some cost to security quality. As our entire digital world has moved onto Zoom, the engineering community has been scrutinizing Zoom more closely, and discovered several places where the security practices of Zoom are lacking.

Patrick Wardle is an engineer with a strong understanding of Apple products. He recently wrote about several vulnerabilities he discovered on Zoom, and joins the show to talk about the security of large client-side Mac applications as well as the specific vulnerabilities of Zoom.

]]>Zoom video chat has become an indispensable part of our lives. In a crowded market of video conferencing apps, Zoom managed to build a product that performs better than the competition, scaling with high quality to hundreds of meeting participants,Zoom video chat has become an indispensable part of our lives. In a crowded market of video conferencing apps, Zoom managed to build a product that performs better than the competition, scaling with high quality to hundreds of meeting participants, and millions of concurrent users. Zoom’s rapid growth in user adoption came from its focusSoftware Engineering Daily1:00:329217Facebook OpenStreetMap Engineering with Saurav Mohapatra and Jacob Wassermanhttp://softwareengineeringdaily.com/2020/04/17/facebook-openstreetmap-engineering-with-saurav-mapatra-and-jacob-wasserman/?utm_source=rss&utm_medium=rss&utm_campaign=facebook-openstreetmap-engineering-with-saurav-mapatra-and-jacob-wasserman
Fri, 17 Apr 2020 09:00:16 +0000http://softwareengineeringdaily.com/?p=9180Facebook applications use maps for showing users where to go. These maps can display businesses, roads, and event locations. Understanding the geographical world is also important for performing search queries that take into account a user’s location. For all of these different purposes, Facebook needs up-to-date, reliable mapping data. OpenStreetMap is an open system for

Facebook applications use maps for showing users where to go. These maps can display businesses, roads, and event locations. Understanding the geographical world is also important for performing search queries that take into account a user’s location. For all of these different purposes, Facebook needs up-to-date, reliable mapping data.

OpenStreetMap is an open system for accessing mapping data. Anyone can use OpenStreetMap to add maps to their application. The data in OpenStreetMap is crowdsourced by users who submit updates to the OpenStreetMap database. Since anyone can submit data to OpenStreetMap, there is a potential for bad data to appear in the system.

Facebook uses OpenStreetMap for its mapping data, including for important applications where bad data would impact a map user in a meaningfully negative way. In order to avoid this, Facebook builds infrastructure tools to improve the quality of its maps. Saurav Mohapatra and Jacob Wasserman work at Facebook on its mapping infrastructure, and join the show to talk about the tooling Facebook has built around OpenStreetMap data.

]]>Facebook applications use maps for showing users where to go. These maps can display businesses, roads, and event locations. Understanding the geographical world is also important for performing search queries that take into account a user’s location.Facebook applications use maps for showing users where to go. These maps can display businesses, roads, and event locations. Understanding the geographical world is also important for performing search queries that take into account a user’s location. For all of these different purposes, Facebook needs up-to-date, reliable mapping data. OpenStreetMap is an open system forSoftware Engineering Daily1:01:009180NGINX Service Mesh with Alan Murphyhttp://softwareengineeringdaily.com/2020/04/16/nginx-service-mesh-with-alan-murphy/?utm_source=rss&utm_medium=rss&utm_campaign=nginx-service-mesh-with-alan-murphy
Thu, 16 Apr 2020 09:00:40 +0000http://softwareengineeringdaily.com/?p=9144NGINX is a web server that is used as a load balancer, an API gateway, a reverse proxy, and other purposes. Core application servers such as Ruby on Rails are often supported by NGINX, which handles routing the user requests between the different application server instances. This model of routing and load balancing between different

NGINX is a web server that is used as a load balancer, an API gateway, a reverse proxy, and other purposes. Core application servers such as Ruby on Rails are often supported by NGINX, which handles routing the user requests between the different application server instances.

This model of routing and load balancing between different application instances has matured over the last ten years due to an increase in the number of servers, and an increase in the variety of services.

A pattern called “service mesh” has grown in popularity and is used to embed routing infrastructure closer to individual services by giving them a sidecar proxy. The application sidecars are connected to each other, and requests between any two services are routed through a proxy. These different proxies are managed by a central control plane which manages policies of the different proxies.

Alan Murphy works at NGINX, and he joins the show to give a brief history of NGINX and how the product has evolved from a reverse proxy and edge routing tool to a service mesh. Alan has worked in the world of load balancing and routing for more than a decade, having been at F5 Networks for many years before F5 acquired NGINX. We also discussed the business motivations behind the merger of those two companies. Full disclosure: NGINX is a sponsor of Software Engineering Daily.

]]>NGINX is a web server that is used as a load balancer, an API gateway, a reverse proxy, and other purposes. Core application servers such as Ruby on Rails are often supported by NGINX, which handles routing the user requests between the different appli...NGINX is a web server that is used as a load balancer, an API gateway, a reverse proxy, and other purposes. Core application servers such as Ruby on Rails are often supported by NGINX, which handles routing the user requests between the different application server instances. This model of routing and load balancing between differentSoftware Engineering Daily59:229144Shopify React Native with Farhan Thawarhttp://softwareengineeringdaily.com/2020/04/15/shopify-react-native-with-farhan-thawar/?utm_source=rss&utm_medium=rss&utm_campaign=shopify-react-native-with-farhan-thawar
Wed, 15 Apr 2020 09:00:12 +0000http://softwareengineeringdaily.com/?p=9179Shopify is a platform for selling products and building a business. It is a large e-commerce company with hundreds of engineers and several different mobile apps. Shopify’s engineering culture is willing to adopt new technologies aggressively, trying new tools that might provide significant leverage to the organization. React Native is one of those technologies. React

Shopify is a platform for selling products and building a business. It is a large e-commerce company with hundreds of engineers and several different mobile apps. Shopify’s engineering culture is willing to adopt new technologies aggressively, trying new tools that might provide significant leverage to the organization.

React Native is one of those technologies. React Native can be used to make cross-platform mobile development easier by allowing code reuse between Android and iOS. React Native was developed within Facebook, and has been adopted by several other prominent technology companies, with varying degrees of success.

Many companies have seen improvements to their mobile development and release process. However, in a previous episode, we talked with Airbnb about their adoption of React Native, which was less successful.

Farhan Thawar is a VP of engineering at Shopify. He joins the show to talk about Shopify’s experience using React Native, the benefits of cross-platform development, and his perspective on when it is not a good idea to use React Native.

]]>Shopify is a platform for selling products and building a business. It is a large e-commerce company with hundreds of engineers and several different mobile apps. Shopify’s engineering culture is willing to adopt new technologies aggressively,Shopify is a platform for selling products and building a business. It is a large e-commerce company with hundreds of engineers and several different mobile apps. Shopify’s engineering culture is willing to adopt new technologies aggressively, trying new tools that might provide significant leverage to the organization. React Native is one of those technologies. ReactSoftware Engineering Daily56:519179Ceph Storage System with Sage Weilhttp://softwareengineeringdaily.com/2020/04/14/ceph-storage-system-with-sage-weil/?utm_source=rss&utm_medium=rss&utm_campaign=ceph-storage-system-with-sage-weil
Tue, 14 Apr 2020 09:00:11 +0000http://softwareengineeringdaily.com/?p=9178Ceph is a storage system that can be used for provisioning object storage, block storage, and file storage. These storage primitives can be used as the underlying medium for databases, queueing systems, and bucket storage. Ceph is used in circumstances where the developer may not want to use public cloud resources like Amazon S3. As

Ceph is a storage system that can be used for provisioning object storage, block storage, and file storage. These storage primitives can be used as the underlying medium for databases, queueing systems, and bucket storage. Ceph is used in circumstances where the developer may not want to use public cloud resources like Amazon S3.

As an example, consider telecom infrastructure. Telecom companies that have their own data centers need software layers which make it simpler for the operators and developers that are working with that infrastructure to spin up databases and other abstractions with the same easy experience that is provided by a cloud provider by AWS.

Sage Weil has been a core developer on Ceph since 2005, and the company he started around Ceph sold to Red Hat for $175 million. Sage joins the show to talk about the engineering behind Ceph and his time spent developing companies.

]]>Ceph is a storage system that can be used for provisioning object storage, block storage, and file storage. These storage primitives can be used as the underlying medium for databases, queueing systems, and bucket storage.Ceph is a storage system that can be used for provisioning object storage, block storage, and file storage. These storage primitives can be used as the underlying medium for databases, queueing systems, and bucket storage. Ceph is used in circumstances where the developer may not want to use public cloud resources like Amazon S3. AsSoftware Engineering Daily54:569178Collaborative SQL with Rahil Sondhihttp://softwareengineeringdaily.com/2020/04/13/collaborative-sql-with-rahil-sondhi/?utm_source=rss&utm_medium=rss&utm_campaign=collaborative-sql-with-rahil-sondhi
Mon, 13 Apr 2020 09:00:09 +0000http://softwareengineeringdaily.com/?p=9177Data analysts need to collaborate with each other in the same way that software engineers do. They also need a high quality development environment. These data analysts are not working with programming languages like Java and Python, so they are not using an IDE such as Eclipse. Data analysts predominantly use SQL, and the tooling

Data analysts need to collaborate with each other in the same way that software engineers do. They also need a high quality development environment.

These data analysts are not working with programming languages like Java and Python, so they are not using an IDE such as Eclipse. Data analysts predominantly use SQL, and the tooling for a data analyst to work with SQL is often a SQL explorer tool that lacks the kind of collaborative experience that we would expect in the age of Slack and GitHub.

Rahil Sondhi is the creator of PopSQL, a collaborative SQL explorer. He created PopSQL after several years in the software industry, including 4 years at Instacart. Rahil joins the show to talk about the frictions that data analysts encounter when working with databases, and how those frictions led to the design of PopSQL.

]]>Data analysts need to collaborate with each other in the same way that software engineers do. They also need a high quality development environment. These data analysts are not working with programming languages like Java and Python,Data analysts need to collaborate with each other in the same way that software engineers do. They also need a high quality development environment. These data analysts are not working with programming languages like Java and Python, so they are not using an IDE such as Eclipse. Data analysts predominantly use SQL, and the toolingSoftware Engineering Daily48:269177Reserved Instances with Aran Khannahttp://softwareengineeringdaily.com/2020/04/10/reserved-instances-with-aran-khanna/?utm_source=rss&utm_medium=rss&utm_campaign=reserved-instances-with-aran-khanna
Fri, 10 Apr 2020 09:00:06 +0000http://softwareengineeringdaily.com/?p=9156When a developer spins up a virtual machine on AWS, that virtual machine could be purchased using one of several types of cost structures. These cost structures include on-demand instances, spot instances, and reserved instances. On-demand instances are often the most expensive, because the developer gets reliable VM infrastructure without committing to long-term pricing. Spot

When a developer spins up a virtual machine on AWS, that virtual machine could be purchased using one of several types of cost structures. These cost structures include on-demand instances, spot instances, and reserved instances.

On-demand instances are often the most expensive, because the developer gets reliable VM infrastructure without committing to long-term pricing. Spot instances are cheap, spare compute capacity with lower reliability, that is available across AWS infrastructure. Reserved instances allow a developer to purchase longer term VM contracts for a lower price.

Reserved instances can provide significant savings, but it can be difficult to calculate how much infrastructure to purchase. Aran Khanna is the founder of Reserved.ai, a company that builds cost management tools for AWS. He joins the show to talk about the landscape of cost management, and what he is building with Reserved.ai.

]]>When a developer spins up a virtual machine on AWS, that virtual machine could be purchased using one of several types of cost structures. These cost structures include on-demand instances, spot instances, and reserved instances.When a developer spins up a virtual machine on AWS, that virtual machine could be purchased using one of several types of cost structures. These cost structures include on-demand instances, spot instances, and reserved instances. On-demand instances are often the most expensive, because the developer gets reliable VM infrastructure without committing to long-term pricing. SpotSoftware Engineering Daily59:349156Snorkel: Training Dataset Management with Braden Hancockhttp://softwareengineeringdaily.com/2020/04/09/snorkel-training-dataset-management-with-braden-hancock/?utm_source=rss&utm_medium=rss&utm_campaign=snorkel-training-dataset-management-with-braden-hancock
Thu, 09 Apr 2020 09:00:42 +0000http://softwareengineeringdaily.com/?p=9145Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don’t have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into the

Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don’t have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into the training process.

Snorkel is a system for scaling the creation of labeled training data. In Snorkel, human subject matter experts create labeling functions, and these functions are applied to large quantities of data in order to label it.

For example, if I want to generate training data about spam emails, I don’t have to hire 1000 email experts to look at emails and determine if they are spam or not. I can hire just a few email experts, and have them define labeling functions that can indicate whether an email is spam. If that doesn’t make sense, don’t worry. We discuss it in more detail in this episode.

Braden Hancock works on Snorkel, and he joins the show to talk about the labeling problems in machine learning, and how Snorkel helps alleviate those problems. We have done many shows on machine learning in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about machine learning, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.

]]>Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don’t have large high quality data sets. For many applications,Machine learning models require the use of training data, and that data needs to be labeled. Today, we have high quality data infrastructure tools such as TensorFlow, but we don’t have large high quality data sets. For many applications, the state of the art is to manually label training examples and feed them into theSoftware Engineering Daily58:269145Cadence: Uber’s Workflow Engine with Maxim Fateevhttp://softwareengineeringdaily.com/2020/04/08/cadence-ubers-workflow-engine-with-maxim-fateev/?utm_source=rss&utm_medium=rss&utm_campaign=cadence-ubers-workflow-engine-with-maxim-fateev
Wed, 08 Apr 2020 09:00:28 +0000http://softwareengineeringdaily.com/?p=9101A workflow is an application that involves more than just a simple request/response communication. For example, consider a session of a user taking a ride in an Uber. The user initiates the ride, and the ride might last for an hour. At the end of the ride, the user is charged for the ride and

]]>A workflow is an application that involves more than just a simple request/response communication. For example, consider a session of a user taking a ride in an Uber. The user initiates the ride, and the ride might last for an hour. At the end of the ride, the user is charged for the ride and sent a transactional email.

Throughout this entire ride, there are many different services and database tables being accessed across the Uber infrastructure. The transactions across this infrastructure need to be processed despite server failures which may occur along the way.

Workflows are not just a part of Uber. Many different types of distributed operations at a company might be classified as a workflow: banking operations, spinning up a large cluster of machines, performing a distributed cron job.

Maxim Fateev is the founder of Temporal.io, and the co-creator of Cadence, a workflow orchestration engine. Maxim developed Cadence when he was at Uber, seeing the engineering challenges that come from trying to solve the workflow orchestration problem. Before Uber, Maxim worked at AWS on the Simple Workflow Service, which was also a system for running workflows. Altogether, Maxim has developed workflow software for more than a decade.

]]>A workflow is an application that involves more than just a simple request/response communication. For example, consider a session of a user taking a ride in an Uber. The user initiates the ride, and the ride might last for an hour.A workflow is an application that involves more than just a simple request/response communication. For example, consider a session of a user taking a ride in an Uber. The user initiates the ride, and the ride might last for an hour. At the end of the ride, the user is charged for the ride andSoftware Engineering Daily56:549101Debugging in a Remote Worldhttp://softwareengineeringdaily.com/2020/04/07/debugging-in-a-remote-world/?utm_source=rss&utm_medium=rss&utm_campaign=debugging-in-a-remote-world
Tue, 07 Apr 2020 14:00:29 +0000http://softwareengineeringdaily.com/?p=9128To learn more about Rookout go to softwareengineeringdaily.com/rookout. As long as there has been software, there has been a need to maintain and debug it; and as the world we live in is changing, so does the way we debug software. One of the key changes the world is seeing is remote work. Working remotely,

As long as there has been software, there has been a need to maintain and debug it; and as the world we live in is changing, so does the way we debug software. One of the key changes the world is seeing is remote work.Working remotely, and hence debugging remotely, isn’t new but it’s now taking the world by storm — whether we want it (modern workflows) or not (as elements like COVID-19 reshape our society).

The debugging relations matrix

Like most things in life this new way of working comes with both pros and cons, to best understand these and how this increasing trend will affect developers and the people around them, we should look at the debugging relations matrix, in which “Dev” stands for developers – e.g. software engineers, production engineers, SREs, etc. “Code” stands for the software being developed, used, and debugged. “User” stands for the person or persons using the software or overall product, and “Support” stands for the initial staff responding to detected issues or user requests.

User / Code – Users can’t help:With the complexity of modern software, migration to the cloud, and the overall expansion of backend components, users have very little access to the software and very little ability to understand it. While this isn’t affected directly by remote work; it does affect the other axes.

Dev / Code – Everything is remote and going further, companies adopting tools to adapt:With SaaS and cloud, code in production is always remote, and often so are staging and even the baseline development environments. Similar is the case for IoT, and edge computing solutions. And these of course join the classic remote customer on-premise deployments. On top of this remote-work makes it impossible/impractical for developers to get closer to the software even in the cases where it was technically feasible. This explosion outwards means that all software is created equal, and organizations know it makes sense to invest in remote access and monitoring to the software- and budgets for monitoring / logging are increasing and access for debugging is becoming commonplace.

Dev / User – “PEBKAC”:Developers (on average) often find customer interactions challenging, coming from often introverted and sarcastic/cynical mindsets. A common saying among devs is – PEBKAC – problem exists between keyboard and chair. Working remotely makes all cases here much harder- both identifying it as true, or disproving it.

Support / User – Getting to the bottom line / root-cause becomes harder:Person to person communication suffers greatly from remote interactions, and this manifests here most dramatically, as both sides are remote from the code and understanding it. While this axis usually has the most channels set in place (chats, conference calls, customer relation management, case management), getting the full picture, and being able to recreate incidents on the customer side, or replicating incidents on the support side remains and becomes even more difficult. The pain-points of this axis stream down (rather than trickle down) to the dev/support axis.

Dev / Support – innovation is coming :This realm of communication is ripe for innovation:

The need to allow remote access democratizes access to software and debugging- paving the road for new shared tools.

Unlike the Support/User channel which has seen multiple new offerings, communication between dev and support remains much as it did a decade ago. Mainly using ticketing systems (e.g. Jira) with the slight exception of tasks/project management solutions (e.g. Monday) which aren’t really geared towards the needs here.

The conversation around debugging, replicating issues, and obtaining and observing the right points of data – was always complex and has become even more so with the explosion of software and going remote.

Support / Code – Opportunity to empower support eng. – challenge in uplifting them:While tier-3 or tier-4 support engineers are capable of doing debugging, their limited access to the software remotely or otherwise makes it impractical for them to attempt even in the cases where they have enough skill. Support’s limited access for debugging also affects their ability to communicate with developers, as both sides are looking through different lenses on the same issues, and fail to work on a joint status picture.As remote work democratizes access (everyone is as far or close to the data as anyone else), by creating debugging solutions that adapt for both remote work, and various skill sets, there is a huge opportunity here to empower support engineers (on all levels) both enabling them to debug and solve issues on their own (removing pressure from R&D) and to better communicate with their fellow engineers.

How to debug remotely – a modern balancing act

Looking at the intersection of developers, support, code, and users, we’ve seen that there are many challenges that arise from the combination of the explosion of software and the growing trend of remote work. There’s no avoiding the question of what we can do to improve our debugging capabilities in this remote world. And for the best results we are required to balance bridging the gaps that are created by the new shift, while embracing the new mindsets that benefit from it.

Scale vs Familiarity

One of the main tradeoffs in moving to remote work is one of scale vs familiarity – while we lose the intimacy and closeness to both our code (few instances) and our customers (physically remote) we build and gain command and control solutions to manage things from afar and in greater scale. As such the first step should be to acknowledge this change and harness the power of tools and solutions that either have a scale mindset built into them, or empower our teams to adopt the scale mindset themselves.

A classic example would be to adopt microservices and distributed software principals, which at their core allow teams to break down software into smaller components, so they can be more easily deployed, scaled-up, but also debugged – by allowing multiple team-members to connect to multiple shared components, without having to synchronize the shared access.

Another example can be seen in a tool like Rookout, where scale and distribution is built into the fundamentals of the tool – flipping usual debugging flows on their head, and instead of having the developer/debugger connect (over the network) to each and every component they wish to debug, have all the components connect back to a command and control system, which allows developers to on-demand choose which components they wish to debug in an elastic system, without having to struggle with connections.

Screenshot from Rookout – showing distributed debugging filter

A Team effort – support and engineering

Another key area that is highlighted by the debugging relations matrix is team communication, and specifically communication between support and R&D teams. As communication suffers we can bridge the gap by investing more in culture, methods, and tools to augment communications. These should surpass shallow ticketing approaches, and focus on conveying quality data – rich with details on incidents, software, and user behaviours. By enabling our team members to share live detailed full pictures, and specifically live debug sessions so they can handoff investigations with ease.

Example screenshot of a live sharable debug session in Rookout

Summary

As the explosion of software collides with a world adopting more and more remote work, many challenges arise for developing, maintaining, and debugging software. The effects can be seen all across the debugging relations matrix- from the developers themselves, the users, through the support teams, and of course the software itself.

Teams are now required to walk a fine line – balancing between the bridging the gaps created by the shift (e.g. gaps in communication, data access / availability), and embracing new mindsets that benefit from it (e.g. Distributed software and scale, remote access and control, shareable interfaces and sessions).

Indeed a challenging tightrope balancing act, but those who perfect it won’t only survive the shift, they will lead the future.

]]>9128kSQLDB: Kafka Streaming Interface with Michael Drogalishttp://softwareengineeringdaily.com/2020/04/07/ksqldb-kafka-streaming-interface-with-michael-drogalis/?utm_source=rss&utm_medium=rss&utm_campaign=ksqldb-kafka-streaming-interface-with-michael-drogalis
Tue, 07 Apr 2020 09:00:43 +0000http://softwareengineeringdaily.com/?p=9146Kafka is a distributed stream processing system that is commonly used for storing large volumes of append-only event data. Kafka has been open source for almost a decade, and as the project has matured, it has been used for new kinds of applications. Kafka’s pubsub interface for writing and reading topics is not ideal for

Kafka is a distributed stream processing system that is commonly used for storing large volumes of append-only event data. Kafka has been open source for almost a decade, and as the project has matured, it has been used for new kinds of applications.

Kafka’s pubsub interface for writing and reading topics is not ideal for all of these applications, which has led to the creation of ksqlDB, a database system built for streaming applications that uses Kafka as the underlying infrastructure for storing data.

Michael Drogalis is a principal product manager at Confluent, where he helped develop ksqlDB. Michael joins the show to discuss ksqlDB, including the architecture, the query semantics, and the applications which might want a database that focuses on streams. We have done many great shows on Kafka in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about Kafka, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.

]]>Kafka is a distributed stream processing system that is commonly used for storing large volumes of append-only event data. Kafka has been open source for almost a decade, and as the project has matured, it has been used for new kinds of applications.Kafka is a distributed stream processing system that is commonly used for storing large volumes of append-only event data. Kafka has been open source for almost a decade, and as the project has matured, it has been used for new kinds of applications. Kafka’s pubsub interface for writing and reading topics is not ideal forSoftware Engineering Daily48:219146Godot Game Engine with Juan Linietskyhttp://softwareengineeringdaily.com/2020/04/06/godot-game-engine-with-juan-linietsky/?utm_source=rss&utm_medium=rss&utm_campaign=godot-game-engine-with-juan-linietsky
Mon, 06 Apr 2020 09:00:39 +0000http://softwareengineeringdaily.com/?p=9143Building a game is not easy. The development team needs to figure out a unique design and gameplay mechanics that will attract players. There is a great deal of creative work that goes into making a game successful, and these games are often built with low budgets by people who are driven by the art

Building a game is not easy. The development team needs to figure out a unique design and gameplay mechanics that will attract players. There is a great deal of creative work that goes into making a game successful, and these games are often built with low budgets by people who are driven by the art and passion of game creation.

A game engine is a system used to build and run games. Game engines let the programmer work at a high level of abstraction, by providing interfaces for graphics, physics, and scripting. Popular game engines include Unreal Engine and Unity, both of which require a license that reduces the amount of money received by the game developer.

Godot is an open source and free to use game engine. The project was started by Juan Linietsky, who joins the show to discuss his motivation for making Godot. We have done some great shows on gaming in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about game development, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.

]]>Building a game is not easy. The development team needs to figure out a unique design and gameplay mechanics that will attract players. There is a great deal of creative work that goes into making a game successful,Building a game is not easy. The development team needs to figure out a unique design and gameplay mechanics that will attract players. There is a great deal of creative work that goes into making a game successful, and these games are often built with low budgets by people who are driven by the artSoftware Engineering Daily54:549143V8 Lite with Ross McIlroyhttp://softwareengineeringdaily.com/2020/04/03/v8-lite-with-ross-mcilroy/?utm_source=rss&utm_medium=rss&utm_campaign=v8-lite-with-ross-mcilroy
Fri, 03 Apr 2020 09:00:09 +0000http://softwareengineeringdaily.com/?p=9135V8 is the JavaScript engine that runs Chrome. Every popular website makes heavy use of JavaScript, and V8 manages the execution environment of that code. The code that processes in your browser can run faster or slower depending on how “hot” the codepath is. If a certain line of code is executed frequently, that code

V8 is the JavaScript engine that runs Chrome. Every popular website makes heavy use of JavaScript, and V8 manages the execution environment of that code. The code that processes in your browser can run faster or slower depending on how “hot” the codepath is. If a certain line of code is executed frequently, that code might be optimized to run faster.

V8 is running behind the scenes in your browser all the time, evaluating the code in your different tabs and determining how to manage that runtime in memory. As V8 is observing your code and analyzing it, V8 needs to allocate resources in order to determine what code to optimize. This process can be quite memory intensive, and can add significant overhead to the memory overhead of Chrome.

Ross McIlroy is an engineer at Google, where he worked on a project called V8 Lite. The goal of V8 Lite was to significantly reduce the execution overhead of V8. Ross joins the show to talk about JavaScript memory consumption, and his work on V8 Lite. We have done some great shows on JavaScript in the past, which you can find on SoftwareDaily.com. Also, if you are interested in writing about JavaScript, we have a new writing feature that you can check out by going to SoftwareDaily.com/write.

]]>V8 is the JavaScript engine that runs Chrome. Every popular website makes heavy use of JavaScript, and V8 manages the execution environment of that code. The code that processes in your browser can run faster or slower depending on how “hot” the codepa...V8 is the JavaScript engine that runs Chrome. Every popular website makes heavy use of JavaScript, and V8 manages the execution environment of that code. The code that processes in your browser can run faster or slower depending on how “hot” the codepath is. If a certain line of code is executed frequently, that codeSoftware Engineering Daily55:149135Serverless Development with Jeremy Dalyhttp://softwareengineeringdaily.com/2020/04/02/serverless-development-with-jeremy-daly/?utm_source=rss&utm_medium=rss&utm_campaign=serverless-development-with-jeremy-daly
Thu, 02 Apr 2020 09:00:38 +0000http://softwareengineeringdaily.com/?p=9098Serverless tools have come a long way since the release of AWS Lambda in 2014. Serverless apps were originally architected around Lambda, with the functions-as-a-service being used to glue together larger pieces of functionality and API services. Today, many of the common AWS services such as API Gateway and DynamoDB have functionality built in to

]]>Serverless tools have come a long way since the release of AWS Lambda in 2014. Serverless apps were originally architected around Lambda, with the functions-as-a-service being used to glue together larger pieces of functionality and API services.

Today, many of the common AWS services such as API Gateway and DynamoDB have functionality built in to be able to respond to events. These services can use Amazon EventBridge to connect to each other. In many cases, a developer does not need AWS Lambda to glue services together in order to build an event-driven application.

Jeremy Daly is the host of the Serverless Chats podcast, a show about patterns and strategies in serverless architecture. Jeremy joins the show to talk about modern serverless development, and the new tools available in the AWS ecosystem.

]]>Serverless tools have come a long way since the release of AWS Lambda in 2014. Serverless apps were originally architected around Lambda, with the functions-as-a-service being used to glue together larger pieces of functionality and API services.Serverless tools have come a long way since the release of AWS Lambda in 2014. Serverless apps were originally architected around Lambda, with the functions-as-a-service being used to glue together larger pieces of functionality and API services. Today, many of the common AWS services such as API Gateway and DynamoDB have functionality built in toSoftware Engineering Daily59:239098Audio Data Engineering with Allison Kinghttp://softwareengineeringdaily.com/2020/04/01/audio-data-engineering-with-allison-king/?utm_source=rss&utm_medium=rss&utm_campaign=audio-data-engineering-with-allison-king
Wed, 01 Apr 2020 09:00:38 +0000http://softwareengineeringdaily.com/?p=9095Cortico is a non-profit that builds audio tools to improve public dialogue. Allison King is an engineer at Cortico, and she joins the show to talk about the process of building audio applications. One of these applications was a system for ingesting radio streams, transcribing the radio, and looking for duplicate information across the different

]]>Cortico is a non-profit that builds audio tools to improve public dialogue. Allison King is an engineer at Cortico, and she joins the show to talk about the process of building audio applications.

One of these applications was a system for ingesting radio streams, transcribing the radio, and looking for duplicate information across the different radio stations. In a talk at Data Council, Allison talked through the data engineering architecture for processing these radio streams, and the patterns that she found across the radio streams, including clusters of political leanings.

Another project from Cortico is called Local Voices Network. The Local Voices Network is built around a piece of hardware called a “digital hearth”, a specialized device that records discussions among people in a community. These community discussions are made available to journalists, public officials, and political candidates, creating a listening channel that connects these communities and stakeholders. Much of our conversation is focused on the engineering of the digital hearth, this device that sits in the center of community discussions.

]]>Cortico is a non-profit that builds audio tools to improve public dialogue. Allison King is an engineer at Cortico, and she joins the show to talk about the process of building audio applications. One of these applications was a system for ingesting r...Cortico is a non-profit that builds audio tools to improve public dialogue. Allison King is an engineer at Cortico, and she joins the show to talk about the process of building audio applications. One of these applications was a system for ingesting radio streams, transcribing the radio, and looking for duplicate information across the differentSoftware Engineering Daily53:029095Managing Cloud Data Services with Herokuhttp://softwareengineeringdaily.com/2020/03/31/managing-cloud-data-services-with-heroku/?utm_source=rss&utm_medium=rss&utm_campaign=managing-cloud-data-services-with-heroku
Tue, 31 Mar 2020 15:00:20 +0000http://softwareengineeringdaily.com/?p=9108Nearly all modern web applications depend on persisted data in order to function. Since the introduction of frameworks such as JavaScript in the late 1990s, developers have demanded more functionality from their web-based programs than traditional “static” websites could provide. Today, single-page applications (SPAs) such as Gmail provide a dynamic user experience by interacting with

]]>Nearly all modern web applications depend on persisted data in order to function. Since the introduction of frameworks such as JavaScript in the late 1990s, developers have demanded more functionality from their web-based programs than traditional “static” websites could provide. Today, single-page applications (SPAs) such as Gmail provide a dynamic user experience by interacting with the server to rewrite individual components of a page. Increasing demands for interactivity and a customized UX, along with a broadened horizon of what a website could or should do, meant that persistent data storage was more and more critical to a web application’s engineering.

Heroku was the first, and remains the most prominent, Layer 2 Cloud Provider. Heroku is a “Platform-as-a-Service” provider that builds upon the infrastructure of Layer 1 Cloud Providers, such as AWS, to create a streamlined, developer-first platform for the deployment and management of 12-Factor Web Apps. Heroku is a strong proponent of 12-Factor Web App best principles, and the 12-Factor “manifesto” was written by Heroku engineers.

One critical element of 12-Factor Web Apps is so-called “backing services;” that is, “any service the app consumes over the network as part of its normal operation.” Best practices dictate that these backing services- including databases- should be treated as “attached resources,” and an app’s code should be agnostic to whether the resource is accessed locally or over a network.

The principle of “loose coupling” of backing services comes with an implicit contract- that the increased flexibility will not create a trade-off in availability or durability. This is especially important for services such as attached databases, which provide critical data such as user account information. Users of web applications expect fast, accurate rendering of the data they expect to see.

Heroku provides several managed data resources, including PostgreSQL, Redis, and Kafka. In addition, the Heroku Elements Marketplace contains dozens of add-ons available to developers using Heroku’s platform-as-a-service offering.

Heroku’s flagship data management offering is Heroku Postgres. PostgreSQL is an open-source relational database management system (RDBMS) that has been widely adopted since its release in 1996 due to its support for a wide variety of data types, its ACID-compliant transactions, and its use of write-ahead logging to increase fault tolerance. Heroku adopted Postgres in 2007, and it continues to be the most popular data storage offering on the platform. Heroku Postgres allows users to manage schema migrations, database access controls, and scaling from the Heroku platform. A Heroku Postgres database can be shared between several applications by a simple set of commands from the CLI. Heroku Postgres has a feature called rollback, which acts like a time machine for the database, allowing a developer to “roll back” the database to a previous point in time without affecting the present state of the database.

In addition to Postgres and the add-ons in the Elements Marketplace, Heroku offers official integrations with Redis and Kafka. Redis is a key-value store that supports a wide variety of abstract data types. While Redis traditionally holds all data in memory, Heroku Redis is configured to persist data to disk by using an Append-Only File (AOF) and maintaining a high-availability standby for failover. Heroku Redis also provides tools to federate data with Postgres; this ability to manage data from multiple sources in a streamlined fashion is another advantage of a platform-as-a-service offering abstracting away the work of creating a common data model across data sources.

Heroku also offers a managed Kafka service for streaming data. Apache Kafka is a distributed streaming platform which provides four core APIs (Producers, Consumers, Streams, and Connectors) which allow communication across a distributed system using an abstraction called a “topic.” A topic is a stream of records, created by Producers, which other members in a distributed application can subscribe to (these are the Consumers). Kafka builds on the concept of an event-driven architecture (EDA), which uses messages between services as the drivers of application state. Kafka also acts as a transport for large volumes of immutable event streams, making it a key tool for real-time data streaming and parallel processing of Big Data. Kafka acts as a “distributed commit log”, storing key-value records of these messages across several nodes in a cluster. Kafka also works hand-in-hand with Zookeeper, which helps orchestrate nodes across the cluster and perform failover migration.

As distributed applications scale, the management complexity increases rapidly. As before, Heroku’s focus is on streamlining the management of data resources. Heroku Kafka allows management of Kafka through the web platform or the CLI, while lower-level configuration tasks are abstracted away. Heroku Kafka allows straightforward monitoring of Kafka clusters, and is built to be secure and compliant with regulations involving streaming of Personal Identifiable Information (PII).

For developers building 12-Factor Web Apps- or any cloud-based applications which can benefit from a streamlined development-to-production workflow- Heroku’s data management tools offer significant benefits. Less time spent in configuration can equate to more time spent coding the application itself. For more information on Heroku Postgres, we did a deep dive on the subject with Jon Daniel, an infrastructure engineer at Heroku. For more information on Heroku, check out their website, or visit our Heroku archives at SoftwareDaily.com.

]]>9108Facebook Messenger Engineering with Mohsen Agsenhttp://softwareengineeringdaily.com/2020/03/31/facebook-messenger-engineering-with-mohsen-agsen/?utm_source=rss&utm_medium=rss&utm_campaign=facebook-messenger-engineering-with-mohsen-agsen
Tue, 31 Mar 2020 09:00:57 +0000http://softwareengineeringdaily.com/?p=9092Facebook Messenger is a chat application that millions of people use every day to talk to each other. Over time, Messenger has grown to include group chats, video chats, animations, facial filters, stories, and many more features. Messenger is a tool for utility as well as for entertainment. Messenger is used both on mobile and

]]>Facebook Messenger is a chat application that millions of people use every day to talk to each other. Over time, Messenger has grown to include group chats, video chats, animations, facial filters, stories, and many more features. Messenger is a tool for utility as well as for entertainment.

Messenger is used both on mobile and on desktop, but the size of the mobile application is particularly important on mobile. There are many users who are on devices that do not have much storage space.

As Messenger has accumulated features, the iOS code base has grown larger and larger. Several generations of Facebook engineers have rotated through the company with the responsibility of working on Facebook Messenger, which has led to different ways of managing information within the same codebase. The iOS codebase had room for improvement.

Project Lightspeed was a project within Facebook that had the goal of making Messenger on iOS much smaller. Mohsen Agsen is an engineer with Facebook, and he joins the show to talk about the process of rewriting the Messenger app.

]]>Facebook Messenger is a chat application that millions of people use every day to talk to each other. Over time, Messenger has grown to include group chats, video chats, animations, facial filters, stories, and many more features.Facebook Messenger is a chat application that millions of people use every day to talk to each other. Over time, Messenger has grown to include group chats, video chats, animations, facial filters, stories, and many more features. Messenger is a tool for utility as well as for entertainment. Messenger is used both on mobile andSoftware Engineering Daily1:01:229092Pika Dependency Management with Fred Schotthttp://softwareengineeringdaily.com/2020/03/30/pika-dependency-management-with-fred-schott/?utm_source=rss&utm_medium=rss&utm_campaign=pika-dependency-management-with-fred-schott
Mon, 30 Mar 2020 09:00:02 +0000http://softwareengineeringdaily.com/?p=9087Modern web development involves a complicated toolchain for managing dependencies. One part of this toolchain is the bundler, a tool that puts all your code and dependencies together into static asset files. The most popular bundler is webpack, which was originally released in 2012, before browsers widely supported ES Modules. Today, every major browser supports

]]>Modern web development involves a complicated toolchain for managing dependencies. One part of this toolchain is the bundler, a tool that puts all your code and dependencies together into static asset files. The most popular bundler is webpack, which was originally released in 2012, before browsers widely supported ES Modules.

Today, every major browser supports the ES Module system, which improves the efficiency of JavaScript dependency management. Snowpack is a system for managing dependencies that takes advantage of the browser support for ES Modules. Snowpack is made by Pika, a company that is developing a set of web technologies including a CDN, a package catalog, and a package code editor.

Fred Schott is the founder of Pika and the creator of Snowpack. Fred joins the show to talk about his goals with Pika, and the ways in which modern web development is changing.

]]>Modern web development involves a complicated toolchain for managing dependencies. One part of this toolchain is the bundler, a tool that puts all your code and dependencies together into static asset files. The most popular bundler is webpack,Modern web development involves a complicated toolchain for managing dependencies. One part of this toolchain is the bundler, a tool that puts all your code and dependencies together into static asset files. The most popular bundler is webpack, which was originally released in 2012, before browsers widely supported ES Modules. Today, every major browser supportsSoftware Engineering Daily1:00:219087Cloud Kitchen Platform with Ashley Colpaarthttp://softwareengineeringdaily.com/2020/03/27/cloud-kitchen-platform-with-ashley-colpaart/?utm_source=rss&utm_medium=rss&utm_campaign=cloud-kitchen-platform-with-ashley-colpaart
Fri, 27 Mar 2020 09:00:55 +0000http://softwareengineeringdaily.com/?p=9073Food delivery apps have changed how the restaurant world operates. After seven years of mobile food delivery, the volume of food ordered through these apps has become so large that entire restaurants can be sustained solely through the order flow that comes in from the apps. This raises the question as to why you even

]]>Food delivery apps have changed how the restaurant world operates. After seven years of mobile food delivery, the volume of food ordered through these apps has become so large that entire restaurants can be sustained solely through the order flow that comes in from the apps. This raises the question as to why you even need an “on-prem” restaurant.

A cloud kitchen is a large, shared kitchen where food is prepared for virtual restaurants. These virtual restaurants exist only on mobile apps. There are no waiters, there are only the food delivery couriers who pick up the food from these warehouse-sized food preparation facilities.

A virtual restaurant entrepreneur could open up multiple restaurants operated from the same cloud kitchen. The mobile app user might see separate restaurant listings for a pizza place, a cookie bakery, and a Thai food restaurant, when all of them are operated by the same restaurateur.

Ashley Colpaart is the founder of The Food Corridor, a system for cloud kitchen management. Ashley joins the show to talk about the dynamics of virtual restaurants and the cloud kitchen industry.

]]>Food delivery apps have changed how the restaurant world operates. After seven years of mobile food delivery, the volume of food ordered through these apps has become so large that entire restaurants can be sustained solely through the order flow that ...Food delivery apps have changed how the restaurant world operates. After seven years of mobile food delivery, the volume of food ordered through these apps has become so large that entire restaurants can be sustained solely through the order flow that comes in from the apps. This raises the question as to why you evenSoftware Engineering Daily50:459073Remote Team Management with Ryan Chartrandhttp://softwareengineeringdaily.com/2020/03/26/remote-team-management-with-ryan-chartrand/?utm_source=rss&utm_medium=rss&utm_campaign=remote-team-management-with-ryan-chartrand
Thu, 26 Mar 2020 09:00:59 +0000http://softwareengineeringdaily.com/?p=9070Remote engineering work makes some elements of software development harder, and some elements easier. With Slack and email, communication becomes more clear cut. Project management tools lay out the responsibilities and deliverables of each person. GitHub centralizes and defines the roles of developers. On the other hand, remote work subtracts the role of nuanced conversation.

]]>Remote engineering work makes some elements of software development harder, and some elements easier. With Slack and email, communication becomes more clear cut. Project management tools lay out the responsibilities and deliverables of each person. GitHub centralizes and defines the roles of developers.

On the other hand, remote work subtracts the role of nuanced conversation. There is no water cooler or break room. Work can become systematic, rigid, and completely transactional. Your co-workers are your allies, but they feel less like friends when you don’t see them every day. For some people, this can have a devastating long-term impact on their psyche.

Managers have the responsibility of ensuring the health and productivity of the people that work with them. Managing an all-remote team includes a different set of challenges than an in-person team.

Ryan Chartrand is the CEO of X-Team, a team of developers who work across the world and collaborate with each other remotely. X-Team partners with large companies who need additional development work. Ryan joins the show to talk about the dynamics of leading a large remote workforce, as well as his own personal experiences working remotely.

]]>Remote engineering work makes some elements of software development harder, and some elements easier. With Slack and email, communication becomes more clear cut. Project management tools lay out the responsibilities and deliverables of each person.Remote engineering work makes some elements of software development harder, and some elements easier. With Slack and email, communication becomes more clear cut. Project management tools lay out the responsibilities and deliverables of each person. GitHub centralizes and defines the roles of developers. On the other hand, remote work subtracts the role of nuanced conversation.Software Engineering Daily58:439070Sorbet: Typed Ruby with Dmitry Petrashkohttp://softwareengineeringdaily.com/2020/03/25/sorbet-typed-ruby-with-dmitry-petrashko/?utm_source=rss&utm_medium=rss&utm_campaign=sorbet-typed-ruby-with-dmitry-petrashko
Wed, 25 Mar 2020 09:00:47 +0000http://softwareengineeringdaily.com/?p=9061Programming languages are dynamically typed or statically typed. In a dynamically typed language, the programmer does not need to declare if a variable is an integer, string, or other type. In a statically typed language, the developer must declare the type of the variable upfront, so that the compiler can take advantage of that information.

]]>Programming languages are dynamically typed or statically typed. In a dynamically typed language, the programmer does not need to declare if a variable is an integer, string, or other type. In a statically typed language, the developer must declare the type of the variable upfront, so that the compiler can take advantage of that information.

Dynamically typed languages give a programmer flexibility and fast iteration speed. But they also introduce the possibility of errors that can be avoided by performing type checking. This is one of the reasons why TypeScript has risen in popularity, giving developers the option to add types to their JavaScript variables.

Sorbet is a typechecker for Ruby. Sorbet allows for gradual typing of Ruby programs, which helps engineers avoid errors that might otherwise be caused by the dynamic type system. Dmitry Petrashko is an engineer at Stripe who helped build Sorbet. He has significant experience in compilers, having worked on Scala before his time at Stripe. Dmitry joins the show to discuss his work on Sorbet, and the motivation for adding type checking to Ruby.

We realize right now humanity is going through a hard time with the Caronovirus pandemic, but we all have skills useful to fight this battle. Head over to codevid19.com to join the world’s largest pandemic hackathon!

]]>Programming languages are dynamically typed or statically typed. In a dynamically typed language, the programmer does not need to declare if a variable is an integer, string, or other type. In a statically typed language,Programming languages are dynamically typed or statically typed. In a dynamically typed language, the programmer does not need to declare if a variable is an integer, string, or other type. In a statically typed language, the developer must declare the type of the variable upfront, so that the compiler can take advantage of that information.Software Engineering Daily56:419061Datomic Architecture with Marshall Thompsonhttp://softwareengineeringdaily.com/2020/03/24/datomic-architecture-with-marshall-thompson/?utm_source=rss&utm_medium=rss&utm_campaign=datomic-architecture-with-marshall-thompson
Tue, 24 Mar 2020 09:00:11 +0000http://softwareengineeringdaily.com/?p=9057Datomic is a database system based on an append-only record keeping system. Datomic users can query the complete history of the database, and Datomic has ACID transactional support. The data within Datomic is stored in an underlying database system such as Cassandra or Postgres. The database is written in Clojure, and was co-authored by the

]]>Datomic is a database system based on an append-only record keeping system. Datomic users can query the complete history of the database, and Datomic has ACID transactional support. The data within Datomic is stored in an underlying database system such as Cassandra or Postgres. The database is written in Clojure, and was co-authored by the creator of Clojure, Rich Hickey.

Datomic has a unique architecture, with a component called a Peer, which gets embedded in an application backend. A Peer stores a subset of the database data in memory in this application backend, improving the latency of database queries that hit this caching layer.

Marshall Thompson works at Cognitect, the company that supports and sells the Datomic database. Marshall joins the show to talk about the architecture of Datomic, its applications, and the life of a query against the database.

]]>Datomic is a database system based on an append-only record keeping system. Datomic users can query the complete history of the database, and Datomic has ACID transactional support. The data within Datomic is stored in an underlying database system suc...Datomic is a database system based on an append-only record keeping system. Datomic users can query the complete history of the database, and Datomic has ACID transactional support. The data within Datomic is stored in an underlying database system such as Cassandra or Postgres. The database is written in Clojure, and was co-authored by theSoftware Engineering Daily1:03:349057Google Cloud Networking with Lakshmi Sharmahttp://softwareengineeringdaily.com/2020/03/23/google-cloud-networking-with-lakshmi-sharma/?utm_source=rss&utm_medium=rss&utm_campaign=google-cloud-networking-with-lakshmi-sharma
Mon, 23 Mar 2020 09:00:09 +0000http://softwareengineeringdaily.com/?p=9051A large cloud provider has high volumes of network traffic moving through data centers throughout the world. These providers manage the infrastructure for thousands of companies, across racks and racks of multitenant servers, and cables that stretch underseas, connecting network packets with their destination. Google Cloud Platform has grown steadily into a wide range of

]]>A large cloud provider has high volumes of network traffic moving through data centers throughout the world. These providers manage the infrastructure for thousands of companies, across racks and racks of multitenant servers, and cables that stretch underseas, connecting network packets with their destination.

Google Cloud Platform has grown steadily into a wide range of products, including database services, machine learning, and containerization. Scaling a cloud provider requires both technical expertise and skillful management.

Lakshmi Sharma is the director of product management for networking at Google Cloud Platform. She joins the show to discuss the engineering challenges of building a large scale cloud provider, including reliability, programmability, and how to direct a large hierarchical team.

]]>A large cloud provider has high volumes of network traffic moving through data centers throughout the world. These providers manage the infrastructure for thousands of companies, across racks and racks of multitenant servers,A large cloud provider has high volumes of network traffic moving through data centers throughout the world. These providers manage the infrastructure for thousands of companies, across racks and racks of multitenant servers, and cables that stretch underseas, connecting network packets with their destination. Google Cloud Platform has grown steadily into a wide range ofSoftware Engineering Daily49:549051ClickUp Engineering with Zeb Evans and Alex Yurkowskihttp://softwareengineeringdaily.com/2020/03/20/clickup-engineering-with-zeb-evans-and-alex-yurkowski/?utm_source=rss&utm_medium=rss&utm_campaign=clickup-engineering-with-zeb-evans-and-alex-yurkowski
Fri, 20 Mar 2020 09:00:57 +0000http://softwareengineeringdaily.com/?p=9025Over the last fifteen years, there has been a massive increase in the number of new software tools. This is true at the infrastructure layer: there are more databases, more cloud providers, and more open-source projects. And it’s also true at a higher level: there are more APIs, project management systems, and productivity tools. ClickUp

]]>Over the last fifteen years, there has been a massive increase in the number of new software tools. This is true at the infrastructure layer: there are more databases, more cloud providers, and more open-source projects. And it’s also true at a higher level: there are more APIs, project management systems, and productivity tools.

ClickUp is a project management and productivity system for organizations and individuals. The goal of ClickUp is to create a system that integrates closely with other project management systems, popular SaaS tools, and the Google Suite of docs and spreadsheets. The company was started in 2016, and despite raising zero outside capital, it has grown as rapidly as many venture-backed companies.

Zeb Evans and Alex Yurkowski are the founders of ClickUp. They join the show to talk about their experience building the company. We talk through their process of scaling the infrastructure, and their philosophy of moving fast. This episode has some useful strategic advice for anyone who is looking to take a product to market and iterate quickly–even if that product is bootstrapped. Full disclosure: ClickUp is a sponsor of Software Engineering Daily.

]]>Over the last fifteen years, there has been a massive increase in the number of new software tools. This is true at the infrastructure layer: there are more databases, more cloud providers, and more open-source projects.Over the last fifteen years, there has been a massive increase in the number of new software tools. This is true at the infrastructure layer: there are more databases, more cloud providers, and more open-source projects. And it’s also true at a higher level: there are more APIs, project management systems, and productivity tools. ClickUpSoftware Engineering Daily1:05:339025Pulumi: Infrastructure as Code with Joe Duffyhttp://softwareengineeringdaily.com/2020/03/19/pulumi-infrastructure-as-code-with-joe-duffy/?utm_source=rss&utm_medium=rss&utm_campaign=pulumi-infrastructure-as-code-with-joe-duffy
Thu, 19 Mar 2020 09:00:12 +0000http://softwareengineeringdaily.com/?p=9022Infrastructure-as-code allows developers to use programming languages to define the architecture of their software deployments, including servers, load balancers, and databases. There have been several generations of infrastructure-as-code tools. Systems such as Chef, Puppet, Salt, and Ansible provided a domain-specific imperative scripting language that became popular along with the early growth of Amazon Web Services.

]]>Infrastructure-as-code allows developers to use programming languages to define the architecture of their software deployments, including servers, load balancers, and databases.

There have been several generations of infrastructure-as-code tools. Systems such as Chef, Puppet, Salt, and Ansible provided a domain-specific imperative scripting language that became popular along with the early growth of Amazon Web Services. Hashicorp’s Terraform project created an open source declarative model for infrastructure. Kubernetes YAML definitions are also a declarative system for infrastructure as code.

Pulumi is a company that offers a newer system for infrastructure as code, combining declarative and imperative syntax. Pulumi programs can be written in TypeScript, Python, Go, or .NET. Joe Duffy is the CEO of Pulumi, and he joins the show to talk about his work on the Pulumi project and his vision for the company. Joe also discusses his twelve years at Microsoft, and how his work in programming language tooling shaped how he thinks about building infrastructure-as-code.

]]>Infrastructure-as-code allows developers to use programming languages to define the architecture of their software deployments, including servers, load balancers, and databases. There have been several generations of infrastructure-as-code tools.Infrastructure-as-code allows developers to use programming languages to define the architecture of their software deployments, including servers, load balancers, and databases. There have been several generations of infrastructure-as-code tools. Systems such as Chef, Puppet, Salt, and Ansible provided a domain-specific imperative scripting language that became popular along with the early growth of Amazon Web Services.Software Engineering Daily59:439022Infrastructure Investing with Vivek Saraswathttp://softwareengineeringdaily.com/2020/03/18/infrastructure-investing-with-vivek-saraswat/?utm_source=rss&utm_medium=rss&utm_campaign=infrastructure-investing-with-vivek-saraswat
Wed, 18 Mar 2020 09:00:56 +0000http://softwareengineeringdaily.com/?p=9016Software investing requires a deep understanding of the market, and an ability to predict what changes might occur in the near future. At the level of core infrastructure, software investing is particularly difficult. Databases, virtualization, and large scale data processing tools are all complicated, highly competitive areas. As the software world has matured, it has

]]>Software investing requires a deep understanding of the market, and an ability to predict what changes might occur in the near future. At the level of core infrastructure, software investing is particularly difficult. Databases, virtualization, and large scale data processing tools are all complicated, highly competitive areas.

As the software world has matured, it has become apparent just how big these infrastructure companies can become. Consequently, the opportunities to invest in these infrastructure companies have become highly competitive.

When a venture capital fund invests into an infrastructure company, the fund will then help the infrastructure company bring their product to market. This involves figuring out the product design, the sales strategy, and the hiring roadmap. A strong investor will be able to give insight into all of these different facets of building a software company.

Vivek Saraswat is a venture investor with Mayfield, a venture fund that focuses on early to growth-stage investments. Vivek joins the show to discuss his experience at AWS, Docker, and Mayfield, as well as his broad lessons around how to build infrastructure companies today.

]]>Software investing requires a deep understanding of the market, and an ability to predict what changes might occur in the near future. At the level of core infrastructure, software investing is particularly difficult. Databases, virtualization,Software investing requires a deep understanding of the market, and an ability to predict what changes might occur in the near future. At the level of core infrastructure, software investing is particularly difficult. Databases, virtualization, and large scale data processing tools are all complicated, highly competitive areas. As the software world has matured, it hasSoftware Engineering Daily1:22:519016Sisu Data with Peter Bailishttp://softwareengineeringdaily.com/2020/03/17/sisu-data-with-peter-bailis/?utm_source=rss&utm_medium=rss&utm_campaign=sisu-data-with-peter-bailis
Tue, 17 Mar 2020 09:00:47 +0000http://softwareengineeringdaily.com/?p=9013A high volume of data can contain a high volume of useful information. That fact is well understood by the software world. Unfortunately, it is not a simple process to surface useful information from this high volume of data. A human analyst needs to understand the business, formulate a question, and determine what metrics could

]]>A high volume of data can contain a high volume of useful information. That fact is well understood by the software world. Unfortunately, it is not a simple process to surface useful information from this high volume of data. A human analyst needs to understand the business, formulate a question, and determine what metrics could reveal the answer to such a question.

Sisu is a system for automatically surfacing insights from large data sets within companies. A user of Sisu can select a database column that they are interested in learning more about, and Sisu will automatically analyze the records in the database to look for trends and relationships between that column and the other columns.

For example, if I have a database of user purchases, including how much money those users spent on each purchase, I can ask Sisu to analyze the purchase price column, and find what kinds of attributes correlate with a high purchase price. Perhaps there will be correlations such as age and city that I can use to understand my customers better. Sisu can automatically surface these correlations and display them to me to help me make business decisions.

Peter Bailis is the CEO of Sisu Data and an assistant professor at Stanford. Peter joins the show to give his perspective on the development of Sisu, which came out of his research on data-intensive systems, including MacroBase, an analytic monitoring engine that prioritizes human attention.

]]>A high volume of data can contain a high volume of useful information. That fact is well understood by the software world. Unfortunately, it is not a simple process to surface useful information from this high volume of data.A high volume of data can contain a high volume of useful information. That fact is well understood by the software world. Unfortunately, it is not a simple process to surface useful information from this high volume of data. A human analyst needs to understand the business, formulate a question, and determine what metrics couldSoftware Engineering Daily59:449013Location Data with Ryan Fox Squirehttp://softwareengineeringdaily.com/2020/03/16/location-data-with-ryan-fox-squire/?utm_source=rss&utm_medium=rss&utm_campaign=location-data-with-ryan-fox-squire
Mon, 16 Mar 2020 09:00:37 +0000http://softwareengineeringdaily.com/?p=9033Physical places have a large amount of latent data. Pick any location on a map, and think about all of the questions you could ask about that location. What businesses are at that location? How many cars pass through it? What is the soil composition? How much is the land on that location worth? The

]]>Physical places have a large amount of latent data. Pick any location on a map, and think about all of the questions you could ask about that location. What businesses are at that location? How many cars pass through it? What is the soil composition? How much is the land on that location worth?

The world of web-based information has become easy to query. We can use search engines like Google, as well as APIs like Diffbot and Clearbit. Today, the physical world is not so easy to query, but it is becoming easier. Location data as a service is a burgeoning field, with some vendors offering products for satellite data, foot traffic, and other specific location-based domains.

SafeGraph is a company that provides location data-as-a-service. SafeGraph data sets include data about businesses, patterns describing human movement, and geometric representations describing the shape and size of buildings. Ryan Fox Squire develops data products for SafeGraph, and he joins the show to talk about the engineering and strategy that goes into building a data-as-a-service company.

]]>Physical places have a large amount of latent data. Pick any location on a map, and think about all of the questions you could ask about that location. What businesses are at that location? How many cars pass through it? What is the soil composition?Physical places have a large amount of latent data. Pick any location on a map, and think about all of the questions you could ask about that location. What businesses are at that location? How many cars pass through it? What is the soil composition? How much is the land on that location worth? TheSoftware Engineering Daily56:419033Descript with Andrew Masonhttp://softwareengineeringdaily.com/2020/03/13/descript-with-andrew-mason/?utm_source=rss&utm_medium=rss&utm_campaign=descript-with-andrew-mason
Fri, 13 Mar 2020 09:00:18 +0000http://softwareengineeringdaily.com/?p=8990Descript is a software product for editing podcasts and video. Descript is a deceptively powerful tool, and its software architecture includes novel usage of transcription APIs, text-to-speech, speech-to-text, and other domain-specific machine learning applications. Some of the most popular podcasts and YouTube channels use Descript as their editing tool because it provides a set of

Descript is a deceptively powerful tool, and its software architecture includes novel usage of transcription APIs, text-to-speech, speech-to-text, and other domain-specific machine learning applications. Some of the most popular podcasts and YouTube channels use Descript as their editing tool because it provides a set of features that are not found in other editing tools such as Adobe Premiere or a digital audio workstation.

Descript is an example of the downstream impact of machine learning tools becoming more accessible. Even though the company only has a small team of machine learning engineers, these engineers are extremely productive due to the combination of APIs, cloud computing, and frameworks like TensorFlow.

Descript was founded by Andrew Mason, who also founded Groupon and Detour, and Andrew joins the show to describe the technology behind Descript and the story of how it was built. It is a remarkable story of creative entrepreneurship, with numerous takeaways for both engineers and business founders.

]]>Descript is a software product for editing podcasts and video. Descript is a deceptively powerful tool, and its software architecture includes novel usage of transcription APIs, text-to-speech, speech-to-text,Descript is a software product for editing podcasts and video. Descript is a deceptively powerful tool, and its software architecture includes novel usage of transcription APIs, text-to-speech, speech-to-text, and other domain-specific machine learning applications. Some of the most popular podcasts and YouTube channels use Descript as their editing tool because it provides a set ofSoftware Engineering Daily50:248990Flyte: Lyft Data Processing Platform with Allyson Gale and Ketan Umarehttp://softwareengineeringdaily.com/2020/03/12/flyte-lyft-data-processing-platform-with-allyson-gale-and-ketan-umare/?utm_source=rss&utm_medium=rss&utm_campaign=flyte-lyft-data-processing-platform-with-allyson-gale-and-ketan-umare
Thu, 12 Mar 2020 09:00:08 +0000http://softwareengineeringdaily.com/?p=8986Lyft is a ridesharing company that generates a high volume of data every day. This data includes ride history, pricing information, mapping, routing, and financial transactions. The data is stored across a variety of different databases, data lakes, and queueing systems, and is processed at scale in order to generate machine learning models, reports, and

]]>Lyft is a ridesharing company that generates a high volume of data every day.

This data includes ride history, pricing information, mapping, routing, and financial transactions. The data is stored across a variety of different databases, data lakes, and queueing systems, and is processed at scale in order to generate machine learning models, reports, and data applications.

Data workflows involve a set of interconnected systems such as Kubernetes, Spark, Tensorflow, and Flink. In order for these systems to work together harmoniously, a workflow manager is often used to orchestrate them together. A workflow platform lets a data engineer have a high-level view into how data moves through the system, and can be used to reason about retries, resource utilization, and scalability.

Flyte is a data processing system built and open-sourced at Lyft. Allyson Gale and Ketan Umare work at Lyft, and they join the show to talk about how Flyte works, and why they needed to build a new workflow processing system when there are already tools available such as Airflow.

]]>Lyft is a ridesharing company that generates a high volume of data every day. This data includes ride history, pricing information, mapping, routing, and financial transactions. The data is stored across a variety of different databases, data lakes,Lyft is a ridesharing company that generates a high volume of data every day. This data includes ride history, pricing information, mapping, routing, and financial transactions. The data is stored across a variety of different databases, data lakes, and queueing systems, and is processed at scale in order to generate machine learning models, reports, andSoftware Engineering Daily1:01:128986Cloud Investing with Danel Dayanhttp://softwareengineeringdaily.com/2020/03/11/cloud-investing-with-danel-dayan/?utm_source=rss&utm_medium=rss&utm_campaign=cloud-investing-with-danel-dayan
Wed, 11 Mar 2020 09:00:31 +0000http://softwareengineeringdaily.com/?p=8983Cloud computing caused a fundamental economic shift in how software is built. Before the cloud, businesses needed to buy physical servers in order to operate. There was an up-front cost that often amounted to tens of thousands of dollars required to pay for these servers. Cloud computing changed the up-front capital expense to an ongoing

]]>Cloud computing caused a fundamental economic shift in how software is built. Before the cloud, businesses needed to buy physical servers in order to operate. There was an up-front cost that often amounted to tens of thousands of dollars required to pay for these servers.

Although the initial motivation for moving onto cloud providers might have been decreased cost, over time the cloud providers have developed unique services that make software even easier to build than before. There has also been a proliferation of new software infrastructure companies that have been built on top of the cloud providers, giving rise to new databases, logging companies, and platform-as-a-service products.

Danel Dayan is a venture investor with Battery Ventures and a co-author of the State of the OpenCloud 2019, a report that compiles a wide set of statistics and information on how cloud computing and open source are impacting the software industry. Danel joins the show to talk about his work as an investor, as well as his previous career at Google, where he worked on mergers and acquisitions.

If you want to reach Danel you can email him at ddayan@battery.com or tweet at him via @daneldayan.

The views expressed in this podcast are the interviewee’s own and not those of Battery Ventures or of any person or organization affiliated or doing business with Battery Ventures. Further, the information discussed in the podcast is not intended for use by any current or potential investor in any investment fund affiliated with Battery Ventures. For more information about Battery Ventures’ potential financing capabilities for prospective portfolio companies, please refer to our website. Matillion, Sumo Logic and Woven are Battery portfolio companies and are discussed for illustrative purposes only. For a full list of all Battery investments, please click here.

]]>Cloud computing caused a fundamental economic shift in how software is built. Before the cloud, businesses needed to buy physical servers in order to operate. There was an up-front cost that often amounted to tens of thousands of dollars required to pa...Cloud computing caused a fundamental economic shift in how software is built. Before the cloud, businesses needed to buy physical servers in order to operate. There was an up-front cost that often amounted to tens of thousands of dollars required to pay for these servers. Cloud computing changed the up-front capital expense to an ongoingSoftware Engineering Daily1:14:058983OneGraph: GraphQL Tooling with Sean Grovehttp://softwareengineeringdaily.com/2020/03/10/onegraph-graphql-tooling-with-sean-grove/?utm_source=rss&utm_medium=rss&utm_campaign=onegraph-graphql-tooling-with-sean-grove
Tue, 10 Mar 2020 09:00:44 +0000http://softwareengineeringdaily.com/?p=8979GraphQL is a system that allows frontend engineers to make requests across multiple data sources using a simple query format. In GraphQL, a frontend developer does not have to worry about the request logic for individual backend services. The frontend developer only needs to know how to issue GraphQL requests from the client, and these

]]>GraphQL is a system that allows frontend engineers to make requests across multiple data sources using a simple query format. In GraphQL, a frontend developer does not have to worry about the request logic for individual backend services. The frontend developer only needs to know how to issue GraphQL requests from the client, and these requests are handled by a GraphQL server.

GraphQL is mostly used to issue queries across internal databases and services. But many of the data sources that a company needs to query in modern infrastructure are not databases–they are APIs like Salesforce, Zendesk, and Stripe. These API companies might store a large percentage of the data that a given company needs to query, and executing queries, subscriptions, and joins against these APIs is not a simple task.

OneGraph is a company that builds integrations with third-party services and exposes them through a GraphQL interface. Sean Grove is a founder of OneGraph, and he joins the show to explain the problem that OneGraph solves, how OneGraph is built, and some of the difficult engineering challenges required to design OneGraph.

]]>GraphQL is a system that allows frontend engineers to make requests across multiple data sources using a simple query format. In GraphQL, a frontend developer does not have to worry about the request logic for individual backend services.GraphQL is a system that allows frontend engineers to make requests across multiple data sources using a simple query format. In GraphQL, a frontend developer does not have to worry about the request logic for individual backend services. The frontend developer only needs to know how to issue GraphQL requests from the client, and theseSoftware Engineering Daily1:19:148979DBT: Data Build Tool with Tristan Handyhttp://softwareengineeringdaily.com/2020/03/09/dbt-data-build-tool-with-tristan-handy/?utm_source=rss&utm_medium=rss&utm_campaign=dbt-data-build-tool-with-tristan-handy
Mon, 09 Mar 2020 09:00:59 +0000http://softwareengineeringdaily.com/?p=8976A data warehouse serves the purpose of providing low latency queries for high volumes of data. A data warehouse is often part of a data pipeline, which moves data through different areas of infrastructure in order to build applications such as machine learning models, dashboards, and reports. Modern data pipelines are often associated with the

]]>A data warehouse serves the purpose of providing low latency queries for high volumes of data. A data warehouse is often part of a data pipeline, which moves data through different areas of infrastructure in order to build applications such as machine learning models, dashboards, and reports.

Modern data pipelines are often associated with the term “ELT” or Extract, Load, Transform. In the “ELT” workflow, data is taken out of a source such as a data lake, loaded into a data warehouse, and then transformed within the data warehouse to create materialized views on the data. Data warehouse queries are usually written in SQL, and for the last 50 years, SQL has been the primary language for executing these kinds of queries.

DBT is a system for data modeling that allows the user to write queries that involve a mix of SQL and a templating language called Jinja. Jinja allows the analyst to blend imperative code along with the declarative SQL. Tristan Handy is the CEO of Fishtown Analytics, the company that created DBT, and he joins the show to discuss how DBT works, and the role it plays in modern data infrastructure.

]]>A data warehouse serves the purpose of providing low latency queries for high volumes of data. A data warehouse is often part of a data pipeline, which moves data through different areas of infrastructure in order to build applications such as machine ...A data warehouse serves the purpose of providing low latency queries for high volumes of data. A data warehouse is often part of a data pipeline, which moves data through different areas of infrastructure in order to build applications such as machine learning models, dashboards, and reports. Modern data pipelines are often associated with theSoftware Engineering Daily1:07:258976React Best Practices with Kent Doddshttp://softwareengineeringdaily.com/2020/03/06/react-best-practices-with-kent-dodds/?utm_source=rss&utm_medium=rss&utm_campaign=react-best-practices-with-kent-dodds
Fri, 06 Mar 2020 10:00:03 +0000http://softwareengineeringdaily.com/?p=8960ReactJS developers have lots of options for building their applications, and those options are not easy to work through. State management, concurrency, networking, and testing all have elements of complexity and a wide range of available tools. Take a look at any specific area of JavaScript application development, and you can find highly varied opinions.

]]>ReactJS developers have lots of options for building their applications, and those options are not easy to work through. State management, concurrency, networking, and testing all have elements of complexity and a wide range of available tools. Take a look at any specific area of JavaScript application development, and you can find highly varied opinions.

Kent Dodds is a JavaScript teacher who focuses on React, JavaScript, and testing. In today’s episode, Kent provides best practices for building JavaScript applications, specifically React. He provides a great deal of advice on testing, which is unsurprising considering he owns TestingJavaScript.com. Kent is an excellent speaker who has taught thousands of people about JavaScript, so it was a pleasure to have him on the show.

Kent is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.

]]>ReactJS developers have lots of options for building their applications, and those options are not easy to work through. State management, concurrency, networking, and testing all have elements of complexity and a wide range of available tools.ReactJS developers have lots of options for building their applications, and those options are not easy to work through. State management, concurrency, networking, and testing all have elements of complexity and a wide range of available tools. Take a look at any specific area of JavaScript application development, and you can find highly varied opinions.Software Engineering Daily58:438960React Stack with Tejas Kumarhttp://softwareengineeringdaily.com/2020/03/05/react-stack-with-tejas-kumar/?utm_source=rss&utm_medium=rss&utm_campaign=react-stack-with-tejas-kumar
Thu, 05 Mar 2020 10:00:15 +0000http://softwareengineeringdaily.com/?p=8956JavaScript fatigue. This phrase has been used to describe the confusion and exhaustion around the volume of different tools required to be productive as a JavaScript developer. Frameworks, package managers, typing systems, state management, GraphQL, and deployment systems–there are so many decisions to make. In addition to the present-day tooling choices, a JavaScript developer needs

]]>JavaScript fatigue. This phrase has been used to describe the confusion and exhaustion around the volume of different tools required to be productive as a JavaScript developer. Frameworks, package managers, typing systems, state management, GraphQL, and deployment systems–there are so many decisions to make.

In addition to the present-day tooling choices, a JavaScript developer needs to watch the emerging developments in the ecosystem. ReactJS is evolving at a rapid clip, and newer primitives such as React Hooks and React Suspense allow developers to handle concurrency and networking more robustly.

Tejas Kumar works with G2i, a company that connects React developers with organizations that are looking for high-quality engineers. His role at G2i is head of vetting, which requires him to assess engineers for their competency in JavaScript-related technologies. Tejas joins the show to discuss the modern stack of technologies that a React developer uses to build an application. Full disclosure: G2i, where Tejas works, is a sponsor of Software Engineering Daily.

Tejas is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.

]]>JavaScript fatigue. This phrase has been used to describe the confusion and exhaustion around the volume of different tools required to be productive as a JavaScript developer. Frameworks, package managers, typing systems, state management, GraphQL,JavaScript fatigue. This phrase has been used to describe the confusion and exhaustion around the volume of different tools required to be productive as a JavaScript developer. Frameworks, package managers, typing systems, state management, GraphQL, and deployment systems–there are so many decisions to make. In addition to the present-day tooling choices, a JavaScript developer needsSoftware Engineering Daily1:03:138956JavaScript Deployments with Brian LeRouxhttp://softwareengineeringdaily.com/2020/03/04/javascript-deployments-with-brian-leroux/?utm_source=rss&utm_medium=rss&utm_campaign=javascript-deployments-with-brian-leroux
Wed, 04 Mar 2020 10:00:38 +0000http://softwareengineeringdaily.com/?p=8950Full-stack JavaScript applications have been possible since the creation of NodeJS in 2009. Since then, the best practices for building and deploying these applications have steadily evolved with the technology. ReactJS created consolidation around the view layer. The emergence of AWS Lambda created a new paradigm for backend execution. Serverless tools such as DynamoDB offer

]]>Full-stack JavaScript applications have been possible since the creation of NodeJS in 2009. Since then, the best practices for building and deploying these applications have steadily evolved with the technology.

ReactJS created consolidation around the view layer. The emergence of AWS Lambda created a new paradigm for backend execution. Serverless tools such as DynamoDB offer autoscaling abstractions. CDNs such as Cloudflare and Fastly can now do processing on the edge.

Brian LeRoux is the founder of Begin.com, a hosting and deployment company built on serverless tools. He’s also the primary committer to Architect, a framework for defining applications to be deployed to serverless infrastructure. Brian joins the show to talk about his work in the JavaScript ecosystem and his vision for Begin.com.

Brian is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.

]]>Full-stack JavaScript applications have been possible since the creation of NodeJS in 2009. Since then, the best practices for building and deploying these applications have steadily evolved with the technology.Full-stack JavaScript applications have been possible since the creation of NodeJS in 2009. Since then, the best practices for building and deploying these applications have steadily evolved with the technology. ReactJS created consolidation around the view layer. The emergence of AWS Lambda created a new paradigm for backend execution. Serverless tools such as DynamoDB offerSoftware Engineering Daily1:08:498950React Fundamentals with Ryan Florencehttp://softwareengineeringdaily.com/2020/03/03/react-fundamentals-with-ryan-florence/?utm_source=rss&utm_medium=rss&utm_campaign=react-fundamentals-with-ryan-florence
Tue, 03 Mar 2020 10:00:06 +0000http://softwareengineeringdaily.com/?p=8945ReactJS began to standardize frontend web development around 2015. The core ideas around one-way data binding, JSX, and components caused many developers to embrace React with open arms. There has been a large number of educators that have emerged to help train developers wanting to learn React. A new developer learning React has numerous questions

]]>ReactJS began to standardize frontend web development around 2015. The core ideas around one-way data binding, JSX, and components caused many developers to embrace React with open arms. There has been a large number of educators that have emerged to help train developers wanting to learn React.

A new developer learning React has numerous questions around frameworks, state management, rendering, and other best practices. In today’s episode, those questions are answered by Ryan Florence, a co-founder of React Training.

React Training is a company devoted to helping developers learn React that trains large companies like Google and Netflix how to use React. Ryan has a strong understanding of how to be productive with React, and in today’s episode, he explains some of the fundamentals that commonly confuse new students of React.

Ryan is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.

]]>ReactJS began to standardize frontend web development around 2015. The core ideas around one-way data binding, JSX, and components caused many developers to embrace React with open arms. There has been a large number of educators that have emerged to h...ReactJS began to standardize frontend web development around 2015. The core ideas around one-way data binding, JSX, and components caused many developers to embrace React with open arms. There has been a large number of educators that have emerged to help train developers wanting to learn React. A new developer learning React has numerous questionsSoftware Engineering Daily55:118945NextJS with Guillermo Rauchhttp://softwareengineeringdaily.com/2020/03/02/nextjs-with-guillermo-rauch/?utm_source=rss&utm_medium=rss&utm_campaign=nextjs-with-guillermo-rauch
Mon, 02 Mar 2020 21:04:28 +0000http://softwareengineeringdaily.com/?p=8942When ReactJS became popular, frontend web development became easier. But React is just a view layer. Developers who came to React expecting a full web development framework like Ruby on Rails or Django were required to put together a set of tools to satisfy that purpose. A full-stack JavaScript framework has numerous requirements. How does

]]>When ReactJS became popular, frontend web development became easier. But React is just a view layer. Developers who came to React expecting a full web development framework like Ruby on Rails or Django were required to put together a set of tools to satisfy that purpose.

A full-stack JavaScript framework has numerous requirements. How does it scale? How does it handle server-side rendering versus client-side rendering? Should GraphQL be included by default? How should package management work?

Guillermo Rauch is the creator of NextJS, a popular framework for building React applications. He is also the CEO of ZEIT, a cloud hosting company. Guillermo joins the show to discuss NextJS, and his vision for how the React ecosystem will evolve in the near future, as features such as React Suspense and Concurrent Mode impact the developer experience.

Guillermo is also speaking at Reactathon, a San Francisco JavaScript conference taking place March 30th and 31st in San Francisco. This week we will be interviewing speakers from Reactathon, and if you are interested in JavaScript and the React ecosystem then stay tuned, and if you hear something you like, you can check out the Reactathon conference in person.

]]>When ReactJS became popular, frontend web development became easier. But React is just a view layer. Developers who came to React expecting a full web development framework like Ruby on Rails or Django were required to put together a set of tools to sa...When ReactJS became popular, frontend web development became easier. But React is just a view layer. Developers who came to React expecting a full web development framework like Ruby on Rails or Django were required to put together a set of tools to satisfy that purpose. A full-stack JavaScript framework has numerous requirements. How doesSoftware Engineering Daily1:22:188942Makerpad: Low Code Tools with Ben Tossellhttp://softwareengineeringdaily.com/2020/02/27/makerpad-low-code-tools-with-ben-tossell/?utm_source=rss&utm_medium=rss&utm_campaign=makerpad-low-code-tools-with-ben-tossell
Thu, 27 Feb 2020 23:06:26 +0000http://softwareengineeringdaily.com/?p=8914Low code tools can be used to build an increasing number of applications. Knowledge workers within a large corporation can use low code tools to augment their usage of spreadsheets. Entrepreneurs can use low code tools to start businesses even without knowing how to code. Modern low code tools have benefited from steady improvements in

]]>Low code tools can be used to build an increasing number of applications. Knowledge workers within a large corporation can use low code tools to augment their usage of spreadsheets. Entrepreneurs can use low code tools to start businesses even without knowing how to code.

Modern low code tools have benefited from steady improvements in cloud infrastructure, front-end frameworks like ReactJS, and browser technology such as the V8 JavaScript engine. These building blocks led to the popular low code products such as Webflow, Bubble, Retool, and Airtable. The low code products are supported by a broad selection of domain-specific APIs such as Stripe, Twilio, and Zapier.

Ben Tossell runs Makerpad, a site devoted to low-code and no-code applications. Makerpad describes how to use these tools to design sophisticated applications that don’t require you to write code. But they do require a different kind of software engineering. To create applications intelligently with low-code tools, you need to know how the tools fit together, and you need to be willing to persist through a process of iteration and debugging that is similar to traditional software engineering.

Ben joins the show to talk about his experience building low-code tools, the use cases for these tools, and his predictions for how they will impact the future of software.

]]>Low code tools can be used to build an increasing number of applications. Knowledge workers within a large corporation can use low code tools to augment their usage of spreadsheets. Entrepreneurs can use low code tools to start businesses even without ...Low code tools can be used to build an increasing number of applications. Knowledge workers within a large corporation can use low code tools to augment their usage of spreadsheets. Entrepreneurs can use low code tools to start businesses even without knowing how to code. Modern low code tools have benefited from steady improvements inSoftware Engineering Daily57:208914Slack Frontend Architecture with Anuj Nairhttp://softwareengineeringdaily.com/2020/02/27/slack-frontend-architecture-with-anuj-nair/?utm_source=rss&utm_medium=rss&utm_campaign=slack-frontend-architecture-with-anuj-nair
Thu, 27 Feb 2020 10:00:58 +0000http://softwareengineeringdaily.com/?p=8917Slack is a messaging application with millions of users. The desktop application is an Electron app, which is effectively a web browser dedicated to running Slack. This frontend is built with ReactJS and other JavaScript code, and the application is incredibly smooth and reliable, despite its complexity. When a user boots up Slack, the application

]]>Slack is a messaging application with millions of users. The desktop application is an Electron app, which is effectively a web browser dedicated to running Slack. This frontend is built with ReactJS and other JavaScript code, and the application is incredibly smooth and reliable, despite its complexity.

When a user boots up Slack, the application needs to figure out what data to fetch and where to fetch it from. Companies that use Slack heavily have thousands of messages in their history, and Slack needs to determine which of those should be pulled into the client. There are profile images, and logos, and custom emojis, all of which are used to define the user’s custom workspace experience.

Anuj Nair joined Slack in late 2017. In the years since he has been with the company, Anuj helped rewrite the Slack frontend client, including work on the bootup experience, the caching infrastructure, and the role of service workers. Anuj joins the show to discuss his work on the Slack frontend architecture and the canonical view layer problems that Slack faces.

]]>Slack is a messaging application with millions of users. The desktop application is an Electron app, which is effectively a web browser dedicated to running Slack. This frontend is built with ReactJS and other JavaScript code,Slack is a messaging application with millions of users. The desktop application is an Electron app, which is effectively a web browser dedicated to running Slack. This frontend is built with ReactJS and other JavaScript code, and the application is incredibly smooth and reliable, despite its complexity. When a user boots up Slack, the applicationSoftware Engineering Daily1:07:538917Parabola: No-Code Data Workflows with Alex Yaseenhttp://softwareengineeringdaily.com/2020/02/26/parabola-no-code-data-workflows-with-alex-yaseen/?utm_source=rss&utm_medium=rss&utm_campaign=parabola-no-code-data-workflows-with-alex-yaseen
Wed, 26 Feb 2020 18:08:13 +0000http://softwareengineeringdaily.com/?p=8910Every company has a large number of routine data workflows. These data workflows involve spreadsheets, CSV files, and tedious manual work to be done by a knowledge worker. For example, data might need to be taken from Salesforce, filtered for new customers, and piped into Mailchimp. Or perhaps you need to sort all your customers

]]>Every company has a large number of routine data workflows. These data workflows involve spreadsheets, CSV files, and tedious manual work to be done by a knowledge worker.

For example, data might need to be taken from Salesforce, filtered for new customers, and piped into Mailchimp. Or perhaps you need to sort all your customers to find only the ones who have spent more than $50.

These data workflows might require some basic knowledge of SQL, or an understanding of how to make an API request. Not everyone knows how to execute these technical commands. A software company can be slowed down due to a shortage of technical analysts who have the necessary programming skills to build these data workflows.

Parabola is a low-code tool for building data workflows. Parabola lets the user drag and drop different components together to build an application without using a programming language. Parabola lowers the technical barrier for knowledge workers who want to build these kinds of data workflows. Alex Yaseen is the CEO of Parabola, and he joins the show to talk about the ideas behind Parabola and his goals with the company.

Show Notes

]]>Every company has a large number of routine data workflows. These data workflows involve spreadsheets, CSV files, and tedious manual work to be done by a knowledge worker. For example, data might need to be taken from Salesforce,Every company has a large number of routine data workflows. These data workflows involve spreadsheets, CSV files, and tedious manual work to be done by a knowledge worker. For example, data might need to be taken from Salesforce, filtered for new customers, and piped into Mailchimp. Or perhaps you need to sort all your customersSoftware Engineering Daily1:03:008910Decentralized Finance with Tom Schmidthttp://softwareengineeringdaily.com/2020/02/25/decentralized-finance-with-tom-schmidt/?utm_source=rss&utm_medium=rss&utm_campaign=decentralized-finance-with-tom-schmidt
Tue, 25 Feb 2020 10:00:34 +0000http://softwareengineeringdaily.com/?p=8899Cryptocurrencies today serve two purposes: store of value and speculation. The application infrastructure that has been built around cryptocurrency is mostly to support these use cases. At some point in the future, perhaps cryptocurrencies can be used as a global medium of exchange that is accepted at the grocery store. Perhaps we will use the

]]>Cryptocurrencies today serve two purposes: store of value and speculation.

The application infrastructure that has been built around cryptocurrency is mostly to support these use cases. At some point in the future, perhaps cryptocurrencies can be used as a global medium of exchange that is accepted at the grocery store. Perhaps we will use the blockchain for supply chain management, and as a universal ledger for real estate ownership.

But today, cryptocurrencies are mostly used for speculative trading. Users buy and sell different cryptocurrencies and stablecoins, looking to make short-term profits. And the markets for trading cryptocurrencies have evolved to have a sophistication that looks like the centralized markets of derivatives and leverage-based day trading.

The term “decentralized finance” refers to this phenomenon of cryptocurrency lending markets. Decentralized finance increases the volume of speculated capital by providing liquidity through smart contracts. This short-term liquidity is often collateralized by a volatile cryptocurrency such as Ethereum, creating an opportunity for a type of market participant called a “liquidator.”

Tom Schmidt is an investor with Dragonfly Capital, a cryptoasset investment firm. Tom joins the show to describe the dynamics of decentralized finance.

]]>Cryptocurrencies today serve two purposes: store of value and speculation. The application infrastructure that has been built around cryptocurrency is mostly to support these use cases. At some point in the future,Cryptocurrencies today serve two purposes: store of value and speculation. The application infrastructure that has been built around cryptocurrency is mostly to support these use cases. At some point in the future, perhaps cryptocurrencies can be used as a global medium of exchange that is accepted at the grocery store. Perhaps we will use theSoftware Engineering Daily1:24:028899Infrastructure Management with Joey Parsonshttp://softwareengineeringdaily.com/2020/02/24/infrastructure-management-with-joey-parsons/?utm_source=rss&utm_medium=rss&utm_campaign=infrastructure-management-with-joey-parsons
Mon, 24 Feb 2020 10:00:02 +0000http://softwareengineeringdaily.com/?p=8896At Airbnb, infrastructure management is standardized across the organization. Platform engineering teams build tools that allow the other teams throughout the organization to work more effectively. A platform engineering team handles problems such as continuous integration, observability, and service discovery. Other teams throughout a company use the tools that a platform engineering team builds. For

]]>At Airbnb, infrastructure management is standardized across the organization. Platform engineering teams build tools that allow the other teams throughout the organization to work more effectively. A platform engineering team handles problems such as continuous integration, observability, and service discovery.

Other teams throughout a company use the tools that a platform engineering team builds. For example, there is a team at Airbnb that builds the search and discovery system that is used by customers who are looking for a place to stay. That team does not want to have to worry about how they are deploying, how their service is being logged, and how to scale up. All of that should be taken care of by the platform engineering team.

At a large company like Airbnb, there is so much happening across the infrastructure. Services are being deployed, services are having outages, databases are being resharded. With all of this change occurring, it can be difficult for a team to pinpoint the cause of a service outage. Digging through logs and dashboards is often insufficient.

Joey Parsons is the founder of Effx, a company that is building a platform for observing and managing the changes across the infrastructure. Effx is like a newsfeed for a service. An application instrumented with Effx gives the engineers a single endpoint that they can navigate to for understanding the history of their service.

Joey joins the show to talk about his experience as an infrastructure engineer at Airbnb, and how that experience informs the work of his new company, Effx.

]]>At Airbnb, infrastructure management is standardized across the organization. Platform engineering teams build tools that allow the other teams throughout the organization to work more effectively. A platform engineering team handles problems such as c...At Airbnb, infrastructure management is standardized across the organization. Platform engineering teams build tools that allow the other teams throughout the organization to work more effectively. A platform engineering team handles problems such as continuous integration, observability, and service discovery. Other teams throughout a company use the tools that a platform engineering team builds. ForSoftware Engineering Daily1:14:398896Courier with Troy Goodehttp://softwareengineeringdaily.com/2020/02/21/courier-with-troy-goode/?utm_source=rss&utm_medium=rss&utm_campaign=courier-with-troy-goode
Fri, 21 Feb 2020 10:00:56 +0000http://softwareengineeringdaily.com/?p=8866A gig economy application generates lots of notifications. There is SMS, mobile phone updates, emails, and native application updates. If you order a ride from Uber, you might receive a text message and a push notification at the same time. If an app overloads the user with notifications, the user might end up annoyed and

There is SMS, mobile phone updates, emails, and native application updates. If you order a ride from Uber, you might receive a text message and a push notification at the same time. If an app overloads the user with notifications, the user might end up annoyed and delete the app from their phone.

But perhaps all of these notifications are necessary. You would rather get three simultaneous notifications from your food delivery app than fail to get your food on time. If you are the mobile application developer building the food delivery app, what other choice do you have?

At large companies such as Linkedin, there are entire teams devoted to figuring out how to optimize the notifications that they send you. It has a surprisingly large impact on the usability of a mobile application. Troy Goode is the founder of Courier, a company that provides notification optimization.

This might sound like a small, trivial problem. But it actually has a large impact on the usage of apps. And it is not an easy engineering problem. Troy joins the show to talk about the problem that Courier solves and the backend infrastructure that powers it. Courier is built entirely on serverless APIs. This is a great case study in how to build a completely scalable infrastructure product based on serverless tools.

]]>A gig economy application generates lots of notifications. There is SMS, mobile phone updates, emails, and native application updates. If you order a ride from Uber, you might receive a text message and a push notification at the same time.A gig economy application generates lots of notifications. There is SMS, mobile phone updates, emails, and native application updates. If you order a ride from Uber, you might receive a text message and a push notification at the same time. If an app overloads the user with notifications, the user might end up annoyed andSoftware Engineering Daily1:16:138866LinkedIn Kafkahttp://softwareengineeringdaily.com/2020/02/20/linkedin-kafka/?utm_source=rss&utm_medium=rss&utm_campaign=linkedin-kafka
Thu, 20 Feb 2020 16:00:58 +0000http://softwareengineeringdaily.com/?p=8822This article is part 2 in a series about LinkedIn’s data journey. You can read the first part of LinkedIn’s data infrastructure here. It requires repeating that LinkedIn is a massive player in the software industry, in terms of the number of active users and website interactions. With a user base of over 675 million

It requires repeating that LinkedIn is a massive player in the software industry, in terms of the number of active users and website interactions. With a user base of over 675 million people and growing, the challenges LinkedIn engineers face in terms of the sheer amount of data are on a scale not commonly experienced in the industry. Over the years, these challenges have paved the way for various innovative methodologies and tools. Out of the tools that came from LinkedIn developers, the most famous one is Kafka.

Kafka forms the backbone of operations at LinkedIn. Most of the data communication between different services within the LinkedIn environment utilizes Kafka. It is used explicitly in use cases like database replication, stream processing, and data ingestion.

What is Kafka?

Kafka is a distributed streaming platform. This definition might be abstract, but it captures the core capabilities of Kafka: it is a platform dealing with streaming data as a distributed system. Streaming data is data that is constantly being generated by possibly numerous sources, continuously ordered in time, in contrast with what we might call batch data, which can be historical data that is usually stored in databases.

Kafka deals with records. A record is a single unit of information, a collection of bytes. It can have metadata as its key. Records that are produced for the same topic and partition are bundled in batches to reduce network latency costs.

Kafka handles this type of data with the concept of topics, which are in essence distributed commit logs that act as message queues. A producer writes data into topics, which can be partitioned and replicated for scalability and fault tolerance. Each partition in the topic is essentially a commit log, an append-only, time-ordered data structure.

Each Kafka server is called a Kafka broker. It deals with the storing of data being produced by the producers in disk, and serving requests of consumers. Multiple Kafka brokers come together to form Kafka clusters.

The need for Kafka comes from decoupling any direct links between processes that produce the data, and services that analyze or consume the data. The direct links can get confusing and would require a coordinated effort from multiple teams of frontend and backend. The same piece of data can be utilized in a wide variety of tasks, and correspondingly might require it to be preprocessed in different ways into different formats.

Kafka solves these problems using a push-pull model that lets producers push data into topics and lets consumers pull data whenever they need to. By allowing persistence within the topics, instead of records disappearing once they are consumed, Kafka allows multiple consumers to read data from the same topic. Built with topics and partitions, Kafka is horizontally scalable for the varying needs of numerous organizations.

There is a whole ecosystem built around Kafka at LinkedIn

Kafka was originally designed to facilitate activity tracking, and collect application metrics and logs at LinkedIn. Currently, messages being relayed by Kafka are divided into five main sections, namely: queueing, metrics, logs, database replication, and tracking data.

Kafka brokers facilitate message queues between different applications and services. This process is done in a streaming fashion, as the new data gets added to topics continuously. In the past, storing this data in Hadoop to perform batch processing was enough for most use cases. However, as online services evolved, performing high-latency data processing gave way to low-latency to enable possibly near-time processing solutions. From detecting congestion and unusual traffic to changing recommendations for users following an action, near-time processing use cases have become a necessity for online platforms.

At LinkedIn, to connect the distributed stream messaging platform, Kafka, to stream processing, Samza was developed and later became an incubator project at Apache. Apache Samza is a distributed stream processing system that relies on the concept of streams and jobs operating on these streams. For LinkedIn’s operations that require near real-time response, streams are facilitated by Kafka and the processing is done by Samza. By delegating jobs that normally would be done with Map/Reduce jobs in Hadoop to Samza, LinkedIn engineers can provide a rich user experience and make real-time decisions in the case of an anomaly.

Real-time processing does not always have to rely on data that has been recently created. Streams can be created from historical and static data, depending on the use case. To this end, LinkedIn has developed and open-sourced Brooklin, a distributed streaming service that can consume data from diverse sources and produce streams for other applications to use. Brooklin focuses on data movement. It can be seen as a streaming bridge that can move data across different environments, including different public clouds, databases, and messaging systems. As you can draw the parallel, Brooklin can both be a consumer and a producer for Kafka topics.

That being said, not all use cases call for near real-time responses. Batch processing is still common and valuable. Batch processing at LinkedIn is performed using Apache Hadoop.

Gobblin, a library developed by LinkedIn and later donated to the Apache Foundation, is a data integration framework that bridges the gap between multiple data sources with different data types and Hadoop. Gobblin is used as LinkedIn for this exact purpose: to ingest data that does not require immediate processing into the data lake. Since virtually all data passes through Kafka, it is one of the main sources for Gobblin’s data ingestion.

Besides the tools that are directly connected with data that goes through Kafka, LinkedIn has developed numerous tools for working with Kafka, that serve use cases like monitoring and dealing with operational challenges that come with scaling. If you look at LinkedIn’s repositories in Github, you can see that a number of them relate to Kafka.

There are numerous more tools, like Kafka Monitor and Burrow, that are actively used in LinkedIn and open-sourced for the Kafka community. This whole ecosystem shows the importance of Kafka to LinkedIn’s operations, and how dedicated they are to pushing the limits of Kafka to new horizons.

LinkedIn as a developer of Kafka

LinkedIn, as the main contributor to Kafka, has an internal Kafka development team that is a reliable contact point within the company for any Kafka-related support needs. Relying heavily on Apache Kafka, LinkedIn keeps internal release branches that have been branched off separate from upstream Kafka. Since the ecosystem around Kafka is vast and the amount of data that pours into their operations is on a very large scale, LinkedIn maintains its own releases to address the scalability and operability issues. This branch of Kafka has recently been open-sourced on Github.

Kafka forms the backbone of LinkedIn’s stack, just as it is used by many other organizations and developers daily. LinkedIn has a great influence over Kafka as the initial developer of the tool and has helped shape the ecosystem around it. This level of commitment from a company as large as LinkedIn speaks volumes for the value of Apache Kafka.

]]>8822Data Infrastructure Investing with Eric Andersonhttp://softwareengineeringdaily.com/2020/02/20/data-infrastructure-investing-with-eric-anderson/?utm_source=rss&utm_medium=rss&utm_campaign=data-infrastructure-investing-with-eric-anderson
Thu, 20 Feb 2020 10:00:43 +0000http://softwareengineeringdaily.com/?p=8840In a modern data platform, distributed streaming systems are used to read data coming off of an application in real-time. There are a wide variety of streaming systems, including Kafka Streams, Apache Samza, Apache Flink, Spark Streaming, and more. When Eric Anderson joined the show back in 2016, he was working at Google on Google

]]>In a modern data platform, distributed streaming systems are used to read data coming off of an application in real-time. There are a wide variety of streaming systems, including Kafka Streams, Apache Samza, Apache Flink, Spark Streaming, and more.

When Eric Anderson joined the show back in 2016, he was working at Google on Google Cloud Dataflow, a managed service for handling streaming data. Today, he works as an investor at Scale Venture Partners. In his current job, he analyzes companies built around data infrastructure, developer tooling, and other enterprise engineering domains.

Eric also hosts the podcast Contributor, which explores open source maintainers and the stories of their projects. His podcast has featured the creators of projects such as Envoy, Alluxio, and Chef. In today’s episode, Eric returns to the show to discuss data infrastructure, investing, and the evolving world of open source.

]]>In a modern data platform, distributed streaming systems are used to read data coming off of an application in real-time. There are a wide variety of streaming systems, including Kafka Streams, Apache Samza, Apache Flink, Spark Streaming, and more.In a modern data platform, distributed streaming systems are used to read data coming off of an application in real-time. There are a wide variety of streaming systems, including Kafka Streams, Apache Samza, Apache Flink, Spark Streaming, and more. When Eric Anderson joined the show back in 2016, he was working at Google on GoogleSoftware Engineering Daily1:12:078840Materialize: Streaming SQL on Timely Data with Arjun Narayan and Frank McSherryhttp://softwareengineeringdaily.com/2020/02/19/materialize-streaming-sql-on-timely-data-with-arjun-narayan-and-frank-mcsherry/?utm_source=rss&utm_medium=rss&utm_campaign=materialize-streaming-sql-on-timely-data-with-arjun-narayan-and-frank-mcsherry
Wed, 19 Feb 2020 10:00:47 +0000http://softwareengineeringdaily.com/?p=8845Distributed stream processing frameworks are used to rapidly ingest and aggregate large volumes of incoming data. These frameworks often require the application developer to write imperative logic describing how that data should be processed. For example, a high volume of clickstream data that is getting buffered to Kafka needs to have a stream processing system

]]>Distributed stream processing frameworks are used to rapidly ingest and aggregate large volumes of incoming data. These frameworks often require the application developer to write imperative logic describing how that data should be processed.

For example, a high volume of clickstream data that is getting buffered to Kafka needs to have a stream processing system evaluate that data to prepare it for a data warehouse, Spark, or some other queryable environment. In practice, many developers simply want to have the high volume of data become queryable in the fewest number of steps possible.

Materialize is a streaming SQL materialized view engine that provides materialized views over streaming data. The materialized views are incrementally updated over time and reconciled with new data that may have come in out of order.

Arjun Narayan and Frank McSherry are the co-founders of Materialize, a company whose technology is based on the Naiad paper, which was written at Microsoft Research. Arjun and Frank join the show to talk about modern streaming systems and their strategy for taking an academic paper and productizing it.

]]>Distributed stream processing frameworks are used to rapidly ingest and aggregate large volumes of incoming data. These frameworks often require the application developer to write imperative logic describing how that data should be processed.Distributed stream processing frameworks are used to rapidly ingest and aggregate large volumes of incoming data. These frameworks often require the application developer to write imperative logic describing how that data should be processed. For example, a high volume of clickstream data that is getting buffered to Kafka needs to have a stream processing systemSoftware Engineering Daily1:11:098845LinkedIn Data Infrastructurehttp://softwareengineeringdaily.com/2020/02/18/linkedin-data-infrastructure/?utm_source=rss&utm_medium=rss&utm_campaign=linkedin-data-infrastructure
Tue, 18 Feb 2020 16:00:10 +0000http://softwareengineeringdaily.com/?p=8813LinkedIn has become a staple for the modern professional, whether it’s used for searching for a new job, reading industry news, or keeping up with professional connections. As a rapidly growing platform that serves more than 675 million users today, LinkedIn is a company that can boast of having one of the largest user bases

]]>LinkedIn has become a staple for the modern professional, whether it’s used for searching for a new job, reading industry news, or keeping up with professional connections.

As a rapidly growing platform that serves more than 675 million users today, LinkedIn is a company that can boast of having one of the largest user bases in the world. How these users interact with the site and react to recommendations aggregates into a massive dataset. On a scale that not many companies experience, LinkedIn has a large amount of data that brings interesting engineering problems and opens up ripe opportunity for innovation in areas like data infrastructure and tooling.

Even though LinkedIn is a 16-year-old company, its data infrastructure journey is far from over. LinkedIn’s infrastructure quest covers a wide range of practices, having approximately 20 servers in a small data center in 2008 to building smarter data centers around the world, and more recently, as of July 2019, having begun a multi-year migration to the public cloud with Azure. Throughout this journey, LinkedIn engineers have faced a variety of challenges and documented their solutions as lessons to be learned along the way, as well as built and open-sourced invaluable tools like Kafka and Voldemort, used by millions of other engineers.

In the early years of LinkedIn, the data infrastructure relied on a single data center, hosted with a retail data center provider. In those days, the priority, with data being served from a single data center, was availability – keeping the site up for users.

As the number of users grew and new features were released, adding data center capacity through a retail provider became less cost-effective. This is when LinkedIn started its own data center, gradually fanning out to rely on multiple data centers. By not only expanding their data centers in number, but also designing the fabric of the data centers in a smart way, LinkedIn grew into its modern infrastructure, able to handle millions of users.

LinkedIn has showcased a multi-perspective strategy on handling growth. The most prominent strategies have been expanding the number and the capacity of data centers, building smarter data centers, and creating tooling around massive data to enable faster integration of data into workflows to propel innovation.

Data Sources at LinkedIn

The first source is transactional data from the users: every action taken by a user in the form of status updates to post “likes” and job views must be stored. The second source is telemetry data, which comes from monitoring applications to gain insight into how the different components of the platform are performing. The third source, one without an upper bound according to Surlaker, is derived data, generated by developers for numerous purposes such as data sets to be used for analysis and building machine learning models.

These types of data are common for web applications with user interactions. Things get complicated when data has to be consolidated in a standard format to enable a unified experience for the developers in a company.

The data sources can be widely different – historical data usually comes from RDBMSs designed for OLAP, current transactional data comes from NoSQL databases and streams, and logs can be delivered in a variety of formats. In which paradigm the data comes in is also important: ingesting streaming data and using batch data may have different requirements.

LinkedIn’s main answer to handling these diverse sets of data has been through tooling. Luckily for the general developer community, many of these tools have been open-sourced over the years.

Open Source Tools

One of LinkedIn’s strategies for dealing with the massive amounts of data that are being constantly generated is to empower engineers by developing tools to deal with different aspects of the data, from ingestion to storage.

LinkedIn has built and open-sourced a variety of tools over the years. One of these tools, Kafka, built by LinkedIn and donated to Apache Software Foundation, forms the backbone of data operations at LinkedIn alongside Hadoop. Kafka, a distributed streaming platform, acts as a low-latency data collection system for the real-time data generated by LinkedIn’s user base.

Project InVersion

In the fast-moving world of startups, technical debt is often overlooked. It refers to an accumulation of deficiencies that make it harder to add new features to the system. The most common way of accumulating technical debt is by releasing features quickly without thinking of the future sustainability of the overall system, a practice that is prominent for startups that are looking to attract users and investors with shiny new features.

In 2011, after the company’s initial public offering, LinkedIn’s technical debt hit a critical point. Practices in the infrastructure that had been in use for years and problems that were compounded as new features were added on top of them could not be held down anymore. LinkedIn went for a risky infrastructure overhaul, now referred to as Project InVersion.

For two months in 2011, LinkedIn stopped rolling out new features as developers focused on improving and modernizing their infrastructure – a full team effort to get rid of the technical debt of the last eight years. This overhaul included developing new tools that automated testing, accelerated the process of rolling out features and updating the platform, and in the end, completely transformed LinkedIn’s backbone.

Challenges with ML

LinkedIn offers a personalized experience to each of its users. The way that posts in their feed are sorted, the job recommendations they see, and other recommendations need to be specific for everyone on the platform. The main power behind these operations are machine learning models.

An example from recommendations on LinkedIn, powered by AI.

LinkedIn has many teams for each ML application, from Feeds to Communities. Each of these areas poses unique challenges in defining the right objectives, applying the correct modeling technique, and successfully serving complex models with low latency at scale. Each model must be tightly integrated within the serving stack specific to its problem space. At the same time, there must be a single unified framework that provides a battery of tools to solve the myriad challenges that come with dealing with complex models that operate on a very large set of data.

LinkedIn’s solution is Pro-ML.

The goal of Pro-ML is to double the effectiveness of machine learning engineers while simultaneously opening the tools for AI and modeling to engineers from across the LinkedIn stack.

Pro-ML approach divides ML practices into layers as part of the machine learning development lifecycle

Each of these layers is a step towards building machine learning models for production. LinkedIn finds it helpful to standardize these steps so that engineers across teams can share innovations by simply swapping components with one another. We also provide automation and additional hints to help users find mistakes in their models faster.

In machine learning parlance, a “feature” is a piece of the data that the model uses to make a prediction. An example might be how many connections in common a user has with someone who posted an item in his or her feed. Features used in various machine learning models are collected into the Feature Marketplace in a searchable format. These features are available when making predictions when the user visits the site, but must be simulated when testing out an idea during model training. LinkedIn has had many challenges in the past with ensuring features are computed the same way during model training and prediction. Pro-ML offers a tool called Frame that unifies feature access and computation in all of the environments.

LinkedIn also has several open-source tools to integrate machine learning workflows into their infrastructure needs, such as TonY and Photon ML.

Photon ML was built out of similar needs as a machine learning library on Spark. Rather than deep learning, Photon ML focuses on Generalized Linear Models and Generalized Linear Mixed Models (GLMix). These models built by Photon ML power features where response prediction is useful, namely for recommendation components such as job recommendation, feed ranking, and “People You May Know.”

Journey to Cloud

LinkedIn has been using Azure for some of its operations, such as Microsoft’s Content Moderator APIs as part of Cognitive Service for detecting inappropriate content and Text Analytics APIs for machine translation. The choice to use Azure services from Cognitive Service is an important point: LinkedIn has proven over the years through numerous projects built and open-sourced by its engineers that the company is not averse to tackling a problem from the root and developing the necessary solution. There is a trade-off, in terms of the developer effort put in by engineers in LinkedIn and the cost of using a service from a provider. Beyond this trade-off, however, comes the question of reliability and scale, especially for a company like LinkedIn, unique in the amount of data and number of users its platform serves.

Recently, Senior VP of Engineering of LinkedIn, Mohak Shroff announced that the company will be making the switch to the public cloud under the umbrella of Azure. This is a critical move, and a deliberate one, according to Shroff – periodically weighing the pros and cons of public cloud from a multi-faceted approach, ranging from applicability to the bare economics, the company recently decided that it would be a worthy next step.

These considerations are significant. The decisions to use Azure services show the company’s trust in Azure to handle some of the data operations on the scale of LinkedIn.

To learn more about what the engineers over at LinkedIn are building to connect the world’s professionals, check out the company’s blog.

]]>8813Go Networking with Sneha Inguvahttp://softwareengineeringdaily.com/2020/02/18/go-networking-with-sneha-inguva/?utm_source=rss&utm_medium=rss&utm_campaign=go-networking-with-sneha-inguva
Tue, 18 Feb 2020 10:00:29 +0000http://softwareengineeringdaily.com/?p=8863A cloud provider gives developers access to virtualized server infrastructure. When a developer rents this infrastructure via an API call, a virtual server is instantiated on physical machines. That virtual server needs to be made addressable through the allocation of an IP address to make it reachable from the open Internet. When the virtual server

]]>A cloud provider gives developers access to virtualized server infrastructure. When a developer rents this infrastructure via an API call, a virtual server is instantiated on physical machines. That virtual server needs to be made addressable through the allocation of an IP address to make it reachable from the open Internet. When the virtual server starts to receive too much traffic, that traffic needs to be load balanced with another virtual server.

The backend networking code that runs a cloud provider needs to be fast, secure, and memory-efficient. Languages that fit that description include C++, Rust, and Go. Digital Ocean’s low-level networking code is mostly written in Go.

Sneha Inguva is an engineer with Digital Ocean who has written and spoken about writing networking applications using Go. She joins the show to talk about her work at Digital Ocean, including the implementation of a DHCP server, a network server that assigns IP addresses and other parameters to devices that sit on that network.

]]>A cloud provider gives developers access to virtualized server infrastructure. When a developer rents this infrastructure via an API call, a virtual server is instantiated on physical machines. That virtual server needs to be made addressable through t...A cloud provider gives developers access to virtualized server infrastructure. When a developer rents this infrastructure via an API call, a virtual server is instantiated on physical machines. That virtual server needs to be made addressable through the allocation of an IP address to make it reachable from the open Internet. When the virtual serverSoftware Engineering Daily58:188863Great Expectations: Data Pipeline Testing with Abe Gonghttp://softwareengineeringdaily.com/2020/02/17/great-expectations-data-pipeline-testing-with-abe-gong/?utm_source=rss&utm_medium=rss&utm_campaign=great-expectations-data-pipeline-testing-with-abe-gong
Mon, 17 Feb 2020 10:00:50 +0000http://softwareengineeringdaily.com/?p=8835A data pipeline is a series of steps that takes large data sets and creates usable results from them. At the beginning of a data pipeline, a data set might be pulled from a database, a distributed file system, or a Kafka topic. Throughout a data pipeline, different data sets are joined, filtered, and statistically

]]>A data pipeline is a series of steps that takes large data sets and creates usable results from them. At the beginning of a data pipeline, a data set might be pulled from a database, a distributed file system, or a Kafka topic. Throughout a data pipeline, different data sets are joined, filtered, and statistically analyzed.

At the end of a data pipeline, data might be put into a data warehouse or Apache Spark for ad-hoc analysis and data science. At this point, the end-user of the data set expects that data to be clean and accurate. But how do we have any guarantees about the correctness?

Abe Gong is the creator of Great Expectations, a system for data pipeline testing. In Great Expectations, the developer creates tests called “expectations”, which verify certain characteristics of the data set at different phases in a data pipeline. This helps ensure that the end result of a multi-stage data pipeline is correct.

Abe joins the show to discuss the architecture of a data pipeline and the use cases of Great Expectations.

]]>A data pipeline is a series of steps that takes large data sets and creates usable results from them. At the beginning of a data pipeline, a data set might be pulled from a database, a distributed file system, or a Kafka topic.A data pipeline is a series of steps that takes large data sets and creates usable results from them. At the beginning of a data pipeline, a data set might be pulled from a database, a distributed file system, or a Kafka topic. Throughout a data pipeline, different data sets are joined, filtered, and statisticallySoftware Engineering Daily11:08:498835Data Warehouse ETL with Matthew Scullionhttp://softwareengineeringdaily.com/2020/02/14/data-warehouse-etl-with-matthew-scullion/?utm_source=rss&utm_medium=rss&utm_campaign=data-warehouse-etl-with-matthew-scullion
Fri, 14 Feb 2020 10:00:29 +0000http://softwareengineeringdaily.com/?p=8752A data warehouse provides low latency access to large volumes of data. A data warehouse is a crucial piece of infrastructure for a large company, because it can be used to answer complex questions involving a large number of data points. But a data warehouse usually cannot hold all of a company’s data at any

A data warehouse is a crucial piece of infrastructure for a large company, because it can be used to answer complex questions involving a large number of data points. But a data warehouse usually cannot hold all of a company’s data at any given time. Users need to move a subset of the data into the data warehouse by reading large files from a data lake on disk and putting that data into the data warehouse.

The process of moving data from one place into another is broken down into three sequential steps, often called “ETL” (extract, transform, load) or “ELT” (extract, load, transform). In ETL, the data is extracted from a source such as a data lake, transformed into a schema that is customized for the data warehouse application, and then loaded into the data warehouse. In ELT, the last two steps are reversed, because modern systems can often leave the necessary schema transformation until after the data has been loaded into the data warehouse.

Matthew Scullion is the CEO of Matillion, a company that specializes in building tools for data transformations. Matthew joins the show to talk about the problem of data transformation, and how that problem has evolved over the nine years since he started Matillion.

If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

]]>A data warehouse provides low latency access to large volumes of data. A data warehouse is a crucial piece of infrastructure for a large company, because it can be used to answer complex questions involving a large number of data points.A data warehouse provides low latency access to large volumes of data. A data warehouse is a crucial piece of infrastructure for a large company, because it can be used to answer complex questions involving a large number of data points. But a data warehouse usually cannot hold all of a company’s data at anySoftware Engineering Daily157:548752The Rise of Platform Engineeringhttp://softwareengineeringdaily.com/2020/02/13/setting-the-stage-for-platform-engineering/?utm_source=rss&utm_medium=rss&utm_campaign=setting-the-stage-for-platform-engineering
Thu, 13 Feb 2020 16:00:28 +0000http://softwareengineeringdaily.com/?p=8782The rise of microservices, container orchestration, and the like have introduced novel engineering challenges. Platform engineering teams have formed at a number of organizations to shoulder these responsibilities. In some respects, the role of a platform engineer hasn’t drastically changed from that of other DevOps related roles. There is truth in noting that the title,

]]>The rise of microservices, container orchestration, and the like have introduced novel engineering challenges. Platform engineering teams have formed at a number of organizations to shoulder these responsibilities. In some respects, the role of a platform engineer hasn’t drastically changed from that of other DevOps related roles. There is truth in noting that the title, “Platform Engineer” is nothing but a new title. However, a number of factors are, and continue to, cause the traditional responsibilities of a Site Reliability Engineer (SRE) to shift.

These factors include the increased popularity and extensibility of cloud providers, Kubernetes, and infrastructure as code. Paradigms introduced by these factors unlock many superpowers for an organization, such as service discovery and the ability to horizontally scale with ease, which could potentially lead to more money in the bank.

Mature companies with legacy infrastructure are mobilizing in preparation for the great migration to the cloud, and cloud providers are ready to accept them with open arms. But, with this migration comes a need for expertise in the cloud and container orchestration. So, organizations are beginning to question whether they should form a platform engineering team. Companies born recently before, or during, the cloud doesn’t have as many of these concerns; there are fewer, if any, legacy systems to wrestle. It’s very common for companies to begin, and remain, on cloud providers without ever managing on-prem systems.

As mentioned, the role, “Platform Engineer” is considered by some to just be a different title for a job that has been traditionally performed by an infrastructure team. To understand why this is not exactly true, let’s take a closer look at what platform engineering looks like today.

What is platform engineering?

This is a loaded question. Asking ten engineers this same question would likely yield ten different answers. That said, there would probably be a number of similar themes. The most prominent theme would likely be similar to the idea of bridging the gap between software and hardware. In other words, platform engineers enable application developers to put software into the hands of users in an easier manner. This broad stroke manifests itself in a number of different ways. Some of these ways could be standardizing an organization’s Kubernetes deployments, ensuring infrastructure is auditable, automating various deployment processes, and writing documentation for application developers.

The responsibilities of a platform engineering team should not be confused with those of a DevOps team. They’re similar in some respects, though they vary in others. Examining where platform engineering and DevOps diverge can help to explain the growing popularity of this new team. For one, the concept of DevOps predates that of platform engineering and has matured in sync with technological progress. Originally, DevOps was fairly ad hoc. For example, if a team within an organization wanted to host a new website, coordination between this team and a DevOps team was necessary. Contrast this with the notion of platform engineering. Platform engineers build systems that allow teams to build on. To continue the example, if the same team had a platform that would take care of hosting the website, no coordination would be necessary between this team and the platform engineering team.

Another significant difference is the role of an API boundary, as well as how explicit this boundary is, within the context of each role’s responsibilities. This ties in with the suggestion that DevOps tends to be more ad hoc than platform engineering. DevOps and platform engineering teams are concerned with deployments, service accounts, and infrastructure. However, DevOps teams aren’t building platforms that offer explicit APIs and abstractions that offer flexibility for application developers; platform teams are building these types of platforms.

To further describe the role of platform engineering plays in an organization, let’s consider an example. Suppose an insurance company that was founded in the 1980s has started to shift their infrastructure to the cloud. Now, suppose that within this organization, the software engineers are split into two categories: application development and infrastructure. Before the cloud era, it was common for the infrastructure engineers to resemble a backend team that offered APIs.

These responsibilities are most often fulfilled through the use of infrastructure as code (IaC). Some common infrastructure as code tools are Terraform, Vagrant, Chef, Puppet, and AWS CloudFormation. A number of these tools are open source. Generally, the platform built by platform engineers is composed of these open-source tools. An organization’s platform engineers tailor infrastructure as code tools to the needs of the organization’s application developers. Below is a figure that illustrates how infrastructure as code and platform engineers fit into a development team, as well as how these tools, ultimately, lead to more features.

Infrastructure as code

Infrastructure as code is one of a number of factors that have helped boost the notion of a platform engineer within the collective conscience. But, it underpins many of these additional factors, so it deserves a closer examination. Before the era of infrastructure as code, a human had to manually configure infrastructure.

In retrospect, manually configuring infrastructure is problematic; the element of human error is always a risk. Humans are more error-prone and expensive than computers. Oh, and humans are literally billions of times slower. Infrastructure as code removes the risk of human error, reduces cost, and improves the speed at which teams within an organization can iterate. The fewer humans involved in a systematic process, the better.

One notable benefit of infrastructure as code is its ability to be checked into version control. This is especially beneficial for enterprises that may be ramping up cloud infrastructure quickly. Version control platforms, like GitHub, provide context for a system’s infrastructure by facilitating and keeping records of changes and, arguably equally as important, the reviewal process. GitHub’s pull request reviewal process is a great example of this; it’s a place for discussion to take place. Whether this type of review is ideal for all kinds of pull requests is arguable. But, it is a huge upside for infrastructure as code.

As with many technologies, there are different approaches to infrastructure as code. The most common are declarative and imperative models. Declarative frameworks, such as that offered by Kubernetes, require users to define a desired state. Users don’t specify how this state should be achieved. In a declarative model, the system develops a plan to reach and maintain the specified state. Imperative frameworks require users to specify commands in a particular order, in order to reach a desired state.

At first, the imperative model may be more intuitive, as a number of popular programming languages are considered procedural, like Go. However, it is not the popular approach to infrastructure as code. For one, the imperative model does not scale. Complexity scales exponentially, at best, in relation to the number of components in a system; users have to execute the correct commands in the correct order for more machines. Contrast this with the declarative model: users describe the desired end state. Then, it is the responsibility of the framework to develop and execute a plan to reach this desired state. Complexity scales logarithmically, or at least far better than linearly, in relation to the number of components in a system; the framework takes care of all the heavy lifting.

In some cases, the flexibility an imperative model offers is preferable to a declarative model that abstracts it away. Thankfully, there are tools that offer an imperative approach to infrastructure as code that minimizes complexity, such as Terraform, Vagrant, and CloudFormation. If you’re interested in learning more about these technologies, check out this episode of Software Engineering Daily. This episode is a conversation with Mitchell Hasimoto, the founder of Hashicorp, about application development and why the importance of automation scales with the complexity of infrastructure.

When does an organization need a platform engineering team?

There are tradeoffs organizations often consider when considering building a platform engineering team. On one hand, building a platform engineering team detracts resources from building business logic and developing features. However, a platform engineering team may build tooling and infrastructure that increases engineering productivity. Without a platform team in place, it’s likely that some engineers have taken it upon themselves to assume a platform engineer-like role. Organizationally, this can become a challenge and place a burden on all engineers in an organization. Without a definitive set of engineers responsible for an organization’s platform, rogue engineers acting in a platform engineering capacity will probably not be effective. Put simply, considering building a platform engineering team can be thought of as weighing short-term gains against long-term gains.

Building a platform engineering team is easier said than done. The following may come as a surprise to readers living in the Bay Area: legacy infrastructure is common within enterprise organizations and can result in a lot of confusion about platform engineering. Platform engineering at a startup founded in the cloud era looks very different from platform engineering at a pre-cloud era enterprise. Unlike startups born in the cloud era, many pre-cloud era enterprises have on-prem systems that yet to be migrated to the cloud. Additionally, as if migrating to the cloud wasn’t challenging enough, pre-cloud era enterprises tend to have more red tape and bureaucracy standing in the way of organizational changes. The short-term downsides to creating a platform engineering team can be magnified by these types of organizational barriers.

Organizations should consider the short-term losses that creating a platform engineering team may cause. A strong indication that an organization that a platform engineering team would be beneficial is the observation of different product teams building similar features or trying to accomplish similar tasks. Product teams could experience an increase in productivity, if a platform team is formed. Platform engineering is interesting because it can cause an entire organization’s efficiency to increase. This should be taken into consideration before writing off the need for a platform engineering team.

If an enterprise organization does form a platform team, an effort to continue, or begin, migrating to the cloud is almost inevitable. Migrating to the cloud forces organizations to choose which cloud vendor, or vendors, to use. Let’s examine how the choice between using a single, as opposed to multiple, cloud vendors may affect an organization.

To multi-cloud, or not to multi-cloud, that is the question

The first step in developing a cloud strategy is deciding if a multi-cloud strategy is needed. The best multi-cloud strategy may be to not use services from different vendors and embrace all that one has to offer. Sure, using a single cloud provider has drawbacks, but it can prove to be vastly simpler than any multi-cloud approach. This isn’t to say the benefits of going multi-cloud don’t outweigh those of the simplicity of using a single cloud vendor. Examining both the benefits and drawbacks of multi-cloud architecture can shed light on how going multi-cloud may affect a given organization.

The benefits of going multi-cloud include being able to use the best services available for a given task and limit the risk of outages in any given geographical region. The benefit most frequently touted by multi-cloud advocates is the freedom it provides: you’re not locked into any single vendor’s ecosystem. Vendor lock-in is the idea that the vendor is one of the organization’s dependencies and wouldn’t be able to substitute alternative solutions. The fear is that the work associated with substituting a current dependency for another would be far too costly.

There are different ways one could interpret the notion of cloud-vendor lock-in. On one hand, there’s the risk that an organization using a single vendor deems a critical service as sub-par and wants another option. Or, suppose this critical service begins to cost more than anticipated. In this scenario, it would be difficult to decide what the next step should be. However, it could be argued that this service isn’t actually critical for the organization. Popular cloud vendors offer a wide array of products; it’s possible that the alleged critical service could simply be replaced. Additionally, these cloud vendors have guides on migrating to and from their platforms.

Yes, there are some benefits to going multi-cloud. There are also some downsides. One downside is the fact that there’s more organizational complexity; this is difficult to deny. For example, each platform’s security accounts must be managed. Aside from organizational complexity, it can be difficult to find engineers who are knowledgeable about multiple cloud platforms. More ramp-up time might need to be allocated for new hires if a multi-cloud system is in place. Another downside to a multi-cloud environment is the increased surface area for potential threats. Organizations may weigh this downside differently, depending on the kinds of data an organization deals with. But, generally, it’s safe to assume your organization values security.

To conclude this examination of the multi-cloud option, let’s take a closer look at using only a single cloud. Sure, there are risks of lock-in and not being able to use the best services for a given task. However, in many cases, using one cloud-vendor can greatly simplify a system. Going all-in on a cloud vendor will inevitably have less organizational and engineering overhead. A cloud’s unique products and capabilities is a primary reason to choose a vendor, rather than opt for the least common denominator of multiple cloud vendors. Finally, there’s no point in worrying whether a particular cloud will fail to meet your organization’s needs, at some point in the future. There’s fierce competition amongst the top cloud vendors. Competitive pressures minimize the risk of any single cloud vendor not delivering top-notch products.

Looking ahead

In this episode of Software Engineering Daily, Abby Fuller, a principal technologist at Amazon, noted that “…folks [organizations] should always have a modernization plan.” Broadly, plans for modernization have been ubiquitous across all industries, but they often have implications for upgrading technology being used. The need for modernization amongst enterprise organizations stems from a number of factors, all of which have strong ties to the rise of cloud computing. These factors include the rise of Big Data, increased popularity of microservices, and momentum behind Kubernetes in the container orchestration space.

The notion of container orchestration and its difficulties gave rise to Kubernetes. Kubernetes won the orchestration wars. This has allowed engineers to develop transferable skills. Mindshare continues to build around Kubernetes and a number of pre-cloud organizations have started their migrations to Kubernetes. So, many organizations migrating to the cloud, and that are using microservices, are considering using Kubernetes as their container orchestration framework. Not only does it simplify the container orchestration process, but it can attract more developers who are interested in technologies closer to the bleeding edge. This demographic of developers tends to be attractive to pre-cloud organizations that are interested in moving more of their infrastructure to the cloud.

Forming a platform engineering team is one way an organization could begin modernizing their engineering culture. The aforementioned trends, like the prominence of cloud vendors and the shift to a microservice architecture, make forming a platform engineering team even more appealing. But, modernizing engineering culture can be achieved in other ways, as well. A platform engineering team should not be created for the sake of having a platform engineering team. As with most things in the technology space, it depends.

One article will most likely not be enough for an organization to form a strong opinion about whether a platform engineering team would be beneficial. Building, and maintaining, awareness of the technology landscape can help an organization form a stronger opinion. If you’re interested in learning more about the role of software platforms within organizations, check out this episode of Software Engineering Daily covering Cruise’s approach to platform engineering.

]]>8782What is a Layer 2 Cloud Provider?http://softwareengineeringdaily.com/2020/02/13/what-is-a-layer-2-cloud-provider/?utm_source=rss&utm_medium=rss&utm_campaign=what-is-a-layer-2-cloud-provider
Thu, 13 Feb 2020 16:00:07 +0000http://softwareengineeringdaily.com/?p=8776The rise of “cloud infrastructure” has presented a dilemma for developers: what is the appropriate level of complexity for a cloud provider to handle? In the last decade, the options for cloud engineering services have expanded exponentially, leading to a vast array of product offerings. While the variety and complexity offered by cloud providers such

]]>The rise of “cloud infrastructure” has presented a dilemma for developers: what is the appropriate level of complexity for a cloud provider to handle? In the last decade, the options for cloud engineering services have expanded exponentially, leading to a vast array of product offerings. While the variety and complexity offered by cloud providers such as AWS presents advantages to developers with economies of scale and scope, these same factors may be unnecessary or burdensome to other firms that seek a more focused and streamlined experience.

Heroku is Layer 2 Cloud provider which provides a “Platform-as-a-Service.” Heroku was founded in 2007 and has evolved alongside the large cloud providers such as AWS. While Heroku is in some ways a competitor to large cloud providers for clients who wish to deploy web applications, it also builds on AWS cloud infrastructure and acts as a “middleman” between the cloud provider and the user. Heroku’s focus is on 12-Factor Web Apps, which is a set of guiding principles for building web applications that Heroku engineer Mark Turner said “really does define what describes an application suitable for Heroku.”

Given the scope and competitive power of cloud providers, Heroku’s longevity and continued use indicate its users find lasting value that is worth the additional marginal costs. We have conducted several interviews with employees at Heroku in the past several years, and each has had a unique take on Heroku’s value proposition, but a few major themes stood out.

Full disclosure: Heroku is a sponsor of Software Engineering Daily.

In the late 2000s, the rise of cloud computing service offerings such as Amazon’s EC2 heralded a shift in how software was deployed and managed over the web. Formerly, so-called “bare metal” servers were built in on-premise data centers that would act as the host for a company’s web software. Cloud providers such as Amazon Web Services offered “virtualization” services, whereby server hardware was partitioned by a hypervisor into virtual machines, which could be allocated to client applications for hosting.

Virtualization abstracted away the underlying hardware management; subsequently, cloud providers built infrastructure services to interact with and manage VMs. This gave rise to Infrastructure-as-a-Service offerings, which “provide high-level APIs used to dereference various low-level details of underlying network infrastructure like physical computing resources, location, data partitioning, scaling, security, backup, etc.” Cloud providers that focus on the management of hardware up through the IaaS level are called “Layer 1 Cloud Providers.” Layer 1 cloud providers require significant scale and scope to operate effectively due to the complex operational challenges of managing hardware and server operations at a granular level.

Today, Amazon Web Services is the dominant Layer 1 cloud provider. AWS accounts for nearly half of the market for IaaS providers. The second-largest Layer 1 provider, Microsoft Azure, trails at a distant 15.5%. AWS currently offers 212 services on its platform, and offers an array of certification courses training developers and “solutions architects” on its various products.

Above is a diagram of a simple application architecture deployed with AWS services. As applications grow in scale, the number and complexity of the services used can increase dramatically. The advantages of precise configuration and complex architectures are significant for companies with large development and operations teams matched with broad and active customer bases. However, growth in the scope of services creates challenges of its own. As Mark Turner put it:

“Everything that companies like AWS offer as the wide disparate services that they provide means that you really get a giant kit of solutions to problems that you don’t even know you have sometimes.”

“We are essentially taking on the role of your operations team, we’re doing a lot of preventative maintenance. We’re doing a lot of just things that you would never think about behind-the-scenes, and that’s sort of what we bake into our costs.”

Streamlining the developer experience does not translate to simplifying the underlying infrastructure; in fact, the task of adapting layer 1 technologies to a layer 2 interface is a difficult engineering task. Heroku manages several of the underlying infrastructural tasks that a firm may face when managing a web application. For example, Heroku manages application scaling, provides access to autoscaling, and runs a metrics pipeline on Kafka and Cassandra to provide “health checks” essential to an autoscaling process. Heroku engineer Andy Appleton noted that “the entire Heroku product is a developer product,” and that the focus was on creating the best possible UX to add value on top of the Layer 1 AWS cloud framework. Heroku also offers several services such as managed Postgres databases, Redis, version control, and continuous deployment. All of these services and more may be easily integrated with a minimum of time spent managing configuration. Mark Turner described some of the operational needs of Heroku customers:

“We might have customers that have workloads, where each single process uses 14 gigs of memory. At the same time, they might have something that where each process uses 2 megabytes and they want each of those things to schedule and boot up instantaneously. That’s the orchestration layer problems we deal with that makes it hard.

Then it’s the isolation and security boundaries between all of that stuff that also makes it hard, and auditing and patching and maintaining those boundaries is the hard part. Then you factor in layering on those workflows that power that Heroku experience, that those containers encapsulate down at the bottom of it; it all adds up into I think just a hard system to build.”

When a cloud platform expands, it can do so either “horizontally” or “vertically.” Horizontal scaling is the expansion of scope. For example, Amazon adding new available functions to its cloud platform represents scope expansion. The horizontal expansion allows a cloud provider to tackle new problems, or to solve old problems in more efficient ways. The adoption of Kubernetes for container orchestration represented horizontal expansion for Layer 1 cloud providers- including Google’s GKE and AWS’s EKS. Heroku has also undergone horizontal expansion, adding in services such as Kafka to its suite of products. We spoke with Tom Crayford, an engineer at Heroku, at length about adapting Kafka to a managed, streamlined Heroku experience. Despite the scope expansion, Heroku makes efforts to focus its expanded offerings on products that work in mostly the same way in order to minimize the work necessary to add or switch services. From Jon Daniel, talking about Heroku’s managed Postgres offerings:

“The nice thing about that is being able to have a fairly standard, almost single-tenant configuration set up on that instance, and the configuration changes themselves as to like what version of Postgres or what plan type is really based on mounting. So we know that everything is going to act very similarly from one to the other and there’s not a lot of one-off configuration happening in there.”

On the other hand, vertical expansion is “creep up the developer experience stack,” which is when a cloud provider deepens its service offerings by creating new abstractions on top of the base layer. It’s worth noting that the cloud itself, as we think about it in terms of modern software engineering, is a layer of abstraction on top of server hardware. From Mark Tuner:

“Where we expend our energy in leveling up the platform’s capabilities is really important. We pay a lot of attention to how we spend that energy. It’s really important for us to make calls that aren’t good for us, because there’s no way that Heroku is going to directly compete with AWS, or is there a GCP. That’s just not the game we play in. We’re just not that game.”

Even a company as large as Amazon can only expand so much at a time, and thus it must face tradeoffs between horizontal and vertical expansion. In fact, Layer 1 cloud providers operate at somewhat of a disadvantage in terms of offering an opinionated developer experience, because they face competitive pressure to offer a product that can fit every need a client may face. If a large client demands Kubernetes, AWS must build a Kubernetes product. Layer 2 cloud providers like Heroku can afford to be more selective because the needs of their target clients tend to be more clearly defined. From Andy Appleton:

“ I think there’s…80% of applications which have very similar requirements and kind of operate in the same way. A big goal is to serve those 80% kind of very closely so that… as close to being no operational burden on the team is possible. Then when you get customers who have much more specific requirements to try and build the Heroku platform in such a way that there’s these escape hatches or ways to dropdown a level and let them do the thing that they want to do.”

While the big tech companies may find competitive value in the adoption of the newest, shiniest cloud technologies, the operational burden of extra complexity in a tech stack may be unnecessary for organizations without the need or resources to adopt a highly configured approach. 98.2% of firms in the United States have fewer than 100 employees, and it can be safely assumed that most of those firms cannot devote all 100 employees to configuring load balancers. A comparison may be made to Spring Boot, a “convention over configuration” web framework for Java, which has become the top web framework for Java developers due, in a large part, to ease of use. As mentioned before, streamlining does not equal simplifying, and Heroku maintains a significant competitive moat due both to the technological challenge of wrangling cloud services and the business challenge of maintaining margins. Heroku charges a premium over what one would pay to use AWS alone, and maintaining the value of that margin is a primary business objective.

Despite the business and technological challenges of operating in the gap between clients and Layer 1 cloud providers, Heroku’s products offer a sustainable value proposition to developers and tech companies who seek to reduce overhead, pass off complexity, and allocate resources most effectively towards their core business. As Jon Daniel put it:

“…I don’t have to think about building a web framework. I can just focus on building an application that provides value to my business, and using Heroku is similar….You just focus on building your apps.”

]]>8776Anyscale with Ion Stoicahttp://softwareengineeringdaily.com/2020/02/13/anyscale-with-ion-stoica/?utm_source=rss&utm_medium=rss&utm_campaign=anyscale-with-ion-stoica
Thu, 13 Feb 2020 10:00:34 +0000http://softwareengineeringdaily.com/?p=8758Machine learning applications are widely deployed across the software industry. Most of these applications used supervised learning, a process in which labeled data sets are used to find correlations between the labels and the trends in that underlying data. But supervised learning is only one application of machine learning. Another broad set of machine learning

]]>Machine learning applications are widely deployed across the software industry.

Most of these applications used supervised learning, a process in which labeled data sets are used to find correlations between the labels and the trends in that underlying data. But supervised learning is only one application of machine learning. Another broad set of machine learning methods is described by the term “reinforcement learning.”

Reinforcement learning involves an agent interacting with its environment. As the model interacts with the environment, it learns to make better decisions over time based on a reward function. Newer AI applications will need to operate in increasingly dynamic environments, and react to changes in those environments, which makes reinforcement learning a useful technique.

Reinforcement learning has several attributes that make it a distinctly different engineering problem than supervised learning. Reinforcement learning relies on simulation and distributed training to rapidly examine how different model parameters could affect the performance of a model in different scenarios.

Ray is an open source project for distributed applications. Although Ray was designed with reinforcement learning in mind, the potential use cases go beyond machine learning, and could be as influential and broadly applicable as distributed systems projects like Apache Spark or Kubernetes. Ray is a project from the Berkeley RISE Lab, the same place that gave rise to Spark, Mesos, and Alluxio.

The RISE Lab is led by Ion Stoica, a professor of computer science at Berkeley. He is also the co-founder of Anyscale, a company started to commercialize Ray by offering tools and services for enterprises looking to adopt Ray. Ion Stoica returns to the show to discuss reinforcement learning, distributed computing, and the Ray project.

If you enjoy the show, you can find all of our past episodes about machine learning, data, and the RISE Lab by going to SoftwareDaily.com and searching for the technologies or companies you are curious about . And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

]]>Machine learning applications are widely deployed across the software industry. Most of these applications used supervised learning, a process in which labeled data sets are used to find correlations between the labels and the trends in that underlyin...Machine learning applications are widely deployed across the software industry. Most of these applications used supervised learning, a process in which labeled data sets are used to find correlations between the labels and the trends in that underlying data. But supervised learning is only one application of machine learning. Another broad set of machine learningSoftware Engineering Daily156:518758Flink and BEAM Stream Processing with Maximilian Michelshttp://softwareengineeringdaily.com/2020/02/12/flink-and-beam-stream-processing-with-maximilian-michels/?utm_source=rss&utm_medium=rss&utm_campaign=flink-and-beam-stream-processing-with-maximilian-michels
Wed, 12 Feb 2020 10:00:07 +0000http://softwareengineeringdaily.com/?p=8749Distributed stream processing systems are used to read large volumes of data and perform operations across those data streams. These stream processing systems often build off of the MapReduce algorithm for collecting and aggregating large volumes of data, but instead of processing a calculation over a single large batch of data, they process data on

]]>Distributed stream processing systems are used to read large volumes of data and perform operations across those data streams.

These stream processing systems often build off of the MapReduce algorithm for collecting and aggregating large volumes of data, but instead of processing a calculation over a single large batch of data, they process data on an ongoing basis. There are so many different stream processing system for this same use case–Storm, Spark, Flink, Heron, and many others.

Why is that? When there seems to be much more consolidation around the Hadoop MapReduce batch processing technology, why are there so many stream processing systems?

One explanation is that aggregating the results of a continuous stream of data is a process that very much depends on time. At any given point in time, you can take a snapshot of the stream of data, and any calculation based on that data is going to be out of date by the time that your calculation is finished. There is a latency between when you start calculating something, and when you finish calculating it.

There are other design decisions for a distributed stream processing system. What data do you keep in memory? What do you keep on disk? How often do you snapshot your data to disk? What is the method for fault tolerance? What are the APIs for consuming and processing this data?

Maximilian Michels has worked on the Apache Flink and Apache BEAM stream processing systems, and currently works on data infrastructure at Lyft. Max joins the show to discuss the tradeoffs of different stream processing systems and his experiences in the world of data processing.

You can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

]]>Distributed stream processing systems are used to read large volumes of data and perform operations across those data streams. These stream processing systems often build off of the MapReduce algorithm for collecting and aggregating large volumes of d...Distributed stream processing systems are used to read large volumes of data and perform operations across those data streams. These stream processing systems often build off of the MapReduce algorithm for collecting and aggregating large volumes of data, but instead of processing a calculation over a single large batch of data, they process data onSoftware Engineering Dailyclean51:148749Druid Analytics with Jad Naoushttp://softwareengineeringdaily.com/2020/02/11/druid-analytics-with-jad-nauous/?utm_source=rss&utm_medium=rss&utm_campaign=druid-analytics-with-jad-nauous
Tue, 11 Feb 2020 10:00:19 +0000http://softwareengineeringdaily.com/?p=8745Large companies generate large volumes of data. This data gets dumped into a data lake for long-term storage, then pulled into memory for processing and analysis. Once it is in memory, it is often read into a dashboard, which presents a human with a visualization of the data. The end-user who is consuming this data

]]>Large companies generate large volumes of data. This data gets dumped into a data lake for long-term storage, then pulled into memory for processing and analysis. Once it is in memory, it is often read into a dashboard, which presents a human with a visualization of the data.

The end-user who is consuming this data is often a data scientist who is looking at the data to find trends and design new machine learning models. Another kind of user is the operational analyst. An operational analyst is creating complex queries across this data to find latencies in the infrastructure, or perhaps slicing and dicing clickstream data that is coming from online advertisements, in order to figure out how to tweak those advertising algorithms and spend money more effectively.

For an operational analyst, a key use case for a data warehouse is fast, interactive querying. The operational analyst needs to be able to query the data to quickly create a dashboard, make judgments based on that dashboard, and then change the query slightly to look at a slightly different dashboard.

Druid is a high-performance database that is used for these kinds of queries. Druid is used for ad-hoc queries and operational analytics. Imply Data is a company that builds visualization, monitoring, and security around Druid. Jad Naous is vice president of R&D for Imply, and he joins the show to talk about the use case for Druid, the architecture, and the business model of Imply.

If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

]]>Large companies generate large volumes of data. This data gets dumped into a data lake for long-term storage, then pulled into memory for processing and analysis. Once it is in memory, it is often read into a dashboard,Large companies generate large volumes of data. This data gets dumped into a data lake for long-term storage, then pulled into memory for processing and analysis. Once it is in memory, it is often read into a dashboard, which presents a human with a visualization of the data. The end-user who is consuming this dataSoftware Engineering Dailyclean56:518745The Data Exchange with Ben Loricahttp://softwareengineeringdaily.com/2020/02/10/the-data-exchange-with-ben-lorica/?utm_source=rss&utm_medium=rss&utm_campaign=the-data-exchange-with-ben-lorica
Mon, 10 Feb 2020 10:00:29 +0000http://softwareengineeringdaily.com/?p=8739Data infrastructure has been transformed over the last fifteen years. The open source Hadoop project led to the creation of multiple companies based around commercializing the MapReduce algorithm and Hadoop distributed file system. Cheap cloud storage popularized the usage of data lakes. Cheap cloud servers led to wide experimentation for data tools. Apache Spark emerged

]]>Data infrastructure has been transformed over the last fifteen years.

The open source Hadoop project led to the creation of multiple companies based around commercializing the MapReduce algorithm and Hadoop distributed file system. Cheap cloud storage popularized the usage of data lakes. Cheap cloud servers led to wide experimentation for data tools. Apache Spark emerged from academia, and Apache Kafka came out of the corporate challenges faced by LinkedIn.

Over these 15 years, Ben Lorica has been following the world of data engineering as an engineer, a conference organizer, and a podcaster. When he was host of the O’Reilly Data Show, his material served as inspiration for some of the episodes of this podcast. Today he hosts The Data Exchange podcast and writes The Data Exchange newsletter. Ben joins the show to talk about modern data engineering, and his opinion on the past and future of data infrastructure.

If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

]]>Data infrastructure has been transformed over the last fifteen years. The open source Hadoop project led to the creation of multiple companies based around commercializing the MapReduce algorithm and Hadoop distributed file system.Data infrastructure has been transformed over the last fifteen years. The open source Hadoop project led to the creation of multiple companies based around commercializing the MapReduce algorithm and Hadoop distributed file system. Cheap cloud storage popularized the usage of data lakes. Cheap cloud servers led to wide experimentation for data tools. Apache Spark emergedSoftware Engineering Daily1:08:368739Presto with Justin Borgmanhttp://softwareengineeringdaily.com/2020/02/07/presto-with-justin-borgman/?utm_source=rss&utm_medium=rss&utm_campaign=presto-with-justin-borgman
Fri, 07 Feb 2020 10:00:21 +0000http://softwareengineeringdaily.com/?p=8727A data platform contains all of the data that a company has accumulated over the years. Across a data platform, there is a multitude of data sources: databases, a data lake, data warehouses, a distributed queue like Kafka, and external data sources like Salesforce and Zendesk. A user of the data platform often has a

]]>A data platform contains all of the data that a company has accumulated over the years. Across a data platform, there is a multitude of data sources: databases, a data lake, data warehouses, a distributed queue like Kafka, and external data sources like Salesforce and Zendesk.

A user of the data platform often has a question that requires multiple data sources to answer. How does this user join two data sources from a data lake? How does this user join data across a transactional database and a data lake? How does the user join data from two different data warehouse technologies?

Presto is an open source tool originally developed at Facebook. Presto allows a user to query a data platform with a SQL statement. That query gets parsed and executed across the data platform to read from any heterogeneous data source. For some use cases, Presto is replacing the technology Hadoop MapReduce-based technology Hive. For other use cases, Presto is solving a problem in a completely novel way.

Justin Borgman joins the show to discuss the motivation for Presto, the problems it solves, and the architecture of Presto. He also talks about the company he started, Starburst Data, which sells and supports technologies built around Presto.

If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

]]>A data platform contains all of the data that a company has accumulated over the years. Across a data platform, there is a multitude of data sources: databases, a data lake, data warehouses, a distributed queue like Kafka,A data platform contains all of the data that a company has accumulated over the years. Across a data platform, there is a multitude of data sources: databases, a data lake, data warehouses, a distributed queue like Kafka, and external data sources like Salesforce and Zendesk. A user of the data platform often has aSoftware Engineering Dailyclean1:16:198727Nubank Data Engineering with Sujith Nairhttp://softwareengineeringdaily.com/2020/02/06/nubank-data-engineering-with-sujith-nair/?utm_source=rss&utm_medium=rss&utm_campaign=nubank-data-engineering-with-sujith-nair
Thu, 06 Feb 2020 10:00:03 +0000http://softwareengineeringdaily.com/?p=8723Nubank is a popular bank that is based in Brazil. Nubank has more than 20 million customers, and has accumulated a high volume of data over the six years since it was started. Mobile computing and cloud computing have given rise to “challenger banks” that operate more like software companies. When a software company reaches

A data platform is a collection of different technologies that move data into different storage formats and applications, so that different members of an organization can access that data. New data often enters an organization through an OLTP database, which supports user transactions. That data is copied into a data lake, which provides cheap bulk storage. From the data lake, the data is moved into a data warehouse system for fast access. Along the way, tools like Kafka, Spark, and S3 are used to implement the needs of the data platform.

Data platform architecture is not an exact science. Different companies build their data platform based on their own unique requirements. Previous shows have covered the data infrastructure companies like Lyft, Uber, and Facebook. Today’s show is another case study in data infrastructure, with a modern bank.

In a previous episode, we covered the engineering of Nubank. Sujith Nair from Nubank joins today’s show to talk about the data infrastructure of the company.

If you enjoy the show, you can find all of our past episodes about data infrastructure by going to SoftwareDaily.com and searching for the technologies or companies mentioned. And if there is a subject that you want to hear covered, feel free to leave a comment on the episode, or send us a tweet @software_daily.

]]>Nubank is a popular bank that is based in Brazil. Nubank has more than 20 million customers, and has accumulated a high volume of data over the six years since it was started. Mobile computing and cloud computing have given rise to “challenger banks” t...Nubank is a popular bank that is based in Brazil. Nubank has more than 20 million customers, and has accumulated a high volume of data over the six years since it was started. Mobile computing and cloud computing have given rise to “challenger banks” that operate more like software companies. When a software company reachesSoftware Engineering Daily1:04:568723Changelog Podcasting with Adam Stacoviak and Jerod Santohttp://softwareengineeringdaily.com/2020/02/05/changelog-podcasting-with-adam-stacoviak-and-jerod-santo/?utm_source=rss&utm_medium=rss&utm_campaign=changelog-podcasting-with-adam-stacoviak-and-jerod-santo
Wed, 05 Feb 2020 10:00:05 +0000http://softwareengineeringdaily.com/?p=8709The Changelog is a podcast about the world of open source. As open source has become closely tied with the entire software development lifecycle, The Changelog has expanded its coverage to the broader software industry. Since starting the podcast ten years ago, Adam Stacoviak and Jerod Santo have become full-time podcasters, and they have started

]]>The Changelog is a podcast about the world of open source. As open source has become closely tied with the entire software development lifecycle, The Changelog has expanded its coverage to the broader software industry.

Since starting the podcast ten years ago, Adam Stacoviak and Jerod Santo have become full-time podcasters, and they have started several other podcasts within the Changelog network, including Go Time, JS Party, and Practical AI. Throughout all of their shows, there is a consistent theme of technical, entertaining conversations about software.

In the last decade, so much has changed within open source: GitHub became the de facto social network for open source; Kubernetes created a widely used platform for distributed systems; React has given frontend developers a component system to consolidate around. Adam and Jerod return to the show to discuss their perspective on the past and future of open source, and their learnings from interviewing influential software professionals for 10 years.

]]>The Changelog is a podcast about the world of open source. As open source has become closely tied with the entire software development lifecycle, The Changelog has expanded its coverage to the broader software industry.The Changelog is a podcast about the world of open source. As open source has become closely tied with the entire software development lifecycle, The Changelog has expanded its coverage to the broader software industry. Since starting the podcast ten years ago, Adam Stacoviak and Jerod Santo have become full-time podcasters, and they have startedSoftware Engineering Dailyclean1:15:448709Rive: Animation Tooling with Guido and Luigi Rossohttp://softwareengineeringdaily.com/2020/02/04/rive-animation-tooling-with-guido-and-luigi-rosso/?utm_source=rss&utm_medium=rss&utm_campaign=rive-animation-tooling-with-guido-and-luigi-rosso
Tue, 04 Feb 2020 10:00:42 +0000http://softwareengineeringdaily.com/?p=8704Animations can be used to create games, app tutorials, and user interface components. Animations can be seen in messaging apps, where animated reactions can convey rich feelings over a text interface. Loading screens can become less boring through animation, and voice assistant products can feel more alive through animation. But we still don’t see much

]]>Animations can be used to create games, app tutorials, and user interface components. Animations can be seen in messaging apps, where animated reactions can convey rich feelings over a text interface. Loading screens can become less boring through animation, and voice assistant products can feel more alive through animation.

But we still don’t see much animation in our everyday applications. This is partly because animation tooling is difficult to use. To make an animation, the typical workflow is to go into a tool like After Effects, render your animation, and then export that animation in a movie format. This format is not dynamic enough to be easily used on the wide variety of development platforms.

The animation library Lottie did improve this tooling by creating a system for exporting animations to JSON and allowing them to easily scale up and down as vectors. But the animations still were simple and unidirectional. The developer did not have much freedom for how to move an animation in response to user input.

Rive is a system for creating dynamic, movable animated objects. Rive allows for the creation of animated elements that respond to user input. Rive has a tool that runs in the browser and allows the user to define the animation.

The animations in Rive use a bone system that allows animators and designers to define the points of the animated sprite that the developer can then manipulate with code. This improves the painful handoff process that exists between animators and developers, and gives the developer some programmatic control.

Guido and Luigi Rosso are the founders of Rive and they join the show to talk about the frictions of animation tooling, and what they have built to improve

]]>Animations can be used to create games, app tutorials, and user interface components. Animations can be seen in messaging apps, where animated reactions can convey rich feelings over a text interface. Loading screens can become less boring through anim...Animations can be used to create games, app tutorials, and user interface components. Animations can be seen in messaging apps, where animated reactions can convey rich feelings over a text interface. Loading screens can become less boring through animation, and voice assistant products can feel more alive through animation. But we still don’t see muchSoftware Engineering Dailyclean1:22:108704