ReadWrite - Typesafehttp://readwrite.com/tag/Typesafe
enCopyright 2015 Wearable World Inc.http://blogs.law.harvard.edu/tech/rssTue, 03 Mar 2015 12:21:16 -0800The Big-Data Tool Spark May Be Hotter Than Hadoop, But It Still Has Issues<!-- tml-version="2" --><p>Hadoop is hot. But its kissing cousin Spark is even hotter.</p><p>Indeed, Spark is hot like Apache Hadoop was half a decade ago. Spawned at UC Berkeley’s AMPLab, Spark is a fast data processing engine that works in the Hadoop ecosystem, replacing MapReduce. It is designed to perform both batch processing (similar to MapReduce) and new workloads like streaming, interactive queries, and iterative algorithms, like those commonly found in machine learning and graph processing.</p><p>San Francisco-based Typesafe, sponsors of a<a href="http://readwrite.com/2014/10/20/java-8-adoption-apache-spark-internet-of-things"> popular survey on Java developers I wrote about last year</a> and the commercial backers of Scala, Play Framework, and Akka, recently conducted a <a href="http://info.typesafe.com/COLL-20XX-Spark-Survey-Report_LP.html?lst=RW&amp;lsd=COLL-20XX-Spark-Survey-Trends-Adoption-Report">survey of developers about Spark</a>. More than 2,000 (2,136 to be exact) developers responded. Of the findings, three conclusions jump out:</p><ol><li><strong>Spark awareness and adoption are seeing hockey-stick-like growth.</strong> Google Trends <a href="http://www.google.com/trends/explore#q=apache%20spark&amp;cmpt=q&amp;tz=">confirms</a> this. The survey shows that 71% of respondents have at least evaluation or research experience with Spark, and 35% are now using it or plan to use it.</li><li><strong>Faster data processing and event streaming are the focus for enterprises.</strong> By far the most desirable features are Spark's vastly improved processing performance over MapReduce (over 78% mention this) and the ability to process event streams (over 66% mention this), which MapReduce cannot do.</li><li><strong>Perceived barriers to adoption are not major blockers.</strong> When asked what's holding them back from the Spark revolution, respondents mentioned their own lack of experience with Spark and the need for more detailed documentation, especially for more advanced application scenarios and performance tuning. They mentioned perceived immaturity, in general, and also integration with other middleware, like message queues and databases. Lack of commercial support, which is still spotty even by the Hadoop vendors, was also a concern. Finally, some respondents mentioned that their organizations aren't in need of big data solutions at this time.</li></ol><p>I spoke to Typesafe’s architect for Big Data Products and Services, Dean Wampler (<a href="https://twitter.com/deanwampler">@deanwampler</a>), on his thoughts about the rise of Spark. Wampler<a href="http:///h"> </a><a href="http://www.infoq.com/presentations/spark-scala-mapreduce-java">recently recorded a talk</a> on why he thinks Spark/Scala are rapidly replacing MapReduce/Java as the most popular Big Data compute engine in the enterprise.</p><h2>Striking The Spark</h2><div tml-image="ci01c56a771001efe2" tml-image-caption="Dean Wampler" tml-render-layout="right"><figure><img src="http://a2.files.readwrite.com/image/upload/c_fill,cs_srgb,dpr_1.0,q_80,w_620/MTI3NjI1MjE5NDg4NjYzNTYy.jpg" /><figcaption>Dean Wampler</figcaption></figure></div><p><strong>ReadWrite</strong>:&nbsp;<em>For those venturing into Spark, what are the most common hurdles?</em></p><p><strong>Wampler</strong>:&nbsp;It’s mostly around things like acquiring expertise, having good documentation with deep, non-trivial examples. Many people aren’t sure how to manage, monitor, and tune their jobs and clusters. Commercial support for Spark is still limited, especially for non-YARN deployments. However, even among the Hadoop vendors, support is still spotty.&nbsp;</p><p>Spark still needs to mature in many ways, especially the newer modules, such as Spark SQL and Spark Streaming. Older tools, like Hadoop and MapReduce, have had a longer runway and hence more time to be hardened and expertise to be documented. All these issues are being addressed and they should be resolved relatively soon.</p><p><strong>RW</strong>:&nbsp;<em>I hear people ask "where are you running Spark?" all the time, suggesting a pretty broad range of resource management strategies, e.g., standalone clusters, YARN, Mesos. Do you believe industry will tend to run Big Data clusters in isolation, or do you see the industry eventually moving to running Big Data clusters alongside other applications in production?&nbsp;</em></p><p><strong>DW</strong>: I think most organizations will still use fewer, larger clusters, just so their operations teams have fewer clusters to watch. Mesos and YARN really make this approach attractive. Conversely, Spark makes it easier to set up small, dedicated clusters for specific problems. Say you’re ingesting the Twitter firehose. You might want a dedicated cluster tuned optimally for that streaming challenge. Maybe it forwards “curated” data to another cluster, say a big one used for data warehousing.</p><h2>Keeping The Spark Alive</h2><p><strong>RW</strong>:&nbsp;<em>Is the operations side of Spark different than the operations side of MapReduce?</em></p><p><strong>DW</strong>:&nbsp;For batch jobs, it’s about the same. Streaming jobs, however, raise new challenges.&nbsp;</p><p>For a typical batch job, whether it’s written in Spark or MapReduce, you submit a job to run, it gets its resources from YARN or Mesos, and once it finishes, the resources are released. However, in Spark streaming, the jobs run continuously, so you might need more robust recovery if the job dies, so stream data isn’t lost.&nbsp;</p><p>Another problem is resource allocation. For a batch job, it’s probably okay to give it a set of resources and have those resources locked up for the job’s life cycle. (Note, however, some dynamic management is already done by YARN and Mesos.) Long-running jobs really need more dynamic resource management, so you don’t have idle resources during relatively quiescent periods, or overwhelmed resources during peak times.&nbsp;</p><p>Hence, you really want the ability grow and shrink resource allocations, where scaling up and down is automated. This is not a trivial problem to solve and you can’t rely on human intervention either.</p><p><strong>RW</strong>: <em>Let’s talk about the Scala / Spark connection. Does Spark require knowledge of Scala? Are most people using Spark also well versed in Scala? And is it more the case that Scala users are those who tend to favor Spark, or is Spark creating a “pull” effect into Scala?</em></p><p><strong>DW</strong>: Spark is written in Scala and it is pulling people towards Scala. Typically they’re coming from a Big Data ecosystem already, and they are used to working with Java, if they are developers, or languages like Python and R, if they are data scientists.&nbsp;</p><p>Fortunately for everyone, Spark supports several languages - Scala, Java, Python, and R is coming. So people don’t necessarily have to switch to Scala.&nbsp;</p><p>There has been a lag in the API coverage for the other languages, but the Spark team has almost closed the gap. The rule of thumb is that you’ll get the best runtime performance if you use Scala or Java, and you’ll get the most concise code if you use Scala or Python. So, Spark is actually drawing people to Scala, but it doesn’t require that you have to be a Scala expert.&nbsp;</p><p>I like the fact that Spark uses the more mainstream features of Scala. It doesn’t require mastery of more advanced constructs.</p><p><em>Photo courtesy of <a href="http://www.shutterstock.com">Shutterstock</a></em></p>It's the cool kid these days, but it's flunking some subjects.http://readwrite.com/2015/01/27/spark-scala-hadoop-typesafe-dean-wampler
http://readwrite.com/2015/01/27/spark-scala-hadoop-typesafe-dean-wamplerWorkTue, 27 Jan 2015 07:00:00 -0800Matt AsayHow Retailers Are Going "Reactive" To Keep Their Online Operations Humming<!-- tml-version="2" --><p>Every year, Black Friday is used as a barometer for the overall health of retail and e-commerce. One of the interesting insights that came out of this year’s Black Friday came courtesy of Web performance company <a href="http://www.catchpoint.com/">Catchpoint System’s report</a> that e-commerce websites had sharply degraded in performance between 2013 and 2014, with desktop-centric e-commerce pages stumbling to a 20% performance hit while mobile pages slowed by 57%.&nbsp;</p><p>Not exactly the best way to set the cash register on fire. Not in a good way, anyway.</p><p>The problem, it turns out, is scale. Or, rather, online retailers' ability to scale.&nbsp;To get some insight into modern scale challenges for retail as we close out the holiday season, I spoke with Kathryn Murphy, senior vice president of Apps and Platform at Tomax, a Salt Lake City retail software company that specializes in brick and mortar retailers (i.e., L.L. Bean, Swarovski and Sportsman’s Warehouse), but also provides a suite of cloud-based retail solutions that support more than 25,000 customer store locations.</p><blockquote><p><strong>See also: <a href="http://readwrite.com/2014/09/19/reactive-programming-jonas-boner-typesafe">As Systems Get More Complex, Programming Is Getting "Reactive"</a></strong></p></blockquote><p>Tomax is on the front lines with retailers, and has seen an increasingly "<a href="http://readwrite.com/2014/09/19/reactive-programming-jonas-boner-typesafe">r</a>" approach to programming pay dividends as they attempt to scale.</p><p><strong>ReadWrite</strong>:&nbsp;<em>What are the new pressures that you see in retail? What could explain the struggle of retailers to keep up with the demands of scale?</em></p><p><strong>Murphy</strong>: The biggest pressure in retail is the shift in power to the consumer. Several years ago, consumers got pretty comfortable shopping online and on their mobile devices with online retail giants like Amazon. This quickly became the new expectation from the consumer.&nbsp;</p><p>Consumers don't care how hard it may be to deliver that experience. They expect a big box retailer like Best Buy to have connected, real-time systems. The consumer expects to be able to order online, see the inventory in a local store, and pick it up later today. They expect the associates to have mobile devices and access to all the same information. All of these new devices, new endpoints, new interactions from the consumer and the associates have made the retail systems world unpredictable. </p><p><strong>RW</strong>: <em>How much of the year do retail IT pros spend worrying about and preparing for the holiday surges?</em></p><p><strong>KM</strong>: For the highly seasonal retailers, it’s everything. As evidence, they all implement “lock down” periods in the fourth quarter. Everything must be done and ready for the holiday season by September.&nbsp;</p><p>Additionally, they all purchase infrastructure based on “peak season”—this is a phenomenon unto itself. So many retailers are dealing with orders of magnitude differences in system load during the holiday. Honestly, it’s a real opportunity for elastic systems that can scale when the retailer needs it - and pay only for use—instead of making enormous upfront capital expenditures and paying for peak load infrastructure all year just to handle the holiday season traffic.</p><p><strong>RW</strong>:<em> How has the proliferation of devices</em>—<em>in the hands of customers as well as employees</em>—<em>affected the retail landscape?</em></p><p><strong>KM</strong>: It has two major effects. The employee devices have created hundreds of more endpoints in a retail environment. No longer is it a fixed number of registers and a few scanning devices in the backroom. Everyone has a device and they are doing all sorts of things from helping customers, ringing up transactions, and performing store inventory.&nbsp;</p><p>And these devices are not hard-wired into the environment, they are constantly losing connectivity, being replaced, etc. This has the effect of creating unpredictability and instability.&nbsp;</p><p>Then add the consumer and their new empowerment to perform transactions themselves, now we have that many more devices, and many different types of devices in the mix. And since it is the consumer’s device, they are free to interact with the retailer system at any time. This element alone puts pressures on typical retailer IT practices like “nightly system maintenance” windows.</p><p><strong>RW</strong>: <em>Why is resilience so important in retail?</em></p><p><strong>KM</strong>: All of this technology is highly dependent on networks and system availability. Both of which fail.&nbsp;</p><p>And yet, a single failure for either a consumer or associate can be so impactful that it can result in a loss of a customer for the retailer. It’s critical to have “smart apps” that can gracefully recover and transition when failures occur without the user noticing or having to do anything special.</p><p><strong>RW</strong>: <em>In a <a href="https://www.youtube.com/watch?v=oXywS0gPN84">recent video</a> Tomax talked a lot about concurrency being a major challenge for retailers. What does concurrency mean in your retail stores? How does <a href="http://readwrite.com/2014/09/19/reactive-programming-jonas-boner-typesafe">reactive programming</a> help?</em></p><p><strong>KM</strong>: In retail, we see some unique challenges with concurrency with the proliferation of devices. For example, when a big shipment comes in (and in the holiday season, these can be really big!), it seems obvious that you would want as many employees as possible scanning the merchandise in and preparing it for the retail floor.&nbsp;</p><p>Concurrency is immediately a problem since more than one employee is scanning merchandise, and sometimes even the same item, at the same time. The system needs to inform <strong>both</strong>&nbsp;employees, in real time, about the other person’s updates.&nbsp;</p><p>Traditional systems want to “lock” a record while one person works on it and make the other person wait. Reactive systems take a different approach and leverage work queues and threads to keep communication flowing freely back and forth.&nbsp;</p><p>We've had great success using&nbsp;<a href="http://typesafe.com/">Typesafe</a>'s&nbsp;reactive stack, as it frees retailers from having to train their users to use their systems in a certain way. Instead, this reactive approach makes it easier to build intuitive applications that respond to the technical environment around them, rather than forcing users into the system's rigid way of doing things.</p><p><em>Lead image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a></em></p>New challenges arise as power shifts to mobile-shopping consumers.http://readwrite.com/2014/12/24/online-retail-ecommerce-scale-reactive-tomax
http://readwrite.com/2014/12/24/online-retail-ecommerce-scale-reactive-tomaxWebWed, 24 Dec 2014 07:04:13 -0800Matt AsayAs Systems Get More Complex, Programming Is Getting "Reactive"<!-- tml-version="2" --><p>Hardware keeps getting smaller, more powerful and more distributed. To keep up with growing system complexity, there's a growing software revolution—called “reactive” development—that defines how to architect applications that are going to participate in this new world of multicore, cloud, mobile and Web-scale systems. </p><div tml-image="ci01b7298abdb7860e" tml-image-caption="Jonas Bonér" tml-render-size="small" tml-render-position="right"><figure><img src="http://a3.files.readwrite.com/image/upload/c_fill,cs_srgb,dpr_1.0,q_80,w_620/MTE5NDg0MDYzNzg1NDUzMDcx.jpg" /><figcaption>Jonas Bonér</figcaption></figure></div><p>One of the leaders of the reactive-software movement is distributed computing expert and <a href="http://typesafe.com/">Typesafe</a>&nbsp;co-founder and CTO Jonas Bonér, who published the original Reactive Manifesto in September 2013.&nbsp;</p><p>Similar to the early days of the "agile" software development movement, reactive programming got early traction with a hardcore fan base (mostly functional programming, distributed computing and performance experts) but is starting to creep into more mainstream development conversations as high-profile organizations like <a href="http://www.infoq.com/interviews/jafar-husain-netflix-reactive-programming-rx">Netflix</a> adopt and evangelize the reactive model. </p><blockquote><p><strong>See also: <a href="http://readwrite.com/2014/09/17/netflix-chaos-engineering-for-everyone">Netflix's Chaos Engineering Should Be Mandatory—Everywhere</a></strong></p></blockquote><p>I caught up with Bonér to ask him about reactive's traction on the eve of publishing version 2.0 of the <a href="http://www.reactivemanifesto.org/">Reactive Manifesto</a>. Beware: This stuff gets deep very quickly.</p><h2>A Reactive Solution To Broken Development</h2><p><em><strong>ReadWrite</strong>: </em><em>So what’s </em>not<em>&nbsp;reactive about software today, and what needs to change? </em></p><p><strong>Jonas Bonér</strong>: Basically what’s “broken” ties back to software having synchronous call request chains and poor isolation, yielding single points of failure and too much contention. The problem exists in different parts of the application infrastructure.</p><p>At the database layer, most SQL/RDBMS databases still rely on a thread pool or connection pool accessing the database through blocking APIs. So if you exhaust the thread pool by blocking all available threads then everything stops. This problem goes all the way down to the native drivers that the vendors provide, and the JDBC standard specification (for accessing relational databases)—which doesn’t support non-blocking/asynchronous access.&nbsp;</p><p>It will take years before it’s supported. </p><p>In the service layer, we usually see a tangled mix of highly contended, shared mutable state managed by strongly coupled deep request chains. This makes this layer immensely hard to scale and to make resilient. The problem is usually “addressed” by adding more tools and infrastructure; clustering products, data grids, etc. But unfortunately this won’t help much at all unless we address the fundamental underlying problem.&nbsp;</p><p>This is where reactive can help; good solid principles and practices can make all the difference—in particular relying on share nothing designs and asynchronous message passing. </p><blockquote><p><strong>See also: <a href="http://readwrite.com/2014/07/10/akka-jonas-boner-concurrency-distributed-computing-internet-of-things">How One Developer Set Out To Make The Internet Of Things Manageable</a></strong></p></blockquote><p>In the Web layer, we often see request chains executed in a completely serial fashion, meaning that the response time is the sum of the time it takes to do everything, which can sometimes be hundreds of different service calls. This means that keeping latency under control—bounded, within the SLAs and predictable—while allowing for scale, is both technically very challenging and requires a lot more hardware thanks to inefficient usage of resources.&nbsp;</p><p>In a reactive application you would split up the work in many small composable chunks and run them in parallel—which will bound the latency to a max of the longest performing chunk and make very efficient use of the resources available. </p><p>There are many other ways that today’s software is not reactive, but those are a few of the big ones.</p><h2>Defining Reactive</h2><p><em><strong>ReadWrite</strong>: </em><em>What’s the goal of the reactive movement? What are you trying to accomplish?</em></p><p><strong>JB</strong>: A lot of companies have been doing reactive without calling it "reactive" for quite some time, in the same way companies did agile software development before it was called "agile." But giving an idea a name and defining a vocabulary around it makes it easier to talk about and communicate with people. It makes it easier to explain and bring to market a set of principles that are known to work well together. </p><p>Not everyone’s view of agile fits into the agile definition, but what has become agile—and the reason for all these experts to write up the <a href="http://agilemanifesto.org/">Agile Manifesto</a>—is that they knew which principles worked well together and completed each other in a cohesive story.&nbsp;</p><p>This is what reactive is all about. We found these core principles to work well together in a cohesive story. People have used these approaches years before, but this grouping and this reactive story has meaning, in the same sense of agile. And it provides a baseline for solving problems against the wish-list of application behavior that everyone wants. </p><h2>The Future Of Reactive Programming</h2><p><em><strong>ReadWrite</strong>: </em><em>What are the next steps for the reactive movement? </em></p><p><strong>JB</strong>: The reactive principles trace all the way back to the 1970s&nbsp;(e.g., <a href="https://en.wikipedia.org/wiki/Tandem_Computers">Tandem Computers</a>) and 1980s (e.g., <a href="https://en.wikipedia.org/wiki/Erlang_(programming_language)">Erlang</a>), but scale challenges are for everybody today. You don’t have to be Facebook or Google anymore to have these types of problems. There’s more data being produced by individual users, who consume more data, and expect so much more, faster. There’s more data to shuffle around at the service layer; replication that needs to be done instantaneously, and the need to go to multiple nodes almost instantaneously.&nbsp;</p><p>And the opportunities have changed, where virtualization and containerization make it easy to spin up nodes and cost almost nothing—but where it’s much harder for the software to keep up with those nodes in an efficient way.</p><p>So in many ways the next steps for reactive are to just keep refining its view of that problem set—and the application characteristics that developers should aspire to—to conquer them. Martin Thompson, Roland Kuhn, Dave Farley and I have in fact just rewritten the <a href="http://www.reactivemanifesto.org">Reactive Manifesto</a>, taking a lot of great feedback we have been getting from the community into account, distilling it into a much shorter and simpler document. </p><p>But the big next step for Reactive is expanding beyond principles, to also bring in more specific tools, techniques, patterns and best practices, thereby making it approachable for the masses.&nbsp;</p><p>We are planning to write an appendix to the Reactive Manifesto in which we can dive in and provide more hands on practical advice on how to design and implement reactive systems. We’re also starting to see more vendors provide solutions and tools that support building reactive systems, more technical books being published about reactive, and more presentations (and and even full tracks) at events tied to reactive—so this is already happening.&nbsp;</p><p>One good example of this is the <a href="http://reactconf.com">React</a> conference that I’m helping to organize, which will be a great place to learn how to build reactive systems from some of our top thought leaders in the industry and discuss its principles and practices.</p><p><em><strong>ReadWrite</strong>: </em><em>What are the green field areas that you think will drive the requirements for reactive in the future?</em></p><p><strong>JB</strong>: There are several:</p><ol><li>One interesting area that has a lot of debate is around “microservices”—which basically is a conversation around what the smallest ideal isolation of a single “service” and its behavior looks like.&nbsp;</li><li>Another is the emerging need is to stream large volumes of potentially infinite data streams in real-time, while keeping latency predictable and without overloading the server.&nbsp;</li><li>This ties in to the rising need of reactive Big Data solutions (sometimes called “fast data”)—providing (close to) real-time analytics and data processing.&nbsp;</li><li>Internet of Things is another huge driver for new approaches to application infrastructure, where machines and devices are generating new challenges for managing and replicating bursts of data throughout distributed environments, and where individual nodes have new requirements for starting/stopping/dealing with failure based on events. &nbsp;&nbsp;</li></ol><p><em>Lead photo by <a href="https://www.flickr.com/photos/dominik99/384027019">nerovivo</a></em></p>A new way to develop for the cloud.http://readwrite.com/2014/09/19/reactive-programming-jonas-boner-typesafe
http://readwrite.com/2014/09/19/reactive-programming-jonas-boner-typesafeHackFri, 19 Sep 2014 05:00:00 -0700Matt AsayHow One Developer Set Out To Make The Internet Of Things Manageable<!-- tml-version="2" --><p>Six years ago, Swedish programmer <a href="https://twitter.com/jboner">Jonas Bonér</a>&nbsp;set about trying to crack some of the most challenging problems in distributed computing. These included scalability, so that a system as large as the Internet of Things won't fail no matter how large it gets; elasticity, a way of making sure that its computing problems are matched with the right hardware and software at the right time; and fault-tolerance. And he wanted to make sure his system would work in a "concurrent" world in which zillions of calculations are happening at once—and often interacting with one another.</p><div tml-image="ci01a87e1ee248860f" tml-image-caption="" tml-render-position="right" tml-render-size="medium"><figure><img src="http://a5.files.readwrite.com/image/upload/c_fill,cs_srgb,w_620/MTIxNDI3Mjk0MjA1MjE2MjY5.png" /><figcaption></figcaption></figure></div><p>He may or may not have been listening to ABBA while doing so.</p><p>Bonér had built compilers, runtimes and open-source frameworks for distributed applications at vendors like BEA and Terracotta. He’d experienced the scale and resilience limitations of existing technologies—CORBA, RPC, XA, EJBs, SOA, and the various Web Services standards and abstraction techniques that Java developers have used to deal with these problems over the last 20 years.</p><p>He’d lost faith in those ways of doing things.</p><p>This time he looked outside of Java and classical enterprise computing for answers. He spent some time with concurrency-oriented programming languages like&nbsp;<a href="https://mozart.github.io/">Oz</a> and <a href="http://www.erlang.org/">Erlang</a>. Bonér liked how Erlang managed failure for services that simply could not go down—i.e., things like telecom switches for emergency calls—and how principles from Erlang and Oz might also be helpful in solving concurrency and distributed computing problems for mainstream enterprises.</p><p>In particular he saw a software concept called the&nbsp;<a href="http://en.wikipedia.org/wiki/Actor_model">actor model</a>—which emphasizes loose coupling and embracing failure in software systems and dataflow concurrency—as a bridge to the future.</p><p tml-render-size="medium" tml-render-position="right"><strong>See also: <a href="http://readwrite.com/2013/06/14/whats-holding-up-the-internet-of-things">What's Holding Up The Internet Of Things</a></strong></p><p>After about three to four months of intense thinking and hacking, Bonér shared his vision for the <a href="http://permalink.gmane.org/gmane.comp.lang.scala/16486">Akka Actor Kernel</a> (now simply “Akka”) on the Scala mailing list, and about a month later shared the first public release of <a href="https://github.com/akka/akka/tree/v0.5">Akka 0.5</a> on GitHub.</p><p>Today <a href="http://akka.io">Akka</a>—celebrating the five year anniversary for its first public release on July 12—is the open source middleware that major financial institutions use to handle billions of transactions, and that massively trafficked sites like Walmart and <a href="http://readwrite.com/2014/05/08/gilt-eric-bowman-interview-scala-rails-jvm-reactive-platform">Gilt use to scale their services for peak usage</a>.</p><p>I recently caught up with Bonér—now CTO and co-founder of <a href="http://typesafe.com">Typesafe</a>—to get his take on where Akka has seen traction, how it has evolved through the years and why its community views it as the open-source platform best poised to handle the back-end challenges of the Internet of Things, which is introducing a new order of complexity for distributed applications.</p><h2>How To Manage Failure When Everything Happens At Once</h2><div tml-image="ci01a87e1ee244860f" tml-image-caption="Jonas Bonér" tml-render-size="small" tml-render-position="right"><figure><img src="http://a5.files.readwrite.com/image/upload/c_fill,cs_srgb,dpr_1.0,q_80,w_620/MTE5NDg0MDYxMjM1NDQ3MzEx.jpg" /><figcaption>Jonas Bonér</figcaption></figure></div><p><strong>ReadWrite: </strong><em> What is the problem set that initially leads developers to Akka?</em></p><p><strong>Jonas Bonér:</strong> Akka abstracts concurrency, elasticity/scale-on-demand and resilience into a single unified programming model, by embracing share-nothing design and asynchronous message passing. This gives developers one thing to learn, use and maintain regardless of deployment model and runtime topology.</p><p>The typical problem set is people want the ability to scale applications both up and out; i.e., utilize both multicore and cloud computing architectures. The way I see it, these scenarios are essentially the same thing: it is all scale-out. Either you scale-out on multicore, where you have multiple isolated CPUs communicating over a QPI link, or you scale-out on multiple nodes, where you have multiple isolated nodes communicating over the network.&nbsp;</p><p>Understanding and accepting this fact by embracing share-nothing and message-driven architectures makes things so much simpler.</p><p>The other main reason people turn towards Akka is that managing failure in an application is really hard. Unfortunately, to a large extent, failure management is something that historically has been either ignored or handled incorrectly.&nbsp;</p><h2>Failing At Failure Management</h2><p>The first problem is that the strong coupling (between components) of long, synchronous request chains raises the risk of cascading failures throughout the application. The second major problem is that the traditional way to represent failure in the programming model is through exceptions thrown in the user’s thread, which leads to defensive programming with the error handling (using try-catch) tangled with the business logic and scattered across the whole application.&nbsp;</p><p>Asynchronous message passing decouples components by adding an asynchronous communication boundary—allowing fine-grained and isolated error handling and recovery through compartmentalization. It also allows you to reify errors as messages to be sent through a dedicated error channel for management outside of the user call chain and not just throw it in the caller’s face. </p><p>The broad scenarios where Akka gets a lot of traction are those where there are a lot of users and unexpected peaks in visitors, environments where there are a lot of concurrently connected devices and use cases where there is just a ton of raw data or analytics that need to be crunched. Those are all domains where managing scale and failure are of critical importance, and those are where Akka quickly got a lot of traction.</p><h2>In The Actor's Studio</h2><p><strong>RW: </strong><em> What is an “actor,” and why is the actor model that’s been around for more than 40 years seeing a renaissance?</em></p><p><strong>JB:</strong> Actors are very lightweight components—you can easily run millions of live actors on commodity hardware—that help developers focus on communications and functions between services. An actor encapsulates state and behaviour and communicates through its own dedicated message queue, called its “mail box.” All communication between actors is message-driven, asynchronous and fire-forget.&nbsp;</p><p>Actors decouple the reference to the actor from the runtime actor instance by adding a level of indirection—the so-called ActorRef—through which all communication needs to take place. This enables the loose coupling that forms the basis for both location transparency—enabling true elasticity through an explicit model for distributed computing—and the failure model that I mentioned. </p><p>The actor model provides a higher level of abstraction for writing concurrent and distributed systems—it frees the developer from having to deal with explicit locking and thread management, and makes it easier to write correct concurrent and parallel systems. Working with actors also gives you a very dynamic and flexible programming model that allows you to upgrade actors independently of each other and shifting the them around nodes without changing the code—all driven through configuration or adaptively by the runtime behavior of the system.</p><p>Like you say, actors are really nothing new. They were defined in a <a href="http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.77.7898">1973 paper</a> by Carl Hewitt, where discussed for inclusion in the <a href="http://classes.soe.ucsc.edu/cmps112/Spring03/readings/Ingalls78.html">original version</a> of Smalltalk in 1976 and have been popularized by the Erlang programming language which emerged in the early 1980s. They have been used by Ericsson, for example, with great success to build highly concurrent and extremely reliable (99.9999999% availability—equal to 31 millisecond downtime per year) telecom systems.</p><p>The main reason that the actor model is growing in popularity is because it is a great way to implement <a href="http://www.reactivemanifesto.org/">reactive applications</a>, making it easier to write systems [that] are highly concurrent, scalable, elastic, resilient and responsive. It was, like a lot of great technology, ahead of its time, but now the world has caught up and it can start delivering on its promises.</p><h2>Scaling The Internet Of Things</h2><p><strong>RW: </strong><em> There is a lot of interest about Akka in the context of the Internet of Things (IoT). What’s your view of the scale challenges that are unique to IoT?</em></p><p><strong>JB:</strong> The Internet of Things—with the explosion of sensors—adds a lot of challenges in how to deal with all of these simultaneously connected devices producing lots of data to be retrieved, aggregated, analyzed and pushed back out the the devices while maintaining responsiveness. Challenges include managing huge bursts in traffic in receiving sensor data at peak times, processing of these large amounts of data in both batch processes and real-time, and doing massive simulations simulating real-world usage patterns. Some IoT deployments also require the back end services to manage the devices, not just absorb the data sent from the devices.</p><p>The back-end systems managing all this needs to be able to scale on demand and be fully resilient. This is a perfect fit for reactive architectures in general and Akka in particular.</p><p>When you are building services to be used by potentially millions of connected devices, you need a model for coping with information flow. You need abstractions for what happens when devices fail, when information is lost and when services fail. Actors have delivery guarantees and isolation properties that are perfect for the IoT world, making it a great tool for simulating millions of concurrently connected sensors producing real-time data. </p><p><strong>RW: </strong><em> Typesafe recently collaborated with a number of other vendors on the </em><a href="http://readwrite.com/2014/04/17/real-time-data-streaming-viktor-klang-typesafe-reactivestreams-jvm#feed=/search?keyword=typesafe&amp;awesm=~oHYphD597E7q9F"><em>reactive streams specification</em></a><em>, as well as introducing its own&nbsp;</em><a href="http://typesafe.com/blog/typesafe-announces-akka-streams"><em>Akka Streams</em></a><em>. What do the challenges look like for data streaming in an IoT world?</em></p><p><strong>JB:</strong> If you have millions of sensors generating data, and you can't deal with the rate that this data arrives—that’s one early problem set that we’re seeing for the back-end of IoT—you need a means to back-pressure devices and sensors that may not be ready or have the capacity to accept more data. If you look at the end-to-end IoT system—with millions of devices, the need to store data, cleanse it, process it, run analytics, without any service interruption—the requirement for asynchronous, non-blocking, fully back-pressured streams is critical.&nbsp;</p><p>We see Akka Streams playing a really important role in keeping up with inbound rates and managing overflow, so that there are proper data bulkheads in IoT systems.</p><p><em>Lead image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a>; image of Bonér courtesy of Jonas Bonér</em></p>Jonas Bonér came up with a way to handle heavy, bursty information flows from billions of connected devices.http://readwrite.com/2014/07/10/akka-jonas-boner-concurrency-distributed-computing-internet-of-things
http://readwrite.com/2014/07/10/akka-jonas-boner-concurrency-distributed-computing-internet-of-thingsHackThu, 10 Jul 2014 07:04:00 -0700Matt AsayHow Gilt's Insane Traffic Spikes Pushed It Off Rails To Scala<!-- tml-version="2" --><p>Imagine building a gorgeous e-commerce web site that hums along with a nice steady flow of traffic—and then, once a day for about 15 minutes, its traffic spikes by 100 times. That’s a challenge few developers face, but it describes what Eric Bowman (<a href="https://twitter.com/ebowman">@ebowman</a>) vice president of architecture, had to figure out at flash-sales site <a href="http://www.gilt.com/">Gilt.com</a>.</p><p></p><div tml-image="ci01b27ff3d0018266" tml-render-position="right" tml-render-size="medium"><figure><img src="http://a2.files.readwrite.com/image/upload/c_fill,cs_srgb,dpr_1.0,q_80,w_620/MTIyMzAwODIzOTYyMTU3Njcw.jpg" /></figure></div><p>I thought it would be interesting to lift the hood on Gilt.com to uncover the critical technology and architecture decisions Bowman's had to make to enable that kind of scale and elasticity in the site. Bowman tapped into recent developments in Reactive programming, something <a href="http://readwrite.com/2014/04/17/real-time-data-streaming-viktor-klang-typesafe-reactivestreams-jvm#feed=/tag/reactivestreams&amp;awesm=~oDCNBasV7eNSAp">ReadWrite reported on recently</a>, as well as relying on open source, to build a beautiful site with a great user experience that could scale to meet the demands of tens of thousands of customers hitting the site all at once.</p><p>In the process he moved Gilt.com away from Ruby on Rails. But I needed to know: Was that a one-off, or part of a trend?</p><h2>Gilt Dumps Java For Scala</h2><p><strong>ReadWrite</strong>:&nbsp;<em>How did you arrive at Gilt?</em></p><p><strong>Eric Bowman</strong>:&nbsp;I’ve spent my entire career in software development in technology. My first “real” job involved leading technical development for The Sims 1.0 franchise. In more recent years I’ve been involved in the architecture and systems implementation side of software development.</p><p>Before joining Gilt I was Principal Architect at TomTom, a location and navigation solutions provider based in Amsterdam. In August 2011 I became Principal Architect at Gilt, then a few months later was named vice president of Architecture. As you noted, I work from Gilt’s Dublin office alongside an incredibly talented team of engineers who focus on some of the company’s most ambitious and exciting initiatives: continuous delivery, infrastructure, operations and search, just to name a few. We’re continuously working to evolve Gilt’s architecture to handle the next order of scaling magnitude and create shopping experiences that keep our millions of members coming back every day.</p><p><strong>RW</strong>:&nbsp;<em>When did Gilt transition to Scala?</em></p><p><strong>EB</strong>:&nbsp;Right around the time I arrived. When I joined Gilt we were programming in Java, after originally creating Gilt on Rails.&nbsp;</p><p><strong>RW</strong>:&nbsp;<em>Why did Gilt make the transition from Java to Scala?</em></p><p><strong>EB</strong>:&nbsp;From 2009 to 2011 Gilt grew very quickly in terms of membership, and because of this growth we had to scale our tech organization accordingly. With Java, it was becoming harder and harder for teams to contribute code--our code base was becoming monolithic. For performance and scalability reasons we wanted to continue working on the JVM, and started looking at different JVM languages.</p><p>In 2011 some of our engineers became really excited about <a href="https://typesafe.com/platform/tools/scala">Scala</a>, a programming language developed by Typesafe. Scala required much less code-writing than Java, and it was easy to integrate with other JVM services. We adopted it, and we continue to reap the benefits of the language’s elegance and simplicity.&nbsp;</p><p>As part of this shift, we've also started to use other parts of Typesafe's Reactive platform like the Play Framework, Akka and sbt.&nbsp;</p><h2>Reacting To Traffic Spikes</h2><p><strong>RW</strong>:&nbsp;<em>From a technical perspective, what makes Gilt unique? Why does Scala matter?</em></p><p><strong>EB</strong>:&nbsp;One of our key uniqueness factors is our flash-sales model, which produces some very exciting technical challenges. Every day we experience intense traffic spikes at noon US Eastern Time as the day’s batch of new sales go live. Our members rush to view all the different products, and on most days, our traffic increases by 100 times in just a few seconds. So we have to build systems that can support these sudden bursts of activity.&nbsp;</p><p>Our solution has been to develop a distributed architecture based upon hundreds of microservices built in Scala.&nbsp;</p><p><strong>RW</strong>:&nbsp;<em>Why microservices?</em></p><p><strong>EB</strong>:&nbsp;Microservices make sense for Gilt in a number of ways. Gilt started out as a monolithic Rails application, but as we grew quickly we soon found out that this model wasn’t well suited for handling our traffic spikes very effectively. A monolithic architecture makes it challenging to identify who owns what code, and introduces complex dependencies. It also tends to lengthen test cycles and can have unexpected performance impacts.</p><p>With micro-services, we can maintain isolation between unrelated services and that keeps our development process as friction-free as possible and reduces complexity. It also enables us to establish team ownership of end-to-end quality, which not only makes us all more accountable but also contributes to developer happiness--our engineers know they can have an impact and see the results of their work more readily.</p><h2>Lessons Learned About Ruby On Rails...And Open Source</h2><p><strong>RW</strong>:&nbsp;<em>What are some lessons for others that you’ve learned while scaling Gilt?</em></p><p><strong>EB</strong>:&nbsp;Keep your stack and your architecture as simple as possible, so you can adapt quickly. This doesn’t mean you should cut corners, however. Not paying enough attention to architecture will cause you problems down the line as you grow. We wouldn’t consider starting out with Rails to be a mistake on our part, because it was great for enabling us to move nimbly. But if we had to do it all over again, we’d likely start out with Play Framework, which is much easier to scale.</p><p>Another bit of advice is to take advantage of all the great open-source software available--not only for cost-related reasons, but for quality-related reasons as well. For example, we chose PostgreSQL as our launch database and it has remained the core of our relational strategy. It’s just really well-maintained, with an active community constantly making bug fixes and adding new valuable features.</p><p><strong>RW</strong>:&nbsp;<em>How important is open source to Gilt?</em></p><p><strong>EB</strong>:&nbsp;It’s very important to us. We’ve created Gilt almost entirely by using open-source software--from PostgreSQL to sbt (the build tool for Scala, as well as Java) to many of the Apache projects: ZooKeeper, Kafka, Avro and ant, just to name a few. In our earlier days, when we had a single PostgreSQL database and needed replication capabilities, we joined other companies in sponsoring Hot Standby, a key feature (first introduced in Postgres 9.0) that enables true replication by making it possible to read from multiple slave servers.&nbsp;</p><p>We encourage our engineers to participate in open-source projects and create as many of their own projects as they like. Our primary requirements are that their code is of high quality (documented, with automated tests, and useful) and not core to Gilt’s business. Our tech evangelist promotes all of these open-source projects on our tech blog, through social media, etc., whether the project is Gilt-related or not. We also maintain an active stable of projects on our GitHub repo--which I encourage everyone to check out--and contribute pull requests to the projects we use the most.&nbsp;</p>Gilt Group's daily traffic spikes were a problem, but it found a solution: Scala.http://readwrite.com/2014/05/08/gilt-eric-bowman-interview-scala-rails-jvm-reactive-platform
http://readwrite.com/2014/05/08/gilt-eric-bowman-interview-scala-rails-jvm-reactive-platformHackThu, 08 May 2014 06:16:25 -0700Matt AsayReal-Time Data Streaming Gets Standardized<!-- tml-version="2" --><p>One of the advantages of open source is that it can accelerate standards adoption on a level playing field. If there is a big enough problem to solve, smart people can attract the best minds to work together, investigate and share the solution.</p><p>That said, standards bodies often become little more than a parlor game for incumbent vendors seeking to position the standard to their market advantage.</p><p></p><div tml-image="ci01b28119a0016d19" tml-render-position="right" tml-render-size="medium"><figure><img src="http://a2.files.readwrite.com/image/upload/c_fill,cs_srgb,dpr_1.0,q_80,w_620/MTIyMzAyMDg1ODc3MjMwODcz.jpg" /></figure></div><p>In other words, there's lots of talk, but not much code.</p><p>In such a scenario, it's easy to end up with implementations of a standard that each works differently due to unclear or ambiguous specifications.&nbsp;I recently sat down with <a href="https://twitter.com/viktorklang">Viktor Klang</a>, Chief Architect at <a href="https://typesafe.com/">Typesafe</a>, one of the lead organizers of <a href="http://www.reactive-streams.org/">reactivestreams.org</a>, an open source attempt to standardize asynchronous stream-based processing on the Java Virtual Machine (JVM).&nbsp;</p><p>Klang and his group—along with developers from Twitter, Oracle, Pivotal, Red Hat, Applied Duality, Typesafe, Netflix, the spray.io team and Doug Lea—saw the future of computing was increasingly about stream-based processing for real-time, data-intensive applications, like those that stream video, handle transactions for millions of concurrent users, and a range of other scenarios with large-scale usage and low latency requirements.</p><p>The problem? Lack of backpressure for streaming data means if there's a step that's producing faster than the next step can consume, eventually the entire system will crash.</p><p><strong>ReadWrite</strong>: <em>What is driving this shift in computing to reactive streams today?</em></p><p><strong>Viktor Klang</strong>: It’s not a new thing. Rather, it's more like it was becoming a critical mass as more people started using Hadoop and other batch-based frameworks. They needed real-time online streaming. Once you need that, then you don’t know up front how big your input is because it’s continuous. With batch, you know up front how big your batch is.</p><p>Once you have potentially infinite streams of data flowing through your systems, then you need a means to control the rate at which you consume that data. You need to have this back pressure in your system to make sure the producer of data doesn’t overwhelm the consumer of data. It’s a problem that becomes visible once you start going to real-time streaming from batch-based.</p><p>Users have been asking for more “reactive” streams for a long time, for building their own network protocols or for their specific application needs. Any time you need to talk to a network device, you want to use this abstraction. Anything that has an IP address.</p><p>With reactivestreams.org, we’re trying to address a fundamental issue in a compatible way to hook all these different things together to work while being inclusive. Long-term, the plan for this is to build an ecosystem to build implementations that can be connected to other implementations and then have developers building more things on top of it. For example, connect Twitter’s streaming libraries with RxJava streaming libraries, and pipe into Reactor, Akka Streams, or other implementations on the <a href="http://en.wikipedia.org/wiki/Java_virtual_machine">JVM</a>.</p><p><strong>RW</strong>:&nbsp;<em>Who are key members today?</em></p><p><strong>VK</strong>:&nbsp;Certainly Typesafe jumped in early, since we have an open-source software platform that deals with a lot of what the industry calls "reactive application challenges." We were thrilled to have Twitter join, the <a href="http://spring.io/blog/2013/05/13/reactor-a-foundation-for-asynchronous-applications-on-the-jvm">Reactor guys from Pivotal</a>, and <a href="https://www.linkedin.com/pub/erik-meijer/0/5ba/924">Erik Meijer</a> from Applied Duality, as well as <a href="https://twitter.com/benjchristensen">Ben Christensen</a> and <a href="https://www.linkedin.com/in/georgecampbell">George Campbell</a> who work at Netflix. Red Hat’s in there with Oracle, and we also have some critical individuals like <a href="http://en.wikipedia.org/wiki/Doug_Lea">Doug Lea</a>, inventor of “<a href="http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/package-summary.html">java util concurrent</a>,” driving all concurrency stuff in the JVM. One of the goals of the project is to create a JSR for a future Java version.</p><p>Everyone pulls their weight. It’s just really hard to get engineering time from people at this level.</p><p><strong>RW</strong>:&nbsp;<em>Standards don’t tend to be very popular with developers. How are you trying to approach this to attract more key people?</em></p><p><strong>VK</strong>: You’re right, the average developer is about as interested in standards as cats are in water. Jokes aside, however, we start with open source. I think of this project as a non-standard standards thing. We are inverting the usual process. We have created a spec, a test suite that verifies the spec and we created a description of why the spec is what it is and why it isn’t what it isn’t. We’re really creating solutions, picking them apart, and confirming they do what they say they do and using this process to create the best specification.</p><p><strong>RW</strong>: <em>It sounds like developers in this case are also addressing an ops or a dev ops problem?</em></p><p><strong>VK</strong>:&nbsp;As a developer, you can make life really difficult for your ops guys. This is about getting it right so your ops guys don’t come over and mess you up. Previously they’d have to make sure you don’t feed the system more information than it can process, so you’re not blowing up resources, making sure the processing is always faster than the input. It’s really tricky to do that for variable loads.</p><p><strong>RW</strong>:&nbsp;<em>What are some examples that might inspire your core audience of Java developers?</em></p><p><strong>VK</strong>:&nbsp;What’s a hard case for an enterprise Java developer? If you have a TCP connection with orders coming in and you need to perform some processing to it before passing it on to another connection, you need to make sure you aren't pulling things off the inbound connection faster than you are able to send to the outbound connection. If you don't, then you'll risk blowing the JVM up with an OutOfMemoryError.</p><p>For web developers, it could be streaming some input from a user and storing it on Amazon S3 without overloading the server, and without having to be aware of how many concurrent users you can have. That’s a challenging problem to solve now.</p><p><em>Image courtesy of <a href="http://www.shutterstock.com">Shutterstock</a></em></p>Data is increasingly streamed, but now there's a standards body to coordinate this kind reactive streaming.http://readwrite.com/2014/04/17/real-time-data-streaming-viktor-klang-typesafe-reactivestreams-jvm
http://readwrite.com/2014/04/17/real-time-data-streaming-viktor-klang-typesafe-reactivestreams-jvmHackThu, 17 Apr 2014 10:51:13 -0700Matt Asay