WHAT IS THE Isis 2 LIBRARY? - PowerPoint PPT Presentation

WHAT IS THE Isis 2 LIBRARY?. Cornell University. Ken Birman. Isis 2 is a technology for replication. Solves the coherent replication/caching problem Like MapReduce , intended for use by a programmer Like Spark, fairly complex because the issues are subtle when you look at them carefully.

Copyright Complaint Adult Content Flag as Inappropriate

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

WHAT IS THE Isis 2 LIBRARY?

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Isis2 is a technology for replication • Solves the coherent replication/caching problem • Like MapReduce, intended for use by a programmer • Like Spark, fairly complex because the issues are subtle when you look at them carefully

Isis2 library • A large, multithreaded service that runs inside your program using its own threads and data structures • It implements a wide variety of cutting-edge distributed computing protocols and algorithms • The outcome of 25 years of research! • A very complex, sophisticated “operating system” for distributed and cloud computing • But you access it via a very simple library interface that hides as much of this complexity as feasible

Suppose you want to build a cloud service • It will run on many nodes on a cloud platform • It will hold data collected as your system is running • Perhaps it keeps track of where people are • Then it can answer questions about location, such as “where are my friends right now?” • Web applications send location updates and perform queries

Determinism • Many programs are deterministic • Program state is determined by some sequence of events that mutate the state • Given the sequence, the state is easily computed • When building distributed services • Some people don’t assume determinism. • But it easier to work with the “state machine” approach in which programs are identical and fully deterministic

State Machine Approach (Lamport) • You take some deterministic event-driven program • Replicate it by making N identical copies • Apply the same sequence of events to all N copies • ... they will stay in the identical states! • ... and you can spread read-only operations evenly. Any copy can respond identically to any other copy!

Is Determinism Feasible? • Modern programming languages make non-determinism “inevitable” • Threads, libraries that use them • Input that might arrive in unpredictable orders • Other events such as failures, exceptions... • But we can still build a deterministic “object” – a class that has update and query / lookup operations, and lives inside a program

Key idea in this short course? • Small, deterministic objects • We build normal programs, which might not be deterministic • But inside them are deterministic objects and those have identical replicated states over a set of copies • We arrange to deliver the identical events, in the identical order, and the copies advance through identical states

Events • They could be updates... but could also be • Group membership changes • Failures • In the version of state machine replication of interest to us, all of these are just “events” delivered to the deterministic objects that we replicate

Using the membership • Suppose a system has N members, 0... N-1 (“rank”) • We can replicate identical states • But at the same time, can ask the different members to play distinct roles • Example: Search a database with K*N entries • Member 0 searches 0... K-1 • Member 1 searches 1... 2K-1 • Search is a form of “lookup” and if they search identical replicates, the N responses reflect exactly one copy of each sub-result! They add up to “1 search of the database”

Without Isis2 • With no help from a system such as Isis2, you will need to do many things by hand • Run your program on the cloud, with N copies • Track the status of each copy, adapt as they crash, or restart, or the cloud balances load • Send in updates and share them within the copies • Send in queries and compute the responses • Follow the various rules imposed by cloud operators • This makes your task very hard

With Isis2 • Today’s cloud developers often work with prebuild technologies to make their life simpler and easier • Zookeeper: a special cloud computing file system from Yahoo, used for sharing files in a reliable way • Cassendra: A distributed key,value storage system • MapReduce and Hadoop: Computing tools to split jobs into small parts, combine results • Isis2 is a library for writing distributed programs, and we will focus on it • A short class like this won’t have time to look at the full range of cloud computing options. At Cornell we teach such a class but it takes us 14 weeks • By focusing on Isis2 we can be more thorough

Isis2 is for distributed computing • Rather than building one program that will run “by itself” we can use Isis2 to build a program that will • Talk to clients over a network (client-server model) • Collaborate with other servers (peer computing model) • Execute on a cluster or cloud or in the Internet WAN • In this class we will focus on applications running on a cloud platform such as Amazon EC2 • You can “simulate” such a setup on your own laptop! • So you can experiment with Isis2 while we talk about it

Isis2 System “joinmyGroup” state transfer myGroup update update • Revisits an old model (a personal favorite!) • Core functionality: groups of objects • … fault-tolerance, speed (parallelism), coordination • Intended for use in very large-scale settings • The local object instance functions as a gateway • Read-only operations performed on local state • Update operations update all the replicas DSN, June 2013 (Budapest)

Terminology we’ve used • Process group: A term for a collection of programs that are all running (perhaps on different machines, perhaps on the same machine) and that use Isis2 • Each process group has a name (you pick it) • You can have multiple groups in one application • Message: Data encoded to be sent between programs • State transfer: Data to initialize a new group member • Update: Any action that changes the shared data • Lookup: Any action that only queries the data • Multicast: A message sent to every group member

How would this map to our use case • We will use a process group to maintain the locations of our users • Updates will be done by multicast to the group • Queries will be done by asking a group member to look up the location data for a person • Scalability • We could replicate all the data at every member, which works until the number of users gets very large. • Once the data gets huge we will need to split it into subsets. The approach is called “sharding”

Why would we use Isis2 • Prebuilt tools such as Isis2 simplify the mapping of a concept such as “location tracking service” to code that we can run on the actual cloud • It does many of the hard jobs for us • Our programming task becomes easier as a result • Developer still designs the solution and builds it, but many of the hardest tasks are “automated”

Modern computing: Many styles • Console application: • Program receives arguments on the command line and by interacting with the “console user” • Prints output to the console window • Often used when developing a program that might later run in different styles

GUI program • Application employs a GUI library • Lays out a windowed application with buttons, menus, visual regions that contain images, etc • Attaches handlers to perform actions, like responding to mouse clicks (“event” handles) • When a GUI program is launched, it creates a window (a “frame”), sets it up and renders it • Then sits in a special method waiting for events

Client server program • Program must be registered as a server • Now client systems can connect to it, send it requests • These days the “Web Services” approach is common • Requests encoded as web pages, replies too • Clients connect via TCP and use the HTTP protocol to send the requests to the server and receive replies • Usually a library handles the connections and you only write the code to handle the requests

Cloud computing program • Many servers to support lots of clients • The servers all share some form of database • The cloud load-balances work so that performance remains high even if the number of clients is huge

Cloud programs are like... • Servers, but they often only serve one client at a time • This is the easiest model to implement • Virtualization allows cloud computers to host many such servers and leverage multicore • One computer might run many copies • GUI programs • We favor an “event-oriented” style of computing • But there is no private console or terminal and no GUI. The similarity is because of the event computing model.

Sharing data • Normally, cloud servers share databases and files • For example, during the night Google computes big files with web index information, such as answers to queries with 1, 2, 3 ... n terms • These index files are stored in a big file system: GFS • The servers have access to the files as needed • Ideally, caching is used to avoid accessing the file or database servers heavily • Databases: Use a slightly fancier “snapshot isolation” approach

Updates • Updates are typically sent to “back end” servers for processing and applied to the system state there • The client systems have in-memory caches but they don’t do in-memory updates to the server state

Update example UpdateLocation(“John Smith”, now at 12 Main Street, ...) • Web application sends data • Process group retains it andresponds to queries • From this we can build apps: • “Who could join me for golf?”

Most of the cases we’ll consider are like this example • The cloud gets a lot of “power” from simple ideas • Our focus in these lectures will mostly be on services that take updates and lookup requests and keep the data in memory, or in files associated with the server members • Even for this simple case many issues arise

Two kinds of questions • How hard is it to scale my solution up? • For example, to divide the data so that subsets of my group handle each subset of the data? • Otherwise, the update rate will limit scalability • How hard is it to guarantee properties? • Consistency • Security • Fault-tolerance

A process group “joinmyGroup” state transfer myGroup update update • Update ordering iskey to consistency • If we treat membershipchanges as a kind of update, we can address fault-tolerance • By allowing our groups to havea subgroup structure, we cansubdivide data for scalability • E.g.: Red: Locations for people with names A-H, Green: I-Z

Our goals • Learn to use Isis2 to solve problems of this kind, and related problems that are based on the same ideas • A full-length class would look at many other cloud computing technologies • But in the time we have, our choice is to be very superficial for many technologies or reasonably detailed about just one • And so we’re focusing on just the one

What is the Isis2 Library? • Solving the kinds of problems we’ve look at is hard • Nobody wants to invent the solutions and implement them “from scratch” each time they are needed • Isis2 packages common functions into a standard and easily used form • It can be used on many systems: Windows, Linux, Amazon EC2 or Eucalyptus, even Android • And in many settings: WAN, cloud, small clusters

Isis2 is a library • It provides pre-built methods you can call from your program in C#, C++ or Python • But there are some limitations • Isis2 itself was written in C# using .NET and so is not so easy to use from non-.NET languages • The version of C++ we support is “C++/CLI for .NET” • The version of Python is “IronPython for .NET” • On Windows there are 41 additional languages and in fact any of them would work. But only these three are available on Linux as well

How can you obtain this library? • You’ll visit Isis2.codeplex.com and (normally) will register for updates via email • Then download Isis.cs: source code for Isis2 • Also download Isis2.doc: a programming manual • And Documentation.chm: “compiled” html with per-API documentation • Working on Linux? • If so, you should also download and install Mono. Build it. • You’ll use the “dmcs” compiler, so read the small Isis document explaining how this is done • Now you are ready to use the system