About this blog

This blog features updates, opinions, and technical notes from Caucho engineers about Caucho products, the enterprise Java industry, and PHP.
Caucho Technology is the creator of the Resin Application Server and the Quercus PHP in Java engine. A leader in Java performance since 1998, Caucho is a Sun JavaEE licensee with over 9000 customers worldwide.

Meta

Posts Tagged ‘websockets’

In our last segment we introduced the concept of ProjectX, our next generation model to rapidly develop fast, scalable services, that are exposed as REST and WebSocket services. Please remember that ProjectX is a placeholder name and the final name is yet to be determined.

In the same way NoSQL is an alternative way of thinking about data storage, ProjectX is a different way to think about writing services.

When you develop with ProjectX, you develop in a service-first manner, allowing you to spend your time on writing objects versus wasting time on cycles and iterations of complex schema design and schema migration. Not to mention you’ll see a significant decrease in cache coherency issues.

How does it work?

If you recall objects are data and logic. Your data is in your objects and the objects that you have in memory have the data that you need. ProjectX just makes sure that those objects are backed up to disk. This allows you to write services that are in Java and only Java.

ProjectX and Non-Blocking RPC Services

Often when people think about services, they think about blocking RPC services like REST. ProjectX allows you to easily develop non-blocking RPC services, as well as allowing you to register callbacks and/or use one-way method calls. ProjectX allows these services to be consumed over HTTP/REST/JSON or WebSocket/JSON.

ProjectX has an open wire protocol, based on JSON, called JAMP that is easy to implement. You can call into any ProjectX service from any language. All that is required is for that language to have HTTP support and JSON support (so Ruby, Go, C#, Java, Python, JavaScript). If the language also has WebSocket support, then the conversation can be bi-directional and very efficient. (We also have HAMP, which is Hessian based. Hessian is a binary protocol which has been ported to many languages including ActionScript, Python, Java, C#, and others.)

Object-Oriented Development Is Logic and Data Together

One of the original complaints about J2EE, was that it pulled developers away from the object-oriented model. You ended up writing procedural code and all of the data existed in the database. All services were stateless and had the logic in them. No services maintained their own state and relied on third-party frameworks to map objects to a relational database.

Many frameworks like Spring, CDI, Hibernate and Guice, mitigated some of the early issues with J2EE and its lack of OO. Also, the NoSQL movement made the mapping of objects to database easier. But, inherently the majority of modern service development still typically splits the data into a database or some sort of data store, which separates the data from the service. In this common model, services do not own their data.

When you start using caches and DataGrids to speed up storage and retrieval to databases, you are trading the problem of latency for a new more complex set of problems that include but are not limited to: cache coherency issues and split brain to name just a few. These are not easy issues to handle.

In the ProjectX model, objects services own their data and the objects are in memory. ProjectX enables service developers to back those objects to disk in the most efficient manner possible. There is no longer database usage merely for data safety. You can still use a database for reporting, but now your operational data can exist purely in Java.

Did you know that a modern commodity hard disk can read/write up to 300 MB per second? If you are using SSD, the sequential reads are up to 500 MB per second. Phase shift memory and advances in Flash mean that this speed will increase. If you add RAID level 0 support, this speed can increase by several multiples. ProjectX journaling and data store takes advantage of sequential writes to ensure data safety at top speeds. More details about this are in subsequent posts.

Using ProjectX is as easy as just using a few simple annotations. Your code will look like code written for a typical service in EJB 3, Spring or Guice. But with ProjectX you can avoid the common mistake of using the database as a synchronization mechanism.

Using the database as a synchronization mechanism is an anti-pattern that causes many performance and scalability problems in service development. Rest assured, ProjectX is a Java POJO approach to development. Your code can be completely annotation free, or, if you choose to use Java EE/CDI, you can use a few annotations for productivity. Your code base has very little to no direct tie to ProjectX. It is just Java. We don’t try to tie you to our platform.

The Real Expense of Abusing Caching

Using ProjectX also enables you to avoid the anti-pattern of duplicating all the data in the database and every possible query of that data in a data grid or data cache. By using a cache or adding a lot of complexity to your application, you may incur problems of cache coherency and split brain. If all you know is horizontal scaling and caching, every large-scale system looks like a nail. ProjectX can be the hammer.

It is very easy and, from my experience, very common to paint a project into a corner by abusing caching. Caching is the equivalent of applying a quick and dirty (as in dirty read) Band-Aid solution that can cause many operational and development issues down the road. Many have worked on projects that had 80 GB of data, but the same data existed in many cache layers to the tune of 12 TB of RAM. There are projects that solve all these issues with more horizontal scale out and more caching, and these projects can quickly become a vast waste of hardware and developer productivity – not to mention the near impossibility of properly invalidating a cache. Misapplying horizontal scale out and caching have wasted countless developer and operation-engineering years.

Using ProjectX does not preclude horizontal scale out and caching. But when you have services that are up to 10x – or as much as 100x – more efficient and don’t require cache for all of their data; then you reduce cache coherency issues and you need less server instances. It would not be uncommon to replace 10 to 100 hardware servers written the traditional way with six to 12 servers using ProjectX. The ProjectX approach should also be 2x to 10x faster than normal service development (database, cache, Java REST lib, JPA, local cache and distributed cache.). Also, since you have fewer servers and fewer things to worry about (like cache coherency issues which are some of the least fun things in the world to chase down and debug), your operations costs should be 2x to 10x cheaper as well.

ProjectX fully supports horizontal scaling. You can service many more requests/connections from your services. ProjectX is, in fact, a distributed system for service development. More about this will be covered in the next post.

Services Should Own Their Own Data

ProjectX allows the service to own its data, and ProjectX provides fast storage mechanism for crash recovery. ProjectX allows your objects to be served out of memory.

In the ProjectX approach your operational data is your Java objects.

ProjectX provides journaling, replication and fast persistence. The emphasis is not on the persistence. The persistence is a foregone conclusion managed mostly by ProjectX for data safety. This feature allows you to focus on your business logic and derive real value from your services.

Do you want to focus on enhancing the business value of your service or on managing database mapping and cache coherency issues?

The Real Win: The Ability To Develop Faster and Streamline Your System

Just as NoSQL was built for horizontal scaling but found a home in the hearts of developers who wanted to avoid schema migration and wanted more productive, dynamic schema, ProjectX has big productivity wins as well. You don’t have to be the next Internet sensation to get benefits out of ProjectX. If you want to focus on providing business value instead of feeding complexity then ProjectX is a good fit for you.

We feel once you start developing services with ProjectX that you will not want to develop them any other way. Instead of dumbing down distributed service development, we put the engineering rigor and computer science back into service development. You get to take full advantage of your distributed system. Ultimately, and most importantly, you get to focus on writing more collaborative, richer applications. Features that were once cost prohibitive, or could never be squeezed into the budget, are now easy to develop. ProjectX, is a very practical, user-friendly way to create massively collaborative and rich applications. It makes the nearly impossible development easy.

ProjectX makes sense for both enterprise applications and mobile applications that need to send six million requests per second. ProjectX is just simply a more productive way to build services.

Tune in next time when we show you some basic code examples from ProjectX.

The industry is changing at a rapid clip. There is a lot of convergence and it’s a new dawn for software development.

The number of devices that developers have to support has grown enormously – from smart phones, to glasses, to virtual servers. What I want to describe is a way to drastically speed up development time, reduce complexity, and reduce hardware costs.

But first, let’s talk a little bit about the trends in the industry.

The idea of an application server is becoming a thing of the past. Today, most server-side developers develop services – not applications. This is the trend. The new Web is no longer just a servlet engine – a database and some JSP/HTML/CSS. Today, applications can range from mobile applications to rich HTML 5, and the presentation logic is expected to be in the client. Users have come to expect a rich user experience. HTML 5 promises and delivers a very rich environment for writing applications. Companies that embrace this will deliver user-centric GUIs and be more successful than companies that do not. User Experience (UX) is finally the mantra as it should be.

The Rise of NoSQL

The rise of NoSQL is really about the rise of tools that focus on data safety in contrast to relational databases. The emphasis is on horizontal scaling – potentially millions of clients’ data –and not forcing application data into a relational model. NoSQL, although originally built to support horizontal scaling, has found a home in the hearts of developers who just want to rapidly develop and rapidly iterate on their applications and not be forced to deal with the hassle of constant schema migration. Schema migration is a difficult process to manage and has historically slowed development down to a crawl.

While NoSQL’s claim to fame might be horizontal scaling, a larger selling point has been more dynamic schema. This has driven NoSQL from massively scaled uses to also being used for department-level applications that will never use the horizontal scaling features. It just works. It is easier then dragging a schema along and it means fewer DBAs, Ops - and less trouble. Many confuse NoSQL with BigData. There are NoSQL solutions that can be used in BigData, but NoSQL is more about scalable, operational data.

The Rise of Rest Services

In days gone by, SOA was a way to break up an application into reusable services. With this backdrop of activity, we see the rise of REST development with Java. More people are writing services and using services-oriented development, and more people should be talking about it. SOAP and XML are used less while REST and JSON are used more. The days of SOA and belly button lint inspection are gone. The days of writing services has just begun. Service-oriented development is a forgone conclusion. It has become synonymous with software development.

In the era of HTML 5 and mobile applications, the weight of the presentation logic has shifted back to the client. The service-oriented approach has been reborn and repurposed. HTML 5 apps and mobile apps are calling REST services. REST, along with JSON, has become the conduit of communication for mobile applications. REST with JSON is the common language of the Web. If you are doing REST, you are five times more likely to write that REST service in Java than any other language.

WebSockets – The New Communication Backbone

WebSockets are showing up in more places as well. WebSocket is the next generation way to develop services for mobile and HTML 5 applications. WebSocket is part of HTML 5 and provides faster bi-directional communication without the latency of HTTP, request/response of REST. HTML 5 is synonymous with WebSocket and IndexDB. WebSocket is just baked in. Like REST, Java will dominate this space as well.

In-Memory Data – The Golden Goose

To handle load and develop more interactive applications, there has been a trend toward non-blocking systems that use principles of mechanical sympathy to optimize applications. This is done by writing code that takes advantage of the hardware’s multi-core machine effectively. Now, instead of spending millions on hardware and software that scales to tens of thousands of transactions per second, teams have developed software that scales to millions on commodity hardware that costs only thousands.

From LMAX Disruptor to Workday, companies are finding that in-memory data is the fastest way to develop, deliver and scale modern applications. The basic idea is that service requests go through a journal and are replicated before the service is called. The data that is in-memory is the actual operational data. Storage and replication are now background tasks that occur in parallel with the service as much as possible. Storage is simply crash recovery. In-memory data is the actual data. Unlike the NoSQL model, your objects are your data and there is no database per se. Combining logic and data has another name: Object-oriented development.

This allows developers to focus on writing code and not worry about persistence or mapping as much. This is the next logical step in the NoSQL trend. This goes beyond NoSQL to no database, or rather no databases in the operational path. Services own their operational data. Think: “No more mapping. No more cache coherency issues. No more schema migrations. Sounds pretty good, doesn’t it?”

This is not to say you don’t have databases. You just don’t need databases to ensure that your operational data is safe. You can use databases for what they were meant for – reporting and offline analytics. The database no longer needs to be in the operational path. You no longer have to use your database for synchronization and turn it into a performance choke point.

This approach allows faster development time, as no database mapping or schema migration is required. You get the same data safety as you would get from a NoSQL or RDBMS, perhaps even more since the cost of data safety is less. Also since traditional architecture usually requires a lot of caching, it must deal with cache coherency issues. This new approach avoids that by allowing the services to own the operational data.

This allows companies to rapidly iterate and come up with their minimal viable applications and focus on providing an awesome user experience rather than spending millions on infrastructure and slowing the development process to a crawl with schema migrations, cache coherency issues, and the like. This approach allows companies to adopt the lean startup philosophy by allowing simpler more rapid iterations. As far as scalability goes, the same hardware can handle 10x to 100x the number of requests, so you have less vertical scaling to manage. To put it simply: Do more with less.

A SERVICE ENGINE READY FOR THE MASSES!

Well, what about the programming model? Is this in reach of the everyday developer? How can I use this approach?

Enter stage left, ProjectX (our code name). ProjectX has its DNA in JAX-RS, EJB, Spring, etc. It is designed around the way that Java developers write services. It provides the benefits of this new model in a programming model that is familiar and friendly to developers. Instead of learning a new programming model or language, you program in Java.

The WebSockets protocol needs the concept of a sub-protocol to make sure the client and server are sending messages they both understand. A quake client, for example, can only talk to another quake client, not a chat client, and a quake/3.1 client might not be able to talk to a quake/5.3 client. To make sure the clients and servers are taking the same protocol, WebSockets introduces a sub-protocol validation.

Although “sub-protocol” might sound somewhat complicated, it’s just recognition that applications will define simple protocols on top of WebSockets like they define XML formats and schema using the XML syntax, or JSON applications define objects to pass back and forth. Like XML and JSON, WebSockets is a layer that applications build on.

Some examples that WebSockets applications will create are JSON packets over WebSockets, XML over WebSockets, XMPP over WebSockets, and Hessian packets over WebSockets, as well as custom protocols like Quake or a tic-tac-toe game.

The client and server will validate the protocol to make sure a Quake/2.0 client won’t get confused talking to a Quake/1.0 server. At the beginning of WebSockets, the client HTTP handshake sends a Sec-WebSocket-Protocol header with the sub-protocol name like quake.idsoftware.com/1.0. If the server understands that version, it will respond with a Sec-WebSocket-Protocol of quake.idsoftware.com/1.0. If not, it will close the connection.

Although the protocol string is arbitrary, it’s a good idea to use unique names like “quake.idsoftware.com” with a version “/1.0″.

Sub-protocols using a HTML5 browser JavaScript will always send and receive unicode text, not binary. That text will always be encoded in UTF-8, a convention necessary for sanity, because allowing multiple character encodings would be more trouble than the small benefits, and would make implementation far more difficult.

I’ve also seen the term “real-time web” (RTW), which I like as well, but here I want to dive a bit into the underlying bytes that go back and forth to make the real-time web happen.

The binary “message” at the heart of the real-time web is a sequence of bytes controlled by the application: JMS-style messages, XMPP (Jabber) frames, a JSON object, serialized Java in Hessian, the packets for a Quake game, stock ticker updates, iPhone app messages, toll booth status control panel, on-demand music streaming, auto-manufacturing overview consoles.

Because the messages can vary in length from tiny, fast Quake messages where response time is critical, to larger packets like the music and video streaming, the underlying protocol must handle that range, but still be memory-efficient. It would be absurd to force a server to buffer an entire video before sending it, or even fully serialize an XML message just to find the length.

So a sane protocol needs binary-length encoded chunks (called “frames”) combined into messages. “Messages” are understood by the application, “frames” are invisible to the application but are used by clients, servers, and intermediaries to manage the messages.

Bringing those requirements together, the minimal protocol looks like the following: