Archive

With Hadoop/Hbase/Hive, Cassandra, etc. you can store and manipulate peta-bytes of data. But what if you want to get nice looking reports or compare data held in a NoSQL solution with data held elsewhere? There have been two market leaders in the Open Source business intelligence space that are putting all their firepower onto Big Data now.

Pentaho Big Data seems to be a bit further ahead. They offer a graphical ETL tool, a report designer and a business intelligence server. These are existing tools but support for Hadoop HDFS, Map-Reduce, Hbase, Hive, Pig, Cassandra, etc. have been added.

Both companies will accelerate the adoption of big data since the main problem with Big Data is easy reporting. Unstructured data is harder to format into a very structured report than structured data. Any solutions that will make this possible and additionally are Open Source are very welcome in times of cost cutting…

Hadoop has run into architectural limitations and the community has started working on the Next Generation Hadoop [NGN Hadoop]. NGN Hadoop has some new management features of which multi-tenant application management is the major one. However the key change is that MapReduce no longer is entangled inside the rest of Hadoop. This will allow Hadoop to be used for MPI, Machine Learning, Master-Worker, Iterative Processing, Graph Processing, etc. New tools to better manage Hadoop are also being incubated, e.g. Ambari and HCatalog.

Why is this important for telecom?Having one platform that allows massive data storage, peta-byte data analytics, complex parallel computations, large-scale machine learning, big data map reduce processing, etc. all in one multi-tenant set-up means that telecom operators could see massive reductions in their architecture costs together with faster go-to-market, better data intelligence, etc.

Telecom applications, that are redesigned around this new paradigm, can all use one shared back-office architecture. Having data centralized into one large Hadoop cluster instead of tens or hundreds of application-specific databases, will enable unseen data analytics possibilities and bring much-needed efficiencies.

Is this shared-architecture paradigm new? Not at all. Google has been using it since 2004 at least when they published Map Reduce and BigTable.

What is needed is that several large operators define this approach as their standard architecture hence telecom solution providers will start incorporating it into their solutions. Commercial support can be easily acquired from companies like Hortonworks, Cloudera, etc.

Having one shared data architecture and multi-tenant application virtualization in the form of a Telco PaaS would allow third-parties to launch new services quickly and cheaply, think days in stead of years…

Most telecom operators are still thinking that software should be upgraded at most twice a year. Oracle RAC is the only valid database solution. RFQ’s bring innovation. If you pay higher software licenses, the software will have more features and as such will be better.

All of these myths will have to be changed in the coming 12 months if operators want to be stay on top of the game.

Upgrade twice a year

For telecom network equipment, two upgrades a year are fine. However for everything related to services that are offered to consumers or businesses, that means that operators are 180 times less competitive then their direct competition. The large dotcoms like Facebook and Google make software upgrades on a daily basis. 50% of all the files that contain Google software code change every month. Even if “a revolution” would happen and software upgrades would come every month, it would still mean a 30 times lag.

Operators need to start using cloud computing, even if they are private clouds, to deploy their back-office systems. The business needs software solutions to move at market speed. That means that if a new social networking site is hot, then it should be integrated into telecom solution offerings in days. Not in months or a year.

There are many techniques to make deployments more predictable, more frequent and more reliable. Offering extra features or integrations quickly can be done via plugins. You can have a group of early adopters, give feedback. If they don’t survive this feedback, kill them. If they do, scale up quickly.

Oracle RAC

Nothing bad about the quality of Oracle RAC but it is a very expensive solution that needs a lot of man-power to keep on running smoothly. Operators often pay a premium for services that could run equally well on cheaper or Open Source alternatives. Also NOSQL should be embraced.

If the cost of deploying a new service is millions, then only a couple of them will be deployed. By lowering hardware and software costs, innovative projects are more likely to see daylight.

RFQ’s and Innovation

It takes 3 months from idea to finalizing an RFQ document. 1,5 month to get a reply. 1,5 month to do procurement. Half a year in total. Not counting the deployment time which is likely to be another 6 months. The result is that the operator takes 12 months for any “new” system.

Now the question is if that system is really new. Because if an operator was able to define in detail what they want and how they want it, then the technology was probably quite mature to begin with. So operators spend fortunes installing yesterday’s technology 12 months late. Can anybody explain what innovation this is going to bring?

First of all operators should not organize multi-million RFQs for business or end-user solutions. These are likely to come late to market and can only be focused on mass markets.

Instead operators should focus on letting the customer decide what they want by offering a large open eco-system of partners the possibility to offer a very large list of competing services to their customers. The operator should offer open APIs to key assets (charging, numbering plans, call control, network QoS, etc.). As well as offer revenue share and extra services like common marketplaces and support 2.0 (social CRM, helpdesk as a service, etc.). This is called Telecom Platform-as-a-Service or Telco PaaS.

High licenses, more features, better

More features does not mean better. Most people want simplicity, not a long list of features. Easy of use comes at a premium price. Look at Apple’s stock price if you don’t believe it.

It is better to have basic systems that are extremely easy to use with open APIs and plug-ins. A feature by feature comparison will make you choose the most expensive one. However it is hard to put as a feature that the system needs to be easy to use.

In telecom, there is a natural tendency to make things hard. In Web 2.0 the tendency is the opposite. You can see the difference between Nokia and Apple. The Nokia phone would win every feature on feature comparison but the iPhone is winning the market battle…

Instead of organizing an RFP, let end-users and employees play around with early betas or proof-of-concepts. No training, no documentation. Let’s see which solution makes them more productive, the feature rich or the more straight forward. Just ask open APIs and a plugin-mechanism and you will be set…

If you are trying to find out what the right hypervisor is for your private cloud or IaaS then you might be asking the wrong question…

Do most applications really need an OS and hypervisor is a better question?

One company of the companies that is exploring this area is Joyent. Thier SmartOS is like the mix between a virtual machine and a combined OS + hypervisor. Instead of installing a hypervisor, on top an operating system, on top an application server or database, the Joyent team thought it would be more efficient to try to remove as many layers as possible between the application/data and the hardware.

According to publicly available videos and material, their SmartOS is based on a telecom technology for high-scalable low-latency application operations. Unfortunately Google does not seem to be able to answer which telecom technology it is. So if you know the answer, please leave a comment.

The idea of running applications as close to the hardware as possible and being able to scale an application over multiple servers is the ultimate goal of many cloud architects. Joyent claims that their SmartOS runs directly on the hardware. On top of SmartOS you are able to install virtualization but ideally you run applications and data stores directly.

The next step would be to combine the operating system with the virtual machine/application server or database server into one. Removing more layers will greatly improve performance as can be seen by Joyent’s performance tests.

So the real question is: do we need so many extra layers?

A distributed storage system, a virtualized webserver, a virtualized app server, a distributed SQL-accessble database or NoSQL solution that would run straight on hardware with a minimal extension to distribute load over multiple machines would be the ideal IaaS/PaaS architecture. It would give customers what they really need: performance, scalability, low-latency, etc. Why add a large set of OS and hypervisor functions that at the end are not strictly necessary?

I have been looking into virtualization but what I find are mainly operation system based virtualizations. What I am looking for are application, integration and datastore virtualization solutions. Google’s App Engine and Oracle’s JRocket Virtual come closed to what I am looking for application virtualization. Why do you need an operating system if you could virtualize your application directly? It would save resources and would be more secure. My ideal solution allows developers to write applications and run them on a virtual application server. This virtual app server can scale applications horizontally over multiple machines. Each application is running in a sandbox hence badly written or unsecure applications will run out of resources and are not able to impact other applications. We would need a similar solution for integration solutions. Both would need out of the box support for multi-tenancy in which either a tenant gets a separate instance or multiple tenants can share one instance if supported by the software. Integration should be separated from the application logic and so should data storage.

Integration is key because the virtual applications could be running on a public cloud but would have to be able to interact with on-site systems. Enormous high-throughput, security, multi-tenancy and resistance to failure are key. One API can be linked to multiple back-office systems or different versions. Different versions of an API can be link to the same back-office system to prepare applications before a major back-office upgrade.

A distributed multi-tenant data store should hold all the end-user and application data. Ideally in a schema-less manner that avoids having to migrate data for data schema changes.

All these virtual elements should be managed by an automated scaling and highly distributed administration that can let applications grow or shrink based on demand, assure integration links are always up and get re-established if they fail, store data in a limitless way, etc. But there is more. The administration should allow to deploy different versions of the same application or integration and allow for step-wise migration to new versions and fast roll-backs.

Why do we need all this?

The first company that will have such elements at its disposal will have enormous competitive advantages in delivering innovative services quickly. They can launch new applications quickly and scale them to millions of users in hours. They can integrate diverse sources and make them universally available to be re-used by multiple applications. They can store data without having an army of DBAs for every application. They can try out new features and quickly scale them up or kill them. In short they can innovate on a daily basis.

The Google’s of this world understood years ago that a good architecture is a very powerful competitive weapon. There is a valid trend to offshore technical work. However technical work should be separated in extremely high-value and routine. Never off-shore high-value work. Also never assume that because the resources are expensive, it must be high-value. Defining and implementing this innovation architecture is extremely high-value. Writing applications on top of it is routine at least starting from number 5.

With the world looking more at XML, SOAP and REST these days, it is perhaps anti-natural to think binary again. However with Protocol Buffers [Protobuf], Thrift, Avro and BSON being used by the large dotcoms, thinking binary feels modern again…

How can we apply binary to telecom? Binary SIP?

SIP is a protocol for handling sessions for voice, video and instant messaging. It is a dialect of XML. For a SIP session to be set-up a lot of communication is required between different parties. What if that communication is substituted by a binary protocol based for instance on protocol buffers? Google’s protocol buffers can dramatically reduce network loads and parsing, even between 10 to a 100 times compared to regular XML.

Performance – faster parsing and lower load means that more can be done for less. One server can handle more clients.

Scalability – distributing the handling of SIP sessions over more machines becomes easier if each transaction can be handled faster.

Disadvantages:

No easy debugging – SIP can be human ready hence debugging is “easier”. However in practice tools could be written that allow binary debugging.

Syncing client & server – clients and server libraries need to be in sync otherwise parsing can not be handled. Protocol buffers ignores extensions that are unknown so there is some freedom for an old client to connect to a newer server or vice-versa.

Firewalls/Existing equipment – a new binary protocol can not be interchanged with existing equipment. A SIP to binary SIP proxy is necessary.

It would be interesting to see if a binary SIP prototype joined with the latest NOSQL data stores can compete with commercial SIP/IMS equipment in scalability, latency and performance.

Any operator that has not started a project on Cloud Computing is late. The typical data center at an operator is filled with servers that are under utilized e.g. application servers and database servers are running at 30% of memory, disk and CPU. Just by doing step one of getting to Cloud Computing: virtualization, operators are able to save substantially in the cost of hardware, electricity, maintenance, etc. Virtualization means decoupling software from hardware. This allows to run multiple operating systems on one server.

However this would only be focusing on the tip of the iceberg. Cloud Computing is so much more…

Private Clouds

Automatic Scaling

Let´s first focus on the internal systems of an operator. After solutions have been virtualized, then you are able to scale them to more or less servers. The first step is to automate this process. If you have an application server cluster, do you need 8 nodes all the time? You probably only need them the week before Christmas or during some other peak period. So the ideal is to be able to measure the load and to automate the deployment of more or less cluster nodes based on load. The same can be done with the database. During the night you have 2 nodes. In the morning 3. During the day 4. During peak moments 8. In the evening 3 again. You could save massive amounts of money if application servers and databases can be scaled in this way. You ideally also are able to pay licenses based on what you really use and not on your maximum number of nodes during a yearly peak.

Redesigning Applications and Data

Both Amazon and Google found out that if they redesign their applications then they can get even more gains than pure virtualization. Amazon´s S3 service is a clear example. However internally they started with services like Dynamo on which S3 is build. The first step is to build general data stores. Multiple applications should be using a common data store instead of needing a separate database cluster each.

Unlike popular believe in the IT world, the dotcoms are not filling their data centers with Oracle RAC clusters. The dotcoms are designing special purpose data stores. The data volumes any market-leading dotcom has to deal with are so massive that a SQL database can not keep up. SQL databases are very good at running efficient queries on structural data or making sure transactions are consistent. However they fail when data is unstructured, write operations are massive or data volumes grow with terabytes every data.

Relational Data

So for all low-volume applications that need transactional data and read more than they write, you could still use a unified Oracle RAC cluster to serve multiple applications. An alternative approach are the data stores that have been build by Amazon (Relational Database Service or SimpleDB) or Google´s App Engine (Datastore with JDO).

What other alternatives are there?

Read Mostly Data

Data that needs to be read a lot and is not updated frequently can get an enormous performance and scalability boost by using an in-memory data store. The dotcom standard is memcached. Facebook (800 servers and 28TB) and Twitter are addicted to memcached.

Documents, Images & Videos

Binary and media files are best stored outside of a database. In small numbers they are often stored on a file system. However they occupy a lot of disk as well as network bandwidth when moved around. The ideal is a document store with a content-delivery network or CDN as a front-end. Amazon´s S3 and CloudFront are examples. Storing them in a compressed format, e.g. LZO can save valuable space. Also transcoding into different formats, e.g. thumbnails or preview can help save network bandwidth.

Unstructed Realtime Data

Data that is unstructured and needs to be stored and accessed in real-time in high volumes are best stored in special purpose data stores. You can write a book about the latest NoSQL solutions. Write an email to maarten at telruptive dot com if you are interested.

Analytics Data

Twitter has described most extensively how they use all the unstructured data they get from their logs and other sources. They use technology from Facebook to stream it into a high-available file-system from Yahoo. There they run massive parallel map-reduce operations to get to know a lot more about what users are doing and who is influencing who, etc.

Social Graph

The social graph is about who knows who and what kind of relationship you have. This data is best stored in graph data stores.

Collective Intelligence

Again a chapter by itself but dotcoms are also heavy users of collective intelligence which often means dedicate systems.

Accessing Data

Instead of stove pipes with data, the dotcoms are making data accessible to all their applications. Either via search interfaces, web technology to access data (e.g. REST and JSON) or efficient binary interfaces (Thrift and Protocol Buffers).

Messaging and Notification

Applications

If applications have access to all the above services then the architecture of an application is simplified enormously. Most of the famous dotcoms don´t use middleware. They prefer the SOA principle. However unlike the IT SOA solutions, a dotcom would take an application and make it into a chain of reusable services. Let´s take an IVR application as an example. There would be a service to do voice recognition. Another one for voice transcription. Another one for text-to-speech. A transcoding service to transcode between different media formats (e.g. high-quality voice and low-phone-quality voice). And so on. Each service has independent load-balancing and can be scaled separately. Services can be re-used between applications. An application is very short because it just need to define which services need to work together and how.

Application Deployment

The dotcoms deploy new features on a daily and even hourly basis. This means that all application deployment is fully automated. When a new feature is deployed it does not necessarily overwrite an existing feature. It is possible that a new functionality has been solved in 5 different approaches. Dotcoms would split the total user base and let small parts of users try out the different approaches. Depending on the user´s feedback they would take the preferred approach and slowly scale up from 1% to 100%. If they detect that the feature has a performance problem or a bug then they would be able to roll-back or decrease the load, fix it and deploy gradually again.

The Network, OSS and BSS

There is a substantial effort needed to redesign a network to be cloud-aware. Some components need latencies lower than 10 milli-seconds (e.g. antennas), hence most of this logic will have to be processed locally. However all systems that can live with 100 milli-seconds latencies benefit from a cloud make-over.

Especially in the area of OSS and BSS there is room for optimizing applications and making them cloud-aware. Global services like a network inventory service, a user profile service, a device profile service, etc. would mean simpler applications and less data duplication.

Opening the Cloud

So the network and IT infrastructure is being redesigned to allow for faster innovation and lower costs. However Cloud Computing can also be used to increment revenues.

Being a Cloud Infrastructure Provider

Many IT consultancies and software/hardware vendors will tell an operator that they could be a Cloud infrastructure provider. On slides this really looks nice. However unless an operator is not using the cloud computing principles for their own systems as described in the first part, they are lacking substantial knowledge about how to manage such an infrastructure. Without this knowledge it would be hard to have a very optimized solution and as such be price competitive with the existing players.

Being a Cloud Platform Provider

Although closer to the operator´s core competencies, being a cloud platform provider would still be for those operators that are Cloud experts. A Cloud platform provider would allow others to use the infrastructure services to create applications on top. The complexity lies in the fact that malicious users try to break the platform which could have a very negative effect on the infrastructure if not handled correctly.

Being a Cloud Service Provider

This is the default option most operators should explore first before moving into the other areas. Being a service provider also has a roadmap:

Reselling SaaS

The easiest step is to be the storefront and to resell IT applications from others, e.g. cloud backup storage, security solutions, etc.

Offering Telco SaaS

The next step would be to offer specific telecom applications. Applications that are build for the operator or even better applications that can be build by others based on the operator´s assets. An example would be a PBX in the Cloud.

Open Market for SaaS

Building all telecom applications yourself is hard. Attracting others to do it for you is easier. However just putting a “Net App Store” and an SDK on the web will not get you to dominate the market. Only an open market with a large eco-system of companies and developers can generate large quantities of “Net Apps”. If you are thinking about building an open market, why don´t we talk first. Send an email to maarten at telruptive dot com.

Disclaimer

All the contents of the Blog, EXCEPT FOR COMMENTS AND QUOTED MATERIAL, constitute the opinion of the Author, and the Author alone; they do not represent the views and opinions of the Author’s employers, supervisors, nor do they represent the view of organizations, businesses or institutions the Author is a part of.

The Author is not responsible for the content of any comments made by the Commenter(s).

While we have made every attempt to ensure that the information contained in this Blog has been obtained from reliable sources, the Author is not responsible for any errors or omissions, or for the results obtained from the use of this information. All information in this Blog is provided "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information, and without warranty of any kind.