Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

An Introduction to Accumulo

This was presented for an O'Reilly Media webcast. http://www.oreilly.com/pub/e/3152?cmp=tw-na-webcast-product-webcast_an_introduction_to_apache_accumulo

This webcast will cover the basics of Apache Accumulo architecture and how it works, along with examples of how it is used. We'll also talk about some interesting use cases, such as text indexing, fine-grained multi-level access controls, and storing large-scale graphs. We'll also briefly touch on what sets Accumulo apart from other similar and not-so similar systems and where we think the Accumulo project is headed in a technical direction.

A description of Accumulo from the Apache Accumulo website:
The Apache Accumulo sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google's BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here. Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, Hypertable, and Cassandra. Accumulo began its development in 2008 and joined the Apache community in 2011.

Two basic operators AND operator represented by & OR operator represented by | In the examples A,B, C, and D are security tokens Security Tokens are strings of alphanumeric characters Tokens are user defined Parenthesis are required to use nested logic

A Minor Compaction is triggered when the Tablet’s MemTable reaches it’s maximum size When the MemTable reaches it’s maximum size, it is flushed A Minor Compaction Iterator is applied during the stage when the MemTable is flushed and a new RFile is created Since the iterator is applied during a Minor Compaction, the iterator does affect the persistence of the data

A Major Compaction periodically merges as set of RFiles into one If a Major Compaction iterator is enabled, the iterator runs after the merge to filter data before writing the new RFile Since the iterator is applied during a Minor Compaction, the iterator does affect the persistence of the data

An Introduction to Accumulo

1.
AN INTRODUCTION TO
APACHE ACCUMULO
HOW IT WORKS, WHY IT EXISTS,AND HOW IT IS USED
Donald Miner
CTO, ClearEdge IT Solutions
@donaldpminer
August 5th, 2014

13.
Apache Accumulo is based on Google's BigTable design
and is built on top of Apache Hadoop, Zookeeper, and
Thrift. Apache Accumulo features a few novel
improvements on the BigTable design in the form of cell-
based access control and a server-side programming
mechanism that can modify key/value pairs at various
points in the data management process. Other notable
improvements and feature are outlined here.
Google published the design of BigTable in 2006. Several
other open source projects have implemented aspects of
this design including HBase, Hypertable, and Cassandra.
Accumulo began its development in 2008 and joined the
Apache community in 2011.

14.
Apache Accumulo is based on Google's BigTable design
and is built on top of Apache Hadoop, Zookeeper, and
Thrift. Apache Accumulo features a few novel
improvements on the BigTable design in the form of cell-
based access control and a server-side programming
mechanism that can modify key/value pairs at various
points in the data management process. Other notable
improvements and feature are outlined here.
Google published the design of BigTable in 2006. Several
other open source projects have implemented aspects of
this design including HBase, Hypertable, and Cassandra.
Accumulo began its development in 2008 and joined the
Apache community in 2011.

15.
Apache Accumulo is based on Google's BigTable design
and is built on top of Apache Hadoop, Zookeeper, and
Thrift. Apache Accumulo features a few novel
improvements on the BigTable design in the form of cell-
based access control and a server-side programming
mechanism that can modify key/value pairs at various
points in the data management process. Other notable
improvements and feature are outlined here.
Google published the design of BigTable in 2006. Several
other open source projects have implemented aspects of
this design including HBase, Hypertable, and Cassandra.
Accumulo began its development in 2008 and joined the
Apache community in 2011.

16.
Apache Accumulo is based on Google's BigTable design
and is built on top of Apache Hadoop, Zookeeper, and
Thrift. Apache Accumulo features a few novel
improvements on the BigTable design in the form of cell-
based access control and a server-side programming
mechanism that can modify key/value pairs at various
points in the data management process. Other notable
improvements and feature are outlined here.
Google published the design of BigTable in 2006. Several
other open source projects have implemented aspects of
this design including HBase, Hypertable, and Cassandra.
Accumulo began its development in 2008 and joined the
Apache community in 2011.

17.
HBase vs. Accumulo
• Slight differences in visibility labels
• Coprocessors vs. Iterators
• Accumulo has faster write throughput*
• HBase’s reads are faster*
• HBase has more ecosystem integration
• BatchScanner
• Accumulo can shift around locality groups after the fact
• Accumulo has shown to work with no problems at 1,000
nodes (BAH paper). Facebook and others run a “cell”
design for HBase. Largest clusters in the hundreds*.
* We believeDisclaimer: I am biased

18.
Apache Accumulo is based on Google's BigTable design
and is built on top of Apache Hadoop, Zookeeper, and
Thrift. Apache Accumulo features a few novel
improvements on the BigTable design in the form of cell-
based access control and a server-side programming
mechanism that can modify key/value pairs at various
points in the data management process. Other notable
improvements and feature are outlined here.
Google published the design of BigTable in 2006. Several
other open source projects have implemented aspects of
this design including HBase, Hypertable, and Cassandra.
Accumulo began its development in 2008 and joined the
Apache community in 2011.

19.
Apache Accumulo is based on Google's BigTable design
and is built on top of Apache Hadoop, Zookeeper, and
Thrift. Apache Accumulo features a few novel
improvements on the BigTable design in the form of cell-
based access control and a server-side programming
mechanism that can modify key/value pairs at various
points in the data management process. Other notable
improvements and feature are outlined here.
Google published the design of BigTable in 2006. Several
other open source projects have implemented aspects of
this design including HBase, Hypertable, and Cassandra.
Accumulo began its development in 2008 and joined the
Apache community in 2011.
(admin & developer) | analyst

21.
Apache Accumulo is based on Google's BigTable design
and is built on top of Apache Hadoop, Zookeeper, and
Thrift. Apache Accumulo features a few novel
improvements on the BigTable design in the form of cell-
based access control and a server-side programming
mechanism that can modify key/value pairs at various
points in the data management process. Other notable
improvements and feature are outlined here.
Google published the design of BigTable in 2006. Several
other open source projects have implemented aspects of
this design including HBase, Hypertable, and Cassandra.
Accumulo began its development in 2008 and joined the
Apache community in 2011.

22.
Apache Accumulo is based on Google's BigTable design
and is built on top of Apache Hadoop, Zookeeper, and
Thrift. Apache Accumulo features a few novel
improvements on the BigTable design in the form of cell-
based access control and a server-side programming
mechanism that can modify key/value pairs at various
points in the data management process. Other notable
improvements and feature are outlined here.
Google published the design of BigTable in 2006. Several
other open source projects have implemented aspects of
this design including HBase, Hypertable, and Cassandra.
Accumulo began its development in 2008 and joined the
Apache community in 2011.

56.
Iterators
• Iterators run tablet server side at these times:
1. Scan Time
2. Minor Compaction
3. Major Compaction
• Multiple iterators are included with Accumulo
• Custom iterators can be created using the Iterator API