Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Lucene-hadoop Wiki" for change notification.
The following page has been changed by udanax:
http://wiki.apache.org/lucene-hadoop/Hbase/HbaseArchitecture
------------------------------------------------------------------------------
by [wiki:udanax Udanax] [[MailTo(webmaster AT SPAMFREE udanax DOT org)]]
- It's need to be much smaller, much faster, managed for high-demand analytics and can be sparse.
- So, BigTable(Hbase) must Column storing like C-Store for wide and sparse data.
- In a column oriented, NULLs are much easier to handle, and impose a significantly smaller performance overhead.
+ I think Hbase should be compact (space-efficient), fast and should be able to manage high-demand load. It should be able to handle sparse tables efficiently.
+ So, for wide and sparse data, Hbase must store data by columns like C-Store does.
+ A column-oriented system handles NULLs more easily with significantly smaller performance overhead,
- And supports both Horizontal/Vertical Parallel Processing.
+ and supports both Horizontal and Vertical Parallel Processing.
- Do you know RDF(Resource Description Framework) Storage?
- We Can put it.
+ Let's consider the following case:
+ You may be familiar to RDF(Resource Description Framework) Storage from W3C, which is
* Storing and managing very large amounts of structured data
* Row/column space can be sparse
@@ -286, +286 @@
* Because of the design of the system, columns are easy to create (and are created implicitly)
* Column families can be split into locality groups (Ontologies!)
- And then, assume some job.
- I wanna get clustered document set by one of RDF Properties.
- It can be Readed only vertical(Column) Data from Table, because Column-stored.
- if you are not in agreement on this point, let me show your ideas via attach me through MSN Messenger(webmaster@udanax.org)
+ Let's assume a large amount of RDF documents are stored in the system.
+ And then, vertical(column) data set by one of RDF properties can be read fast from Table, because it is column-stored.
+ Please let me know if you don't agree with me.
+
----