I think that interactive queries may be impossible because RDF queries commonly perform joins and doing joins against large-scale RDF data sets needs MapReduce jobs. But, MR is a batch processing and has too slow response time to provide interactive queries. It would be good to target analytical processing on large-scale RDF data.

Hyunsik Choi
added a comment - 13/Apr/10 06:39 I think that interactive queries may be impossible because RDF queries commonly perform joins and doing joins against large-scale RDF data sets needs MapReduce jobs. But, MR is a batch processing and has too slow response time to provide interactive queries. It would be good to target analytical processing on large-scale RDF data.

However, they do everything in memory. We can store these indexes in HBase and allow for fast querying. However, we cant guarantee as good performance as these papers do. It'll still be much better than a MR job though.

Batch processes can also use these indexes for getting results out faster. This is yet to be explored.

Amandeep Khurana
added a comment - 13/Apr/10 06:48 We should be able to answer small queries with low latency by using the right kind of indexing. Here are some papers that do it:
http://people.csail.mit.edu/tdanford/6830papers/weiss-hexastore.pdf
http://portal.acm.org/citation.cfm?id=1114857
However, they do everything in memory. We can store these indexes in HBase and allow for fast querying. However, we cant guarantee as good performance as these papers do. It'll still be much better than a MR job though.
Batch processes can also use these indexes for getting results out faster. This is yet to be explored.

Both papers aims at reducing the number of joins. Is right? They does not eliminate joins, whereas joins that both papers cannot eliminate is common as you can see the berlin benchmark ( http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html ). In such cases, it may be inevitable to use MapReduce to process join processing on large-scale RDF data sets. If you makes use of other distributed computing model (i.e., instead of MapReduce) specified to RDF query processing , I could understand.

Besides, Hexastore makes use of six indices in six possible ways of RDF triples. Is right? I wonder how is it implemented based on Hbase.

Hyunsik Choi
added a comment - 13/Apr/10 07:19 Both papers aims at reducing the number of joins. Is right? They does not eliminate joins, whereas joins that both papers cannot eliminate is common as you can see the berlin benchmark ( http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/spec/index.html ). In such cases, it may be inevitable to use MapReduce to process join processing on large-scale RDF data sets. If you makes use of other distributed computing model (i.e., instead of MapReduce) specified to RDF query processing , I could understand.
Besides, Hexastore makes use of six indices in six possible ways of RDF triples. Is right? I wonder how is it implemented based on Hbase.