I have tried to write some code but withouth any documentation I have not been able to achieve my goal. In particular I have not been able to build the CouchbaseInputDStream correctly because I am not able to provide to the constructor the streamFrom and streamTo parameter. In addition I have not idea ho to retrieve changed couchbase documents from notifications.

You can use spark streaming for the incremental changes, but this of course requires more work to “ingest” properly and also streaming support is experimental right now. A different option would be to “poll for changes” with a N1QL query that fits this criteria, like “where updated_at > …” and so on.

@ldoguin
The problem is that to create reports I have to do a select all on a lot of documents and I think that RDBMS databases are more performant than couchbase on this kind of operations. This is the reason I had an idea to use an RDBMS with a schema already optimized for reports.

@giovanni.casella I don’t know if thats really the case - did you benchmark it? N1QL is pretty good at scale out with the new GSI especially when in memory. Combined with KV fetches you can get awesome performance.

And you could do the analysis directly in spark? I’m not sure you really need to go back into an RDBMS at all, can you tell us more about your use case?

@daschl How can I combine N1QL with KV fetches? With couchbase 4.2 the only way to achieve fast N1QL queries was adding covering indexes but a Couchbase engineered told us that is better to avoid more than 5 indexes for each bucket so we moved from N1QL to KV fetch in almost all the cases (and it was painful). Now I would like to avoid adding indexes for reports.

Regarding the reports with Spark I must admit that I am newbye with spark that I meet this morning for the first time while I have some experiences with some tools able to retrieve data from RDBMS.