on SQL front, the entire language implementation is going to be stream based, including multi-journal joins and aggregation (more specifically - re-sampling). It is going to provide full ANSI SQL 92 support with extensions for handling time series. Unlike Java8 streams, nfsdb streams are reusable, which will contain memory allocation at bare minimum.

Suminda Dharmasena

@sirinath

How about some abstraction like Schema which is a collection of channels.

Also even with Journal can be abstract which need not map one to one with the file system

NFSdb looks great! I’m not clear on indexing though. How would I produce a timestamp ordered read after writing unordered? Does query include a sort order?

Vlad Ilyushchenko

@bluestreak01

Thanks, its not quite finished yet. If you can insert in batches there is JournalWriter.mergeAppend(List) to merge ordered List with ordered journal.

Once query language is complete you'd be able to do "select * from x order by timestamp"

Vlad Ilyushchenko

@bluestreak01

@sirinath i tried to make an abstraction out of Journal with pure in-memory storage, but it is a lot of work mainly because you cannot have overlapped memory in java. So it is a major restructuring. I think whats crying now is tool set for data access, which is where my focus is.

Suminda Dharmasena

@sirinath

Sorry what I meant is get rid of the writer and reader abstractions. All this can be abstracted as journal with a schema. Physically they can be in separate files but logically this is like a normal DB with a Schema which contains many tables and a DB which contains many Schemas. The current journal can be the DB abstraction.

This is purely renaming to match familiar concepts and some fluent APIs abstract away the reader and writer

Vlad Ilyushchenko

@bluestreak01

@sirinath I get it now, thanks. Initial version was the way you suggested. There can only be single instance of writer for same journal at any given point in time. This is the case even cross processes. Attempt to create second writer instance will result in exception. At the same time there could be multiple simultaneous readers against the same journal. If both reader and writer function is wrapped by a same interface single writer enforcement will be deferred and less clear as some methods will work some will not. Also having single interface hides intent of passing around instance of Journal class.

Suminda Dharmasena

@sirinath

If you want to do a SQL on many streams how do you handle it?

Vlad Ilyushchenko

@bluestreak01

do you mean join?

Suminda Dharmasena

@sirinath

Yes

You need multiple streams hence multiple readers.

Vlad Ilyushchenko

@bluestreak01

It isn't a problem having multiple readers. For SQL implementation and any other concurrent access there is class JournalPool (which should be renamed to JournalFactoryPool), which gives out JournalReaderFactory via get/release methods. It caches factories and readers to avoid opening/closing readers often. You can of course use normal JournalFactory to do the same if performance is not a concern.

Suminda Dharmasena

@sirinath

OK

This pooling is what I was thinking

The pooling can be done to abstract also

Vlad Ilyushchenko

@bluestreak01

How abstract are you thinking?

Suminda Dharmasena

@sirinath

Like in a DB

_

Vlad Ilyushchenko

@bluestreak01

I don't think i understand. Do you mind giving me an example of how abstract pool should be?

Suminda Dharmasena

@sirinath

A schema

A collection of tables

Which is also a table

You can have views fined as queries

And a DB which has many schemas

Under the hood they are a collection of pools, pools and journals

Vlad Ilyushchenko

@bluestreak01

Ok, got it. This is down the line when there is "query service" either local or network. Browsing database content is definitely essential

This is a very useful idea, in fact my friend is doing very similar project for a bank. It is very useful to integrate legacy data sources under single query system. That said what i'm doing is slightly different. Calcite query system simply would not do for my project for three reasons: its query system does not offer functionality beyond what you get from individual databases, it looks more of an overlap between functionality of data sources it supports (check what kind of query functionality splunk provides vs. calcite). Pick a source file on calcite github and search for "new " operator usage, it is far too many for what i'm building. Third: name sounds strange (https://en.wikipedia.org/wiki/Calcite) what does it have to do with either querying or integration? ;)