Darnit, acquired by Apple and shutdown!? I didn't even know! Thanks for the link. His critique is actually the first I've seen on their tech. He speculates quite a bit despite them giving plenty of detail and claiming to use many of same choices as Google's F1. If they failed, I'd love to know in detail how to improve on their successes. Some of his gripes have obvious work-arounds with others probably having work-arounds.

That said, I think props still go to Google's teams on GFS, Spanner, and F1 as the best techs out there. A F1 variant, which two teams are trying, is best approach given it's proven and with much published detail.

I'm just a little sad that it's all MySQL again, now that we're betting on PostgreSQL and don't really look back.

On the performance side I'm just a litte worried however. While Go is an excellent choice for performant and concurrent app-level code, I'm not too bullish on the language on the database level. InfluxDB sucked hard when we put it under some more load, but let's see what comes out there.

Can you detail, or point me to a blog post which details this stuff? As someone who is about to invest a lot of time in researching use of influxdb at scale, I'm interested in doc of the performance problems, and even moreso if the performance is troubling in the long term due to the architecture or team. Anything you can point me to will save me a lot of time.

We choose MySQL protocol first because it's widely used and we are more similar with the tool chain and protocol in comparison with PostgresSQL.
About the performance, Golang is design for building distributed system in Google, and the development productivity is perfect, InfluxDB's performance problem is partly caused by other reason.

Wouldn't the classic sharding approach at that point simply be to use a second instance? I mean, it'd be great if they handled that for you but I'd tend to assume most apps would have hit the wall on the approach of shoving everything into a single database long before maxing out a 32-core/244GB system with 64TB on SSD.

I'm assuming [meme] is shorthand “overly-broad assertion”? It's nice if your database has horizontal scaling built in but it's not like we don't have an entire generation of successful companies who had application-level sharding logic either by necessity or because they found the control it offered was valuable compared to the built-in generic logic.

> While it's true that most companies will be fine in 256GB ram, we were talking about sharding, which it doesn't have.

You still haven't supported the assertion that it's common for places to have massive, heavily-queried databases like this which would not be better split following natural application-level boundaries. This is particularly relevant when discussing AWS as some of the common reasons for keeping everything in one database are sweet-spots for other services (e.g. migrating data for long-term historical reporting over to RedShift).

Again, I'm not questioning that integrated sharding would have its uses – only your sweeping assertion that this is a likely problem for most people and that it's a dead-end (“you're stuck”) rather than merely one of many growing pains which you'll deal with on a successful product. In particular, it's unlikely that everyone will have the same right answer at that scale since access patterns vary widely.

I'm curious to know the motivation behind publicizing the project at this stage in development, as it seems like the key feature (distributed transactional storage engine) is quite far away on the road map.

Are there any design documents detailing its implementation? I checked the wiki but it didn't look like there was anything there. What alternatives were considered, and why were they abandoned?

Also, is there a concrete use case for which this system is being built? If so, what are some (publicly releasable) details about the use case, e.g. access patterns, data volume, etc.?

Could one of the authors provide some more information about the goals and architecture? Do I understand it correctly that the goal is to implement a relational database on top of one of a couple of different key value stores and aiming, among other things, for drop in compatibility with MySQL? What are expected benefits?

EDIT: The architecture diagram wasn't visible to me before, now that it suddenly appeared things are way more clear.

Please correct me if I am wrong, but it look like it is not distributed yet.
From the roadmap : https://github.com/pingcap/tidb/blob/master/ROADMAP.md
Distributed KV and distributed Transactions have no checkmark!
So in it's current state, how does that differ from using sqlite?

Cockroach Dev here.
We're actively working on a SQL layer actually.
We really need to update our github readme to contain info about it. It's under active development and of course, it's inspired by F1.