Also, I have been mulling over DB performance. If we want to go the relational route, and enjoy very good speeds, we could write the json data in multiple chunks, into the db. Thus, for Herodotus, we could have tables with it pre-chunked into various parts, eg: herodotus_histories_work and herodotus_histories_book and herodotus_histories_book_chapter and herodotus_histories_book_chapter_section. The API would route the request into the appropriate query, for maximum speed. Not for this first or second or even perhaps third iteration, but could solve speed issues if mongo won't scale

Luke Hollis

@lukehollis

Yeah, that sounds like a really good idea

I was thinking about postgres maybe replacing Mongo due to its native json datatype?

Seems like building with postgres might be a little bit of having our cake and eating it too?

Kyle P. Johnson

@kylepjohnson

I have not looked into storing JSON with Postgres, but this is compelling.

For the first iteration of the frontend, @lukehollis , do you need a db? (Such as for user, configs, etc.)

Rob Jenson

@ferthalangur

Keep in mind 2 things about that article: 1 it is written, and the benchmarks are run, by a company who makes their money converting people to their flavor of PostgresSQL;

_

Rob Jenson

@ferthalangur

2: Based on that article, and I tend to agree with that part, NoSQL databases came into existence and became popular in part because modeling data onto structured databases has a high learning curve to do it right, has a lot of developer overhead to do it well, and often required the cooperation of other professionals (e.g. DBAs) to make it happen. This interfered with the goals of agile software development (little A and big A), so NoSQL became very sexy.

We don't use PostgresSQL at CHS because we are comfortable with MySQL. Each has their advantages. At the end of the day

Rob Jenson

@ferthalangur

You have to ask what you are trying to accomplish right now. It might be optimal, in terms of resources available, to build at the relatively unstructured level now (i.e. use MongoDB) with an eye towards an optimizing filter later that can generate SQL database schemas and procedures for storing, indexing and retrieving the data from a more efficient database engine (but don't put all your eggs in the Postgres basket ... Abstract away from what the backend database engine is).

Rob Jenson

@ferthalangur

The other thing to consider, if your JSON can handle it, is using JSONB in a PostgresSQL database. You get much more efficient retrieval.

Rob Jenson

@ferthalangur

Ahhhh ... they have added JSON support to MySQL 5.7, which will take a while to get real production acceptance, but it is an option.

ConclusionOne of the big reasons that people are interested in JSON support in databases is that they want to use fewer types of databases. The modern technology stack is beginning to introduce significant sprawl as people use different databases in particular areas, taking advantage of their strengths to gain efficiency. However, this polyglot persistence increases the technology surface area enough that it can become quite difficult to monitor, manage, develop, and operate such a diverse set of databases.

One potential answer to this problem is to continue using work horses such as MySQL and Postgres, replacing specialized JSON native databases with the new JSON functionality. For the record, MongoDB is not the only JSON native database. There are many databases, including RethinkDB, that deal with JSON natively.

A big difference between the JSON support in MySQL, Postgres, and MongoDB is that in MongoDB, this is the native transport across the wire as well. JSON is native end to end in this database, whereas in other databases it is typically shoehorned into row and column storage in the client and on the wire, as well as the programming API.

Still, keeping technology diversity down can be a big enough reason to continue using the reliable and trusted databases of yore.```

but now that the api supports querying individual chunks, it could go either way

I guess conceptually it helps me think about a portion of the api for the sole and only purpose of feeding data to the frontend

Luke Hollis

@lukehollis

as something coming from a mongo database

but for our deadline by the 13th, I'll try to make everything work just from the files first

Kyle P. Johnson

@kylepjohnson

@lukehollis and @ferthalangur -- really interesting conversation for me to follow. I am fairly agnostic about backend … if optimization is a chief goal, from early on, then the Postgres-JSON way is intriguing.

In my past experience writing an API, the chief struggle is hammering out the ugly details surrounding the URL + data objects. If we can do this well, I think we'll set up for success regardless

Luke Hollis

@lukehollis

yeah! I think that's the truth!

Luke Hollis

@lukehollis

Any thoughts on where the converter for Scansion information to HTML might go?

Hey Luke! Thoughts about scansion info: How long will this take to run on the entire corpus? If fast, then we can do it dynamically each time the corpus is loaded. If slow, then I am good with adding it to its respective Perseus dir. (Note: the JSON cleaned up JSON that I am adding to the cltk_api repo will soon be moved into either the greek or latin Perseus dir. But don't worry, I will take care of this.)

That's a good question about chunking. For book-line, this is easy: each line will always be wrapped in a <p> tag. For book-chapter-section and book-chapter, I believe we will be following Perseus's website to wrap chapters in <p> tags. In addition to this, I retained newline markup \n when it occurs w/in a chapter (ie, this can happen for a very long chapter that some editor decided to break into two to improve reading.)