Wave

@wavy

Posts made by wavy

One thing to note is that when you set up a db for auto increment and if you make it distributed - then if you have to add a node or remove a node and rebalance the data, there will ID collision since each node has auto increment.

db1-123, 456, 789
db2-123, 456, 789
Even though content is differnet (original and short url), all ids in nodes will be same.
You're doing correct routing but ID has to be unique. If you want to rebalance data, add or delete node, you get id collisions.

To fix this - you can use two (or more) db just to create ids - ID_db1 creates odd id, ID_db2 creates even number ids. (basically just return id if (id % (1 for ID_db1 or 2 for ID_db2) == 0) else auto_increment). So even if one db goes down, we still proceed and fix and resume the process.

And now this id can be used in whatever nodes you wish by directly apply consistent hash on this id. no collision problem.

Data model looks okay but with one correction
Comment can have nested comment so to identify them you have parent_comment_id
Comment (id, text, post_id, user_id, parent_comment_id)
And to keep simple for vote, it can be
Vote(comment_or_post_id, count)

Another suggest i want to make is always have time_created and time_updated row whenever you are updating. it helps for various purpose - mainly analytics and timeline

Few more things to consider here.
Vote is updated very frequently, so we can not update table for every vote - very expensive to update and reads are locked. So we should make it "eventual consistent" - meaning every vote from each server can be updated into a cache (LRU), then we aggregate total and update table every few minutes, and another cache "read comment cache" which keeps total votes for a comment.

For reading vote for a comment, read from comment cache or read from db.

To load all comments for a post - load parent comments, then lazy load child comments if required.

To quickly calculate the best comment/top comment (sum of all upvotes of nested comment) - you can add another field into comment table - root_comment_id
Comment (id, text, post_id, user_id, parent_comment_id, root_comment_id)
Now you can compute total votes for all comments for a root_comment and display the top result. And then lazy load child comment if needed

SQL vs No SQL - I think no sql seems better choice since schema is simple and not require transactional, and mainly we need read focused

Yes I realize after posting that knapsack is differnt - since it is to "maximize" the value within weight bound.
here it is "exact" value. It is like coin problem - give infinite coins, check if coins add up to certain value. or Give all possible combination that add to the given value.