For the purpose of illustration, say I would like to create a decentralized version of Yelp.

A centralized approach would be to have a restaurants table and a reviews table in an Mysql database. In the case of a DApp, what is the best practice for storing non-transactional data?

My current thinking to have a Restaurant.sol contract and a Review.sol contract, each having a mapping from the record ID as key, to the record object as value. Every time a restaurant is added, we invoke a addRecord() method in Restaurant, which adds the new restaurant data to its current mapping. (Similar flow for Review)

Is this approach of treating a contract like a RDBMS table robust? Or am I missing something?

What's the limit to the amount of data that can be stored in the mapping of a contract?

EDIT: from my understanding changing the mapping of the Restaurant contract would cost ether. So that means every new record that is added to the "database" costs money? What's an economically viable way to run an app without requiring users to pay?

2 Answers
2

First, you don't want to think of contracts as analogous to tables. Each contract can hold multiple mappings of information. You could have a single ReviewSystem.sol that has mappings for both Restaurants and Reviews, and in line with what you suggested earlier you could have addRestaurant() and addReview() methods that would add records to the mappings stored on the contract.

That said, contracts don't have very good mechanisms for normalized relational data. In a relational database, if you had a "Restaurant" record you could run a query for all reviews relating to a certain restaurant, and with the power of indexes and query planners get that information back very quickly. In Ethereum your options are more limited. If you don't have an index you'll have to scan over every review in your system to see if it relates to the restaurant in question. If you do have an index, you increase the costs of writing records to your contract. You might also denormalize the data for faster lookups, again at a cost of increased contract storage.

One option is to keep some of your information off-chain. For example you might keep your restaurants and reviews on your contract with a pointer from Review to Restaurant. Separately you might have a traditional database that indexes the relationship from Reviews to Restaurants. When someone adds a review via your DApp the contract stores the canonical information, and the separate database keeps a copy of the information. When someone wants to look up the reviews relating to a certain restaurant, they might query an API backed by the database which can quickly return the IDs for the reviews, and they can get the reviews themselves from the blockchain. In this case, all of the critical information is available on-chain, and the off-chain index can be reconstructed from the contract state. To learn about tracking contract events off chain, read about Events and Logs.

As you noted in your edit, every piece of information you save to a contract costs gas, and that gas in turn costs Ether. Running a service like Yelp with hundreds of thousands of businesses and thousand word reviews could prove very expensive. You could mitigate some of the costs by storing more information off-chain. Perhaps on-chain a review consists of a star rating and the hash of the written review, then the review text is retrieved from an off-chain source such as the database we discussed earlier, or perhaps another decentralized system like ipfs.

In general though, I see the Ethereum block chain as the place where you store information that you don't want to trust third parties to manage. Things like ERC20 tokens, ENS, and distributed exchanges are a great use case because it would otherwise be difficult to establish trust in a centralized entity. While it would be neat to have something like a decentralized review service, the cost of contract storage combined with the relatively low risk of trusting a third party to manage that information make it a less attractive use case for the blockchain.

"Is this approach of treating a contract like a RDBMS table robust? Or am I missing something?"

Yes, this solution is robust, assuming you mean reliable or "will it work every time". In fact robustness is one of the most attractive traits of writing contracts that run on the Ethereum blockchain: you have more redundancy than anyone would ever need. There are 25,5xx nodes [1] right now. Each one of those nodes stores a copy of your data, your code, and executes each line of code in your contract.

"What's the limit to the amount of data that can be stored in the mapping of a contract?"

This question is answered here by an authoritative figure in the Ethereum community, so I will refrain from addressing it except to say that either you probably don't have that much data or certainly couldn't afford to store as much as the blockchain could "take". That said, since each node stores a copy of the blockchain's data, if you could afford it, you would burden each of the nodes and possibly cause fewer nodes to exist since running a node would cost more because the size of the data would be so huge.

"From my understanding changing the mapping of the Restaurant contract would cost ether. So that means every new record that is added to the "database" costs money?"

Yes, changing the mapping is changing the state of the blockchain so will cost gas.

What's an economically viable way to run an app without requiring users to pay?

You have a few options:

~~Charge your users gas~~

Compensate your users with additional units of your native app token, if you have a flexible supply of your token

At the org I contribute to we store a hash that is a cryptographic proof of the data off-chain and we store just the IPFS hash on chain. IPFS hashes are unfortunately more than the 32 byte EVM word (see this highly informative answer about storing IPFS hashes on chain in a struct).

What's interesting about this solution is that users can later independently verify the data that the hash points to is what should have actually been stored on-chain by running the same hash function on the returned data.

Ethereum/Dapp Data Storage Takeaways

Storing data on-chain is very expensive relative to centralized solutions.

Storing 1 kB costs 640,000 gas

640,000 gas costs $0.08- $0.90 using current ether prices depending on how quickly you would like your transaction to be mined/confirmed

If the price of ether continues to rise, the price of storing your data will increase with it, assuming a commensurate decrease in the accepted gas price miners are willing to take does not happen.

Every change to the blockchain, no matter how small, costs gas (or "money" in legacy parlance)

I will share more about how we have approached storing our data in a near-fully decentralized application at a later date. And thank you for teaching me what I learned from writing this answer (like view is now an alias for constant)