攻略模板

学习后总结

Questions 1： How many unique identifiers possible? Will you run out of unique URLs?

62^6 = 57 billion.

Questions 2：Should the identifier be increment or not? Which is easier to design? Pros and cons?

No. It is easier to design auto increment id. But if we need one more MySQL machine, it will be the issue.

Issues:running out of cacheMore and more write requestsMore and more cache misses

SOLUTIONS:

Database Partitioning

1.Vertical sharding 2. Horizontal sharding.

The best way is horizontal sharding.

Currently table structure is (id, long_url). So, which column should be sharding key?

An easy way is id modulo sharding.

Here comes another question: How could multiple machines share a global auto_increment_id?

Two ways: 1. use one more machine to maintain the id. 2. use zookeeper. Both suck.

So, we do not use global auto_increment_id.

Solution1: put the sharding key as the first byte of the short_url.

Solution2: Use consistent hashing to break the cycle into 62 pieces. It doesn’t matter how many pieces because there probably would not be over 62 machines (it could be 360 or whatever). Each machine is responsible for the service in the part of the cycle.

Questions 7： Estimate the maximum number of queries per second (QPS) for decoding a shortened URL in a single machine.

Let’s assume that:

Daily User: 1,000,000

Daily usage per person: (Write)encode: 0.1, (Read)decode:1.

Daily request: Write 0.1M, Read 1M

QPS: Write 100,000 / 24*60*60s = 1.2, Read 1M / 24*60*60s = 12

Peak QPS: Write: 1.2 5 = 6, Read: 12 5 = 60.

Questions 8： How would you scale the service? For example, a viral link which is shared in social media could result in a peak QPS at a moment’s notice.

参考Question5.

Questions 9： How could you handle redundancy? i,e, if a server is down, how could you ensure the service is still operational?

Questions 10： Keep URLs forever or prune, pros/cons? How we do pruning?

Daily new url: 0.1M. So it takes 100M/0.1M = 1000 days to fill up the storage. Obviously keeping URLs forever is preferable and storage should be cheap. If we run out of storage, we can delete the inactive URLs.

Solution1: Define URLs that last accessed timestamp > 300 days). Querying this in a database with 100 million rows should be quick.

Solution2: Using an in-memory database such as Redis or Memcached to store URLs. URLs will be pruned automatically using LRU Cache mechanism. But this is unrealistic in real world because memory is so much expensive than disk storage.

Questions 11： What API would you provide to a third-party developer?

API for URL Service, get URL status like total hits, referrals, last accessed time.

Questions 12： If you can enable caching, what would you cache and what’s the expiry time?

We can use in-memory database as a cache layer, such as Redis or Memcached. Using the 80/20 rule, roughly 80% of the total hits comes from 20% of the URLs. The expiry time could be short(10 min). It depends on QPS.