Labels

Saturday, July 23, 2016

Misc System Design Questions

Question 1:

Given a hard drive with one terabyte of data, arranged in 2^32 key/value pairs, where the keys and values each have lengths of 128 bytes, you need to design, build, and deploy (by yourself) a system that lets you look up the value for a given key, over the internet, at a peak rate of 5000 lookups per second. The data never changes. Let’s design that system.

Given a large network of computers, each keeping log files of visited urls, find the top ten of the most visited urls.

(i.e. have many large <string (url) -> int (visits)> maps, calculate implicitly <string (url) -> int (sum of visits among all distributed maps), and get the top ten in the combined map) The result list must be exact, and the maps are too large to transmit over the network (especially sending all of them to a central server or using MapReduce directly, is not allowed)