Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

30.
However...• THE PROBLEM: THE REAL WORLD • Schema is changed once a week. 15

31.
However...• THE PROBLEM: THE REAL WORLD • Schema is changed once a week. • Real data lacks most columns 15

32.
However...• THE PROBLEM: THE REAL WORLD • Schema is changed once a week. • Real data lacks most columns • Especially in building vertical search over many sites (each has its own schema) 15

33.
However...• THE PROBLEM: THE REAL WORLD • Schema is changed once a week. • Real data lacks most columns • Especially in building vertical search over many sites (each has its own schema) • High Availability is required in some cases 15

38.
Pluggable Storage Strategy• Important: We want to focus on developing application servers • we’re the search engine company, not the database company• DocumentRepository, DistributedFileSystem is pluggable! • Many, many NoSQL storages are emerging • Prepare the simple interface on top of them • You can select the underlying storage technology by the requirements of the system itself • by document volume, availability, consistency, etc. 20

44.
GridFS• MongoDB as Blob-Storage • The contents is splitted into 256kb chunks, with some metadata. • Performance is not as high as HDFS, but still useful in mid-scale deployment. Chunk0 Large Blob Metadata Chunk1 26

49.
How Long?• Prototype Version is in One Week • using C++ client API • about 500 lines• Production release in about 2 month • including bugﬁxes • mongo-user ML is really responsible • Eliot Horowitz merged my patch as quick as possible • The product itself is really stable than I expected (sorry) 31