We introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general. The rest of the course is devoted to algorithms for extracting models and information from large datasets. Participants will learn how Google's PageRank algorithm models importance of Web pages and some of the many extensions that have been used for a variety of purposes. We'll cover locality-sensitive hashing, a bit of magic that allows you to find similar items in a set of items so large you cannot possibly compare each pair. When data is stored as a very large, sparse matrix, dimensionality reduction is often a good way to model the data, but standard approaches do not scale well; we'll talk about efficient approaches. Many other large-scale algorithms are covered as well, as outlined in the course syllabus.

If you’re interested in Machine Learning and Data Mining and want to learn with what kind of challenges are posed by huge datasets in applying standard algorithms, then you’ll find this course extremely valuable.
Read Review

MOOCs stand for Massive Open Online Courses. These arefree online courses from universities around the world (eg. StanfordHarvardMIT) offered to anyone with an internet connection.

How do I register?

To register for a course, click on "Go to Class" button on the course page. This will take you to the providers website where you can register for the course.

How do these MOOCs or free online courses work?

MOOCs are designed for an online audience, teaching primarily through short (5-20 min.) pre recorded video lectures, that you watch on weekly schedule when convenient for you. They also have student discussion forums, homework/assignments, and online quizzes or exams.

24 reviews for Stanford OpenEdx's Mining Massive Datasets

This is a course with interesting content but that is somewhat lacking in pedagogy.
The course has a lot of good content, notably from J.Ullman, but course sessions are very long, pedagogy is not optimal.
The course is a huge time investment with dense content all along the 7 weeks or so. If you can get over this it
Read More

This is a course with interesting content but that is somewhat lacking in pedagogy.

The course has a lot of good content, notably from J.Ullman, but course sessions are very long, pedagogy is not optimal.

The course is a huge time investment with dense content all along the 7 weeks or so. If you can get over this it will be very rewarding but not everyone has that kind of time available.

That course would probably be better off cut in smaller chunks or offered as a self-paced course.

Also the fact the course doesn't offer verified certificate will make think twice before investing so much time in it.

I found the lecture to be of medium difficulty for the post-grad student and I would expect it to be rather hard for an undergrad.
The content is offered in two paces; the lectures of Prof. Ullman are hard to follow, as he browses quickly through many of the notions of the course and does not use enough/ explain in en
Read More

I found the lecture to be of medium difficulty for the post-grad student and I would expect it to be rather hard for an undergrad.

The content is offered in two paces; the lectures of Prof. Ullman are hard to follow, as he browses quickly through many of the notions of the course and does not use enough/ explain in enough detail examples. Jure on the other hand uses a lot of examples and is easy to follow even from an undergrad.

Overall it is a time consuming course, expect to need around 6-8 hours per week. In the end, you do learn quite a few stuff and it is a good lecture to take. I am in favor of the instructors' choice of offering it as it is in Stanford.

Something that could help in the course is to split the content in 10 weeks instead of 7 and add mandatory programming exercises. They help a lot in learning stuck and remembering them for a long time.

Hchancompleted this course, spending 10 hours a week on it and found the course difficulty to be hard.

Excellent course by the authors, covering the content of the book of the same name http://www.amazon.com/gp/product/1107077230. It is the MOOC version of http://cs246.stanford.edu. Many useful topics in large scale data processing algorithms are covered including mapreduce, pagerank, networks and graph analysis, stream
Read More

Excellent course by the authors, covering the content of the book of the same name http://www.amazon.com/gp/product/1107077230. It is the MOOC version of http://cs246.stanford.edu. Many useful topics in large scale data processing algorithms are covered including mapreduce, pagerank, networks and graph analysis, streaming algorithms, just to mention a few. The level is advanced undergrad or postgrad, with some chapters covering topics in research papers published within the last decade.

Pacing is faster than most other MOOCs (I estimate about 2x the workload of a typical MOOC). But the material is very useful and rewarding. Exercises are comprehensive and the forums are very useful for checking your understanding.

Aliaksandr Belycompleted this course, spending 7 hours a week on it and found the course difficulty to be hard.

Very interesting course covers a lot of topics. It is rather difficult and takes a lot of time (only lectures usually take around 3 hours/week and it's hard to watch them faster than 1.25x). The only disappointment for me was lectures taught by prof Ullman, was very hard to fallow his monotonic reading, other two lecturers have strong accents but were much more alive and understandable.