Data Analytics

08-741 Very Large Information Systems

This course studies the theory, design, and implementation of text-based information systems. The IR core components of the course include important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling), clustering algorithms, automatic text categorization, and experimental evaluation. The course covers a variety of current research topics, including cross-lingual retrieval, document summarization, machine learning, and topic detection and tracking.

Prerequisites: None
Units: 12
Schedule: Fall semester

11-441 Search Engines and Web Mining

This course provides a comprehensive introduction to the theory and implementation of algorithms for organizing and searching large text collections. The first half of the course examines text search engines for enterprise and web environments; the open-source Indri search engine is used as a working example. The second half of the course explores text mining techniques such as recommender systems, clustering, and categorization. Programming assignments allow for a hands-on experience in document ranking, evaluation, and classification into browsing hierarchies, as well as other related topics.

Prerequisites: Programming and data-structures proficiency at the 15-211 course level or higher. An understanding of algorithms comparable to the CMU 15-451 course level or higher. An understanding of basic linear algebra, comparable to the CMU 21-241/ 21-341 level. An understanding of basic statistics, comparable to the CMU 36-202 course level or higher.
Units: 12
Schedule: Fall semester

11-741 Information Retrieval

This course studies the theory, design, and implementation of text-based information systems. The IR core components of the course include statistical characteristics of text, representation of information needs and documents, several important retrieval models (Boolean, vector space, probabilistic, inference net, language modeling), clustering algorithms, automatic text categorization, and experimental evaluation. The software architecture components include design and implementation of high-capacity text retrieval and text filtering systems. A variety of current research topics are also covered, including cross-lingual retrieval, document summarization, machine learning, topic detection and tracking, and multi-media retrieval.