Book Description

Cascading is open source software that is used to create and execute complex data processing workflows on big data clusters. The book starts by explaining how Cascading relates to core big data technologies such as Hadoop MapReduce. Having instilled an understanding of the technology, the book provides a comprehensive introduction to the Cascading paradigm and its components using code examples. You will not only learn more advanced Cascading features, you will also write code to utilize them. Furthermore, you will gain in-depth knowledge of how to efficiently optimize a Cascading application. To deepen your knowledge and experience with Cascading, you will work through a real-life case study using Natural Language Processing to perform text analysis and search on large volumes of unstructured text. Throughout the book, you will receive expert advice on how to use the portions of the product that are undocumented or have limited documentation. By the end of the book, you will be able to build practical Cascading applications.

Discover the future of big data frameworks and understand how Cascading can help your software to evolve with it

Uncover sources of additional information and other tools that can make development tasks a lot easier

Downloading the example code for this book. You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you purchased this book elsewhere, you can visit http://www.PacktPub.com/support and register to have the files e-mailed directly to you.