Tag Archives: hadoop

Few days ago I have been contacted from Packt Publishing and we talked about a good traffic generated from this blog. “Ehi, are you saying that people is interested in my reviews?” has been my reply. 🙂

Because of this nice result, Packt Publishing send all of us a gift to share with all of you to thank you for the good feedback on fcorti.com. The special discount of 50% can be used (for free) on one or more of the books listed below. The promotion starts today and will be valid until the 30th of September.

In this review I would like to share another interesting read about Hadoop from Packt Publishing. It is easily understandable that this time the focus is on a interesting and non common topic: Hadoop backup and recovery strategies.

During the past months, many books have been published about Hadoop technology and for sure this is one of the most popular IT buzzword. Of course, the majority part of the books are for neophytes or people that want to know better the MapR solution and all the Hadoop ecosystem. From this point of view, this book could be classified as an introductory book to the solution architect and IT developers. This classification is clearly represented in the first part of the book (in particular in chapter 1 – Big data overview). Read More →

Do you really know what Apache Hadoop is?
Are you sure to understand the meaning of “big data” in the real world scenario?
How big data storage issue and data warehouse issue meet Hadoop implementation?
Which are the main tools Apache Hadoop is based on?

If you completely don’t know (but you are interested in) or you want to have a clear and final picture of those topics (and probably much more) you should read this book. Read More →

The goal of the book is quite clear in the title too: describe in practice how Apache Hadoop and Apache Solr, help organizations to resolve the problem of information extraction from big data. Don’t you think this is a very interesting problem to face? I think so.

If you are interested in Hadoop technology probably this is an interesting video course you should evaluate. As you probably know, Apache Hadoop is an open-source software framework for storage and large-scale processing of data-sets on clusters of commodity hardware. All the modules in Hadoop are designed with the assumption that hardware failures are common and thus should be automatically handled in software by the framework.

Talking about the video course, we can divide the content in three main macro-sections:
1. how to create and set up a three machines cluster using Amazon EC2,
2. how to install an Hadoop cluster using Apache Ambari,
3. how to start using Hadoop cluster, in particular with Apache Hadoop User Interface (HUE).

The description of all the topics is clear and well done (Sean Mikha, the author, did a good job). All the relevant topics are always detailed before with an explanation of the logic structure and approach and only after with a demostration on how to do it in practice.

Useful also for other purposes, the creation of the virtual machines on Amazon EC2. The practical description and the step by step creation, is not limited to the server’s creation but is detailed also in what concerns the security and connection using, for example, putty ssh client.

In my opinion the most relevant value of this video course is on the hidden details of the Hadoop cluster installation process. As you will see if you will decide to follow it, the tasks are quite easy to do (probably this a Sean’s merit) but the configuration details and settings are very important if you want to make it work in practice. Following the hints I’m sure every neophyte will gain days of work and lot of nights in googling. 😉