Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I understand that MySQL and others support large enough data for storing and retrieving. I am also aware of the open source projects such as hadoop and mapreduce etc.. (only their purpose and what they do).

EDIT:
when do you bring down the concept of hadoop , pig, mapreduce etc to your application?. Should we use these software in the beginning of the project itself or can it be induced at a later stage after the database is increased to large size?. Any link will be appreciated.

1 Answer
1

Google searches are a question that Hadoop can answer. Think about the characteristics of a large search engine:

Large amounts of data

Distributed data

Extreme parallelism

Scalability was mentioned in the comments: With Hadoop, it is not hard to throw additional (commodity) servers into the mix.

On to your question. If your project has a lot of SQL and von Neumann bottlenecks, then Hadoop makes little sense. If, however, your data is "Big Data," is less structured, and may be parallelized, then Hadoop will make more sense.