BigData-Collecting Data

Getting data into the BigData system depends on the source from where you want to collect the data. There are various options available as of today.

First we will see the various sources from where data can be collected.

Data collected by different sensors

Log files generated by web applications

Structured / Unstructured data in different external databases

Data generated by mobile devices

Data generated by social media (Facebook, Twitter, Instagram, etc)

So forth and so on. The list is huge but only few are mentioned here.

As we see the list of sources is huge and there is no one solution fit for all. Based on the source of data we can use different tools to get that data into BigData servers. Here is list of some of those options:

Flume: Can be used to collect log file data from different servers

SQL Query: Can be used to collect the data from external databases

Sqoop: Can be used to collect the data from external databases

Files: Data can also be collected with basic OS file copy-paste operations. This is time consuming process and there are dis-advantages also but this can be used on very small scale operations

REST APIs: Can be used to collect data from mobile devices, social media, etc

Streaming: This can be achieved using Apache Kafka. This is another option which can be used to collect real time data from social media, server logs, credit card transactions, etc