How can I tell if data is being ingested

Question

Summary

This "How to" article provides the steps on how to tell if data is being ingested. The high-level overview of steps are listed below:

Step 1: Check ingest directory

Step 2: Check Flume logs

Step 3: Check Elasticsearch

Step 4: Check HBase

NOTE: It is only possible to confirm that data for a historical (or, static) dataset has been fully ingested. For streaming data, ingestion is continual and as a result, does not have an end. You must also know the data type being ingested, in addition to the number of events that is in the dataset being ingested.

The following nodes will be accessed (via web or SSH):

REPORTING (web)

STREAM (SSH)

ANALYTICS (SSH)

Steps

NOTE: This information is only useful for CSV data ingest using Flume

Step 1: Check ingest directory

The first step is to confirm if the dataset has been read from the ingest directory. Flume will mark a CSV file as “.COMPLETED” to show the dataset is read. Please follow the steps below to confirm if this has occurred: