Twiiter uses Cassandra to store the data in key-value format. Lets assume for simplicity that the key value pair for tweet data looks something like this < twitterHandle,Tweet >.

So, in order to find the top n trends in a given snapshot, we would need to:
1. Process all Tweets and parse out tokens with HashTags.
2. Count all the hashTags.
3. Find out top n hashtags by sorting them.

So, the input data for our Mapper could be a flat file generated out of Values of the Key-Value of <twitterHandle,Tweet>.
It would look something like this :

The problem with this output is that it is sorted by the key values, as in the mapping phase the shuffle and sort step sorts them alphabeticaly on the basis of keys.

To get the desired out of sorting it on the basis of number of occurances of each hashTag, we would need them to be sorted on the basis of values.
So we decided to pass this output to second Map-Reduce job which will swap the key and the value and then perform sort.

Hence :Step4: Mapper 2

Tokenise the input and put 2nd token(the number) as key and 1st token (hashtag) as value.
While mapping it will shuffle and sort on the basis of key.
However, the sorting of the keys by default is in ascending order and we would not get out desired list.
So, we would need to use a Comparator.
We would need to use LongWritable.ReverseComparator.