Have you considered using something higher-level like PIG or Hive? Are
there reasons why you need to process at this low level?
-----Original Message-----
From: Aaron Baff [mailto:Aaron.Baff@telescope.tv]
Sent: Friday, September 10, 2010 11:50 PM
To: common-user@hadoop.apache.org
Subject: Custom Key class not working correctly
So I'm pretty new to Hadoop, just learning it for work, and starting to
play with some of our data on a VM cluster to see it work, and to make
sure it can do what we need to. By and large, very cool, I think I'm
getting the hang of it, but when I try and make a custom composite key
class, it doesn't seem to correctly group the data correctly.
The data is a bunch of phone numbers with various transactional data
(timestamp, phone type, other call data). My Mapper is pretty much just
taking the data, and splitting it out into a custom Key (or Text with
just the phone number) and custom Value to hold the rest of the data.
In my reducer, I'm counting the number of unique phone numbers among
other things using a Reporter counter. Using my key class (code below),
I get a total of 56,404 unique numbers which is way too low. When I use
just the phone number (using Text) as the key, it gives me 1,159,558
which is correct. In my custom class hashCode() method I'm just using
the String.hashCode() for the String holding the phone number.
That seemed reasonable to me, since I wanted it to group the values by
the phone number, and then order by the timestamp which is what I'm
doing in the compareTo() function.
========================================================================
====================
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import org.apache.hadoop.io.WritableComparable;
import org.apache.hadoop.io.WritableComparator;
public class AIMdnTimeKey implements WritableComparable {
String mdn = "";
long timestamp = -1L;
private byte oli = 0;
public AIMdnTimeKey() {
}
public AIMdnTimeKey( String initMdn, long initTimestamp) {
mdn = initMdn;
timestamp = initTimestamp;
}
public void setMdn( String newMdn ) {
mdn = newMdn;
}
public String getMdn() {
return mdn;
}
public void setTimestamp( long newTimestamp ) {
timestamp = newTimestamp;
}
public long getTimestamp() {
return timestamp;
}
public void write(DataOutput out) throws IOException {
out.writeUTF(mdn);
out.writeByte(oli);
out.writeLong(timestamp);
}
public void readFields(DataInput in) throws IOException {
mdn = in.readUTF();
oli = in.readByte();
timestamp = in.readLong();
}
public int compareTo(Object obj) throws ClassCastException {
if (obj == null) {
throw new ClassCastException("Object is NULL and so cannot
be compared!");
}
if (getClass() != obj.getClass()) {
throw new ClassCastException("Object is of type " +
obj.getClass().getName() + " which cannot be compared to this class of
type " + getClass().getName());
}
final AIMdnTimeKey other = (AIMdnTimeKey) obj;
return (int)(this.timestamp - other.timestamp);
}
@Override
public int hashCode() {
return mdn.hashCode();
}
@Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final AIMdnTimeKey other = (AIMdnTimeKey) obj;
if ((this.mdn == null) ? (other.mdn != null) :
!this.mdn.equals(other.mdn)) {
return false;
}
return true;
}
@Override
public String toString() {
return mdn + " " + timestamp;
}
/**
* @return the oli
*/
public byte getOli() {
return oli;
}
/**
* @param oli the oli to set
*/
public void setOli(byte oli) {
this.oli = oli;
}
}
========================================================================
====================
Aaron Baff | Developer | Telescope, Inc.
email: aaron.baff@telescope.tv<mailto:aaron.baff@telescope.tv> |
office: 424 270 2913 | www.telescope.tv<http://www.telescope.tv/>
The information contained in this email is confidential and may be
legally privileged. It is intended solely for the addressee. Access to
this email by anyone else is unauthorized. If you are not the intended
recipient, any disclosure, copying, distribution or any action taken or
omitted to be taken in reliance on it, is prohibited and may be
unlawful. Any views expressed in this message are those of the
individual and may not necessarily reflect the views of Telescope Inc.
or its associated companies.