hadoop-common-user mailing list archives

Re: When copying a file to HDFS, how to control what nodes that file will reside on?

Date

Wed, 10 Apr 2013 06:11:27 GMT

Which java file is responsible for replication?
Which file chooses random data node from same rack and which chooses random
rack?
On Wed, Apr 10, 2013 at 3:26 AM, Raj Vishwanathan <rajvish@yahoo.com> wrote:
> You could use the following facts.
> 1. Files are stored in blocks. So make your blocksize bigger than the
> largest file.
> 2, The first split is stored on the localnode.
>
> Raj
>
> ------------------------------
> *From:* jeremy p <athomewithagroovebox@gmail.com>
> *To:* user@hadoop.apache.org
> *Sent:* Tuesday, April 9, 2013 1:49 PM
> *Subject:* When copying a file to HDFS, how to control what nodes that
> file will reside on?
>
> Hey all,
>
> I'm dealing with kind of a bizarre use case where I need to make sure that
> File A is local to Machine A, File B is local to Machine B, etc. When
> copying a file to HDFS, is there a way to control which machines that file
> will reside on? I know that any given file will be replicated across three
> machines, but I need to be able to say "File A will DEFINITELY exist on
> Machine A". I don't really care about the other two machines -- they could
> be any machines on my cluster.
>
> Thank you.
>
>
>
--
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270