hadoop-common-dev mailing list archives

Eric Baldeschwieler wrote:
> You might try setting the block size for these files to be "very
> large". This should guaranty that the entire file ends up on one node.
>
> If an index is composed of many files, you could "tar" them together
> so each index is exactly one file.
>
> Might work... Of course as indexes get really large, this approach
> might have side effects.
Sorry to be so obstinate, but this won't work either. First, when
segments are created they use whatever default block size is there (64MB
?). Is there a per-file setBlockSize in the API? I couldn't find it - if
there isn't then the cluster would have to be shutdown, reconfigured,
started, and the segment data would have to be copied to change its
block size ... yuck.
Index cannot be tar-ed, because Lucene needs direct access to several
files included in the index.
Index sizes are several gigabytes, and consist of ~30 files per each
segment. Segment data is several tens of gigabytes in 4 MapFiles per
segment.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com