hadoop-hdfs-issues mailing list archives

[jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned

Date

Fri, 01 Feb 2013 19:10:12 GMT

[ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568979#comment-13568979
]
Suresh Srinivas commented on HDFS-4461:
---------------------------------------
bq. If someone is running with around 200,000 blocks (a reasonable number), and a 50 to 80
character path, this change saves between 50 and 100 MB of heap space during the DirectoryScanner
run. That's what we should be focusing on here-- the efficiency improvement. After all, that
is why I marked this JIRA as "improvement" rather than "bug"
I think you are missing the point I made earlier. In the description you say:
bq. This has been causing out-of-memory conditions for users who pick such long volume paths.
It is not correct to attribute the inefficiency in memory of DirectoryScanner to OOM. So please
update the description to say DirectoryScanner can be made more efficient.
bq. I saw more than 1 million ScanInfo objects
I am interested in seeing the number of blocks in this particular setup and if we are leaking
these objects.
I am more leaning towards incorrect datanode configuration in the setup where you saw OOM.
Can you provide details on what the heap size of datanode is, the number of blocks on the
datanode etc.?
> DirectoryScanner: volume path prefix takes up memory for every block that is scanned
> -------------------------------------------------------------------------------------
>
> Key: HDFS-4461
> URL: https://issues.apache.org/jira/browse/HDFS-4461
> Project: Hadoop HDFS
> Issue Type: Improvement
> Affects Versions: 2.0.3-alpha
> Reporter: Colin Patrick McCabe
> Assignee: Colin Patrick McCabe
> Priority: Minor
> Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. This object
contains two File objects-- one for the metadata file, and one for the block file. Since
those File objects contain full paths, users who pick a lengthly path for their volume roots
will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't
really need to store File objects-- storing strings and then creating File objects as needed
would be cheaper. This has been causing out-of-memory conditions for users who pick such
long volume paths.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira