YARN local and log storage discussion

Actually we use a dedicated large disk on each datanode/nodemanger to host log and local files of running containers.

I read it is recommended to put YARN local and log files on multiple mount points and more precisely on all HDFS disks (to prevent I/O bottlenecked, and impact the whole nodemanger in case of disk failure)

I wonder if it's not dangerous for the HDFS in case the application log fill multiple HDFS mountpoints, what is the expected behavior of the HDFS service?

1 Reply

Yes, it is definitely recommended to put the YARN local & log directories on multiple disks for resiliency. Putting them all on a single disk means that when that disk fails, the corresponding node entirely becomes unusable for scheduling any more containers.

While you are in general right about potential impact of container local/log data with HDFS reads/writes, it tends to be minimal in practice because the container local/log data is very tiny compared to HDFS data being read/written.

This website uses cookies for analytics, personalisation and advertising. To learn more or change your cookie settings, please read our Cookie Policy. By continuing to browse, you agree to our use of cookies.