Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Hadoop Security Preview

2.
Problem <ul><li>Primary Goal: Keep Data in HDFS Secure from unauthorized access! </li></ul><ul><li>Corollary: All HDFS clients must be authenticated to ensure they are the user they claim to be. </li></ul><ul><li>Since Map/Reduce runs applications as the user, it must authenticate users. </li></ul><ul><li>Since servers (HDFS, Map/Reduce) are entrusted with user credentials, they must also be authenticated. </li></ul><ul><li>Kerberos will be the underlying authentication system. </li></ul><ul><li>Must be able to configure security on or off. </li></ul>

5.
Security Threats in Hadoop <ul><li>User to Service Authentication </li></ul><ul><ul><li>No User Authentication on NameNode or JobTracker </li></ul></ul><ul><ul><ul><li>Client code supplies user and group names </li></ul></ul></ul><ul><ul><li>No User Authorization on DataNode – Fixed in 0.21 </li></ul></ul><ul><ul><ul><li>Users can read/write any block </li></ul></ul></ul><ul><ul><li>No User Authorization on JobTracker </li></ul></ul><ul><ul><ul><li>Users can modify or kill other user’s jobs </li></ul></ul></ul><ul><ul><ul><li>Users can modify the persistent state of JobTracker </li></ul></ul></ul><ul><li>Service to Service Authentication </li></ul><ul><ul><li>No Authentication of DataNodes and TaskTrackers </li></ul></ul><ul><ul><ul><li>Users can start fake DataNodes and TaskTrackers </li></ul></ul></ul><ul><li>No Encryption on Wire or Disk </li></ul>

6.
Definitions <ul><li>Authentication – Ensuring the user is who they claim to be. </li></ul><ul><ul><li>We have a very poor job of this currently </li></ul></ul><ul><ul><li>We need it on both RPC and Web UI. </li></ul></ul><ul><li>Authorization – Ensuring the user can only do things that they are allowed to do. </li></ul><ul><ul><li>HDFS does this already via owners, groups and permissions </li></ul></ul><ul><ul><li>Map/Reduce does not do this </li></ul></ul>

7.
Using Kerberos and Single Signon <ul><li>Kerberos allows user to sign in once to obtain Ticket Granting Tickets (TGT) </li></ul><ul><ul><ul><li>kinit – get a new Kerberos ticket </li></ul></ul></ul><ul><ul><ul><li>klist – list your Kerberos tickets </li></ul></ul></ul><ul><ul><ul><li>kdestroy – destroy your Kerberos ticket </li></ul></ul></ul><ul><ul><ul><li>TGT’s last for 10 hours, renewable for 7 days by default </li></ul></ul></ul><ul><ul><li>PAM on Linux and Solaris can automatically do kinit for you </li></ul></ul><ul><ul><ul><li>Still needs your password </li></ul></ul></ul><ul><ul><li>Once you have a TGT Hadoop commands work like before </li></ul></ul><ul><ul><ul><li>hadoop fs –ls / </li></ul></ul></ul><ul><ul><ul><li>hadoop jar wordcount.jar in-dir out-dir </li></ul></ul></ul>

10.
Other MapReduce Security Changes <ul><li>MapReduce System directory was 777 but now 700. </li></ul><ul><li>Tasks run as user instead of TaskTracker user. </li></ul><ul><li>Task directories were globally visible and now 700. </li></ul><ul><li>Distributed Cache is now secure </li></ul><ul><li>Shared (original is world readable) is shared by everyone’s jobs. </li></ul><ul><li>Private (original is not world readable) is shared by user’s jobs. </li></ul>

11.
Web UIs <ul><li>Hadoop and especially MapReduce make heavy use of the Web UIs. </li></ul><ul><li>These need to be authenticated also… </li></ul><ul><li>We will make it pluggable, but include a login module that uses the Kerberos username and password. </li></ul><ul><li>Even better is if someone makes a SPNEGO filter for Jetty that uses the Kerberos tickets from the browser. </li></ul><ul><li>All of the servlets will use the authenticated username and enforce permissions appropriately. </li></ul>

12.
Proxy-Users <ul><li>Some services must access HDFS and MapReduce as other users </li></ul><ul><li>HDFS and MapReduce allow users to create configuration entries to define: </li></ul><ul><li>Who the proxy service can impersonate </li></ul><ul><li>Which hosts they can impersonate from </li></ul><ul><li>hadoop.proxyuser.superguy.groups=goodguys </li></ul><ul><li>hadoop.proxyuser.superguy.hosts=secretbase </li></ul>

13.
Remaining Security Issues <ul><li>We are not encrypting on the wire. </li></ul><ul><ul><ul><li>It will be possible within the framework, but not in 0.22. </li></ul></ul></ul><ul><li>We are not encrypting on disk. </li></ul><ul><ul><ul><li>For either HDFS or MapReduce. </li></ul></ul></ul><ul><li>Encryption is expensive in terms of CPU and IO speed. </li></ul><ul><li>Our current threat model is that the attacker has access to a user account, but not root or physical access. </li></ul><ul><ul><ul><li>They can’t sniff the packets on the network. </li></ul></ul></ul>