Connecting to Kerberized HDFS from JAVA

Following sample java class lets you connect to secure hadoop file system. I will be using Cloudera maven repos for Hadoop dependencies.
First things first.
You need hdfs-site.xml for your HDFS and keytab file for Kerberos principal.
You can find instructions to generate ketab files here.
Define cloudera repository in your POM file for hadoop dependencies.

<repository>
    <id>cloudera</id>
    <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
Now define the following dependency on your POM file, which provides hadoop common capabilities.
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.6.0</version>
</dependency>
Okay, it's time to the real deal now. I am going to copy all the content of HDFS directory to my local file system.

import java.io.FileInputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;

  public class HDFSClient {

  public static void main(String[] args) {

    try {
      // Your HDFS endpoint and basepath
      String hdfsBase = "hdfs://yournamenode:port/basepath";
      // Your local directory path to hdfs-site.xml of your cluster
      String hdfsXML = "path-to-xml/hdfs-site.xml";
      // Your keytab file for authenticating with HDFS.
      String keyTab = "path-to-keytab/your.keytab";
      // principal present on above keytab
      String principal = "you@your-domain.COM";
      // HDFS relative path which you want to access
      String hdfsPath = "data";

      Configuration hdfsConf = new Configuration();
      hdfsConf.addResource(new FileInputStream(hdfsXML));
      hdfsConf.set("fs.defaultFS", hdfsBase);

      UserGroupInformation.setConfiguration(hdfsConf);
      UserGroupInformation.loginUserFromKeytab(principal, keyTab);

      FileSystem hdfsFS = FileSystem.get(hdfsConf);

      // Create a Iterator by listing the files on HDFS path.
      RemoteIterator<LocatedFileStatus> ri = hdfsFS.listFiles(new Path(hdfsBase + "/" + hdfsPath), true);

      // Iterate through each file
      while (ri.hasNext()) {

        LocatedFileStatus lfs = ri.next();
                
        //DO YOUR STUFF HERE. 
        //I am copying the HDFS files to local file system.
        hdfsFS.copyToLocalFile(false, lfs.getPath(), new Path("workdir/" + lfs.getPath().getName()), false);

      }

    } catch (Exception e) {
      System.out.println(e.toString());

    }
  }
}

              

Running Your code: You need to provide the path to Kerberos confguration for JRE

java -Djava.security.kerb5.conf=/etc/krb5.conf HDFSClient
Tags Kerberos, Java, HDFS, Hadoop, Cloudera, UserGroupInformation, Keytab

Archives

July, 2020