Following sample java class lets you connect to secure hadoop file system.
I will be using Cloudera maven repos for Hadoop dependencies.
First things first.
You need hdfs-site.xml
for your HDFS and keytab file for Kerberos principal.
You can find instructions to generate ketab files here.
Define cloudera repository in your POM file for hadoop dependencies.
<repository> <id>cloudera</id> <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url> </repository>Now define the following dependency on your POM file, which provides hadoop common capabilities.
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.6.0</version> </dependency>Okay, it's time to the real deal now. I am going to copy all the content of HDFS directory to my local file system.
import java.io.FileInputStream; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.LocatedFileStatus; import org.apache.hadoop.fs.RemoteIterator; import org.apache.hadoop.fs.Path; import org.apache.hadoop.security.UserGroupInformation; public class HDFSClient { public static void main(String[] args) { try { // Your HDFS endpoint and basepath String hdfsBase = "hdfs://yournamenode:port/basepath"; // Your local directory path to hdfs-site.xml of your cluster String hdfsXML = "path-to-xml/hdfs-site.xml"; // Your keytab file for authenticating with HDFS. String keyTab = "path-to-keytab/your.keytab"; // principal present on above keytab String principal = "you@your-domain.COM"; // HDFS relative path which you want to access String hdfsPath = "data"; Configuration hdfsConf = new Configuration(); hdfsConf.addResource(new FileInputStream(hdfsXML)); hdfsConf.set("fs.defaultFS", hdfsBase); UserGroupInformation.setConfiguration(hdfsConf); UserGroupInformation.loginUserFromKeytab(principal, keyTab); FileSystem hdfsFS = FileSystem.get(hdfsConf); // Create a Iterator by listing the files on HDFS path. RemoteIterator<LocatedFileStatus> ri = hdfsFS.listFiles(new Path(hdfsBase + "/" + hdfsPath), true); // Iterate through each file while (ri.hasNext()) { LocatedFileStatus lfs = ri.next(); //DO YOUR STUFF HERE. //I am copying the HDFS files to local file system. hdfsFS.copyToLocalFile(false, lfs.getPath(), new Path("workdir/" + lfs.getPath().getName()), false); } } catch (Exception e) { System.out.println(e.toString()); } } }
Running Your code: You need to provide the path to Kerberos confguration for JRE
java -Djava.security.kerb5.conf=/etc/krb5.conf HDFSClient