Following sample java class lets you connect to secure hadoop file system.
I will be using Cloudera maven repos for Hadoop dependencies.
First things first.
You need hdfs-site.xml for your HDFS and keytab file for Kerberos principal.
You can find instructions to generate ketab files here.
Define cloudera repository in your POM file for hadoop dependencies.
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
Now define the following dependency on your POM file, which provides hadoop common capabilities.
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.6.0</version>
</dependency>
Okay, it's time to the real deal now. I am going to copy all the content of HDFS directory to my local file system.
import java.io.FileInputStream;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.LocatedFileStatus;
import org.apache.hadoop.fs.RemoteIterator;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.security.UserGroupInformation;
public class HDFSClient {
public static void main(String[] args) {
try {
// Your HDFS endpoint and basepath
String hdfsBase = "hdfs://yournamenode:port/basepath";
// Your local directory path to hdfs-site.xml of your cluster
String hdfsXML = "path-to-xml/hdfs-site.xml";
// Your keytab file for authenticating with HDFS.
String keyTab = "path-to-keytab/your.keytab";
// principal present on above keytab
String principal = "you@your-domain.COM";
// HDFS relative path which you want to access
String hdfsPath = "data";
Configuration hdfsConf = new Configuration();
hdfsConf.addResource(new FileInputStream(hdfsXML));
hdfsConf.set("fs.defaultFS", hdfsBase);
UserGroupInformation.setConfiguration(hdfsConf);
UserGroupInformation.loginUserFromKeytab(principal, keyTab);
FileSystem hdfsFS = FileSystem.get(hdfsConf);
// Create a Iterator by listing the files on HDFS path.
RemoteIterator<LocatedFileStatus> ri = hdfsFS.listFiles(new Path(hdfsBase + "/" + hdfsPath), true);
// Iterate through each file
while (ri.hasNext()) {
LocatedFileStatus lfs = ri.next();
//DO YOUR STUFF HERE.
//I am copying the HDFS files to local file system.
hdfsFS.copyToLocalFile(false, lfs.getPath(), new Path("workdir/" + lfs.getPath().getName()), false);
}
} catch (Exception e) {
System.out.println(e.toString());
}
}
}
Running Your code: You need to provide the path to Kerberos confguration for JRE
java -Djava.security.kerb5.conf=/etc/krb5.conf HDFSClient