In my previous post I have explained how to develop a custom flume interceptor, that could be used to create a HDFS folder structure which can support Hive partitioning. Having folder structure on HDFS is not enough for Hive to identify the partitions. We need to update the Hive metastore to detect these HDFS folders as partitions.
Flume interceptor is a powerful way of modifying flume events in-flight. As the name suggests, they intercept the message on flume channel. You can also feed inputs to your interceptor through configurations. Flume also provides several out of the box interceptors such as "Timestamp Interceptor". But when you need further capabilities, you can develop your own custom interceptors.
Following simple ansible playbook allows you to install the PostgreSQL on CentOS/RHEL, clean up the data directory, initialize the DB, Add the PostgreSQL as a service, enable local login, create SQL user, schema and DB.
Before you begin, get the Kerberos principal user name/password from the cluster administrator.
Following sample java class let you connecting secure hadoop file system.
I will be using Cloudera maven repos for Hadoop dependencies.