Overview:
This blog talks about Kerborize HDFS (for that matter all modules)
in Cloudera Manager and access the same from Information Server for the purpose
of Profiling, Data Quality analysis, Data Integration etc using the data stored
in HDFS.
Prerequisites:
- Install Cloudera Manager on the server (let us say on “blr01.ibm.com”)
- Install InfoSphere Information Server on a different server (let us say on “blr02.ibm.com”)
- Install and setup Kerberos Server on “blr02.ibm.com”. (Note that this can also done on any other server and can be referred from all Kerberos clients in their configurations accordingly)
- Install and setup Kerberos Client on “blr01.ibm.com” (where Cloudera Manager is installed and setup) and on “blr02.ibm.com” ( In case if Kerberos Server is not setup on this box)
KDC Infrastructure
setup:
The below are some of the steps that need to be followed
while installing and setting up KDC
1. Install Kerberos V5 server/client libraries
2. Install Master KDC
- Edit Configuration Files
- Create Database
- Add Administrators to the Acl File
- Add Administrators to the Kerberos Database
- Create a kadmind Keytab
- Start the Kerberos Daemons on the Master KDC
3. Install Slave KDCs (optional but highly recommended)
4. Propagate Database to Slave KDCs
5. Create Stash Files and Start krb5kdc Daemons on Slave
KDCs
4. Configure Kerberos Client machines
Sample kdc.conf
[kdcdefaults]
kdc_ports = 88
kdc_tcp_ports = 88
[realms]
IPS.COM = {
master_key_type = aes128-cts
max_life = 2d
max_renewable_life = 2w
acl_file = /var/kerberos/krb5kdc/kadm5.acl
dict_file = /usr/share/dict/words
admin_keytab =
/var/kerberos/krb5kdc/kadm5.keytab
supported_enctypes = aes128-cts:normal des3-hmac-sha1:normal
arcfour-hmac:normal des-hmac-sha1:normal des-cbc-md5:normal des-cbc-crc:normal
}
Sample krb5.conf
[logging]
default = FILE:/var/log/krb5libs.log
kdc = FILE:/var/log/krb5kdc.log
admin_server = FILE:/var/log/kadmind.log
[libdefaults]
default_realm = IPS.COM
dns_lookup_realm = false
dns_lookup_kdc = false
ticket_lifetime = 2d
renew_lifetime = 2w
kdc_timeout = 10s
forwardable = true
renewable = true
[realms]
IPS.COM = {
kdc = blr02.ibm.com:88
admin_server = blr02.ibm.com:749
}
[domain_realm]
.ibm.com = IPS.COM
.ibm.com = IPS.COM
[kdc]
profile=/var/kerberos/krb5kdc/kdc.conf
The following links give some info on overview, install and
setup steps for KDC Infrastructure.
Enable Kerberos
Security in ClouderaManager:
Logon to Coudera Manager admin console (for example: http://blr01.ibm.com:7180/)
Follow the steps mentioned in http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/latest/Configuring-Hadoop-Security-with-Cloudera-Manager/Configuring-Hadoop-Security-with-Cloudera-Manager.html
to enable Hadoop Security.
These steps create the required keytab files in the server
where Cloudera Manager is installed. One
of these keytab fles need to be moved to the client (blr02.ibm.com, where
Information Server is installed)
Drivers used:
Cloudera
ODBC Driver for Apache Hive
This driver needs to be installed on the server where
Information Server is installed (blr02.ibm.com)
P.S. Always have latest version of Cloudera ODBC Driver for Apache
Hive to avoid any performance issues
Mandatory steps to be
followed in the server where Information Server (IS) Engine is installed:
- DSN Entry in .odbc.ini file
[Hive_Cloudera]
Driver=/opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
Host= blr01.ibm.com
Port=10000
Schema= default
DefaultStringColumnLength=255
FastSQLPrepare=0
UseNativeQuery=0
HiveServerType=2
AuthMech=1
HS2AuthMech=1
HS2HostFQDN= blr01.ibm.com
HS2KrbServiceName=hive
HS2KrbRealm= IPS.COM
- Create /opt/IBM/InformationServer/Server/DSEngine/.cloudera.hiveodbc.ini file with the below contents
[Driver]
## - Note that this
default DriverManagerEncoding of UTF-32 is for iODBC.
## - unixODBC uses
UTF-16 by default.
## - If unixODBC was
compiled with -DSQL_WCHART_CONVERT, then UTF-32 is the correct value.
## Execute 'odbc_config --cflags' to determine
if you need UTF-32 or UTF-16 on unixODBC
## - SimbaDM can be
used with UTF-8 or UTF-16.
## The DriverUnicodeEncoding setting will cause
SimbaDM to run in UTF-8 when set to 2 or UTF-16 when set to 1.
DriverManagerEncoding=UTF-8
ErrorMessagesPath=/opt/cloudera/hiveodbc/ErrorMessages/
LogLevel=0
LogPath=/tmp
## - Uncomment the
ODBCInstLib corresponding to the Driver Manager being used.
## - Note that the
path to your ODBC Driver Manager must be specified in LD_LIBRARY_PATH (LIBPATH
for AIX).
## - Note that AIX has
a different format for specifying its shared libraries.
# Generic ODBCInstLib
# iODBC
ODBCInstLib=/opt/IBM/InformationServer/Server/branded_odbc/lib/libodbcinst.so
# SimbaDM / unixODBC
#ODBCInstLib=libodbcinst.so
# AIX specific
ODBCInstLib
# iODBC
#ODBCInstLib=libiodbcinst.a(libiodbcinst.so.2)
# SimbaDM
#ODBCInstLib=libodbcinst.a(odbcinst.so)
# unixODBC
#ODBCInstLib=libodbcinst.a(libodbcinst.so.1)
- Add the below entry in /opt/IBM/InformationServer/Server/DSEngine/dsenv file
export
SIMBAINI=/opt/IBM/InformationServer/Server/DSEngine/.cloudera.hiveodbc.ini
- Locate hive.keytab file in hiveserver2 (on the server where Cloudera Manager is installed, here it is blr01.ibm.com) and transfer it to the machine where IS Engine is installed (i.e blr02.ibm.com).
ls
-alt `find . -name hive.keytab`
Pick up the first one and transfer to /opt/cloudera/ folder
on IS Engine machine
- Modify the file permissions
chmod 777 /opt/cloudera/hive.keytab
- Logon with dsadm user and run the kinit command
kinit -k -t /opt/cloudera/hive.keytab hive/blr01.ibm.com@IPS.COM
- Verify the ticket information by issuing klist –e command
- Logon to Administrator client and define the two environment variables (KRB5CCNAME and KRB5_CONFIG)
P.S. Alternatively you can also add these two environment
variables to /opt/IBM/InformationServer/Server/DSEngine/dsenv
file.
- Export the following two environment variables and Test the connection from DataDirect example program
[root@blr02 example]#export
KRB5CCNAME=/tmp/krb5cc_0
[root@blr02 example]#export
KRB5_CONFIG=/etc/krb5.conf
[root@blr02 example]#.
/opt/IBM/InformationServer/Server/DSEngine/dsenv
[root@blr02 example]#cd
/opt/IBM/InformationServer/Server/branded_odbc/samples/example
[root@blr02 example]# ./example
./example DataDirect Technologies, Inc. ODBC Example
Application.
Enter the data source name : Hive_Cloudera
Enter the user name
: <leave blank>
Enter the password
: <leave blank>
Connecting...
JDK_Connecting..
JDK_Connected..
Connected
Enter SQL statements (Press ENTER to QUIT)
SQL> show databases
database_name
default
test
ued_qbo
Enter SQL statements (Press ENTER to QUIT)
SQL> use default
Enter SQL statements (Press ENTER to QUIT)
SQL> show tables
tab_name
inttypes
jdk
tab2
Enter SQL statements (Press ENTER to QUIT)
SQL> select * from
tab2
col1 col2
1 ABCD
2 EFG
3 HIJK
4 LMNOP
5 QRST
6 UVWX
7 YZabc
8 defghi
9 klmnop
10 qrstuvwxyz
Enter SQL statements (Press ENTER to QUIT)
SQL>
Using beeline to
connect to Hive:
Users can also preferably use beeline to connect to hive and
check the contents in the following manner:
[hive@blr01 run]$ beeline
Beeline version 0.10.0-cdh4.4.0 by Apache Hive
beeline> !connect
jdbc:hive2://blr01.ibm.com:10000/default;principal=hive/blr01.ibm.com@IPS.COM
scan complete in 6ms
Connecting to
jdbc:hive2://blr01.ibm.com:10000/default;principal=hive/blr01.ibm.com@IPS.COM
Enter username for
jdbc:hive2://blr01.ibm.com:10000/default;principal=hive/blr01.ibm.com@IPS.COM:
Enter password for
jdbc:hive2://blr01.ibm.com:10000/default;principal=hive/blr01.ibm.com@IPS.COM:
Connected to: Hive (version 0.10.0)
Driver: Hive (version 0.10.0-cdh4.4.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://blr01.ibm.com:10000/def> use default;
No rows affected (1.168 seconds)
0: jdbc:hive2://blr01.ibm.com:10000/def> select col1, col2 from tab2;
+-------+-------------+
| col1 | col2
|
+-------+-------------+
| 1 | ABCD |
| 2 | EFG |
| 3 | HIJK |
| 4 | LMNOP |
| 5 | QRST |
| 6 | UVWX |
| 7 | YZabc |
| 8 | defghi |
| 9 | klmnop |
| 10 |
qrstuvwxyz |
+-------+-------------+
10 rows selected (26.746 seconds)
0: jdbc:hive2://blr01.ibm.com:10000/def>
Usage of this data in
Information Server:
- Create an ImportArea, Data connection in InfoSphere Metadata Asset Manager (IMAM) to this datasource and Import Metadata.
- Register the metadata created in previous step in a Information Analyzer project and one can perform Data Profiling and Data Quality Analysis.
- The same metadata can be used in Data Integration in DataStage (or) across any component in InfoSphere Information Server.
Forums:
One can also register in a Google forum in https://groups.google.com/a/cloudera.org/forum/#!forum/scm-users
and seek help for any specific questions related to Cloudera Manager.
Disclaimer: The
postings on this site are those of the authors and don’t necessarily represent
IBM’s positions, strategies or opinions.
As per my opinion, videos play a vital role in learning. And when you consider Big data modernization solutions , then you should focus on all the learning methods. Udacity seems to be an excellent place to explore machine learning.
ReplyDelete