Greenplum Connector is a generic connector which comes under
Connectivity component category in IBM InfoSphere Information Server. It is
used by IBM InfoSphere DataStage to perform extract, load, lookup and metadata
operations (by using InfoSphere Metadata Asset Manager (IMAM)) on Greenplum
Database. Greenplum Connector is supported on Parallel Canvas in InfoSphere
DataStage. This blog provides an overview of how to read and write data from Pivotal
Greenplum database using Greenplum Connector through DataStage.
To connect to Pivotal Greenplum database through Greenplum
Connector, it must have the proper driver to be loaded to connect to a particular
database in order to extract, load or lookup the data.
Greenplum CC Configuration:
To connect to a Greenplum database, you must first configure
an ODBC data source definition (DSN) for the database by using the IBM
Greenplum Wire Protocol ODBC driver. Before you begin ensure that the ODBC
driver for Greenplum libraries is installed.
a) Windows OS Procedure:
1. Start the Microsoft ODBC Data Source Administrator.
On a 32-bit Windows
computer, click Start > Control panel > Administrative Tools > Data
Sources (ODBC)
On a 64-bit Windows computer, navigate to
C:\Windows\SysWOW64\odbcad32.exe.
Note: On Windows, InfoSphere® Information Server is a 32-bit
application. Even on a 64-bit Windows computer, the connector is running as a
32-bit application. Therefore you must use the 32-bit version of the ODBC Data
Source Administrator as the Greenplum connector will not be able to locate DSN
definitions created in the 64-bit ODBC Data Source Administrator.
2. On the System DSN page, click Add.
3. On the Create New Data Source page, select the IBM Greenplum
Wire Protocol driver and click Finish. For information about configuring the
driver options, see the Greenplum Wire Protocol Driver chapter in the
DataDirect Connect Series for ODBC User's Guide.
b) UNIX OS Procedure:
1. Add a new ODBC DSN definition for the Greenplum Wire
Protocol to the .odbc.ini file.
2. Specify the Hostname and PortNumber to the host and port
where Greenplum database server is running.
3. Specify the Database name to default database to use for
connections using the new ODBC DSN.
4. Save the .odbc.ini file.
Example
[Greenplum_DEV_SERVER]
Driver=/opt/IBM/InformationServer/Server/branded_odbc/lib/VMgplm00.so
Description=DataDirect 7.0 Greenplum Wire Protocol
Database=gp_dev
HostName=gp_host
PortNumber=5432
where, gp_dev is the name of the Greenplum database the DSN
connects to and gp_host is the host name where the Greenplum server resides.
*
For more information about configuring and other driver options, see the
Greenplum Wire Protocol Driver chapter in the DataDirect Connect Series for
ODBC User's Guide.
c) Greenplum parallel
file distribution program (gpfdist)
The Greenplum Connector stage exchanges data with the
Greenplum server by using the Greenplum file distribution program, which is
called gpfdist. The gpfdist program runs
on the database client, and it must be installed on the InfoSphere Information
Server engine tier computer. For data to be transferred by using the gpfdist
protocol, a network route must be present to enable bidirectional access by
using an IP address and optionally the presence of a DNS server to facilitate
the name resolution. The connector invokes a gpfdist process on every physical
computer node and creates the external table. The host of the external table
data is identified by the fastname entry in the parallel engine configuration
file ($APT_CONFIG_FILE).
In order for the connector to
invoke gpfdist on each engine tier, the location of
gpfdist(%GPHOME_LOADERS%\bin) must be in the system path. In addition, the
location of gpfdist dependent libraries (%GPHOME_LOADERS%\lib) must be in the
system library path. On Windows, the system environment variable PATH is
updated in the Advanced system settings. On Linux, the PATH environment
variable is updated in the dsenv script.
Note: On Windows, the Greenplum installer adds
%GPHOME_LOADERS%\bin and %GPHOME_LOADERS%\lib to the PATH system environment
variable. Verify that these directories are in the PATH.
Greenplum Connector as a Source:
The following figure shows a sample job which can extract the data from Greenplum database and put it into a Sequential File.
The following figure shows a sample job which can extract the data from Greenplum database and put it into a Sequential File.
From the Job Canvas, Double click on Greenplum Connector
Stage and go to the Output tab.
Connection:
Data source (Required) - Data source name (DSN) as defined
in the ODBC driver manager or the odbc.ini file for a Greenplum database
Database (Optional) - Name of the Greenplum database on the
server defined in the ODBC Data source. If not specified, the database
specified in the Data source will be used.
Username (Required) - The user name for the ODBC connection
Password (Optional)* - The password for the ODBC connection
* Greenplum Database allows
password less authentication, and can potentially support other forms of
authentication (SSL, LDAP) through the ODBC driver. If the user would like to
use these types of authentication methods then the connector should allow the
Password property to be optional.
Usage : Provide the source table name in Table Name option.
Usage : Provide the source table name in Table Name option.
For Column Definition:
Go to the Columns tab under Outputs tab and specify the columns of the table which you want to extract from Greenplum Database.
Greenplum Connector as a target:
Go to the Columns tab under Outputs tab and specify the columns of the table which you want to extract from Greenplum Database.
Greenplum Connector as a target:
The following figure shows a sample job which can load the
data from a sequential file into Greenplum database.
From the Job Canvas, Double click on Greenplum Connector
Stage and go to the Input tab.
Connection:
For Connection, please follow the same steps above which
were performed when Greenplum CC
Usage:
When the user selects the insert mode, then the user should be able to choose the direct insert mode. Select the below options
When the user selects the insert mode, then the user should be able to choose the direct insert mode. Select the below options
Write Mode=Insert
Table Action= Target table name.
For Column Definition:
Go to the Columns tab under Inputs tab and specify the columns of the table in which you want to load data.
For Column Definition:
Go to the Columns tab under Inputs tab and specify the columns of the table in which you want to load data.
Conclusion
Greenplum connector can be use to perform operations like read
data from Greenplum databases or write data to Greenplum databases or look up
data in the contexts of those jobs. Also by using InfoSphere Metadata Asset
Manager (IMAM), it can be used to import metadata from Greenplum database.
Disclaimer: “The postings on this site are my own and don’t necessarily
represent IBM’s positions, strategies or opinions.”
Nice information its really help me out your content and also Visit:www.datastage.in
ReplyDelete