Friday 13 June 2014

Extract and Load data from Netezza database using InfoSphere DataStage

Netezza Connector Stage is a connectivity module, which enables IBM Infosphere Information Server to connect to Netezza database, import database metadata into the Information Server metadata repository and access data in the database. Netezza Connector Stage supports various operations such as read, write, upsert, Update and delete. This blog provides an overview of  how to extract and load data from Netezza database using the Netezza Connector Stage available in Datastage.

Netezza Connector is easy to use with minimum configuration. Client applications communicate with Netezza Performance Server using Netezza native connectivity via ODBC. Hence the connector requires Netezza native ODBC driver to connect to Netezza database.  Clients corresponding to Netezza server supports all platforms such as AIX, Red Hat Linux, Solaris, Suse Linux ,HP Itanium and Zlinux.

For more information please refer to link.

Advantages of using Netezza Connector: 

Netezza Connector has many features which are different from other Connectors in Infosphere datastage. 
  • Supports Row-selected SQL operation via an auto-generated SQL statement: This feature helps user to create a single job for multiple operations. It requires a action column char(1) in the input data which determines the SQL statement to be executed for that row. The action column can have values like "I" for Insert, "U" for Update,"D" Delete, "R" Replace and "M" merge.
  • Writes data records to Netezza tables in sequential and parallel mode using external tables and named pipes. Using this feature user can execute the job in single node or in multiple nodes.
  • Inserts via temporary work tables by default: Advantage of this feature is that the connector will create temporary work tables in database for insertion and those temporary work tables can be used in other jobs as well.
  • Temporary work tables in a separate database: If this option is selected in connector then the temporary work tables will be created in a separate database  which minimizes the load on database.
  • Automatically generate statistics : If this option is selected, the connector will generate statistics after the target table is updated, it generate statistics process to efficiently perform calculations involving numeric data types, which helps connector determine the most efficient way to execute a query.

Netezza Connector Configuration:

Netezza Connector configuration is easy and needs Netezza native ODBC driver which should be present in the same system where we have Information Server. It requires a Data source name to be created using Netezza native ODBC driver for connection.
Prerequisites

  1. Install Information Server: Netezza Connector is introduced on IS8.7 onwards.
  2. Netezza native ODBC driver
  3. Create Data Source Name using Netezza native driver 
a. For Unix Platforms add DSN in .odbc.ini file under Information Server Home path $DSHOME/Server/DSEngine/.odbc.ini  and then link the .odbc.ini file with odbc.ini file.
ln -s .odbc.ini odbc.ini.
b.For windows platform create the DSN in ODBC Administrator.
  1. (Only for Unix platforms)In Administrator client of Information server, set LIB path and BIN path of netezza driver and set NZ_ODBC_INI_PATH 
ex: LD_LIBRARY_PATH=/usr/nz/lib 
                    PATH=/usr/nz/bin
NZ_ODBC_INI_PATH=/opt/IBM/InformationServer/Server/DSEngine
For more information please refer to below link:

Security and access Privileges :

Following permissions are required in Netezza database
•To load data into a table, the user should have access privilege to that table or database.
•User should have Load privileges and Create Table, Select, Insert privileges and Create External Table privilege
•To Generate Statistics user should have privilege for Generating Statistics.

Extraction of Data using Netezza Connector( as a Source):

The following figure shows a sample read job which can extract the data from Netezza  database and put it into a Sequential File. 
Create a job in Designer client. In a parallel canvas, drag the Netezza Connector Stage.
Add an output link to the stage and add a Sequential file to the output link as shown in figure.





Double click on Netezza Connector Stage and go to the Output tab.



Connection properties in Output tab:
  1. Specify Data Source Name(DSN) which was created  using Netezza native ODBC Driver.
  2. Specify Database Name which has to be accessed.
  3. Specify Username to access the DSN.
  4. Specify Password to access the DSN.
  5. Specify table name from where data to be fetched.

Column Definition:
Go to the Columns tab under Outputs tab and define the columns of the table which you want to extract from Database.



Loading data using Netezza Connector(as a target):
The following figure shows a sample job which can load the data from a sequential file into Netezza database.
Create a job in Designer client. In a parallel canvas, drag the Netezza Connector Stage.
Add an Input  link to the stage and add a Sequential file to the Input  link as shown in figure.

Double click on Netezza Connector Stage and go to the Input tab.
 

Connection properties in Input tab: 
1.    Specify Data Source Name(DSN) which was created  using Netezza native ODBC Driver.
2.    Specify Database Name which has to be accessed.
3.    Specify Username to access the DSN.
4.    Specify Password to access the DSN.
5.    Use Seperate Connection for TWT.
i.      Set to Yes if the Temporary work table(TWT) and the external tables(ET) tables to be created/dropped/Accessed in Seperate database. Once set to Yes, specify the Database Name, and username and password to access the seperate database connection. This helps for easier maintenance
ii.    Set to No if main connection to be used for TWT and ET tables
6.    Select write mode and specify the table name.

Column Definition: 
Go to the Columns tab under Inputs tab and define the columns of the table in which you want to load data.


Conclusion: 
Using Netezza Connector one can Extract, Load and Transform data by creating jobs and perform data transaction effectively by enabling ad hoc queries and complex analytics with terabytes of data.

Disclaimer: “The postings on this site are my own and don’t necessarily represent IBM’s positions, strategies or opinions.”
 

3 comments:

  1. NETEZZA Online Training, ONLINE TRAINING – IT SUPPORT – CORPORATE TRAINING http://www.21cssindia.com/courses/netezza-developer-online-training-21.html The 21st Century Software Solutions of India offers one of the Largest conglomerations of Software Training, If you’re serious about a career in IT, 21st Century would like to provide you a guidance don’t hesitate to organize a free demo session. For any further information regarding the courses once go through our website Visit: http://www.21cssindia.com | Call Us +917386622889 - +919000444287 - contact@21cssindia.com

    ReplyDelete
  2. Very useful article.. appreciate it

    ReplyDelete