Abstract
Unstructured
data is information that does not have a predefined data model or
does not fit well into relational tables. Unstructured data can be
text from books, journals, metadata, audio, video files, the body of
word processor documents, web pages, and presentation charts. This
blog describes about the stage that handles the unstructured data.
Introduction:
With
the release of Information Server 9.1, it supports the Unstructured
Data with Microsoft Excel files as data source.In this blog,we
are going to create a job to extract the unstructured data and
explain about the properties with an example.
Configure to extract data
Create a stage in information server with the following two components:
- Unstructured data stage
- Output sequential file
Following figure illustrates how to extract data by using unstructured data stage. Double click the unstructured data stage icon to enter or modify the options. Options for the respective stage need to be specified at stage level appropriately and in configure window to make the job run.
Configure
Window
Following
figure describes the Configure window of unstructured stage which is
displayed when configure button in the Stage properties is clicked.
Configure data source,import,map sections appropriately and click OK
.Respective stage properties and column schema is populated.
Configure
Window Properties
Specify
filename,Password if file is password protected in Data source
Section and specify template file for design time when wild cards
specified in the input excel files. Select Use template
file for design time and specify a template file name. Template file
is used for subsequent configuration steps, and not used at
runtime.Specify a value for Template password if the specified
template file is password protected.
Specify range
expression and click load button. Appropriate excel columns will be
displayed in Import Section.
From the Column
header field, select First row of data ranges. When First row of data
ranges is selected, first row is regarded as the header and the
Unstructured Data stage starts extracting from the second row.
Different types of
data like File/Document properties , Sheet/Row/Cell Information
values can be extracted using property tab and Custom properties can
be specified in Advanced tab in Import Section.
Click Import
button to get the data stage columns. Other
cell information like formula,comment,author of the comment,Hyperlink
label,type,address can be extracted by specifying the appropriate
value for Import option in Map section. Click OK
Stage
Level Properties
Unstructured
data stage can be configured to skip null values in the cells that
are being extracted by specifying skip null rows option in stage
properties
Run
time column propagation should be enabled and appropriate property
should be set for hidden columns option to extract hidden column data
When
extracting data from the file, specify how you want to handle errors.
Select Fail to abort the job. Select Skip to skip the data and
continue to run the job
Click OK , compile and execute the job to get the required output.
Example:
In
this example there are two input excel file names (Employee1.xls and
Employee2.xls). To retrieve the data from both the files mention the
File name as Employee*.xls. To load the excel columns in Import
section template file ie., original file name either employee1.xls or
employee2.xls should be specified in the Template File option in
Configure window.
Input
Excel Sheet Data (Employee1.xls)
Input
Excel Sheet Data (Employee2.xls)
Output Data (sequential File)
Column_A
– Column_G contains actual data.RowNumber Column the details of the
row from where the data is extracted. Sheet Name columns provides the
details from which sheet the data is extracted. Cell_D7 is the value
extracted for the property specified in the particular cell property
in Advanced tab of Import section.
Conclusion
Information Server 9.1
release supports single output link which has single schema. Upcoming
release supports multiple output links to extract multiple data
structures.
Disclaimer:
“The postings on this site are my own and don’t necessarily
represent IBM’s positions, strategies or opinions.”
Hi, I need to know how to get Reject data from this Stage.
ReplyDelete