IDQ Parser Transformation
In this article we are going to cover parser based transformation .It is one of most important transformation used in IDQ. Parsing is the core function of any data quality tool and IDQ provides rich parsing functionality to handle complex patterns.
Parser transformation can be created in two mode
Step4) Select email parser or you can create your own regular expression to parse different type of transformation
In this article we are going to cover parser based transformation .It is one of most important transformation used in IDQ. Parsing is the core function of any data quality tool and IDQ provides rich parsing functionality to handle complex patterns.
Parser transformation can be created in two mode
- Token Parsing Mode
- Pattern Based Parsing
Token Based Parsing : It is used to parse strings that match token sets regular expression or reference table based entries.We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields
NAME COMPANY_NAME DOMAIN
Suppose we have input data coming as below
Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk
We will create a token based parser transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new
Token Based Parsing :It is used to parse strings that match token sets regular expression or reference table based enteries.
We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields
NAME
COMPANY_NAME
DOMAIN
Suppose we have input data coming as below
Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk
Step1 : We will create a token based transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new
Step1 : We will create a token based transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new
Step2 : Click on Token Based
Step3 : Select Regular expression (As we want to
have multiple output port)
Step4) Select email parser or you can create your own regular expression to parse different type of transformation
Step5) Create three output port and click on OK
then finish
Below is output from Parser transformation Name
,company and email id parsed into separate fields.
Pattern Based Parsing : Pattern
based parsers are useful when working with data that needs to be parsed apart
or sorted and the data has a moderately high number of patterns that are easily
recognized.
Parser Based Transformation need to have output
from Label Transformation which will provide two outputs LabelData and
Tokenised data
Suppose we have a field named as PATTERN_DATA in
source which contains name ,empno and date in it and we need to parse into
three seperate fields
Step1 ) We will first create a label transformation with delimiter as , and below
properties by creating new strategies
in second tab choose execution order and assign
label
Output of Label transformation will be
Step2 ) Connect both LabeledOutput and Tokenized
data to pattern based transformation
and create three new output port in port tab as
shown below
Step3 ) In Pattern Tab define below (As per Label
defined in Label)
You can preview Parser data broken in three fields NAME EMPNO DOB
Hope this post make Parser transformation more clear..In case of nay question please send mail to support@ITNirvanas.com or leave your comment here.
No comments:
Post a Comment