Tuesday, 29 October 2013

IDQ Parser Transformation

IDQ Parser Transformation 

In this article we are going to cover parser based transformation .It is one of most important transformation used in IDQ. Parsing is the core function of any data quality tool and IDQ provides rich parsing functionality to handle complex patterns.

Parser transformation can be created in two mode


  • Token Parsing Mode 
  • Pattern Based Parsing


Token Based Parsing : It is used to parse strings that match token sets regular expression or reference table based entries.We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields 
NAME COMPANY_NAME DOMAIN

Suppose we have input data coming as below 

Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk

We will create a token based parser transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new 

Token Based Parsing :It is used to parse strings that match token sets regular expression or reference table based enteries.
We will use a simple example to create a token based parser transformation.Suppose we have email id coming in a field in format "Name@company.domain" and we want to parse this and store it in multiple fields 
NAME
COMPANY_NAME
DOMAIN
Suppose we have input data coming as below 

Rahul@gmail.com
Sachin@yahoo.com
Stuart@yahoo.co.uk

Step1 : We will create a token based transformation having email id as input ,After creating transformation go to properties and strategies tab and click on new 



Step2 : Click on Token Based

Step3 : Select Regular expression (As we want to have multiple output port)

Step4)  Select email parser or you can create your own regular expression to parse different type of transformation


Step5) Create three output port and click on OK then finish


Below is output from Parser transformation Name ,company and email id parsed into separate fields.


Pattern Based Parsing : Pattern based parsers are useful when working with data that needs to be parsed apart or sorted and the data has a moderately high number of patterns that are easily recognized.
Parser Based Transformation need to have output from Label Transformation which will provide two outputs LabelData and Tokenised data
Suppose we have a field named as PATTERN_DATA in source which contains name ,empno and date in it and we need to parse into three seperate fields
Step1 ) We will first create a label transformation with delimiter as , and below properties by creating new strategies 


in second tab choose execution order and assign label


    Output of Label transformation will be

Step2 ) Connect both LabeledOutput and Tokenized data to pattern based transformation
and create three new output port in port tab as shown below

  
Step3 ) In Pattern Tab define below (As per Label defined in Label) 


You can preview Parser data broken in three fields NAME EMPNO DOB

Hope this post make Parser transformation more clear..In case of nay question please send mail to support@ITNirvanas.com or leave your comment here.



No comments:

Post a Comment