Creating Advanced Text Schema Activity
You can use this schema activity to define how to read data from and write data to an advance text file. An advance text file refers to a text file which can have multiple record formats and field separators.
Figure displays a sample Advanced Text file.
12345678D 07/12/2007 XYZ SYSTEMS, INC. 443 NORTH CLARK AVE, SUITE 350 CHICAGO, IL 60610 Agent: JOHN SMITH 443 NORTH CLARK AVE, SUITE 350 CHICAGO, IL 60610 Officer: JOHN SMITH 443 NORTH CLARK AVE, SUITE 350 CHICAGO, IL 60610 56789101D 07/16/2007 XYZ INTERNATIONAL INC 443 NORTH CLARK AVE, SUITE 350 CHICAGO, IL 60610 Agent: NICK MASSA 443 NORTH CLARK AVE, SUITE 350 CHICAGO, IL 60610 Officer: NICK MASSA 443 NORTH CLARK AVE, SUITE 350 CHICAGO, IL 60610
In the Figure there are three types of records. The first field of all record is considered as the Record Identifier. In this example, the record identifier of the first record is an eight-digit number followed by a character 'D' (e.g. 12345678D). It could be any number followed by 'D'. The second and third record identifiers are for Agent and Officer respectively. In this text file, there are two field separators ':' colon and space.
Steps to create Advance Text Schema
On the Adeptia Suite homepage, go to Configure > Services > Schema and then click Adv. Text.
This action will display you the Manage Adv. Text Schema screen.
Click the Create New link. This action will display you the Create Advance Text Schema screen.
- Enter the name and description of the new Advance Text schema activity in the Name and Description text boxes respectively.
- To separate the different records of an advance text file, you need to enter a value in the record separator. For example, \n for new line and \s for space.
To separate the different fields of a text file, you need to enter a value in the Field Separator. For example, \t for Tab and \s for space.
You can specify multiple record and field separators. You can also use regular expressions to specify field separators. For example, you can use "\s:" to specify space or colon ( : as a field separator.
- You cannot directly use special characters such as '', '', or '|' in your regular expression as field separators. To use them you need to put these characters within square brackets in the *Field Separator field. For example, [], [*] or [|].
- However, if you use the same schema at the target end, in a target data file then the schema will print the field separator along with the square brackets. To avoid this, enter '+', '', or '|' in the *Target Field Separator field in Advanced Properties.
- You can also use hex values in record and field separators.
- As field separator at target end, use 0x before the value. For example, for space as field separator use 0x20.
- As field separator at source end, use as regular expression. For example, for space as field separate or use \x20 .
- Record separator uses regular expression to support hex values at both source and target end. For example, for space as a record separator, use 0x20 .
- You can also specify two hex values together in record and field separator. For example, for two spaces you need to use 0x200x20.
- To define schema definition, select one of the following options:
- Use Definition File
- Enter the Fields Sequentially
- To define the schema using definition file, select the Use Definition File radio button, select the type of file from the dropdown list, and click the Browse button to select a file that you want.
- To enter the fields sequentially, select the Enter the Fields Sequentially radio button and click Record Definition. This will display you the Record Definition fields.
- Enter the record identifier in the Record Identifier textbox. You can also use a regular expression in this textbox.
Enter a name in the FieldName textbox.
In the Match Pattern textbox, enter the pattern against which you want to match the record.
You can use regular expression in the Match Pattern textbox. For example, if in the data file there is a Company Name field that can have value like XYZ SYSTEMS, INC. Since you can use \S as a field separator, the schema will consider XYZ, SYSTEM, and INC as separate fields. But they need to be a part of the same field. To parse this type of data, you can enter \S+ .* (INC\.|INC(ORPORATED)?) in the Match Pattern textbox. This pattern matches the fields that have spaces and end with INC. or INCORPORATED.
- Enter a particular pattern that you want to skip in the Skip Pattern textbox. You can also use regular expression in this textbox.
If you want to parse the data even if the number of fields in the data file are less than the number of fields specified in the schema then check the Allow Less Fields checkbox.
You should use the Allow Less Fields feature only in a source-end schema as it will not work in a target-end schema.
- Enter the minimum and maximum size of the field in the Min Size and Max Size textboxes respectively.
- Enter the position in the Position textbox.
Click the Add Record button to define another type of record format and follow the steps from 10 to 15.
For adding more fields in the same record format, click the Add Row button.
Specify the number and position of the rows that you want to add, in the Number of Rows and Position fields respectively and click the Add Row button to insert rows. You can add a maximum of 99 rows at a time.Advance text schema supports only String data type.
To learn about Advanced Properties refer to the Changing Advanced Properties section.
You can enable quotes handling, by checking the Quotes Handling on checkbox in the Advanced Properties.
For example, if there is a field data record, "Chocolate$20$perpack" and you set a field separator as $ then the schema will ignore the $ even though it's a part of the data.
To avoid such situations you need to put that field within double quote like this, (Chocolate"$"20$perpack").
When you use Quotes Handling on in a schema then regular expression does not work on any field.
The header contains the information about the different record structure of the text file. You may not want to send the header information to the target. To do this, you need to check the Ignore Header checkbox in Advanced Properties. You can use this property only when you use a schema at the source end. This property is not applicable while use it at a target schema.
If there are multiple record separators in an input file, then to parse this input file, you need to check the Look Ahead checkbox in the Advanced Properties.
At times, the input data may contain some characters that are invalid in XML, this results in the abortion of the mapping process. You can filter these invalid XML characters by checking the Filter Invalid XML Characters checkbox in Advanced Properties.- After defining records you need to define their hierarchy. This is mandatory for creating an Advance Text schema.
Once you finish adding the records and defining their record identifiers, click the Refresh button under the Hierarchy Definition. This action will populate the record identifiers in the Record ID field.
- Select a record from the Record ID list box.
- Select Y or N from the required list box, to indicate if you want the current record to be present in the source file.
- In the minoccur field, enter the minimum number of occurrences of the current record that you want in the source file. The minimum allowable value is 1.
- In the maxoccur field, enter the maximum number of occurrences of the current record that you want in the source file. The maximum allowable value is 2147483647.
You can now create a record either at the root level or at the child level.
Using Record at Root Level
Steps to use a Record at the Root Level
- Click the Add Root Record button. This action will create a record at the same level as that of the current record.
- Select a record from the Record ID field of the current record and enter all the information.
Select a record from the Record ID list box of the root record and repeat step 1-2 to create another root Record.
Each root Record must have a unique Record ID. For example, if you select [0-9][8,8]+[D] as the first root record, then you need to select Agent in the next root record.
Using Record at Child Level
To create a record at the Child Level
Click the Add Child button. This creates a record at a level below that of the current record.
- Select a record in the Record ID list box of the current record and enter all the information.
Select a record from the Record ID list box of the root record and repeat step 1-2 to create a child record.
A parent and child record must have a unique Record ID in one hierarchy. For example, if you select [0-9][8,8]+[D] as the parent record, then you need to select Agent as the child record. Similarly, if you create another child record under Agent, then you need to select Officer as its Record ID.
Select a record and click the Remove button to remove it. This will display a confirmation delete message, click OK to delete the record.
Alternately, if you delete the last field name of a record and then save the schema, this action will delete the entire field. For example, a record has three fields – Name, Description, and Age. If you delete Age, save the schema, this action will delete the entire field.
If a record has one or more child record, deleting the parent record will delete all its child record too.- Click the Save button.