How do I create an import filter?

A 'filter' is a script file that contains a description of how to translate a CSV file, or any text file, into an equivalent import package.

Some example filter scripts are installed with Match-IT. They can be found in your ...\match_it\configs folder.

One example has the name sagestok.fil. This example will translate the standard Sage Sterling stock file CSV report (stock.csv) into a form acceptable to Match-IT.

Another example is sagebom.fil. This will take the Sage Stock Explosion report and translate it into a form that adds the bill of materials information to the stock records created by the first example.

Note: The transformations possible using these import filters are very simple. When complex transformations are required, use a Lua script.

An import filter is a text file. You can create it using a text editor (such as Notepad or WordPad) or a word processor, providing you save the document in text form. When Match-IT is installed a suitable text editor, called FoldIt, is placed in your ...\match_it\sys directory.

To be understood by the import filter the content of the filter file must conform to a syntax.

The required syntax is defined here in BNF (Backus-Naur Form):

CommentLine ::= comment lines begin with '!' and can appear anywhere and are ignored
PackageGenerationScript ::=IssueLine [ NamesSpec ] { PacketSpec }
IssueLine ::='ISSUE:' SystemIssueId (just passed through to the package script)
NamesSpec ::= 'NAMES:'
       { 'F#' Number '=' Name [ '!' description ]}
'ENDNAMES:'
Number is any positive integer. Name is an arbitrary string provided it begins with a letter (a-z) and does not contain spaces or any of the other special characters (?, !, =, [, ~, :). Description is ignored and extends to the end of the line. The NAMES: section is optional, if present it provides a means to give field numbers a meaningful name. The use of F#Name or F#Number anywhere in the script becomes synonymous.
PacketSpec ::='PACKET:' PacketName
       [ StreamSpec ]
       { LookupSpec }
       { RecordSpec }
'ENDPACKET:'
PacketName ::=an identifying name for the packet, eg. Sage Stock File
StreamSpec ::='STREAM:' SNumber
       { IgnoreSpec }
       { FieldSpec }
       [ GroupSpec ]
'ENDSTREAM:'
IgnoreSpec ::='IGNORE:' Condition
GroupSpec ::='GROUP:' GNumber 'WHEN:' Condition {‘OR:’ Condition | ‘AND:’ Condition}
       { CommentLine | IgnoreSpec}
       { FieldSpec }
'ENDGROUP:' Condition {‘OR:’ Condition | ‘AND:’ Condition}
FieldSpec ::=FixedField | CSVField
FixedField ::='F#' FNumber '[' FieldSize ']'
CSVField ::='F#' FNumber [',']
LookupSpec ::='LOOKUP:' LNumber 'IN:' (FileId|'SCRIPT') ['FIX:' FieldName] ['WHEN:' Condition] {‘OR:’ Condition | ‘AND:’ Condition}
       { FieldNameSpec }
       { RecordImage }
'ENDLOOKUP:'
RecordSpec ::='RECORD:' (FileId|DropId) 'SEGMENT:' SegmentName [Options] ['WHEN:' Condition] {‘OR:’ Condition | ‘AND:’ Condition}
       { FieldNameSpec }
       { RecordImage }
'ENDRECORD:'
FieldNameSpec ::='FIELDS:' FieldName { ',' FieldName }
FieldName ::=the name of a field in the associated file. The order of the field names given defines the order they are being given in the RecordImage. This need not be the same as the physical file order. See 'importing files'. In the context of a 'SCRIPT' LOOKUP: the field name list is ignored.
FileId ::=one of the system file TLAs
DropId ::=one of the system DO operation names
SegmentName ::=the name of the segment in the package script that the records are to be generated into. If the segment does not exist, it is created. The records generated are added to the end of the segment. A 'segment' refers to the records between a start and end pattern. These are the patterns that will be passed to gxImportPackage. The filter function creates start and end patterns and writes the appropriate IMPORT script lines when it generates the translated records. The start pattern is:
'{{{  (FileId|DropId):SegmentName:Part#'
and the end pattern is always '}}}'. I.e. plain-text FoldMarks.
(Part# is 1..N and is used to break up large segments)
Options ::=options as required on the IMPORT line in an import package
Condition ::='[' FieldValue ']' WhenOpCode [ '[' MatchPattern ']' ]
FieldValue ::=[FieldContext ':'] FieldRef
FieldRef ::='F#' FNumber [',' FieldSymbol] (the value of FieldSymbol in field FNumber)
'R#' GNumber (the record number within the Group defined by GNumber)
(if no group is defined it means the overall record number)
'F#0' (the whole input record)
FieldContext ::='S#' SNumber  (field is in stream number SNumber)
'L#' LNumber  (field is in lookup number LNumber)
'L#0'         (field is a pre-defined value)
Omitted implies S#1.
FieldSymbol ::=[ WordSeparator ] FieldSymbolNo
WordSeparator ::=The character to consider as the element separator, omitted=comma (,)
The ? ! ] - and space characters *cannot* be used. To use one of these use Q for ?, E for !, C for ], M for - and S for space.
FieldSymbolNo ::=MakePacked field element, first is 1. Omitted means all symbols.
A positive number means extract that symbol.
Eg. "Sterling,1.54" symbol 1 is "Sterling", symbol 2 is "1.54"
A negative number means strip up to and including that symbol and return all remaining symbols.  Eg. For "X,Y,Z" symbol -1 is "Y,Z"
WhenOpCode ::='==' | '<>'  (equal, not equal)
'<<' | '>>'  (contains, does not contain)
'!<' | '>!'  (starts with, ends with)
'++'         (changed, the field value has changed from the previous record, MatchPattern must be [])
MatchPattern ::=Any string, case is not significant, Clarion '<##>' notation is allowed.
Leading and trailing single quotes are ignored. This is useful for patterns that        require leading or trailing spaces. E.g. ['  '] is a string of 2 spaces, but [  ] is an empty string.
RecordImage ::={ Text | FieldText }
FieldText ::='[' FieldValue FieldOpCode [ Text | '[' FieldValue ']' ] ']'
FieldOpCode ::='?' | '!' | '=[' text ']' | '~[' text ']'
(not blank and blank and equal a constant and not equal a constant)
(? = Text or FieldValue if FNumber has a non-blank and non-0 value, else blank)
(! = Text or FieldValue if FNumber has a blank or 0 value, else blank)
(=[text] = Text or FieldValue if FNumber has a value of 'text', else blank)
(~[text] = Text or FieldValue if FNumber has a value that is NOT 'text', else blank)
(Note: [F#nnn?F#nnn] is equiv to [F#nnn] and F#nnn=[] is equiv to F#nnn! And F#nnn~[] is equiv to F#nnn?)
(In a FieldText element, spaces are only allowed within the Text portion)

The meaning of each of these syntax elements is:

PackageGeneration Script:

A package generation script consists of an ISSUE line followed by any number of PacketSpecs.

CommentLine:

Comments can be placed almost anywhere to aid the readability. They are ignored by the filter function.

IssueLine:

The ISSUE line defines which issue of the Match-IT system the filter script is intended for. This is just passed through to the package script created by the filter.

PacketSpec:

A packet spec consists of a PACKET line followed by a StreamSpec, followed by any number of record or lookup specs, followed by an ENDPACKET line. The entire packet spec is processed for each CSV record. Each record spec represents a potential Match-IT record that can be created from a CSV record.

PacketName:

The packet name is used to identify which packet is to be processed when the filter script is run.

StreamSpec:

A StreamSpec consists of a STREAM line followed by up to 16 IgnoreSpec lines, followed by a FieldSpec line, followed by an optional GroupSpec, followed by an ENDSTREAM line. A stream spec defines how to interpret the CSV Records and Text Records presented to the filter. Each IgnoreSpec identifies lines that should be ignored. The FieldSpec identifies the field boundaries within the record. The GroupSpec defines the Group structure (if present). The fields defined in the FieldSpec at this level define the group header fields. These are given field numbers from 1. These are prepended to each group line detected.

IgnoreSpec:

An ignore spec defines a Condition for those records that should be ignored. If the condition is met the record is discarded and no information is extracted from it. There can be up to 16 ignore specs in total. Those defined before the GroupSpec apply only to the group header lines. Those defined within the group apply only to the group detail lines.

GroupSpec:

A group spec consists of a GROUP line followed by up to 16 IgnoreSpec lines followed by a FieldSpec line, followed by an ENDGROUP line. A group spec defines how to interpret the lines within a Group. A Group is all the lines between, and including, the lines that meet the Condition on the GROUP line and the Condition on the ENDGROUP line. The fields defined in the FieldSpec at this level define the group detail fields. These are given field numbers that follow on from the group header fields defined in the StreamSpec. For each group line, the header fields are prepended. A typical use for this is to break up a file that contains sub-sections, e.g.

 SubSection Header Line
         SubSection Detail Line 1
         SubSection Detail Line 2

         ...
         SubSection Detail Line N

Can be broken up into:

 SubSection Header Line        SubSection Detail Line 1
 SubSection Header Line        SubSection Detail Line 2

 ...
 SubSection Header Line        SubSection Detail Line N

Note: The 'WHEN' condition of a group start is referencing stream fields, but the 'WHEN' condition of a group end is referencing group fields.

FieldSpec:

A field spec defines the boundaries of a field in an input record. A field can be either a FixedField or a CSVField. The reference point for the start of each field is the end of the previous field. Thus: F#1,F#2[16] means field 1 extends to a separator, and field 2 is the next 16 characters. They must be defined in numerical order.

FixedField:

A fixed field is a field that is defined by a fixed number of characters. The number between the square brackets [] defines the number of characters occupied by the field. The start of the field is defined as the end of the preceding field, irrespective of the type of the previous field. Thus: F#1,F#2[22] defines that field 2 is 22 characters following on from wherever field 1 stopped. Similarly, F#1[10]F#2[22] defines that field 2 starts at position 11 and occupies the next 22 positions.

CSVField:

A CSV field is defined to extend from the end of the previous field to the next separator. Thus: F#1[10]F#2,F#3 defines that field 2 starts at position 11 and extends to the next separator, and field 3 starts from there, and so on.

LookupSpec:

A LookupSpec consists of a LOOKUP line, followed by any number of FieldNameSpec lines, followed by any number of RecordImage lines, followed by an ENDLOOKUP line. A LookupSpec where a 'FileId' is given defines a record to be looked up in the Match-IT database. The RecordImage must load the appropriate key fields. There may be multiple lookups on the same file. Fields looked up may be referenced in the RECORD: clause using the 'L#' context notation. Fields are referenced in the order specified in the field name list, e.g. L#?:F#1 is the first L#?:F#2 is the second, etc. The first fields in the list must match some key in the file. If no FIX: clause is present, the LOOKUP is exact, i.e. an exact match must be found. If FIX: is present, the lookup will find an exact match for the fields up to and including the FIX: field, then the nearest match for the rest. (It uses a zfSet/zfNext pair.) Lookups are performed in the order they are defined. A failed lookup leaves all fields blank when referenced. All lookups are done in lookup number order and before any record image is created. The lookup is only performed if the WHEN: conditions are met. If the conditions are not met, all lookup fields are blank when referenced.

The lookup number must be 1 or greater. The lookup number 0 is a special case used to access pre-defined values. There is a fixed reportoire of pre-defined values. As of 30/01/03 this reportoire is:

 F#1 = today's date
 F#2 = the time now
 F#3 = logged in user

A LookupSpec where the 'SCRIPT' option is given just evaluates fields for easier reference elsewhere. This is useful to minimise repetition and centralise 'knowledge'.

An example use might be to encapsulate things like: '[F#3][F#3![F#4]]'. These evaluations can also be cascaded, so they can be used to evaluate multiple choices.

Example:

LOOKUP:1 IN: SCRIPT
   [F#1=[X]Yes][F#1=[Y]No]
ENDLOOKUP:

!L#1:F#1 is now either 'Yes' or 'No' or blank.

LOOKUP:2 IN: SCRIPT
   [L#1:F#1][L#1:F#1!Illegal]
ENDLOOKUP:

!L#2:F#1 is now either 'Yes' or 'No' or 'Illegal'

RecordSpec:

A record spec consists of a RECORD line, followed by any number of FieldNameSpec lines, followed by any number of record images followed by an ENDRECORD line. Each record spec represents a record in one of the Match-IT files. A record spec is only considered if its 'WHEN:' conditions are TRUE. If no conditions are given then TRUE is assumed. The conditions are considered strictly in sequence. OR:'s are considered in turn, AND:s are considered until FALSE is detected. An FNumber is referring to the number of the field in the source record. The first is field 1. The whole input record is field 0.

FieldNameSpec:

A FieldNameSpec defines the logical order of the fields in a Match-IT database for the purposes of the import filter. It need not be the same as the physical order. Each name given must correspond to a name of a field in the associated file. The order the names are given defines the order they are being created in the RecordImage. See below for a full description of a field name spec. When the record is constructing a DO operation parameter list, the FieldNameSpec must still be present, with one name per parameter, but the actual names are arbitrary.

FileId:

A file ID is a short hand name for one of the Match-IT files. They consist of three letters. The list of valid file IDs is available within the on-line help system.

DropId

A drop Id is the name of a DO operation. The list of valid DO operations is available within the on-line help system.

SegmentName:

This is a name, for reference purposes within the script, of a group of related records. Every record created as a result of the associated record spec will be placed in the same segment. The import system will treat each segment as an indivisible unit for importing purposes. The whole segment or none of it will be imported.

Options:

These are just passed through to the package script. However, if the [DUPS] option is set, the filter will only add records to the associated segment if there isn’t  an identical record there already. The only significance of this is to save memory space while the filter script is being processed.

Condition:

A condition consists of a Field Value, an operation and a MatchPattern. If the match pattern matches the field value referenced according to the operation, the condition is TRUE. Otherwise it is FALSE. This is used to control records that are ignored, group boundaries and records created. The operation defines the test to perform.

FieldValue:

A field value is a reference to a field from the current input or a lookup, or a special value. The FieldContext defines where the field value is to come from. The FieldRef defines which field is being referenced. If the context is omitted, S#1 is assumed.

FieldRef:

A field ref. identifies which field in the context is being referenced. The first field in a record is field 1, the next field 2 and so on. The Fnumber indicates which field is being referenced. The [F#nnn] string is replaced by the current value of the field. If a FieldSymbol is present then only that element of the field is being referenced.

FieldContext:

This defines where the field is to come from. It can be either from the input stream, designated by ‘S#’, or a lookup, designated by ‘L#’.

FieldSymbol:

A field symbol identifies an element within the field. These elements correspond to the Match-IT LType elements. It is most often useful when taking looked-up fields apart, for example, extracting the units from a measure.

WhenOpCode:

This defines the type of match to perform on the match pattern. The possibilities are:
 ‘==‘        the field value must be the same as the pattern
 ‘<>‘        the field value must be different to the pattern
 ‘<<‘        the field value must contain the pattern somewhere
 ‘>>‘        the field value must not contain the pattern anywhere
 ‘!<‘        the field value must start with the pattern
 ‘>!’        the field value must end with the pattern.
 ‘++’        the field value must have changed form the last record

MatchPattern:

This is the pattern to match against the associated field value. The case of letters is ignored. Any leading and trailing single quote characters (‘) are removed. The MatchPattern for the ‘++’ operation must be [].

RecordImage:

The record image contains the fields of the associated record that are to be used to create, or lookup, a Match-IT file record. It consists of any number of Text strings or FieldText values. Each can either be placed in the image unconditionally or only when a specified field has a value or does not have a value. An implied package separator exists between each line of a record image. Thus:
 [F#1]
 [F#2]
is equivalent to:
 [F#1] | [F#2]

The changes made when constructing the image are: to replace FieldNumber strings with the current value of the field; to remove leading spaces from the line; and to translate sub-strings of the form '<##>' into the ASCII char that has the code '##'.

FieldText:

This represents a ‘conditional’ field. The first FieldValue defines the field whose value is to be checked. The FieldOpCode defines the type of check to apply. If the condition is ‘TRUE’ then the second FieldValue is placed in the record image, otherwise nothing is placed in the image.

FieldOpCode:

This defines the type of check to perform for a conditional field. A ‘?’ means the field condition is only ‘TRUE’ when the preceding FieldValue has a value. A ‘!’ means the opposite. The condition is ‘TRUE’ only when the associated FieldValue does not have a value. An ‘=[text]’ means the field condition is only TRUE when the preceding FieldValue has the value ‘text’, A ‘~[text]’ means the opposite to ‘=’, i.e. it’s TRUE when the field is not equal to ‘text’.

Refer to the examples, given later, to see how this syntax is used.

The overall processing loop of a filter is:

FOR each input record

   EXTRACT 'header' fields

   FOR each record that passes the stream ignore specs

     IF a group is defined

       SEARCH for group 'when' record

       EXTRACT 'detail' fields and append to the 'header' fields

       FOR each record that passes the group ignore specs and is not the endgroup 'when'

         FOR each lookup defined

           BUILD record image

           LOOKUP record

         ENDFOR

         FOR each record defined

           BUILD record image

           ADD record

         ENDFOR

       ENDFOR

     ELSE no group is defined

       FOR each lookup defined

         BUILD record image

         LOOKUP record

       ENDFOR

       FOR each record defined

         BUILD record image

         ADD record

       ENDFOR

     ENDIF

   ENDFOR

 ENDFOR