A Regex or regular expression (sometimes called a
rational expression) is a sequence of characters that define a search pattern,
mainly for use in pattern matching with strings, or string matching. Regular
expressions are a generalized way to match patterns with sequences of
characters. Examples of Regular Expressions include Dates, Phone Numbers,
Social Security numbers all of which follow a pattern. WIB™ has a library of Regular
Expression that can be ingested into a Taxonomy. We suggest you have a general
knowledge of using Regular Expressions to use this feature of WIB™ or contact
us for profession services to setup data extraction using Regular Expressions.
Section 10.2.2.1 Overview
The Overview explains Regular Expressions (Regex) and how they are created in WIB Review.
Section 10.2.2.2 New
Allows you to create a new regex from the navigation tree.
Section 10.2.2.2.1 Regex Name
Name the regex. Make sure the name is short but descriptive.
Section 10.2.2.2.2 Regex Description
Use the description to supply more insight about the regular expression.
Section 10.2.2.2.3 Text Type
The OCR engine produces the OCR in two (2) formats Standard and Line Recognition. The end user can determine which OCR results are displayed in the Text tab of the Image Viewer.
Section 10.2.2.2.3.1 Standard OCR Results
The standard/default algorithm of the OCR Engine is used to produce the OCR results.
Section 10.2.2.2.3.2 Line Recognition Results
An algorithm that attempts to detect each line of text and present the results as such.
Section 10.2.2.2.4 Regex Pattern
Enter the Regex Pattern for WIB Review to match in the OCR. If you are not familiar with Regular Expression, contact us at support@radixdata.com and we can build a regex for you. You can also contact your IT department for help if you have staff familiar with regex.
Section 10.2.2.2.5 Regex 101 Link
We have provided a link to Regex 101 to create and debug your regex expressions (patterns) prior to adding them to your project or you can leverage the Regex Tester in WIB™ Review.
Section 10.2.2.2.5.1 Flavor
Select Perl Compatible Regular Expressions (PCRE (PH <7.3)) as the flavor to build and test your regular expression. This is the language recognized by WIB Review. Please refer to the PCRE Cheat Sheet in the Appendix for quick reference to meta-characters and the corresponding description for how the character is defined.
Section 10.2.2.2.6 Capture Group(s) Replacement
There are some circumstances that require the regular expression to include a character or word to make a regular expression a unique pattern but that does not mean the unique character or word is a desired result in the attribute. The Capture Group Replacement allows you to write the regular expression with the replace feature.
Example: Phone number containing parenthesis and dashes.
(123)456-7890
Section 10.2.2.2.7 Attribute Assignment
Assign the extracted value to an attribute using the drop-down menu. You must assign a capture group for the extracted value. The default value is {0} if you are not using capture groups. If using capture groups, specify the capture group needed for the extraction. Capture Groups must be enclosed by curly brackets {}.
Section 10.2.2.2.8 How to create Match-only value
Set a value for an attribute if a pattern is found. This means the system will not extract the value from the OCR but will match the pattern and place the Match-Only value in the attribute. For example, if the regex finds an email address pattern it can set the document type to ‘Email’ and does not extract the email address. Enter the text “email” instead of the capture group in the format string field for the Attribute Assignment.
Section 10.2.2.2.9 Customer Formatter
Custom formatter allows the format definition for a capture group.
Section 10.2.2.2.9.1 Capitalization Formatter
You can set the formatter to write the capture group results into specific capitalization formats.
Upper – all alphabetic characters are set to Upper Case. To set the capture group format to uppercase use the following syntax {0:U} zero (0) being the capture group number followed by colon and the capital U signifying Upper case.
Lower - all alphabetic characters are set to Lower Case. To set the capture group format to lowercase use the following syntax {0:L} zero (0) being the capture group number followed by colon and the capital L signifying Lower case.
Title - all alphabetic characters are set to Title Case, which results in the first alpha character for each word to a capital letter. To set the capture group format to Title case use the following syntax {0:T} zero (0) being the capture group number followed by colon and the capital T signifying Title case.
Section 10.2.2.3 Regular Expression Test Area
A user can test the Regex prior to implementing the Regex in a Taxonomy. There are two test features using 1) sample text and 2) using live data.Section 10.2.2.3.1 Sample Text Test
Type or copy and paste text to test the regular expression on into the test area and select test. The results from the test are shown below the sample text area.
Section 10.2.2.3.2 Live Data Test
Prior to using the Live Data Test, it is suggested and best practice to know a Project, Collection and Session containing the pattern data.
The live data for demonstrating the same regular expression test from above is used in the following figures. The highlighted section shows the same result from the sample test.
Section 10.2.2.4 Catalog
You can refer to the Catalog to see a list of regular expressions, their properties, and how they are used to automate data extraction.
Section 10.2.2.5 My Regexes
Navigation Tree with each of the regexes used in your workspace. You can edit a regex by selecting the regex from the navigation tree.
Section 10.2.2.6 WIB Library – Regex
The library holds commonly used regular expressions. You can add regexes from the library to your projects.
Section 10.2.2.6.1 Overview
Explanation of regexes and how they are used to extract data from the OCR.
Section 10.2.2.6.2 Catalog
A grid view of all lists in the library and the properties that define each list.
Section 10.2.2.6.3 Save to Library
This feature is only available to Radix Data System Administrators. Please send an email to support@radixdata.com if you would like us to curate a list for your industry/department and add it to the library.