Section 10.2.2 Regex

Section 10.2.2 Regex


Section 10.2.2                Regex

A Regex or regular expression (sometimes called a rational expression) is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching. Regular expressions are a generalized way to match patterns with sequences of characters. Examples of Regular Expressions include Dates, Phone Numbers, Social Security numbers all of which follow a pattern. WIB™ has a library of Regular Expression that can be ingested into a Taxonomy. We suggest you have a general knowledge of using Regular Expressions to use this feature of WIB™ or contact us for profession services to setup data extraction using Regular Expressions.

Section 10.2.2.1             Overview

The Overview explains Regular Expressions (Regex) and how they are created in WIB Review.

Section 10.2.2.2            New

Allows you to create a new regex from the navigation tree.

Section 10.2.2.2.1                 Regex Name

Name the regex. Make sure the name is short but descriptive.

Section 10.2.2.2.2                Regex Description

Use the description to supply more insight about the regular expression.

Section 10.2.2.2.3                Text Type

The OCR engine produces the OCR in two (2) formats Standard and Line Recognition. The end user can determine which OCR results are displayed in the Text tab of the Image Viewer.


 

Section 10.2.2.2.3.1                      Standard OCR Results

    The standard/default algorithm of the OCR Engine is used to produce the OCR results.

Section 10.2.2.2.3.2                      Line Recognition Results

    An algorithm that attempts to detect each line of text and present the results as such.

Section 10.2.2.2.4                Regex Pattern

Enter the Regex Pattern for WIB Review to match in the OCR. If you are not familiar with Regular Expression, contact us at support@radixdata.com and we can build a regex for you. You can also contact your IT department for help if you have staff familiar with regex.

We have provided a link to Regex 101 to create and debug your regex expressions (patterns) prior to adding them to your project or you can leverage the Regex Tester in WIB™ Review.

Section 10.2.2.2.5.1     Flavor

Select Perl Compatible Regular Expressions (PCRE (PH <7.3)) as the flavor to build and test your regular expression. This is the language recognized by WIB Review. Please refer to the PCRE Cheat Sheet in the Appendix for quick reference to meta-characters and the corresponding description for how the character is defined.

Section 10.2.2.2.6                Capture Group(s) Replacement

There are some circumstances that require the regular expression to include a character or word to make a regular expression a unique pattern but that does not mean the unique character or word is a desired result in the attribute. The Capture Group Replacement allows you to write the regular expression with the replace feature.

Example: Phone number containing parenthesis and dashes.

(123)456-7890


Section 10.2.2.2.7                Attribute Assignment

Assign the extracted value to an attribute using the drop-down menu. You must assign a capture group for the extracted value. The default value is {0} if you are not using capture groups. If using capture groups, specify the capture group needed for the extraction. Capture Groups must be enclosed by curly brackets {}.


Section 10.2.2.2.8                How to create Match-only value

Set a value for an attribute if a pattern is found. This means the system will not extract the value from the OCR but will match the pattern and place the Match-Only value in the attribute. For example, if the regex finds an email address pattern it can set the document type to ‘Email’ and does not extract the email address. Enter the text “email” instead of the capture group in the format string field for the Attribute Assignment.


Section 10.2.2.2.9                   Customer Formatter

Custom formatter allows the format definition for a capture group.

Section 10.2.2.2.9.1                         Capitalization Formatter

You can set the formatter to write the capture group results into specific capitalization formats.

Upper – all alphabetic characters are set to Upper Case. To set the capture group format to uppercase use the following syntax {0:U} zero (0) being the capture group number followed by colon and the capital U signifying Upper case.

Lower - all alphabetic characters are set to Lower Case. To set the capture group format to lowercase use the following syntax {0:L} zero (0) being the capture group number followed by colon and the capital L signifying Lower case.

Title - all alphabetic characters are set to Title Case, which results in the first alpha character for each word to a capital letter. To set the capture group format to Title case use the following syntax {0:T} zero (0) being the capture group number followed by colon and the capital T signifying Title case.

Section 10.2.2.3            Regular Expression Test Area

A user can test the Regex prior to implementing the Regex in a Taxonomy. There are two test features using 1) sample text and 2) using live data.

Section 10.2.2.3.1                  Sample Text Test

Type or copy and paste text to test the regular expression on into the test area and select test. The results from the test are shown below the sample text area.

Section 10.2.2.3.2                 Live Data Test

Prior to using the Live Data Test, it is suggested and best practice to know a Project, Collection and Session containing the pattern data.

The live data for demonstrating the same regular expression test from above is used in the following figures. The highlighted section shows the same result from the sample test.

 

Section 10.2.2.4            Catalog

You can refer to the Catalog to see a list of regular expressions, their properties, and how they are used to automate data extraction.

Section 10.2.2.5            My Regexes

Navigation Tree with each of the regexes used in your workspace. You can edit a regex by selecting the regex from the navigation tree.

Section 10.2.2.6            WIB Library – Regex

The library holds commonly used regular expressions. You can add regexes from the library to your projects.

Section 10.2.2.6.1                Overview

Explanation of regexes and how they are used to extract data from the OCR.

Section 10.2.2.6.2                Catalog

A grid view of all lists in the library and the properties that define each list.

Section 10.2.2.6.3                Save to Library

This feature is only available to Radix Data System Administrators. Please send an email to support@radixdata.com if you would like us to curate a list for your industry/department and add it to the library.

 


    • Related Articles

    • Section 10.2 Automation

      Section 10.2 Automation Automation describes a wide range of technologies that reduce human intervention in processes. Human intervention is reduced by predetermining decision criteria, sub-process relationships, and related actions — and embodying ...
    • Section 10 Configuration Management

      Configuration Management Configuration Manager allows you to create attributes, setup the automation and workflow for a project. Configuration Management allows you to configure the systems settings to reach product optimization and because changes ...
    • Section 10.2.3 Taxonomy

      Section 10.2.3 Taxonomy A Taxonomy in WIB is the combination of lists and regular expressions and how they are used to classify records. Section 10.2.3.1 Overview Explains what a taxonomy is and how it is used in WIB Review. Section 10.2.3.2 New ...
    • Section 6.2 Tiles

      Section 6.2 Tiles A user can select which Tiles are displayed on the review page. If there are no tiles selected for a user, the Review page will display the following message, “You have no tiles enabled. This area will be empty until you select some ...
    • Section 10.1 Attributes

      Section 10.1 Attributes Attributes allow end users to search and retrieve records in the platform and through filtering and applying different search criteria refine the results. They supply the foundation for retrieving records, understanding the ...