Appendix F: Regular Expressions

A regular expression (regex) is a pattern of characters that describes a set of strings. The simplest form of a regular expression is a literal string, such as 'Adeptia' or 'America'. You can use regular expression matching to test whether a string fits into a specific syntax, for example, a House Number. A regular expression for a House Number can be:

[1-9][a-zA-Z0-9]

First character has to be a number between 1-9. Second character can be either an UPPERCASE alphabet, or a lowercase alphabet, or a number between 0-9.

Following are the characters with special meanings in a regex:

CharacterDescription
caret ^ The caret is the anchor for the start of the string, or the negation symbol. For example, '^a' matches 'a' at the start of the string.
dollar sign $The dollar sign is the anchor for the end of the string. For example, 'b$' matches 'b' at the end of a line.
period or dot .The dot matches any character. For example, '.a' matches two consecutive characters where the last one is 'a'
vertical bar or pipe symbol |The vertical pipe separates a series of alternatives. For example, (a|b|c) a matches 'aa' or 'ba' or 'ca'.
question mark ?The question mark makes the preceding token in the regular expression optional. For example, colou?r matches both colour and color.
asterisk or star *The asterisk is the match-zero-or-more quantifier. For example, ^.*$ matches an entire line.
plus sign +The plus sign matches one or more occurrences of the one-character regular expression. For example, a+b matches 'ab' and 'aaab'
opening parenthesis (
The opening and closing parentheses are used for grouping characters (or other regex)
closing parenthesis )
opening square bracket [
The opening and closing square brackets define a character class to match a single character. For example, '[a-z]' matches any lower-case character in the alphabet.
closing square bracket ]
opening curly brace {
The opening and closing curly brackets are used as range quantifiers. For example, 'a{2,3}' matches 'aa' or 'aaa'.
closing curly brace }
Backslash \

The backslash gives special meaning to the character following it. For example, '\n' stands for the newline.
If backslash is prefixed with a special symbol, it acts as an escaping agent. For example, 'a\+' matches 'a+' and not a series of one or more 'a's

Note

In case you are defining any of the above characters as a literal in a regex, you need to escape them with a backslash.

For example, if you want to match text p[ai]nt, (alphabet p followed by ai enclosed in square brackets, and the alphabets nt), the correct regex is p\[ai\]nt. Otherwise, the closing square bracket sign has a special meaning.
Note that 
p[ai]nt with the backslash omitted, is a valid regex. So you won't get an error message. But it doesn't match p[ai]nt. It would match pant or pint.