Appendix F: Regular Expressions
A regular expression (regex) is a pattern of characters that describes a set of strings. The simplest form of a regular expression is a literal string, such as 'Adeptia' or 'America'. You can use regular expression matching to test whether a string fits into a specific syntax, for example, a House Number. A regular expression for a House Number can be:
[1-9][a-zA-Z0-9]
First character has to be a number between 1-9. Second character can be either an UPPERCASE alphabet, or a lowercase alphabet, or a number between 0-9.
Following are the characters with special meanings in a regex:
Character | Description |
---|---|
caret ^ |  The caret is the anchor for the start of the string, or the negation symbol. For example, '^a' matches 'a' at the start of the string. |
dollar sign $ | The dollar sign is the anchor for the end of the string. For example, 'b$' matches 'b' at the end of a line. |
period or dot . | The dot matches any character. For example, '.a' matches two consecutive characters where the last one is 'a' |
vertical bar or pipe symbol | | The vertical pipe separates a series of alternatives. For example, (a|b|c) a matches 'aa' or 'ba' or 'ca'. |
question mark ? | The question mark makes the preceding token in the regular expression optional. For example, colou?r matches both colour and color. |
asterisk or star * | The asterisk is the match-zero-or-more quantifier. For example, ^.*$ matches an entire line. |
plus sign + | The plus sign matches one or more occurrences of the one-character regular expression. For example, a+b matches 'ab' and 'aaab' |
opening parenthesis ( | The opening and closing parentheses are used for grouping characters (or other regex) |
closing parenthesis ) | |
opening square bracket [ | The opening and closing square brackets define a character class to match a single character. For example, '[a-z]' matches any lower-case character in the alphabet. |
closing square bracket ] | |
opening curly brace { | The opening and closing curly brackets are used as range quantifiers. For example, 'a{2,3}' matches 'aa' or 'aaa'. |
closing curly brace } | |
Backslash \ | The backslash gives special meaning to the character following it. For example, '\n' stands for the newline. |
Note
In case you are defining any of the above characters as a literal in a regex, you need to escape them with a backslash.
For example, if you want to match text p[ai]nt, (alphabet p followed by ai enclosed in square brackets, and the alphabets nt), the correct regex is p\[ai\]nt. Otherwise, the closing square bracket sign has a special meaning.
Note that p[ai]nt with the backslash omitted, is a valid regex. So you won't get an error message. But it doesn't match p[ai]nt. It would match pant or pint.