/
Regular Expressions

Regular Expressions

A regular expression (regex) is a pattern of characters that describes a set of strings. The simplest form of a regular expression is a literal string, such as 'Adeptia' or 'America'. You can use regular expression matching to test whether a string fits into a specific syntax, for example, a House Number. A regular expression for a House Number can be:

[1-9][a-zA-Z0-9]

First character has to be a number between 1-9. Second character can be either an UPPERCASE alphabet, or a lowercase alphabet, or a number between 0-9.

Following are the characters with special meanings in a regex:

Character

Description

Character

Description

caret ^

 The caret is the anchor for the start of the string, or the negation symbol. For example, '^a' matches 'a' at the start of the string.

dollar sign $

The dollar sign is the anchor for the end of the string. For example, 'b$' matches 'b' at the end of a line.

period or dot .

The dot matches any character. For example, '.a' matches two consecutive characters where the last one is 'a'

vertical bar or pipe symbol |

The vertical pipe separates a series of alternatives. For example, (a|b|c) a matches 'aa' or 'ba' or 'ca'.

question mark ?

The question mark makes the preceding token in the regular expression optional. For example, colou?r matches both colour and color.

asterisk or star *

The asterisk is the match-zero-or-more quantifier. For example, ^.*$ matches an entire line.

plus sign +

The plus sign matches one or more occurrences of the one-character regular expression. For example, a+b matches 'ab' and 'aaab'

opening parenthesis (


The opening and closing parentheses are used for grouping characters (or other regex)

closing parenthesis )

opening square bracket [


The opening and closing square brackets define a character class to match a single character. For example, '[a-z]' matches any lower-case character in the alphabet.

closing square bracket ]

opening curly brace {


The opening and closing curly brackets are used as range quantifiers. For example, 'a{2,3}' matches 'aa' or 'aaa'.

closing curly brace }

Backslash \

The backslash gives special meaning to the character following it. For example, '\n' stands for the newline.
If backslash is prefixed with a special symbol, it acts as an escaping agent. For example, 'a\+' matches 'a+' and not a series of one or more 'a's



Note

In case you are defining any of the above characters as a literal in a regex, you need to escape them with a backslash.

For example, if you want to match text p[ai]nt, (alphabet p followed by ai enclosed in square brackets, and the alphabets nt), the correct regex is p\[ai\]nt. Otherwise, the closing square bracket sign has a special meaning.
Note that p[ai]nt with the backslash omitted, is a valid regex. So you won't get an error message. But it doesn't match p[ai]nt. It would match pant or pint.



Related content