Regular expression syntax is zero or more branches,
separated by a "|" symbol. Regular expressions match anything that
matches one of the branches, and a branch is zero or more concatenated
pieces. The following examples describe the regular expression syntax:
- A piece is an atom possibly followed by *, +,
or ?.
- An atom followed by * matches
a sequence of 0 or more matches of the atom.
For example, the atom .* matches zero or more instances
of any character (a period matches any character).
- An atom followed by + matches
a sequence of 1 or more matches of the atom.
For example, the atom .+ matches one or more instances
of any character.
- An atom followed by ? matches
a match of the atom, or the null string.
For example, the atom .? matches a single character
or the null string, such as at the end of an input string.
- An atom is a regular expression in parentheses
(matching a match for the regular expression), a range, or:
- . (matching any single character)
- ^ (matching the beginning of the input
string)
- $ (matching the end of the input
string)
- A \ followed by a single character
(matching that character)
- A single character with no other significance (matching
that character)
Consider the following regular expression:
([a-zA-Z_][a-zA-Z0-9_]*)
This regular expression, enclosed in parentheses,
matches a sequence of two ranges, any single uppercase or lowercase
letter, or underscore character; followed by zero or more uppercase
or lowercase letters, digits 0-9, or the underscore character.