Regular Expression

A regular expression is a group of characters or symbols which is used to find a specific pattern in a text.

regex

  • a string literal is the simplest possible regular expression. "str" would match str(s followed by t, followed by r).
  • regular expression are generally case sensitive

Meta Characters

  • special characters. does not mean anything on their own.
  • some meta character have different meaning when written inside square bracket
meta chardesc
. (dot)match any single character except for linebreak
[]char class. matches any char contained between the brackets
[^ ]negated char class. matches any character NOT contained between the brackets
*matches 0 or more repetitions of the preceeding symbol
+match one or more repetitions of preceeding symbol
?makes the preceeding symbol optional
{n,m}match at least n but not more than m repetitions of preceeding symbol
(abc)char group
|alteration. matches either the chars before or the chars after the symbol
\escapes character
^beginning of the input
$end of the input

Things to Remember

set character range by using hyphen inside character class. Example: /[a-z0-9]/.

a period inside a char set means a literal period

the * (star) with a . (dot) can be used to match any string of characters. Example: .*

braces(also called quantifiers) are used to specify the number of times a character or a group of character can be repeated. Examples:

  • [0-9]{3} - matches exactly 3 digits.
  • [a-z]{2,} - matches between 2 and unlimited times
  • [A-Z]{2, 5} - matches between 3 and 5 times