# PCRE (Perl Compatible Regular Expression)


General Tokens

\n Newline
\r Carriage return
\t Tab
\0 Null character

Anchors

\G Start of match. Will match at the position the previous successful match ended
^ Start of string (multiline mode). Will match after each newline character
$ End of string (multiline mode). Will match before each newline character
\A Start of string
\Z End of string. Will match before last newline character
\z End of string. Will match at the end of a string
\b A word boundary. Will match between \w and \W
\B Non-word boundary. Will match between two characters matched by \w

Meta Sequences

. Any single character
\s Any whitespace
\S Any non-whitespace
\d Any digit
\D Any non-digit
\w Any word
\W Any non-word
\X Any unicode sequences
\C Match one data unit
\R Unicode newline
\v Vertical whitespace
\h Horizontal whitespace
\H Non-horizontal whitespace
\K Reset match: sets the given position as the new start
\n Match nth subpattern (backreference)
\pX Unicode property X
\PX Non-unicode property X
\p{...} Unicode properties
\P{...} Non-unicode properties
\Q...\E Any characters between will be treated as literals
\k<name> Match subpattern 'name'
\k'name' Match subpattern 'name'
\k{name} Match subpattern 'name'
\gn Match nth subpattern
\g{n} Match nth subpattern
\g{-n} Match nth group before current position
\g'name' Recurse subpattern 'name'
\g<n> Recurse nth subpattern
\g'n' Recurse nth subpattern
\g<+n> Recurse nth relative subpattern
\g'+n' Recurse nth relative subpattern
\xYY Hex character YY
\x{YYYY} Hex character YYYY
\ddd Octal character ddd
\cY Control character Y
\b Backspace character
\ Makes any character literal

Quantifiers

a? Zero or one a
a* Zero or more of a
a+ One or more of a
a{3} Exactly 3 of a
a{3,} 3 or more of a
a{3,6} Between 3 and 6 of a
a* Greedy quantifier
a*? Lazy/Reluctant quantifier
a*+ Possessive quantifier

Group Constructs

(...) Capture everything enclosed
(a|b) a or b
(?:...) Match everything enclosed but won't create a capture group
(?>...) Atomic group
(?|...) Duplicate subpattern group
(?#...) Comment
(?'name'...) Named capturing group
(?<name>...) Named capturing group
(?P<name>...) Named capturing group
(?imsxXU) Inline modifiers
(?(...)|) Conditional statement
(?R) Recurse entire pattern
(?1) Recurse first subpattern
(?+1) Recurse first relative subpattern
(?&name) Match subpattern 'name'
(?P>name) Match subpattern 'name'
(?=...) Positive lookahead
(?!...) Negative lookahead
(?<=...) Positive lookbehind
(?<!...) Negative lookbehind

Character classes

[abc] A character: a, b or c
[^abc] A character except: a, b or c
[a-z] A character in the range: a-z
[a-z] A character not in the range: a-z
[a-zA-Z] A character in the range: a-z or A-Z
[[:alnum:]] Letter or digit
[[:alpha:]] Letter
[[:ascii:]] Ascii code in the range: 0-127
[[:blank:]] Space or tab
[[:cntrl:]] Control character
[[:digit:]] Digit
[[:graph:]] Visible character (not space)
[[:lower:]] Lower character
[[:print:]] Visible character
[[:punct:]] Visible punctuation character
[[:space:]] Whitespace
[[:upper:]] Uppercase character
[[:word:]] Word
[[:xdigit:]] Hexadecimal digit

Flags/Modifiers

g Global
m Multiline
i Case insensitive
x Ignore whitespace
s Single line
u Unicode
X Extended
U Ungreedy
A Anchor

Substitution

\0 Complete match contents
\1 Contents in capture group 1
\g<1> Contents in capture group 1
$1 Contents in capture group 1
${foo} Contents in capture group 'foo'
\{foo} Contents in capture group 'foo'
\g{foo} Contents in capture group 'foo'
\xYY Hexadecimal replacement
\x{YYZZ} Hexadecimal replacement
\t Tab
\r Carriage return
\n Newline
\f Form-feed

PCRE tester

# perl -Mre=debugcolor -e '"preval(" =~ /(^|\s)eval\(/'
Compiling REx "(^|\s)eval\("
Final program:
   1: OPEN1 (3)
   3:   BRANCH (5)
   4:     BOL (7)
   5:   BRANCH (FAIL)
   6:     POSIXD[\s] (7)
   7: CLOSE1 (9)
   9: EXACT <eval(> (12)
  12: END (0)
floating "eval(" at 0..1 (checking floating) minlen 5 
Guessing start of match in sv for REx "(^|\s)eval\(" against "preval("
Found floating substr "eval(" at offset 2...
Starting position does not contradict /^/m...
Guessed: match at offset 1
Matching REx "(^|\s)eval\(" against "reval("
   1 <preval(>|  1:OPEN1(3)
   1 <preval(>|  3:BRANCH(5)
   1 <preval(>|  4:  BOL(7)
                                    failed...
   1 <preval(>|  5:BRANCH(7)
   1 <preval(>|  6:  POSIXD[\s](7)
                                    failed...
                                  BRANCH failed...
   2 <preval(>|  1:OPEN1(3)
   2 <preval(>|  3:BRANCH(5)
   2 <preval(>|  4:  BOL(7)
                                    failed...
   2 <preval(>|  5:BRANCH(7)
   2 <preval(>|  6:  POSIXD[\s](7)
                                    failed...
                                  BRANCH failed...
Match failed
Freeing REx: "(^|\s)eval\("

# perl -Mre=debugcolor -e '"eval(" =~ /(^|\s)eval\(/'
Compiling REx "(^|\s)eval\("
Final program:
   1: OPEN1 (3)
   3:   BRANCH (5)
   4:     BOL (7)
   5:   BRANCH (FAIL)
   6:     POSIXD[\s] (7)
   7: CLOSE1 (9)
   9: EXACT <eval(> (12)
  12: END (0)
floating "eval(" at 0..1 (checking floating) minlen 5 
Guessing start of match in sv for REx "(^|\s)eval\(" against "eval("
Found floating substr "eval(" at offset 0...
Guessed: match at offset 0
Matching REx "(^|\s)eval\(" against "eval("
   0 <eval(>|  1:OPEN1(3)
   0 <eval(>|  3:BRANCH(5)
   0 <eval(>|  4:  BOL(7)
   0 <eval(>|  7:  CLOSE1(9)
   0 <eval(>|  9:  EXACT <eval(>(12)
   5 <eval(>| 12:  END(0)
Match successful!
Freeing REx: "(^|\s)eval\("


References

http://pcre.org/pcre.txt

No comments: