Synopsis 5
Regular expressions
by Allison Randal, Damian ConwayJune 26, 2002
A summary of the changes in Apocalypse 5:
Unchanged features
- Capturing: (...)
- Repetition quantifiers: *, +, and ?
- Alternatives: |
- Backslash escape: \
- Minimal matching suffix: ??, *?, +?
Modifiers
-
The extended syntax (
/x) is no longer required...it's the default. -
There are no
/sor/mmodifiers (changes to the meta-characters replace them - see below). -
There is no
/eevaluation modifier on substitutions; uses/pattern/$( code() )/instead. -
The
/gmodifier has been renamed toe(foreach). -
Modifiers are now placed as adverbs at the start of a match/substitution:
@matches = m:ei/\s* (\w*) \s* ,?/; -
The single-character modifiers also have longer versions:
:i :ignorecase :e :each -
The
:c(or:cont) modifier causes the match to continue from the string's current.pos:m:c/ pattern / # start at end of # previous match on $_ -
The new
:o(:once) modifier replaces the Perl 5?...?syntax:m:once/ pattern / # only matches first time -
The new
:w(:word) modifier causes whitespace sequences to be replaced by\s*or\s+subpattern:m:w/ next cmd = <condition>/ -
Same as:
m/ \s* next \s+ cmd \s* = \s* <condition>/ -
The new
:uNmodifier specifies Unicode level:m:u0/ .<2> / # match two bytes m:u1/ .<2> / # match two codepoints m:u2/ .<2> / # match two graphemes m:u3/ .<2> / # match language dependently -
The new
:p5modifier allows Perl 5 regex syntax to be used instead:m:p5/(?mi)^[a-z]{1,2}(?=\s)/ - Any integer modifier specifies a count. What kind of count is determined by the character that follows.
-
If followed by an
x, it means repetition:s:4x{ (<ident>) = (\N+) $$}{$1 => $2};# same as:s{ (<ident>) = (\N+) $$}{$1 => $2} for 1..4; -
If followed by an
st,nd,rd, orth, it means find the Nth occurance:s:3rd/(\d+)/@data[$1]/;# same as:m/(\d+)/ && m:c/(\d+)/ && s:c/(\d+)/@data[$1]/; -
With the new
:anymodifier, the regex will match every possible way (including overlapping) and return all matches.$str = "abracadabra";@substrings = $str =~ m:any/ a (.*) a /;# br brac bracad bracadabr c cad cadabr d dabr br -
The
:i,:w,:c,:uN, and:p5modifiers can be placed inside the regex (and are lexically scoped):m/:c alignment = [:i left|right|cent[er|re]] / -
User-defined modifiers will be possible
m:fuzzy/pattern/; -
Single letter flags can be ``chained'':
s:ewi/cat/feline/ -
User-defined modifiers can also take arguments:
m:fuzzy('bare')/pattern/; - Hence parentheses are no longer valid regex delimiters

