Listen Print

Synopsis 5

Regular expressions

by Allison Randal, Damian Conway
June 26, 2002

Editor's note: this document is out of date and remains here for historic interest. See Synopsis 5 for the current design information.

A summary of the changes in Apocalypse 5:

Unchanged features

  • Capturing: (...)
  • Repetition quantifiers: *, +, and ?
  • Alternatives: |
  • Backslash escape: \
  • Minimal matching suffix: ??, *?, +?

Modifiers

  • The extended syntax (/x) is no longer required...it's the default.

  • There are no /s or /m modifiers (changes to the meta-characters replace them - see below).

  • There is no /e evaluation modifier on substitutions; use s/pattern/$( code() )/ instead.

  • The /g modifier has been renamed to e (for each).

  • Modifiers are now placed as adverbs at the start of a match/substitution:
        @matches = m:ei/\s* (\w*) \s* ,?/;

  • The single-character modifiers also have longer versions:
            :i        :ignorecase
            :e        :each

  • The :c (or :cont) modifier causes the match to continue from the string's current .pos:
        m:c/ pattern /        # start at end of
                              # previous match on $_

  • The new :o (:once) modifier replaces the Perl 5 ?...? syntax:
        m:once/ pattern /    # only matches first time

  • The new :w (:word) modifier causes whitespace sequences to be replaced by \s* or \s+ subpattern:
        m:w/ next cmd =   <condition>/

  • Same as:
        m/ \s* next \s+ cmd \s* = \s* <condition>/

  • The new :uN modifier specifies Unicode level:
        m:u0/ .<2> /        # match two bytes
        m:u1/ .<2> /        # match two codepoints
        m:u2/ .<2> /        # match two graphemes
        m:u3/ .<2> /        # match language dependently

  • The new :p5 modifier allows Perl 5 regex syntax to be used instead:
        m:p5/(?mi)^[a-z]{1,2}(?=\s)/

  • Any integer modifier specifies a count. What kind of count is determined by the character that follows.

  • If followed by an x, it means repetition:
        s:4x{ (<ident>) = (\N+) $$}{$1 => $2};
        # same as:
        s{ (<ident>) = (\N+) $$}{$1 => $2} for 1..4;

  • If followed by an st, nd, rd, or th, it means find the Nth occurance:
        s:3rd/(\d+)/@data[$1]/;
        # same as:
        m/(\d+)/ && m:c/(\d+)/ && s:c/(\d+)/@data[$1]/;

  • With the new :any modifier, the regex will match every possible way (including overlapping) and return all matches.
        $str = "abracadabra";
        @substrings = $str =~ m:any/ a (.*) a /;
        # br brac bracad bracadabr c cad cadabr d dabr br

  • The :i, :w, :c, :uN, and :p5 modifiers can be placed inside the regex (and are lexically scoped):
        m/:c alignment = [:i left|right|cent[er|re]] /

  • User-defined modifiers will be possible
            m:fuzzy/pattern/;

  • Single letter flags can be ``chained'':
            s:ewi/cat/feline/

  • User-defined modifiers can also take arguments:
            m:fuzzy('bare')/pattern/;

  • Hence parentheses are no longer valid regex delimiters

Pages: 1, 2, 3, 4, 5

Next Pagearrow





Contact Us | Advertise with Us | Privacy Policy | Press Center | Jobs | Submissions Guidelines

Copyright © 2000-2008 O’Reilly Media, Inc. All Rights Reserved. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.

For problems or assistance with this site, email