Listen Print

Apocalypse 5

Regular Expressions

by Larry Wall
June 04, 2002

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 05 for the latest information.

This is the Apocalypse on Pattern Matching, generally having to do with what we call "regular expressions", which are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here. I will, however, generally call them "regexes" (or "regexen", when I'm in an Anglo-Saxon mood).

Here are the RFCs covered in this Apocalypse. PSA stands for "problem, solution, acceptance", my private rating of how this RFC will fit into Perl 6. Doubtless I have misclassified your RFC, though the other ratings are pretty accurate. :-)

    RFC   PSA   Title
    ---   ---   -----
    072   aaa   Variable-length lookbehind. 
    093   abb   Regex: Support for incremental pattern matching
    110   bbb   counting matches
    112   acc   Assignment within a regex
    135   acr   Require explicit m on matches, even with ?? and // as delimiters.
    144   aaa   Behavior of empty regex should be simple
    145   acr   Brace-matching for Perl Regular Expressions
    150   acc   Extend regex syntax to provide for return of a hash of matched subpatterns
    156   aaa   Replace first match function (C<?...?>) with a flag to the match command.
    164   ccr   Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade()
    165   acc   Allow Variables in tr///
    166   abc   Alternative lists and quoting of things
    191   bbc   smart container slicing
    197   cdr   Numeric Value Ranges In Regular Expressions
    198   adr   Boolean Regexes
    261   dbr   Pattern matching on perl values
    274   acc   Generalised Additions to Regexs
    276   aaa   Localising Paren Counts in qr()s.
    308   dar   Ban Perl hooks into regexes
    316   bcr   Regex modifier for support of chunk processing and prefix matching
    317   aaa   Access to optimisation information for regular expressions
    331   acc   Consolidate the $1 and \1 notations
    332   abc   Regex: Make /$/ equivalent to /\z/ under the '/s' modifier
    348   bcc   Regex assertions in plain Perl code
    360   acb   Allow multiply matched groups in regexes to return a listref of all matches
    361   abb   Simplifying split()

Interestingly, there were no withdrawn RFCs for pattern matching. That means either that there were no cork-brained ideas proposed, or that regex culture is so cork-brained already that the cork-brained ideas blend right in. I know where my money is... :-)

In fact, regular expression culture is a mess, and I share some of the blame for making it that way. Since my mother always told me to clean up my own messes, I suppose I'll have to do just that.

For prior Apocalypses, I've used the RFCs as a springboard for discussion of my thinking, but this one is special, because none of the RFCs were courageous enough (or foolhardy enough) to look at the big picture and propose radical change where it's needed. But Perl has often been tagged as a language in which it's easy to write programs that are difficult to read, and it's no secret that regular expression syntax that has been the chief culprit. Funny that other languages have been borrowing Perl's regular expressions as fast as they can...

That's primarily because we took several large steps in Perl 5 to enhance regex capabilities. We took one large step forwards with the /x option, which allowed whitespace between regex tokens. But we also took several large steps sideways with the (?...) extension syntax. I call them steps sideways, but they were simultaneously steps forward in terms of functionality and steps backwards in terms of readability. At the time, I rationalized it all in the name of backward compatibility, and perhaps that approach was correct for that time and place. It's not correct now, since the Perl 6 approach is to break everything that needs breaking all at once.

And unfortunately, there's a lot of regex culture that needs breaking.

Table of Contents
Accepted RFCs



Rejected RFCs

Apocalypse One

Apocalypse Two

Apocalypse Three

Apocalypse Four
 


Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24

Next Pagearrow





Contact Us | Advertise with Us | Privacy Policy | Press Center | Jobs | Submissions Guidelines

Copyright © 2000-2008 O’Reilly Media, Inc. All Rights Reserved. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.

For problems or assistance with this site, email