Apocalypse 5
Regular Expressions
by Larry WallJune 04, 2002
This is the Apocalypse on Pattern Matching, generally having to do with what we call "regular expressions", which are only marginally related to real regular expressions. Nevertheless, the term has grown with the capabilities of our pattern matching engines, so I'm not going to try to fight linguistic necessity here. I will, however, generally call them "regexes" (or "regexen", when I'm in an Anglo-Saxon mood).
Here are the RFCs covered in this Apocalypse. PSA stands for "problem,
solution, acceptance", my private rating of how this RFC will fit
into Perl 6. Doubtless I have misclassified your RFC, though
the other ratings are pretty accurate. :-)
RFC PSA Title
--- --- -----
072 aaa Variable-length lookbehind.
093 abb Regex: Support for incremental pattern matching
110 bbb counting matches
112 acc Assignment within a regex
135 acr Require explicit m on matches, even with ?? and // as delimiters.
144 aaa Behavior of empty regex should be simple
145 acr Brace-matching for Perl Regular Expressions
150 acc Extend regex syntax to provide for return of a hash of matched subpatterns
156 aaa Replace first match function (C<?...?>) with a flag to the match command.
164 ccr Replace =~, !~, m//, s///, and tr// with match(), subst(), and trade()
165 acc Allow Variables in tr///
166 abc Alternative lists and quoting of things
191 bbc smart container slicing
197 cdr Numeric Value Ranges In Regular Expressions
198 adr Boolean Regexes
261 dbr Pattern matching on perl values
274 acc Generalised Additions to Regexs
276 aaa Localising Paren Counts in qr()s.
308 dar Ban Perl hooks into regexes
316 bcr Regex modifier for support of chunk processing and prefix matching
317 aaa Access to optimisation information for regular expressions
331 acc Consolidate the $1 and \1 notations
332 abc Regex: Make /$/ equivalent to /\z/ under the '/s' modifier
348 bcc Regex assertions in plain Perl code
360 acb Allow multiply matched groups in regexes to return a listref of all matches
361 abb Simplifying split()
Interestingly, there were no withdrawn RFCs for pattern
matching. That means either that there were no cork-brained ideas
proposed, or that regex culture is so cork-brained already that the
cork-brained ideas blend right in. I know where my money is... :-)
In fact, regular expression culture is a mess, and I share some of the blame for making it that way. Since my mother always told me to clean up my own messes, I suppose I'll have to do just that.
For prior Apocalypses, I've used the RFCs as a springboard for discussion of my thinking, but this one is special, because none of the RFCs were courageous enough (or foolhardy enough) to look at the big picture and propose radical change where it's needed. But Perl has often been tagged as a language in which it's easy to write programs that are difficult to read, and it's no secret that regular expression syntax that has been the chief culprit. Funny that other languages have been borrowing Perl's regular expressions as fast as they can...
That's primarily because we took several large steps in Perl 5 to
enhance regex capabilities. We took one large step forwards with the
/x option, which allowed whitespace between regex tokens. But we
also took several large steps sideways with the (?...) extension
syntax. I call them steps sideways, but they were simultaneously
steps forward in terms of functionality and steps backwards in terms
of readability. At the time, I rationalized it all in the name of
backward compatibility, and perhaps that approach was correct for that
time and place. It's not correct now, since the Perl 6 approach is
to break everything that needs breaking all at once.
And unfortunately, there's a lot of regex culture that needs breaking.
Pages: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 |

