Listen Print

Apocalypse 3

Operators

by Larry Wall
October 02, 2001

Editor's Note: this Apocalypse is out of date and remains here for historic reasons. See Synopsis 03 for the latest information.

Table of Contents

RFC 025: Operators: Multiway comparisons

RFC 320: Allow grouping of -X file tests and add filetest builtin

RFC 290: Better english names for -X

RFC 283: tr/// in array context should return a histogram

RFC 084: Replace => (stringifying comma) with => (pair constructor)

RFC 081: Lazily evaluated list generation functions

RFC 285: Lazy Input / Context-sensitive Input

RFC 082: Arrays: Apply operators element-wise in a list context

RFC 045: || and && should propagate result context to both sides

RFC 054: Operators: Polymorphic comparisons

RFC 104: Backtracking

RFC 143: Case ignoring eq and cmp operators

RFC 170: Generalize =~ to a special ``apply-to'' assignment operator

Non-RFC considerations

Related Reading

Perl in a Nutshell, 2nd Edition

Perl in a Nutshell, 2nd Edition
By Stephen Spainhour, Ellen Siever, Nate Patwardhan

  Binary . (dot)
Unary . (dot)
Binary _
Unary _
Unary +
Binary :=
Unary *
List context
Binary :
Trinary ??::
Binary //
Binary ;
Unary ^
Unary ?
Binary ?
Binary ~
Binary ~~
User defined operators
Unicode operators
Precedence

To me, one of the most agonizing aspects of language design is coming up with a useful system of operators. To other language designers, this may seem like a silly thing to agonize over. After all, you can view all operators as mere syntactic sugar -- operators are just funny looking function calls. Some languages make a feature of leveling all function calls into one syntax. As a result, the so-called functional languages tend to wear out your parenthesis keys, while OO languages tend to wear out your dot key.

But while your computer really likes it when everything looks the same, most people don't think like computers. People prefer different things to look different. They also prefer to have shortcuts for common tasks. (Even the mathematicians don't go for complete orthogonality. Many of the shortcuts we typically use for operators were, in fact, invented by mathematicians in the first place.)

So let me enumerate some of the principles that I weigh against each other when designing a system of operators.

  • Different classes of operators should look different. That's why filetest operators look different from string or numeric operators.
  • Similar classes of operators should look similar. That's why the filetest operators look like each other.
  • Common operations should be ``Huffman coded.'' That is, frequently used operators should be shorter than infrequently used ones. For how often it's used, the scalar operator of Perl 5 is too long, in my estimation.
  • Preserving your culture is important. So Perl borrowed many of its operators from other familiar languages. For instance, we used Fortran's ** operator for exponentiation. As we go on to Perl 6, most of the operators will be ``borrowed'' directly from Perl 5.
  • Breaking out of your culture is also important, because that is how we understand other cultures. As an explicitly multicultural language, Perl has generally done OK in this area, though we can always do better. Examples of cross-cultural exchange among computer cultures include XML and Unicode. (Not surprisingly, these features also enable better cross-cultural exchange among human cultures -- we sincerely hope.)
  • Sometimes operators should respond to their context. Perl has many operators that do different but related things in scalar versus list context.
  • Sometimes operators should propagate context to their arguments. The x operator currently does this for its left argument, while the short-circuit operators do this for their right argument.
  • Sometimes operators should force context on their arguments. Historically, the scalar mathematical operators of Perl have forced scalar context on their arguments. One of the RFCs discussed below proposes to revise this.
  • Sometimes operators should respond polymorphically to the types of their arguments. Method calls and overloading work this way.
  • Operator precedence should be designed to minimize the need for parentheses. You can think of the precedence of operators as a partial ordering of the operators such that it minimizes the number of ``unnatural'' pairings that require parentheses in typical code.
  • Operator precedence should be as simple as possible. Perl's precedence table currently has 24 levels in it. This might or might not be too many. We could probably reduce it to about 18 levels, if we abandon strict C compatibility of the C-like operators.
  • People don't actually want to think about precedence much, so precedence should be designed to match expectations. Unfortunately, the expectations of someone who knows the precedence table won't match the expectations of someone who doesn't. And Perl has always catered to the expectations of C programmers, at least up till now. There's not much one can do up front about differing cultural expectations.

It would be easy to drive any one of these principles into the ground, at the expense of other principles. In fact, various languages have done precisely that.

My overriding design principle has always been that the complexity of the solution space should map well onto the complexity of the problem space. Simplification good! Oversimplification bad! Placing artificial constraints on the solution space produces an impedence mismatch with the problem space, with the result that using a language that is artificially simple induces artificial complexity in all solutions written in that language.

One artificial constraint that all computer languages must deal with is the number of symbols available on the keyboard, corresponding roughly to the number of symbols in ASCII. Most computer languages have compensated by defining systems of operators that include digraphs, trigraphs, and worse. This works pretty well, up to a point. But it means that certain common unary operators cannot be used as the end of a digraph operator. Early versions of C had assignment operators in the wrong order. For instance, there used to be a =- operator. Nowadays that's spelled -=, to avoid conflict with unary minus.

By the same token (no pun intended), you can't easily define a unary = operator without requiring a space before it most of the time, since so many binary operators end with the = character.

Perl gets around some of these problems by keeping track of whether it is expecting an operator or a term. As it happens, a unary operator is simply one that occurs when Perl is expecting a term. So Perl could keep track of a unary = operator, even if the human programmer might be confused. So I'd place a unary = operator in the category of ``OK, but don't use it for anything that will cause widespread confusion.'' Mind you, I'm not proposing a specific use for a unary = at this point. I'm just telling you how I think. If we ever do get a unary = operator, we will hopefully have taken these issues into account.

While we can disambiguate operators based on whether an operator or a term is expected, this implies some syntactic constraints as well. For instance, you can't use the same symbol for both a postfix operator and a binary operator. So you'll never see a binary ++ operator in Perl, because Perl wouldn't know whether to expect a term or operator after that. It also implies that we can't use the ``juxtaposition'' operator. That is, you can't just put two terms next to each other, and expect something to happen (such as string concatenation, as in awk). What if the second term started with something looked like an operator? It would be misconstrued as a binary operator.

Well, enough of these vague generalities. On to the vague specifics.

The RFCs for this apocalypse are (as usual) all over the map, but don't cover the map. I'll talk first about what the RFCs do cover, and then about what they don't. Here are the RFCs that happened to get themselves classified into chapter 3:

    RFC   PSA    Title
    ---   ---    -----
    024   rr     Data types: Semi-finite (lazy) lists
    025   dba    Operators: Multiway comparisons
    039   rr     Perl should have a print operator 
    045   bbb    C<||> and C<&&> should propagate result context to both sides
    054   cdr    Operators: Polymorphic comparisons
    081   abc    Lazily evaluated list generation functions
    082   abc    Arrays: Apply operators element-wise in a list context
    084   abb    Replace => (stringifying comma) with => (pair constructor)
    104   ccr    Backtracking
    138   rr     Eliminate =~ operator.
    143   dcr    Case ignoring eq and cmp operators
    170   ccr    Generalize =~ to a special "apply-to" assignment operator
    283   ccc    C<tr///> in array context should return a histogram
    285   acb    Lazy Input / Context-sensitive Input
    290   bbc    Better english names for -X
    320   ccc    Allow grouping of -X file tests and add C<filetest> builtin

Note that you can click on the following RFC titles to view a copy of the RFC in question. The discussion sometimes assumes that you've read the RFC.

Pages: 1, 2, 3, 4, 5, 6

Next Pagearrow





Contact Us | Advertise with Us | Privacy Policy | Press Center | Jobs | Submissions Guidelines

Copyright © 2000-2008 O’Reilly Media, Inc. All Rights Reserved. | (707) 827-7000 / (800) 998-9938
All trademarks and registered trademarks appearing on the O'Reilly Network are the property of their respective owners.

For problems or assistance with this site, email