Tuesday, 18 June 2013
Facebook StumbleUpon Twitter Google+ Pin It

Weird properties of PHP's lexer and parser

There are (as of PHP 5.3.0) only two tokens which represent a single character:
    • This only occurs inside of interpolated strings, e.g. "{$foo}" lexes to: '"' T_CURLY_OPEN T_VARIABLE '}' '"'
  • Technically, there is a third, T_BAD_CHARACTER, but it is non-specific. No longer true according to one of the php devs
There are two items in the parser which, instead of being unspecified and generating a generic parse error, exist only to throw a special parse error:
  • using isset() with something other than a variable
  • using __halt_compiler() anywhere other than the global scope (e.g., inside a function, conditional or loop)
(Shameless blog plug on this one) The closing tag ?> is implicitly converted to a semicolon. The opening tag consumes one character of whitepace (or two in case of windows newlines) after the literal tag, but is otherwise completely ignored by the parser. Thus, the following code is syntactically correct:
for ( $i = 0 ?><?php $i < 10 ?><?php ++$i ) echo "$i\n" ?>
And it lexes (after the first round transform) to
              T_ENCAPSED_AND_WHITESPACE '"' ';'
The next several relate to variable interpolation syntax. For these, it helps to know the difference between a statement (ifforwhile, etc) and an expression (something with a value, like a variable, object lookup, function call, etc).
  1. If you interpolate an array with a single element lookup and no braces, non-identifier-non-whitespace chars will be parsed as single-character tokens until either a whitespace character or closing bracket is encountered.
    • e.g., "$foo["bar$$foo]" lexes to '"' T_VARIABLE '[' '"' T_STRING '$' '$' T_STRING ']' '"'
  2. In a similar scenario to the above, if you do use a space inside the braces, you will get an extra, emptyT_ENCAPSED_AND_WHITESPACE token.
    • e.g., "$foo[ whatever here" lexes to '"' T_VARIABLE '[' T_ENCAPSED_AND_WHITESPACE T_ENCAPSED_AND_WHITESPACE '"'
  3. In the midst of complex interpolation, if you are in one of the constructs that allows you to use full expressions, you can insert a closing tag (which PHP considers to be the same as a ';' and therefore bad syntax, but nevertheless), and it will be parsed as such. Furthermore, if you use an open tag, the lexer will remember that you were in the middle of an expression inside a string interpolation, although this seems like a moment of good design and implementation (or something like it).
You can nest heredocs. Seriously. Consider the following:
echo <<<THONE
You can nest it as deep as you want, which is terrible (edit: a terrible thing to do), but what is hilarious is that, while the actual PHP interpreter handles this scenario correctly, the PHP userland tokenizer, token_get_all(), cannot handle it, and parses the remainder of the source after the innermost heredoc to be one long interpolated string (edit: according to a person on the php dev team, this is fixed in 5.5).
I hope these oddities have been as amusing for you to read about here as they have been for me to discover.


Siddarth Gaur said...

Revolute is complete PHP Web Development Company who is specialized in customized website development. We provide quality web development services at nominal rate.

Robert patel said...

I have come across better blog writers who are capable of holding the attention of their readers. You can check out some really awesome blogs at www.zopthemes.com. As I can clearly make out your amateurish content, you can brush up your writing skills with the help of good blogging tips at how to edit photograph,freelance photography. Hey…no personal feelings. Just wanted to help.

Ecommerce Website Development in Punjab said...

That's really nice post. really appreciate your work.

Jayam Web Solutions said...

Thanks for sharing the very useful information. Keep sharing..

Web Design and Development | SEO Services | Mobile Apps and Ecommerce | Web Hosting

vinu priya said...

This was so useful and informative. The article helped me to learn something new.
Mobile App Development Company in Chennai

Lokashree sam said...

It was a great information and Its really worth enough reading it. The author did an mind blowing work by describing each and every concept in detail. Thanks for such an informative post. Please keep up your good work.
Marine Colleges in Chennai, Engineering Colleges In Chennai