How about this? It doesn't count for nested comments, which means it'll still break using your haystack Geert, but it doesn't use any (I repeat any!) lazy operators. Based off an example I posted elsewhere yesterday, I've updated it - so we're looking for an opening and closing tag, but the in between is a little smarter than before where it just looked for anything and everything. It looks for any character
except for another star, or a star but only if it's not the closing tag, or a new line.
PHP Code:
$szTest = <<<EOF
/*** this
is*** the
test **/
/* test*/
booga!
/* more
new lines
in this
comment
*/
EOF;
$szTest = preg_replace("~/\*([^*]|\*+[^*/]|[\r\n])*\*+/~", 'booga!', $szTest);
//$szTest = preg_replace("~/\*(.|[\r\n])*?\*/~", 'booga!', $szTest);
echo $szTest;
- /\*([^*]|\*+[^*/]|[\r\n])*\*+/
- /\*(.|[\r\n])*?\*/
...but which one is more efficient? The second one has a lazy operator, but it matches the same as the first one. I like the second one because it's more readable, but I'm prone to using things like the first one because it's less readable and people looking at my work may think I'm actually smarter than I really am.
Other than the ego, I also like the first one because it's job description is much more defined. Despite accomplishing the same task, it's doing what it's meant to do, whereas the second, although accomplishing the same task for now, has more ability to let something slide through that it shouldn't in the future. Or will it... hmmmmmmmmmmmmm.
-m
edit: Here's a hackish version that matches newlines without mentioning newlines: /\*[\w\W]*?\*/