View Single Post
Old 12-12-2007, 01:28 AM   #2 (permalink)
Salathe
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

I've arrived at a pattern (it's not majorly advanced, even if it has that appearance) from a starting point offered elsewhere where the same question was asked.
#<.*?(?:\s+[\w\W]+?(?:\s*=\s*([\'"]?).*?(?<!\\\\)\\1))*?\>#s
It basically looks for tags (items surrounded by <>), with the bulk of the pattern catering for optional content within the tag (attributes or other random junk).

In tuning the pattern, I also wrote up a quick series of tests (generally something I do with all code snippets) which you can try out yourself.

php Code:
<?php

$tests = array(
    '<aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa>moo',
    '<a onclick="javascript.writeln(\">these angl>>>>>>e brackets are smelly\');">Boo Boo</a>',
    '<img src="image.gif" onload="if (this.width<50) {this.src=\'image2.gif\'; this.width=\'120\'; this.height=\'90\'}">
<p>This is some text</p>'
,
    '<TD WIDTH="14%" BACKGROUND="images.jpg"><A HREF="http://something.xxx">
<IMG SRC="image.gif" BORDER="0" ONLOAD="if (this.width>50) this.border=1" ALT="Preview by Thumbshots"
WIDTH="45">testestets>blah</A></TD>'
,
    file_get_contents('http://example.org/')
);

// Tweaked from [url]http://forums.devnetwork.net/viewtopic.php?t=25494[/url]
$pattern = '#<.*?(?:\s+[\w\W]+?(?:\s*=\s*([\'"]?).*?(?<!\\\\)\\1))*?\>#s';

foreach ($tests as $id => $test)
{
    $start  = microtime(true);
    $result = preg_replace($pattern, '', $test);
    $time   = round((microtime(true) - $start) * 1000, 6);
    printf('<h4>TEST %d (%s ms)</h4><pre>%s</pre>', $id + 1, $time, $result);
    echo "\n";
}

It's only a very quick solution so there could well be huge flaws!! I'll look over it for potential problems when it's not so late and my eyes aren't struggling to focus on the screen.
Salathe is offline  
Reply With Quote