TalkPHP
 
 
Account Login
Latest Articles
» The basic usage of PHPTAL, a XML/XHTML template library for PHP
» Vulnerable methods and the areas they are commonly trusted in.
» Simple way to protect a form from bot
» The Basics On: How Session Stealing Works
» How to keep your forms from double posting data
IRC Channel
IRC Speech Bubble Join the friendly bunch on IRC...
(#TalkPHP on Freenode)

...Also available via a web interface.

See this thread for information on the TalkPHP Free Hugs Initiative™. Subject to availability.
Associates
Associates
CSS Tutorials
Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 12-12-2007, 12:16 AM   #1 (permalink)
Super Moderator
Advanced Programmer 
 
bluesaga's Avatar
 
Join Date: Sep 2007
Posts: 165
Thanks: 0
bluesaga is on a distinguished road
Default Majorly Advanced Regex

Looking for some clever clogs to figure out something for me:

Currently the strip_tags php function is rather rubbish, and simple folk must of wrote it! Well it doesn't check for the angle bracket within tag attributes.

So for example you have the html code:
Code:
<a onclick="javascript.writeln(\">these angl>>>>>>e brackets are smelly');">Boo Boo</a>
and you run it via strip_tags, PHP will return '>>>>>e brackets are smelly');">Boo Boo'

What i am requesting is some regex that will handle it as it should returning 'Boo Boo', i've been fiddling with lookaheads, behinds and arounds and just can't get it to match the whole tag!
__________________
Halo 3 Cheats
bluesaga is offline  
Reply With Quote
Old 12-12-2007, 01:28 AM   #2 (permalink)
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,324
Thanks: 5
Salathe is on a distinguished road
Default

I've arrived at a pattern (it's not majorly advanced, even if it has that appearance) from a starting point offered elsewhere where the same question was asked.
#<.*?(?:\s+[\w\W]+?(?:\s*=\s*([\'"]?).*?(?<!\\\\)\\1))*?\>#s
It basically looks for tags (items surrounded by <>), with the bulk of the pattern catering for optional content within the tag (attributes or other random junk).

In tuning the pattern, I also wrote up a quick series of tests (generally something I do with all code snippets) which you can try out yourself.

php Code:
<?php

$tests = array(
    '<aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa>moo',
    '<a onclick="javascript.writeln(\">these angl>>>>>>e brackets are smelly\');">Boo Boo</a>',
    '<img src="image.gif" onload="if (this.width<50) {this.src=\'image2.gif\'; this.width=\'120\'; this.height=\'90\'}">
<p>This is some text</p>'
,
    '<TD WIDTH="14%" BACKGROUND="images.jpg"><A HREF="http://something.xxx">
<IMG SRC="image.gif" BORDER="0" ONLOAD="if (this.width>50) this.border=1" ALT="Preview by Thumbshots"
WIDTH="45">testestets>blah</A></TD>'
,
    file_get_contents('http://example.org/')
);

// Tweaked from [url]http://forums.devnetwork.net/viewtopic.php?t=25494[/url]
$pattern = '#<.*?(?:\s+[\w\W]+?(?:\s*=\s*([\'"]?).*?(?<!\\\\)\\1))*?\>#s';

foreach ($tests as $id => $test)
{
    $start  = microtime(true);
    $result = preg_replace($pattern, '', $test);
    $time   = round((microtime(true) - $start) * 1000, 6);
    printf('<h4>TEST %d (%s ms)</h4><pre>%s</pre>', $id + 1, $time, $result);
    echo "\n";
}

It's only a very quick solution so there could well be huge flaws!! I'll look over it for potential problems when it's not so late and my eyes aren't struggling to focus on the screen.
Salathe is offline  
Reply With Quote
Old 12-12-2007, 05:17 PM   #3 (permalink)
The Contributor
RegEx Guru 
 
Join Date: Dec 2007
Location: Belgium
Posts: 60
Thanks: 6
Geert is on a distinguished road
Default

Quote:
Originally Posted by bluesaga View Post
So for example you have the html code:
Code:
<a onclick="javascript.writeln(\">these angl>>>>>>e brackets are smelly');">Boo Boo</a>
Are you aware that that actually is invalid html? As far as I know html does not allow embedded quotes to be escaped. Put that link in a file and open it in a browser, the javascript won't work and you'll only see the colored part:
Code:
<a onclick="javascript.writeln(\">these angl>>>>>>e brackets are smelly');">Boo Boo</a>
So the question is whether you really want to match html strings like this because when your matching your opening tag beyond the \" you're going on where normal html browsers stop.
__________________
Kohana - PHP5 framework
Geert is offline  
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT. The time now is 07:49 AM.

 
     

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2012, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0
Inactive Reminders By Icora Web Design