TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   Absolute Beginners (http://www.talkphp.com/absolute-beginners/)
-   -   Building an array of matches... (http://www.talkphp.com/absolute-beginners/2501-building-array-matches.html)

delayedinsanity 03-21-2008 04:29 AM

Building an array of matches...
 
I want to take a page of HTML and put everything in paragraph tags into an array. So far I've failed, obviously... can this be done using a regular expression match easily enough, or is there a better way? This is the short piece of code I've been doing the test with so far;

PHP Code:

$match "<p>test1</p> <p>test2</p>";
$worked preg_match_all("/(\<p\>.*\<\/p\>)/"$match$matches); 

The problem is that it returns the whole string as a single match. Augh!
*!*

delayedinsanity 03-21-2008 05:07 AM

Hmm, feeling a little silly now, I changed it to

PHP Code:

$worked preg_match_all("/(\<p\>[a-z0-9_]*\<\/p\>)/i"$match$matches); 

Now, I just have to figure out how to allow other tags inside the P tags... question still stands though, is there a better method for doing this, or should I just keep with the regular expression till I get it?

Wildhoney 03-21-2008 01:17 PM

You're definitely going down the right path with the regular expressions -- but what precisely are you trying to do, just get everything between the 2 P tags? How about something like the following:

php Code:
preg_match_all('~<p>(.+?)</p>~i', $match, $matches);

Salathe 03-21-2008 03:58 PM

Why not use DOM in this instance, it will provide a far more reliable means of grabbing the paragraph elements rather than trying to delve into the intricacies of a suitable regular expression.

For example:

PHP Code:

<?php

/*
    Load the HTML document. It is a good idea 
    to cache the remote document rather than
    load it from the remote server every time 
    the script is called 
*/
$dom = @DOMDocument::loadHTMLFile('http://lipsum.com/feed/html');

/*
    Grab all paragraph elements in the document.
    $nodes is a DOMNodeList object
*/
$nodes $dom->getElementsByTagName('p');

/*
    Quick debugging to see what we've got 
*/
header('Content-Type: text/plain; charset=utf-8');
foreach (
$nodes as $p)
{
    
// Could use $p->textContent if we only wanted
    // the text content (no HTML tags)
    
var_dump($dom->saveXML($p));
}


delayedinsanity 03-21-2008 04:01 PM

Yeah, everything between an opening and closing P including other tags, etc. So that the following,

HTML Code:

<p>Fusce porta pede nec eros. Maecenas ipsum sem, interdum non, aliquam vitae, interdum nec, metus. Maecenas ornare lobortis risus. Etiam placerat varius mauris.</p>

<p>Maecenas viverra. <a href="">Sed feugiat.</a> Donec mattis quam aliquam risus. Proin quis massa semper felis euismod ultricies.</p>

...for example, would return two matches.

Geert 03-23-2008 03:15 PM

Worked out WildHoney's regex a bit further. Now also allows newlines inside p elements, as well as html attributes.

Code:

#<p\b[^>]*+>(.+?)</p>#is

delayedinsanity 03-24-2008 07:50 PM

Salathe: I'll look more into that - it grabs the elements and everything in between them though, or does it just go through and match the elements themsevles?

Geert: Thank you, I've actually gotten slowed down working on the design again and less on the coding, but I should get back into it in the next day or two here and I'll give that a try.


All times are GMT. The time now is 03:50 PM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0