View Single Post
Old 03-21-2008, 03:58 PM   #4 (permalink)
Salathe
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

Why not use DOM in this instance, it will provide a far more reliable means of grabbing the paragraph elements rather than trying to delve into the intricacies of a suitable regular expression.

For example:

PHP Code:
<?php

/*
    Load the HTML document. It is a good idea 
    to cache the remote document rather than
    load it from the remote server every time 
    the script is called 
*/
$dom = @DOMDocument::loadHTMLFile('http://lipsum.com/feed/html');

/*
    Grab all paragraph elements in the document.
    $nodes is a DOMNodeList object
*/
$nodes $dom->getElementsByTagName('p');

/*
    Quick debugging to see what we've got 
*/
header('Content-Type: text/plain; charset=utf-8');
foreach (
$nodes as $p)
{
    
// Could use $p->textContent if we only wanted
    // the text content (no HTML tags)
    
var_dump($dom->saveXML($p));
}
Salathe is offline  
Reply With Quote