TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   General (http://www.talkphp.com/general/)
-   -   preg_match_all (http://www.talkphp.com/general/2583-preg_match_all.html)

crazyryan 04-04-2008 10:46 PM

preg_match_all
 
Hey, I'm trying to get data from retailmenot.com

This is what I have:

PHP Code:

<?php
if(isset($_GET['domain'])) {
    
$page file_get_contents("http://www.retailmenot.com/view/" $_GET['domain'] . "");
    
preg_match_all('/<a name="(.*)"><\/a><div class="coupon" id="(.*)"><table class="details"><tr><th class="code">Code:<\/th><td class="code" id="(.*)"><strong>(.*)<\/strong> <a class="useButton" href="\/out\/(.*)" target="_blank" onclick="javascript:pageTracker._trackPageview(\'\/outgoing\/aff\/(.*)\');" rel="nofollow"><span>Use coupon &raquo;<\/span><\/a><div class="affTracker"><\/div><\/td><\/tr><tr><th>Discount:<\/th><td class="discount">(.*)<\/td><\/tr><tr><th>Stats:<\/th><td class="stats"><span class="good">(.*)<\/span>/'$page$output);
    
print_r($output);
    } else {
}
?>

But when $_GET['domain'] = namecheap.com or any other data, I still get no data scraped..

Any ideas?

TlcAndres 04-04-2008 10:51 PM

Correct me if I'm wrong but I always thought to get data from the regular expression into the resulting matches you had to use (.*?) - other than that don't have the time right now to sit down and look through your expression.

crazyryan 04-04-2008 10:52 PM

Quote:

Originally Posted by TlcAndres (Post 13180)
Correct me if I'm wrong but I always thought to get data from the regular expression into the resulting matches you had to use (.*?) - other than that don't have the time right now to sit down and look through your expression.

Just tried that but still didn't work, exactly the same as before, I get this:

Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) [3] => Array ( ) [4] => Array ( ) [5] => Array ( ) [6] => Array ( ) [7] => Array ( ) [8] => Array ( ) [9] => Array ( ) )

TlcAndres 04-04-2008 11:09 PM

Oh and I forgot to ask is all this html really on the same line? if it's not then you need to add the modifier to ignore new lines...which is..god I can't remember for the life of me.

crazyryan 04-04-2008 11:11 PM

I think that may be why, I just used this:
preg_match_all('/<td class="code" id="(.*?)"><strong>(.*?)<\/strong>/', $page, $output);

Instead of the whole area I need and brought back the coupons but I need the other areas too.

Wildhoney 04-04-2008 11:36 PM

I'm sure I wrote a class for this once, but I've wrote it again nevertheless as it's easy enough to write. It can be used as simple as this:

php Code:
$pRetail = new TalkPHP_RetailMeNot('namecheap.com');
$pData = $pRetail->getData();

You would then loop the results like so:

html4strict Code:
<table>
    <tr>
        <th>Code</th>
        <th>Description</th>
        <th>Success Rate</th>
    <tr>
   
    <?php foreach($pData as $pItem): ?>
    <tr>
        <td><?php echo $pItem->code; ?></td>
        <td><?php echo $pItem->description; ?></td>
        <td><?php echo $pItem->success; ?></td>
    </tr>
    <?php endforeach; ?>
   
</table>

...And finally the class itself :-) Enjoy it! Just give TalkPHP.com some credit when the opportunity arises. Please!

php Code:
class TalkPHP_RetailMeNot
{
    private $m_szAddress;
    private $m_aData;
   
    public function __construct($szAddress)
    {
        $this->m_aData = array();
        $this->m_szAddress = sprintf('http://www.retailmenot.com/view/%s', $szAddress);
        $this->parse();
    }
   
    public function getData()
    {
        return (object) $this->m_aData;
    }
   
    public function parse()
    {
        $szContents = file_get_contents($this->m_szAddress);
       
        preg_match_all
        (
            '~<td class="code" id="code.+?"><strong>(.+?)</strong>.*<td class="discount">(.+?)</td>.*<td class="stats"><span class=".+?">(.+?) success rate</span>~im',
            $szContents,
            $aMatches
        );
       
        $iCount = count($aMatches[1]);
       
        for($iIndex = 0; $iIndex < $iCount; $iIndex++)
        {
            $this->m_aData[] = (object) array
            (
                'code'    => $aMatches[1][$iIndex],
                'description'   => $aMatches[2][$iIndex],
                'success'     => $aMatches[3][$iIndex]
            );
        }
    }
}

crazyryan 04-05-2008 11:55 AM

Thanks, appreciated :)

crazyryan 04-21-2008 01:16 PM

Hey, me again - for some reason the script stopped functioning, I'm guessing it's the regex and tried fixing it but I couldn't get it working again, any chance someone could look into it? Thanks

Salathe 04-21-2008 01:25 PM

Their HTML has changed which means that the regex will need updating, it's a simple fix and one I'd encourage you to have a go at yourself.

crazyryan 04-21-2008 01:34 PM

Quote:

Originally Posted by Salathe (Post 13709)
Their HTML has changed which means that the regex will need updating, it's a simple fix and one I'd encourage you to have a go at yourself.

I'm lazy :-P

<td class="code" id="code.+?">
to
<td class="code.+?" id="code.+?">

did the job, thanks :D

kelkadir 06-12-2008 02:00 AM

I thought we could go on the same thread...
I have the following html or string $str:


<div class="MutiStuffe">
<p><strong>Decription: </strong><br />
extra1, extra2, extra3, ..........extras50</p>
</div>


and I would like to match "extra1, extra2, extra3, ..........extras50"

preg_match_all( "/\,(.*?)\</", $str, $array); matches from extra2 till extras50< including "<"

But still not what I need.
Thanks in advance for your proposals


All times are GMT. The time now is 10:49 AM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0