TalkPHP
 
 
Account Login
Latest Articles
» The basic usage of PHPTAL, a XML/XHTML template library for PHP
» Vulnerable methods and the areas they are commonly trusted in.
» Simple way to protect a form from bot
» The Basics On: How Session Stealing Works
» How to keep your forms from double posting data
IRC Channel
IRC Speech Bubble Join the friendly bunch on IRC...
(#TalkPHP on Freenode)

...Also available via a web interface.

See this thread for information on the TalkPHP Free Hugs Initiative™. Subject to availability.
Associates
Associates
CSS Tutorials
Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 04-04-2008, 10:46 PM   #1 (permalink)
The Contributor
 
Join Date: Dec 2007
Posts: 27
Thanks: 0
crazyryan is on a distinguished road
Default preg_match_all

Hey, I'm trying to get data from retailmenot.com

This is what I have:

PHP Code:
<?php
if(isset($_GET['domain'])) {
    
$page file_get_contents("http://www.retailmenot.com/view/" $_GET['domain'] . "");
    
preg_match_all('/<a name="(.*)"><\/a><div class="coupon" id="(.*)"><table class="details"><tr><th class="code">Code:<\/th><td class="code" id="(.*)"><strong>(.*)<\/strong> <a class="useButton" href="\/out\/(.*)" target="_blank" onclick="javascript:pageTracker._trackPageview(\'\/outgoing\/aff\/(.*)\');" rel="nofollow"><span>Use coupon &raquo;<\/span><\/a><div class="affTracker"><\/div><\/td><\/tr><tr><th>Discount:<\/th><td class="discount">(.*)<\/td><\/tr><tr><th>Stats:<\/th><td class="stats"><span class="good">(.*)<\/span>/'$page$output);
    
print_r($output);
    } else {
}
?>
But when $_GET['domain'] = namecheap.com or any other data, I still get no data scraped..

Any ideas?
crazyryan is offline  
Reply With Quote
Old 04-04-2008, 10:51 PM   #2 (permalink)
The Addict
 
Join Date: Nov 2007
Posts: 264
Thanks: 2
TlcAndres is on a distinguished road
Default

Correct me if I'm wrong but I always thought to get data from the regular expression into the resulting matches you had to use (.*?) - other than that don't have the time right now to sit down and look through your expression.
__________________
"What everyone seems to forget is that while knowledge certainly is something - it's the implementation of knowledge that brings power" - Andres Galindo.
TlcAndres is offline  
Reply With Quote
Old 04-04-2008, 10:52 PM   #3 (permalink)
The Contributor
 
Join Date: Dec 2007
Posts: 27
Thanks: 0
crazyryan is on a distinguished road
Default

Quote:
Originally Posted by TlcAndres View Post
Correct me if I'm wrong but I always thought to get data from the regular expression into the resulting matches you had to use (.*?) - other than that don't have the time right now to sit down and look through your expression.
Just tried that but still didn't work, exactly the same as before, I get this:

Array ( [0] => Array ( ) [1] => Array ( ) [2] => Array ( ) [3] => Array ( ) [4] => Array ( ) [5] => Array ( ) [6] => Array ( ) [7] => Array ( ) [8] => Array ( ) [9] => Array ( ) )
crazyryan is offline  
Reply With Quote
Old 04-04-2008, 11:09 PM   #4 (permalink)
The Addict
 
Join Date: Nov 2007
Posts: 264
Thanks: 2
TlcAndres is on a distinguished road
Default

Oh and I forgot to ask is all this html really on the same line? if it's not then you need to add the modifier to ignore new lines...which is..god I can't remember for the life of me.
__________________
"What everyone seems to forget is that while knowledge certainly is something - it's the implementation of knowledge that brings power" - Andres Galindo.
TlcAndres is offline  
Reply With Quote
Old 04-04-2008, 11:11 PM   #5 (permalink)
The Contributor
 
Join Date: Dec 2007
Posts: 27
Thanks: 0
crazyryan is on a distinguished road
Default

I think that may be why, I just used this:
preg_match_all('/<td class="code" id="(.*?)"><strong>(.*?)<\/strong>/', $page, $output);

Instead of the whole area I need and brought back the coupons but I need the other areas too.
crazyryan is offline  
Reply With Quote
Old 04-04-2008, 11:36 PM   #6 (permalink)
La Vida es Sueño
Advanced Programmer Top Contributor 
 
Wildhoney's Avatar
 
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
Wildhoney is on a distinguished road
Default

I'm sure I wrote a class for this once, but I've wrote it again nevertheless as it's easy enough to write. It can be used as simple as this:

php Code:
$pRetail = new TalkPHP_RetailMeNot('namecheap.com');
$pData = $pRetail->getData();

You would then loop the results like so:

html4strict Code:
<table>
    <tr>
        <th>Code</th>
        <th>Description</th>
        <th>Success Rate</th>
    <tr>
   
    <?php foreach($pData as $pItem): ?>
    <tr>
        <td><?php echo $pItem->code; ?></td>
        <td><?php echo $pItem->description; ?></td>
        <td><?php echo $pItem->success; ?></td>
    </tr>
    <?php endforeach; ?>
   
</table>

...And finally the class itself Enjoy it! Just give TalkPHP.com some credit when the opportunity arises. Please!

php Code:
class TalkPHP_RetailMeNot
{
    private $m_szAddress;
    private $m_aData;
   
    public function __construct($szAddress)
    {
        $this->m_aData = array();
        $this->m_szAddress = sprintf('http://www.retailmenot.com/view/%s', $szAddress);
        $this->parse();
    }
   
    public function getData()
    {
        return (object) $this->m_aData;
    }
   
    public function parse()
    {
        $szContents = file_get_contents($this->m_szAddress);
       
        preg_match_all
        (
            '~<td class="code" id="code.+?"><strong>(.+?)</strong>.*<td class="discount">(.+?)</td>.*<td class="stats"><span class=".+?">(.+?) success rate</span>~im',
            $szContents,
            $aMatches
        );
       
        $iCount = count($aMatches[1]);
       
        for($iIndex = 0; $iIndex < $iCount; $iIndex++)
        {
            $this->m_aData[] = (object) array
            (
                'code'    => $aMatches[1][$iIndex],
                'description'   => $aMatches[2][$iIndex],
                'success'     => $aMatches[3][$iIndex]
            );
        }
    }
}
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
Send a message via AIM to Wildhoney Send a message via MSN to Wildhoney Send a message via Yahoo to Wildhoney
Wildhoney is offline  
Reply With Quote
Old 04-05-2008, 11:55 AM   #7 (permalink)
The Contributor
 
Join Date: Dec 2007
Posts: 27
Thanks: 0
crazyryan is on a distinguished road
Default

Thanks, appreciated :)
crazyryan is offline  
Reply With Quote
Old 04-21-2008, 01:16 PM   #8 (permalink)
The Contributor
 
Join Date: Dec 2007
Posts: 27
Thanks: 0
crazyryan is on a distinguished road
Default

Hey, me again - for some reason the script stopped functioning, I'm guessing it's the regex and tried fixing it but I couldn't get it working again, any chance someone could look into it? Thanks
crazyryan is offline  
Reply With Quote
Old 04-21-2008, 01:25 PM   #9 (permalink)
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

Their HTML has changed which means that the regex will need updating, it's a simple fix and one I'd encourage you to have a go at yourself.
Salathe is offline  
Reply With Quote
Old 04-21-2008, 01:34 PM   #10 (permalink)
The Contributor
 
Join Date: Dec 2007
Posts: 27
Thanks: 0
crazyryan is on a distinguished road
Default

Quote:
Originally Posted by Salathe View Post
Their HTML has changed which means that the regex will need updating, it's a simple fix and one I'd encourage you to have a go at yourself.
I'm lazy

<td class="code" id="code.+?">
to
<td class="code.+?" id="code.+?">

did the job, thanks :D
crazyryan is offline  
Reply With Quote
Old 06-12-2008, 02:00 AM   #11 (permalink)
The Visitor
 
Join Date: Jun 2008
Posts: 1
Thanks: 0
kelkadir is on a distinguished road
Default

I thought we could go on the same thread...
I have the following html or string $str:


<div class="MutiStuffe">
<p><strong>Decription: </strong><br />
extra1, extra2, extra3, ..........extras50</p>
</div>


and I would like to match "extra1, extra2, extra3, ..........extras50"

preg_match_all( "/\,(.*?)\</", $str, $array); matches from extra2 till extras50< including "<"

But still not what I need.
Thanks in advance for your proposals
kelkadir is offline  
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT. The time now is 02:36 AM.

 
     

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0
Inactive Reminders By Icora Web Design