TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   Advanced PHP Programming (http://www.talkphp.com/advanced-php-programming/)
-   -   get part of html file (http://www.talkphp.com/advanced-php-programming/5209-get-part-html-file.html)

russellharrower 01-05-2010 03:03 PM

get part of html file
 
Hi guys, I am trying to get a section of html from a website that has the following in it

Code:

<OBJECT classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"
                                          codebase="http://download.macromedia.com/"
                                          WIDTH="300" HEIGHT="300"> <PARAM NAME=movie VALUE="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15">
<PARAM NAME=quality VALUE=high><param name="wmode" value="transparent"><PARAM NAME=bgcolor VALUE=#FFFFFF> <EMBED src="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15"
                                                    quality=high wmode=transparent bgcolor=#FFFFFF WIDTH="258" HEIGHT="300"
                                                            TYPE="application/x-shockwave-flash"
                                                    PLUGINSPAGE="http://www.macromedia.com/go/getflashplayer">
</EMBED> </OBJECT>

The part I want for my script is the following.
Code:

/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15
If someone would be able to help that would be great. the website is @ http://bit.ly/7xNBoa

delayedinsanity 01-05-2010 06:47 PM

Can you expand a little more on the purpose of this script? Is it only grabbing data from *this* html file, or is it looking in a variety of html files for the SRC attribute of the EMBED element? Do you have local access to the file or are you grabbing it with cURL? Are you vegetarian or meatatarian?

russellharrower 01-06-2010 01:55 AM

cURL, What I want is to get the information that is after /graph-mod.swf? then take the data from day= onwards,
Then I want to take that data and spilt each one where the & sign is.

delayedinsanity 01-06-2010 02:28 AM

You could probably do one of two things. Use a regular expression to find graph-mod.swf and grab everything after that up until the first double-quote it finds. The other possibility is to drop the entire file into a string, then find the location of the first or second occurrence using strpos, use that as an offset to find the next double-quote, the substr out what's in between. I'd go regular expression though.

russellharrower 01-06-2010 09:10 AM

I think I know what you mean, however can I ask if you can give an example on how to do the first way you said?

thanks

Cypher 01-11-2010 03:10 PM

If you still need it, here's a small example that will get you an array with all query parameters:

PHP Code:

$s '<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/" width="300" height="300">
        <param name="movie" value="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15">
        <param name="quality" value="high">
        <param name="wmode" value="transparent">
        <param name="bgcolor" value="#ffffff">
        <embed src="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15"
            quality="high"
            wmode="transparent"
            bgcolor="#ffffff"
            width="258"
            height="300"
            type="application/x-shockwave-flash"
            pluginspage="http://www.macromedia.com/go/getflashplayer">
        </embed>
    </object>'
;

preg_match('/\<param name=\"movie\" value=\"(.*?)\"/sim'$s$matches);

$url parse_url($matches[1]);
parse_str($url['query'], $params);

var_dump($params); 

In my case the output is:
Code:

array(37) {
  ["day"]              => string(1) "3"
  ["amht1"]            => string(4) "1.50"
  ["pmht1"]            => string(1) "2"
  ["amwind1"]          => string(3) "ssw"
  ["pmwind1"]          => string(3) "ssw"
  ["am_wind_strength1"] => string(2) "10"
  ["pm_wind_strength1"] => string(2) "25"
  ["amht2"]            => string(1) "2"
  ["pmht2"]            => string(1) "3"
  ["amwind2"]          => string(2) "se"
  ["pmwind2"]          => string(1) "e"
  ["am_wind_strength2"] => string(2) "15"
  ["pm_wind_strength2"] => string(2) "25"
  ["amht3"]            => string(4) "2.50"
  ["pmht3"]            => string(4) "1.50"
  ["amwind3"]          => string(1) "n"
  ["pmwind3"]          => string(1) "n"
  ["am_wind_strength3"] => string(2) "20"
  ["pm_wind_strength3"] => string(2) "20"
  ["amht4"]            => string(4) "1.50"
  ["pmht4"]            => string(4) "2.50"
  ["amwind4"]          => string(2) "nw"
  ["pmwind4"]          => string(2) "sw"
  ["am_wind_strength4"] => string(1) "0"
  ["pm_wind_strength4"] => string(2) "20"
  ["amht5"]            => string(4) "2.50"
  ["pmht5"]            => string(1) "3"
  ["amwind5"]          => string(1) "n"
  ["pmwind5"]          => string(2) "sw"
  ["am_wind_strength5"] => string(1) "0"
  ["pm_wind_strength5"] => string(2) "20"
  ["amht6"]            => string(4) "2.50"
  ["pmht6"]            => string(4) "1.50"
  ["amwind6"]          => string(1) "n"
  ["pmwind6"]          => string(2) "sw"
  ["am_wind_strength6"] => string(1) "0"
  ["pm_wind_strength6"] => string(2) "15"
}

Will that do? :)

russellharrower 01-11-2010 04:25 PM

@Cypher I think you hit it on the head, Just testing it and adding a few things...

russellharrower 01-11-2010 04:54 PM

@Cypher I think you hit it on the head, Just testing it and adding a few things...

However did not go to plan here is the code.
Code:

<?php

$config['url']      = "http://swellnet.com.au/loc_report.php?region_id=27&state_id=3"; // url of html to grab
$config['start_tag'] = '5 Day Swell Graph </td>'; // where you want to start grabbing
$config['end_tag']  = '</body>'; // where you want to stop grabbing
$config['show_tags'] = 0; // do you want the tags to be shown when you show the html? 1 = yes, 0 = no

class grabber
{
        var $error = '';
        var $html  = '';
       
        function grabhtml( $url, $start, $end )
        {
                $file = file_get_contents( $url );
               
                if( $file )
                {
                        if( preg_match_all( "#$start(.*?)$end#s", $file, $match ) )
                        {                               
                                $this->html = $match;
                        }
                        else
                        {
                                $this->error = "Tags cannot be found.";
                        }
                }
                else
                {
                        $this->error = "Site cannot be found!";
                }
        }
       
        function strip( $html, $show, $start, $end )
        {
                if( !$show )
                {
                        $html = str_replace( $start, "", $html );
                        $html = str_replace( $end, "", $html );
                       
                        return $html;
                }
                else
                {
                        return $html;
                }
        }
}

$grab = new grabber;
$grab->grabhtml( $config['url'], $config['start_tag'], $config['end_tag'] );

echo $grab->error;
$i = 0;
foreach( $grab->html[0] as $html )
{
        $s[$i++] = htmlspecialchars( $grab->strip( $html, $config['show_tags'], $config['start_tag'], $config['end_tag'] ) );
}

$s = $s[0];

print $s;
preg_match('/\WIDTH=\"300\" HEIGHT=\"300\"> <PARAM NAME=movie VALUE=\"(.*?)\"/sim', $s, $matches);

$url = parse_url($matches[1]);
parse_str($url['query'], $params);

var_dump($params);

?>

As you can see it prints the code $s but then does not do the next step.

Cypher 01-12-2010 10:12 AM

It works if you remove "htmlspecialchars". Except I don't quite understand what you want to do with that class. Do you want to have a generic class where you can pass start and end text and extract everything between them and in addition choose if you want to encode all html entities?

Cypher 01-12-2010 10:49 AM

Out of nothing to do, I have created two classes for you that you can use. Hope you can use php5 as I noticed that you are doing everything in php4.

Can't help you with anything else. Learn from what I've given you and let me know if something is not clear.

Here's the code:

PHP Code:

<?php

class Swellnet_Locations
{

    
// states
    
const STATE_QUEENSLAND 1;
    const 
STATE_NEW_SOUTH_WALES 2;
    const 
STATE_VICTORIA 3;
    const 
STATE_SOUTH_AUSTRALIA 4;
    const 
STATE_WESTERN_AUSTRALIA 5;
    const 
STATE_TASMANIA 6;

    
// Queensland
    
const REGION_GOLD_COAST 17;
    const 
REGION_SUNSHINE_COAST 18;
    const 
REGION_AGNES_WATER 1;
    const 
REGION_BALLINA 3;
    const 
REGION_YAMBA 16;

    
// New South Wales
    
const REGION_COFFS_HARBOUR 7;
    const 
REGION_PT_MACQUARIE 11;
    const 
REGION_NEWCASTLE 10;
    const 
REGION_CENTRAL_COAST 6;
    const 
REGION_NARRABEEN 13;
    const 
REGION_CURL_CURL 12;
    const 
REGION_BONDI 4;
    const 
REGION_MAROUBRA 9;
    const 
REGION_CRONULLA 8;
    const 
REGION_WOLLONGONG 15;

    
// Victoria
    
const REGION_WARRNAMBOOL 34;
    const 
REGION_TORQUAY 28;
    const 
REGION_13TH_BEACH 27;
    const 
REGION_MORNINGTON_PEN 31;
    const 
REGION_WESTERN_PORT 32;
    const 
REGION_PHILLIP_ISLAND 35;
    const 
REGION_WOOLAMAI 33;

    
// South Australia
    
const REGION_MID_COAST 19;
    const 
REGION_VICTOR_HARBOR 22;

    
// Tasmania
    
const REGION_NORTH_EAST 25;
    const 
REGION_HOBART 26;

    
// Western Australia
    
const REGION_MARGARET_RIVER 38;
    const 
REGION_PERTH 39;
    const 
REGION_GERALDTON 37;
}



class 
Swellnet_Graph
{

    
// swellnet url mask
    
const URL_MASK 'http://swellnet.com.au/loc_report.php?state_id=%d&region_id=%d';

    
// class variables
    
private $_html null;
    private 
$_url null;
    private 
$_params null;


    
/**
     * Class constructor
     *
     * @param integer $state  State id
     * @param integer $region Region id
     * @return SwellGraph
     */
    
public function __construct($state$region)
    {
        
// generate the url
        
$this->_url sprintf(self::URL_MASK$state$region);

        
// get html of the target url
        
$this->_html file_get_contents($this->_url);

        
// fail if couldn't open the url
        
if ($this->_html === false) {
            throw new 
Exception('Could not fetch contents of ' $this->_url);
        }

        
// attempt to extract the graph source url
        
preg_match('/value=\"(\/graph-mod\.swf\?.*?)\"/si'$this->_html$matches);

        
// fail if the number of matches is incorrect
        
if (count($matches) != 2) {
            throw new 
Exception('Error extracting the graph source');
        }

        
// extract url parameters
        
$url $matches[1];
        
$parse parse_url($url);
        
parse_str($parse['query'], $this->_params);
    }


    
/**
     * Return request url
     *
     * @return string
     */ 
    
public function getUrl()
    {
        return 
$this->_url;
    }


    
/**
     * Return request html
     *
     * @return string
     */ 
    
public function getHtml()
    {
        return 
$this->_html;
    }


    
/**
     * Return resulting parameters
     *
     * @return array
     */ 
    
public function getParams()
    {
        return 
$this->_params;
    }

}


// fetch parameters of the graph
$sg = new Swellnet_Graph(Swellnet_Locations::STATE_QUEENSLANDSwellnet_Locations::REGION_GOLD_COAST);

var_dump($sg->getUrl());
var_dump($sg->getParams());

Hope that helps.


All times are GMT. The time now is 02:42 AM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0