TalkPHP
 
 
Account Login
Latest Articles
» The basic usage of PHPTAL, a XML/XHTML template library for PHP
» Vulnerable methods and the areas they are commonly trusted in.
» Simple way to protect a form from bot
» The Basics On: How Session Stealing Works
» How to keep your forms from double posting data
IRC Channel
IRC Speech Bubble Join the friendly bunch on IRC...
(#TalkPHP on Freenode)

...Also available via a web interface.

See this thread for information on the TalkPHP Free Hugs Initiative™. Subject to availability.
Associates
Associates
CSS Tutorials
Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 01-05-2010, 03:03 PM   #1 (permalink)
The Contributor
 
russellharrower's Avatar
 
Join Date: Jul 2009
Posts: 80
Thanks: 13
russellharrower is on a distinguished road
Default get part of html file

Hi guys, I am trying to get a section of html from a website that has the following in it

Code:
<OBJECT classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"
  					codebase="http://download.macromedia.com/"
 					 WIDTH="300" HEIGHT="300"> <PARAM NAME=movie VALUE="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15"> 
<PARAM NAME=quality VALUE=high><param name="wmode" value="transparent"><PARAM NAME=bgcolor VALUE=#FFFFFF> <EMBED src="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15"
    						quality=high wmode=transparent bgcolor=#FFFFFF WIDTH="258" HEIGHT="300"
   							 TYPE="application/x-shockwave-flash"
    						PLUGINSPAGE="http://www.macromedia.com/go/getflashplayer"> 
</EMBED> </OBJECT>
The part I want for my script is the following.
Code:
/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15
If someone would be able to help that would be great. the website is @ http://bit.ly/7xNBoa
russellharrower is offline  
Reply With Quote
Old 01-05-2010, 06:47 PM   #2 (permalink)
is cute and cuddly
 
delayedinsanity's Avatar
 
Join Date: Mar 2008
Location: Vegas, Baby
Posts: 963
Thanks: 31
delayedinsanity is on a distinguished road
Default

Can you expand a little more on the purpose of this script? Is it only grabbing data from *this* html file, or is it looking in a variety of html files for the SRC attribute of the EMBED element? Do you have local access to the file or are you grabbing it with cURL? Are you vegetarian or meatatarian?
delayedinsanity is offline  
Reply With Quote
Old 01-06-2010, 01:55 AM   #3 (permalink)
The Contributor
 
russellharrower's Avatar
 
Join Date: Jul 2009
Posts: 80
Thanks: 13
russellharrower is on a distinguished road
Default

cURL, What I want is to get the information that is after /graph-mod.swf? then take the data from day= onwards,
Then I want to take that data and spilt each one where the & sign is.
russellharrower is offline  
Reply With Quote
Old 01-06-2010, 02:28 AM   #4 (permalink)
is cute and cuddly
 
delayedinsanity's Avatar
 
Join Date: Mar 2008
Location: Vegas, Baby
Posts: 963
Thanks: 31
delayedinsanity is on a distinguished road
Default

You could probably do one of two things. Use a regular expression to find graph-mod.swf and grab everything after that up until the first double-quote it finds. The other possibility is to drop the entire file into a string, then find the location of the first or second occurrence using strpos, use that as an offset to find the next double-quote, the substr out what's in between. I'd go regular expression though.
delayedinsanity is offline  
Reply With Quote
Old 01-06-2010, 09:10 AM   #5 (permalink)
The Contributor
 
russellharrower's Avatar
 
Join Date: Jul 2009
Posts: 80
Thanks: 13
russellharrower is on a distinguished road
Default

I think I know what you mean, however can I ask if you can give an example on how to do the first way you said?

thanks
russellharrower is offline  
Reply With Quote
Old 01-11-2010, 03:10 PM   #6 (permalink)
The Wanderer
 
Join Date: Jan 2010
Posts: 7
Thanks: 0
Cypher is on a distinguished road
Default

If you still need it, here's a small example that will get you an array with all query parameters:

PHP Code:
$s '<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/" width="300" height="300">
        <param name="movie" value="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15">
        <param name="quality" value="high">
        <param name="wmode" value="transparent">
        <param name="bgcolor" value="#ffffff">
        <embed src="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15"
            quality="high"
            wmode="transparent"
            bgcolor="#ffffff"
            width="258"
            height="300"
            type="application/x-shockwave-flash"
            pluginspage="http://www.macromedia.com/go/getflashplayer">
        </embed>
    </object>'
;

preg_match('/\<param name=\"movie\" value=\"(.*?)\"/sim'$s$matches);

$url parse_url($matches[1]);
parse_str($url['query'], $params);

var_dump($params); 
In my case the output is:
Code:
array(37) {
  ["day"]               => string(1) "3"
  ["amht1"]             => string(4) "1.50"
  ["pmht1"]             => string(1) "2"
  ["amwind1"]           => string(3) "ssw"
  ["pmwind1"]           => string(3) "ssw"
  ["am_wind_strength1"] => string(2) "10"
  ["pm_wind_strength1"] => string(2) "25"
  ["amht2"]             => string(1) "2"
  ["pmht2"]             => string(1) "3"
  ["amwind2"]           => string(2) "se"
  ["pmwind2"]           => string(1) "e"
  ["am_wind_strength2"] => string(2) "15"
  ["pm_wind_strength2"] => string(2) "25"
  ["amht3"]             => string(4) "2.50"
  ["pmht3"]             => string(4) "1.50"
  ["amwind3"]           => string(1) "n"
  ["pmwind3"]           => string(1) "n"
  ["am_wind_strength3"] => string(2) "20"
  ["pm_wind_strength3"] => string(2) "20"
  ["amht4"]             => string(4) "1.50"
  ["pmht4"]             => string(4) "2.50"
  ["amwind4"]           => string(2) "nw"
  ["pmwind4"]           => string(2) "sw"
  ["am_wind_strength4"] => string(1) "0"
  ["pm_wind_strength4"] => string(2) "20"
  ["amht5"]             => string(4) "2.50"
  ["pmht5"]             => string(1) "3"
  ["amwind5"]           => string(1) "n"
  ["pmwind5"]           => string(2) "sw"
  ["am_wind_strength5"] => string(1) "0"
  ["pm_wind_strength5"] => string(2) "20"
  ["amht6"]             => string(4) "2.50"
  ["pmht6"]             => string(4) "1.50"
  ["amwind6"]           => string(1) "n"
  ["pmwind6"]           => string(2) "sw"
  ["am_wind_strength6"] => string(1) "0"
  ["pm_wind_strength6"] => string(2) "15"
}
Will that do? :)
Cypher is offline  
Reply With Quote
The Following User Says Thank You to Cypher For This Useful Post:
russellharrower (01-11-2010)
Old 01-11-2010, 04:25 PM   #7 (permalink)
The Contributor
 
russellharrower's Avatar
 
Join Date: Jul 2009
Posts: 80
Thanks: 13
russellharrower is on a distinguished road
Default

@Cypher I think you hit it on the head, Just testing it and adding a few things...
russellharrower is offline  
Reply With Quote
Old 01-11-2010, 04:54 PM   #8 (permalink)
The Contributor
 
russellharrower's Avatar
 
Join Date: Jul 2009
Posts: 80
Thanks: 13
russellharrower is on a distinguished road
Default

@Cypher I think you hit it on the head, Just testing it and adding a few things...

However did not go to plan here is the code.
Code:
<?php

$config['url']       = "http://swellnet.com.au/loc_report.php?region_id=27&state_id=3"; // url of html to grab
$config['start_tag'] = '5 Day Swell Graph </td>'; // where you want to start grabbing
$config['end_tag']   = '</body>'; // where you want to stop grabbing
$config['show_tags'] = 0; // do you want the tags to be shown when you show the html? 1 = yes, 0 = no

class grabber
{
	var $error = '';
	var $html  = '';
	
	function grabhtml( $url, $start, $end )
	{
		$file = file_get_contents( $url );
		
		if( $file )
		{
			if( preg_match_all( "#$start(.*?)$end#s", $file, $match ) )
			{				
				$this->html = $match;
			}
			else
			{
				$this->error = "Tags cannot be found.";
			}
		}
		else
		{
			$this->error = "Site cannot be found!";
		}
	}
	
	function strip( $html, $show, $start, $end )
	{
		if( !$show )
		{
			$html = str_replace( $start, "", $html );
			$html = str_replace( $end, "", $html );
			
			return $html;
		}
		else
		{
			return $html;
		}
	}
}

$grab = new grabber;
$grab->grabhtml( $config['url'], $config['start_tag'], $config['end_tag'] );

echo $grab->error;
$i = 0;
foreach( $grab->html[0] as $html )
{
	$s[$i++] = htmlspecialchars( $grab->strip( $html, $config['show_tags'], $config['start_tag'], $config['end_tag'] ) );
}

$s = $s[0];

print $s;
preg_match('/\WIDTH=\"300\" HEIGHT=\"300\"> <PARAM NAME=movie VALUE=\"(.*?)\"/sim', $s, $matches); 

$url = parse_url($matches[1]); 
parse_str($url['query'], $params); 

var_dump($params); 

?>
As you can see it prints the code $s but then does not do the next step.
russellharrower is offline  
Reply With Quote
Old 01-12-2010, 10:12 AM   #9 (permalink)
The Wanderer
 
Join Date: Jan 2010
Posts: 7
Thanks: 0
Cypher is on a distinguished road
Default

It works if you remove "htmlspecialchars". Except I don't quite understand what you want to do with that class. Do you want to have a generic class where you can pass start and end text and extract everything between them and in addition choose if you want to encode all html entities?
Cypher is offline  
Reply With Quote
Old 01-12-2010, 10:49 AM   #10 (permalink)
The Wanderer
 
Join Date: Jan 2010
Posts: 7
Thanks: 0
Cypher is on a distinguished road
Default

Out of nothing to do, I have created two classes for you that you can use. Hope you can use php5 as I noticed that you are doing everything in php4.

Can't help you with anything else. Learn from what I've given you and let me know if something is not clear.

Here's the code:

PHP Code:
<?php

class Swellnet_Locations
{

    
// states
    
const STATE_QUEENSLAND 1;
    const 
STATE_NEW_SOUTH_WALES 2;
    const 
STATE_VICTORIA 3;
    const 
STATE_SOUTH_AUSTRALIA 4;
    const 
STATE_WESTERN_AUSTRALIA 5;
    const 
STATE_TASMANIA 6;

    
// Queensland
    
const REGION_GOLD_COAST 17;
    const 
REGION_SUNSHINE_COAST 18;
    const 
REGION_AGNES_WATER 1;
    const 
REGION_BALLINA 3;
    const 
REGION_YAMBA 16;

    
// New South Wales
    
const REGION_COFFS_HARBOUR 7;
    const 
REGION_PT_MACQUARIE 11;
    const 
REGION_NEWCASTLE 10;
    const 
REGION_CENTRAL_COAST 6;
    const 
REGION_NARRABEEN 13;
    const 
REGION_CURL_CURL 12;
    const 
REGION_BONDI 4;
    const 
REGION_MAROUBRA 9;
    const 
REGION_CRONULLA 8;
    const 
REGION_WOLLONGONG 15;

    
// Victoria
    
const REGION_WARRNAMBOOL 34;
    const 
REGION_TORQUAY 28;
    const 
REGION_13TH_BEACH 27;
    const 
REGION_MORNINGTON_PEN 31;
    const 
REGION_WESTERN_PORT 32;
    const 
REGION_PHILLIP_ISLAND 35;
    const 
REGION_WOOLAMAI 33;

    
// South Australia
    
const REGION_MID_COAST 19;
    const 
REGION_VICTOR_HARBOR 22;

    
// Tasmania
    
const REGION_NORTH_EAST 25;
    const 
REGION_HOBART 26;

    
// Western Australia
    
const REGION_MARGARET_RIVER 38;
    const 
REGION_PERTH 39;
    const 
REGION_GERALDTON 37;
}



class 
Swellnet_Graph
{

    
// swellnet url mask
    
const URL_MASK 'http://swellnet.com.au/loc_report.php?state_id=%d&region_id=%d';

    
// class variables
    
private $_html null;
    private 
$_url null;
    private 
$_params null;


    
/**
     * Class constructor
     *
     * @param integer $state  State id
     * @param integer $region Region id
     * @return SwellGraph
     */
    
public function __construct($state$region)
    {
        
// generate the url
        
$this->_url sprintf(self::URL_MASK$state$region);

        
// get html of the target url
        
$this->_html file_get_contents($this->_url);

        
// fail if couldn't open the url
        
if ($this->_html === false) {
            throw new 
Exception('Could not fetch contents of ' $this->_url);
        }

        
// attempt to extract the graph source url
        
preg_match('/value=\"(\/graph-mod\.swf\?.*?)\"/si'$this->_html$matches);

        
// fail if the number of matches is incorrect
        
if (count($matches) != 2) {
            throw new 
Exception('Error extracting the graph source');
        }

        
// extract url parameters
        
$url $matches[1];
        
$parse parse_url($url);
        
parse_str($parse['query'], $this->_params);
    }


    
/**
     * Return request url
     *
     * @return string
     */ 
    
public function getUrl()
    {
        return 
$this->_url;
    }


    
/**
     * Return request html
     *
     * @return string
     */ 
    
public function getHtml()
    {
        return 
$this->_html;
    }


    
/**
     * Return resulting parameters
     *
     * @return array
     */ 
    
public function getParams()
    {
        return 
$this->_params;
    }

}


// fetch parameters of the graph
$sg = new Swellnet_Graph(Swellnet_Locations::STATE_QUEENSLANDSwellnet_Locations::REGION_GOLD_COAST);

var_dump($sg->getUrl());
var_dump($sg->getParams());
Hope that helps.
Cypher is offline  
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Generating XML from a Mysql DB with PHP's DOM functions (part one) sketchMedia XML, XSLT, XPath, XQuery 10 03-05-2013 07:41 AM
HTML in downlaod file page Enfernikus Absolute Beginners 1 07-08-2009 09:51 PM
[Tutorial] How to organize your classes | Part 1 Tanax Advanced PHP Programming 10 03-01-2009 10:08 PM
Generating XML from a Mysql DB with PHP's DOM functions (part Two) sketchMedia XML, XSLT, XPath, XQuery 7 08-20-2008 12:02 AM
Execute a script and call that file in an HTML page j4v1 General 2 05-22-2008 01:41 PM


All times are GMT. The time now is 12:44 AM.

 
     

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0
Inactive Reminders By Icora Web Design