 |
Account Login
|
 |
 |
Latest Articles
|
 |
 |
IRC Channel
|
 |
 |
Associates
|
 |
 |
Associates
|
 |
|
 |
 |
|
 |
01-05-2010, 03:03 PM
|
#1 (permalink)
|
|
The Contributor
Join Date: Jul 2009
Posts: 80
Thanks: 13
|
get part of html file
Hi guys, I am trying to get a section of html from a website that has the following in it
Code:
<OBJECT classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"
codebase="http://download.macromedia.com/"
WIDTH="300" HEIGHT="300"> <PARAM NAME=movie VALUE="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15">
<PARAM NAME=quality VALUE=high><param name="wmode" value="transparent"><PARAM NAME=bgcolor VALUE=#FFFFFF> <EMBED src="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15"
quality=high wmode=transparent bgcolor=#FFFFFF WIDTH="258" HEIGHT="300"
TYPE="application/x-shockwave-flash"
PLUGINSPAGE="http://www.macromedia.com/go/getflashplayer">
</EMBED> </OBJECT>
The part I want for my script is the following.
Code:
/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15
If someone would be able to help that would be great. the website is @ http://bit.ly/7xNBoa
|
|
|
|
01-05-2010, 06:47 PM
|
#2 (permalink)
|
|
is cute and cuddly
Join Date: Mar 2008
Location: Vegas, Baby
Posts: 963
Thanks: 31
|
Can you expand a little more on the purpose of this script? Is it only grabbing data from *this* html file, or is it looking in a variety of html files for the SRC attribute of the EMBED element? Do you have local access to the file or are you grabbing it with cURL? Are you vegetarian or meatatarian?
|
|
|
|
01-06-2010, 01:55 AM
|
#3 (permalink)
|
|
The Contributor
Join Date: Jul 2009
Posts: 80
Thanks: 13
|
cURL, What I want is to get the information that is after /graph-mod.swf? then take the data from day= onwards,
Then I want to take that data and spilt each one where the & sign is.
|
|
|
|
01-06-2010, 02:28 AM
|
#4 (permalink)
|
|
is cute and cuddly
Join Date: Mar 2008
Location: Vegas, Baby
Posts: 963
Thanks: 31
|
You could probably do one of two things. Use a regular expression to find graph-mod.swf and grab everything after that up until the first double-quote it finds. The other possibility is to drop the entire file into a string, then find the location of the first or second occurrence using strpos, use that as an offset to find the next double-quote, the substr out what's in between. I'd go regular expression though.
|
|
|
|
01-06-2010, 09:10 AM
|
#5 (permalink)
|
|
The Contributor
Join Date: Jul 2009
Posts: 80
Thanks: 13
|
I think I know what you mean, however can I ask if you can give an example on how to do the first way you said?
thanks
|
|
|
|
01-11-2010, 03:10 PM
|
#6 (permalink)
|
|
The Wanderer
Join Date: Jan 2010
Posts: 7
Thanks: 0
|
If you still need it, here's a small example that will get you an array with all query parameters:
PHP Code:
$s = '<object classid="clsid:d27cdb6e-ae6d-11cf-96b8-444553540000" codebase="http://download.macromedia.com/" width="300" height="300">
<param name="movie" value="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15">
<param name="quality" value="high">
<param name="wmode" value="transparent">
<param name="bgcolor" value="#ffffff">
<embed src="/graph-mod.swf?day=3&amht1=1.50&pmht1=2&amwind1=ssw&pmwind1=ssw&am_wind_strength1=10&pm_wind_strength1=25&amht2=2&pmht2=3&amwind2=se&pmwind2=e&am_wind_strength2=15&pm_wind_strength2=25&amht3=2.50&pmht3=1.50&amwind3=n&pmwind3=n&am_wind_strength3=20&pm_wind_strength3=20&amht4=1.50&pmht4=2.50&amwind4=nw&pmwind4=sw&am_wind_strength4=0&pm_wind_strength4=20&amht5=2.50&pmht5=3&amwind5=n&pmwind5=sw&am_wind_strength5=0&pm_wind_strength5=20&amht6=2.50&pmht6=1.50&amwind6=n&pmwind6=sw&am_wind_strength6=0&pm_wind_strength6=15"
quality="high"
wmode="transparent"
bgcolor="#ffffff"
width="258"
height="300"
type="application/x-shockwave-flash"
pluginspage="http://www.macromedia.com/go/getflashplayer">
</embed>
</object>';
preg_match('/\<param name=\"movie\" value=\"(.*?)\"/sim', $s, $matches);
$url = parse_url($matches[1]);
parse_str($url['query'], $params);
var_dump($params);
In my case the output is:
Code:
array(37) {
["day"] => string(1) "3"
["amht1"] => string(4) "1.50"
["pmht1"] => string(1) "2"
["amwind1"] => string(3) "ssw"
["pmwind1"] => string(3) "ssw"
["am_wind_strength1"] => string(2) "10"
["pm_wind_strength1"] => string(2) "25"
["amht2"] => string(1) "2"
["pmht2"] => string(1) "3"
["amwind2"] => string(2) "se"
["pmwind2"] => string(1) "e"
["am_wind_strength2"] => string(2) "15"
["pm_wind_strength2"] => string(2) "25"
["amht3"] => string(4) "2.50"
["pmht3"] => string(4) "1.50"
["amwind3"] => string(1) "n"
["pmwind3"] => string(1) "n"
["am_wind_strength3"] => string(2) "20"
["pm_wind_strength3"] => string(2) "20"
["amht4"] => string(4) "1.50"
["pmht4"] => string(4) "2.50"
["amwind4"] => string(2) "nw"
["pmwind4"] => string(2) "sw"
["am_wind_strength4"] => string(1) "0"
["pm_wind_strength4"] => string(2) "20"
["amht5"] => string(4) "2.50"
["pmht5"] => string(1) "3"
["amwind5"] => string(1) "n"
["pmwind5"] => string(2) "sw"
["am_wind_strength5"] => string(1) "0"
["pm_wind_strength5"] => string(2) "20"
["amht6"] => string(4) "2.50"
["pmht6"] => string(4) "1.50"
["amwind6"] => string(1) "n"
["pmwind6"] => string(2) "sw"
["am_wind_strength6"] => string(1) "0"
["pm_wind_strength6"] => string(2) "15"
}
Will that do? :)
|
|
|
|
|
The Following User Says Thank You to Cypher For This Useful Post:
|
|
01-11-2010, 04:25 PM
|
#7 (permalink)
|
|
The Contributor
Join Date: Jul 2009
Posts: 80
Thanks: 13
|
@Cypher I think you hit it on the head, Just testing it and adding a few things...
|
|
|
|
01-11-2010, 04:54 PM
|
#8 (permalink)
|
|
The Contributor
Join Date: Jul 2009
Posts: 80
Thanks: 13
|
@Cypher I think you hit it on the head, Just testing it and adding a few things...
However did not go to plan here is the code.
Code:
<?php
$config['url'] = "http://swellnet.com.au/loc_report.php?region_id=27&state_id=3"; // url of html to grab
$config['start_tag'] = '5 Day Swell Graph </td>'; // where you want to start grabbing
$config['end_tag'] = '</body>'; // where you want to stop grabbing
$config['show_tags'] = 0; // do you want the tags to be shown when you show the html? 1 = yes, 0 = no
class grabber
{
var $error = '';
var $html = '';
function grabhtml( $url, $start, $end )
{
$file = file_get_contents( $url );
if( $file )
{
if( preg_match_all( "#$start(.*?)$end#s", $file, $match ) )
{
$this->html = $match;
}
else
{
$this->error = "Tags cannot be found.";
}
}
else
{
$this->error = "Site cannot be found!";
}
}
function strip( $html, $show, $start, $end )
{
if( !$show )
{
$html = str_replace( $start, "", $html );
$html = str_replace( $end, "", $html );
return $html;
}
else
{
return $html;
}
}
}
$grab = new grabber;
$grab->grabhtml( $config['url'], $config['start_tag'], $config['end_tag'] );
echo $grab->error;
$i = 0;
foreach( $grab->html[0] as $html )
{
$s[$i++] = htmlspecialchars( $grab->strip( $html, $config['show_tags'], $config['start_tag'], $config['end_tag'] ) );
}
$s = $s[0];
print $s;
preg_match('/\WIDTH=\"300\" HEIGHT=\"300\"> <PARAM NAME=movie VALUE=\"(.*?)\"/sim', $s, $matches);
$url = parse_url($matches[1]);
parse_str($url['query'], $params);
var_dump($params);
?>
As you can see it prints the code $s but then does not do the next step.
|
|
|
|
01-12-2010, 10:12 AM
|
#9 (permalink)
|
|
The Wanderer
Join Date: Jan 2010
Posts: 7
Thanks: 0
|
It works if you remove "htmlspecialchars". Except I don't quite understand what you want to do with that class. Do you want to have a generic class where you can pass start and end text and extract everything between them and in addition choose if you want to encode all html entities?
|
|
|
|
01-12-2010, 10:49 AM
|
#10 (permalink)
|
|
The Wanderer
Join Date: Jan 2010
Posts: 7
Thanks: 0
|
Out of nothing to do, I have created two classes for you that you can use. Hope you can use php5 as I noticed that you are doing everything in php4.
Can't help you with anything else. Learn from what I've given you and let me know if something is not clear.
Here's the code:
PHP Code:
<?php
class Swellnet_Locations
{
// states
const STATE_QUEENSLAND = 1;
const STATE_NEW_SOUTH_WALES = 2;
const STATE_VICTORIA = 3;
const STATE_SOUTH_AUSTRALIA = 4;
const STATE_WESTERN_AUSTRALIA = 5;
const STATE_TASMANIA = 6;
// Queensland
const REGION_GOLD_COAST = 17;
const REGION_SUNSHINE_COAST = 18;
const REGION_AGNES_WATER = 1;
const REGION_BALLINA = 3;
const REGION_YAMBA = 16;
// New South Wales
const REGION_COFFS_HARBOUR = 7;
const REGION_PT_MACQUARIE = 11;
const REGION_NEWCASTLE = 10;
const REGION_CENTRAL_COAST = 6;
const REGION_NARRABEEN = 13;
const REGION_CURL_CURL = 12;
const REGION_BONDI = 4;
const REGION_MAROUBRA = 9;
const REGION_CRONULLA = 8;
const REGION_WOLLONGONG = 15;
// Victoria
const REGION_WARRNAMBOOL = 34;
const REGION_TORQUAY = 28;
const REGION_13TH_BEACH = 27;
const REGION_MORNINGTON_PEN = 31;
const REGION_WESTERN_PORT = 32;
const REGION_PHILLIP_ISLAND = 35;
const REGION_WOOLAMAI = 33;
// South Australia
const REGION_MID_COAST = 19;
const REGION_VICTOR_HARBOR = 22;
// Tasmania
const REGION_NORTH_EAST = 25;
const REGION_HOBART = 26;
// Western Australia
const REGION_MARGARET_RIVER = 38;
const REGION_PERTH = 39;
const REGION_GERALDTON = 37;
}
class Swellnet_Graph
{
// swellnet url mask
const URL_MASK = 'http://swellnet.com.au/loc_report.php?state_id=%d®ion_id=%d';
// class variables
private $_html = null;
private $_url = null;
private $_params = null;
/**
* Class constructor
*
* @param integer $state State id
* @param integer $region Region id
* @return SwellGraph
*/
public function __construct($state, $region)
{
// generate the url
$this->_url = sprintf(self::URL_MASK, $state, $region);
// get html of the target url
$this->_html = file_get_contents($this->_url);
// fail if couldn't open the url
if ($this->_html === false) {
throw new Exception('Could not fetch contents of ' . $this->_url);
}
// attempt to extract the graph source url
preg_match('/value=\"(\/graph-mod\.swf\?.*?)\"/si', $this->_html, $matches);
// fail if the number of matches is incorrect
if (count($matches) != 2) {
throw new Exception('Error extracting the graph source');
}
// extract url parameters
$url = $matches[1];
$parse = parse_url($url);
parse_str($parse['query'], $this->_params);
}
/**
* Return request url
*
* @return string
*/
public function getUrl()
{
return $this->_url;
}
/**
* Return request html
*
* @return string
*/
public function getHtml()
{
return $this->_html;
}
/**
* Return resulting parameters
*
* @return array
*/
public function getParams()
{
return $this->_params;
}
}
// fetch parameters of the graph
$sg = new Swellnet_Graph(Swellnet_Locations::STATE_QUEENSLAND, Swellnet_Locations::REGION_GOLD_COAST);
var_dump($sg->getUrl());
var_dump($sg->getParams());
Hope that helps.
|
|
|
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Hybrid Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|