TalkPHP
 
 
Account Login
Latest Articles
» The basic usage of PHPTAL, a XML/XHTML template library for PHP
» Vulnerable methods and the areas they are commonly trusted in.
» Simple way to protect a form from bot
» The Basics On: How Session Stealing Works
» How to keep your forms from double posting data
IRC Channel
IRC Speech Bubble Join the friendly bunch on IRC...
(#TalkPHP on Freenode)

...Also available via a web interface.

See this thread for information on the TalkPHP Free Hugs Initiative™. Subject to availability.
Associates
Associates
CSS Tutorials
Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 07-10-2009, 02:51 PM   #1 (permalink)
The Addict
 
webtuto's Avatar
 
Join Date: Dec 2007
Location: morocco
Posts: 221
Thanks: 19
webtuto is on a distinguished road
Default help on making my own crawler

hi , i want to make a crawler that grab IMAGES LINKS from another website
so i started like that
PHP Code:
$site "http://www.zik4.com/";
$file file_get_contents($site); 
and i dont know how to extract just IMAGES URLS(using regex but...) and echo them on my page
any idea on how to search on a source code for a word and echo it ?
thanks in advance
__________________
Send a message via MSN to webtuto Send a message via Yahoo to webtuto Send a message via Skype™ to webtuto
webtuto is offline  
Reply With Quote
Old 07-10-2009, 03:46 PM   #2 (permalink)
The Prestige
Advanced Programmer Top Contributor Good Samaritan 
 
sketchMedia's Avatar
 
Join Date: Oct 2007
Location: Manchester, UK
Posts: 854
Thanks: 32
sketchMedia is on a distinguished road
Default

You can do this quite easily with php DOM and Xpath:

PHP Code:
<?php
$site 
"http://www.zik4.com/";
$html file_get_contents($site);

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$imgs $xpath->evaluate("/html/body//img");


foreach(
$imgs as $img)
{
    echo 
'<br />' $img->getAttribute('src');
}
Not tested, there is prolly a quicker way.
__________________
mysql> SELECT * FROM `users` WHERE `users`.`clue` > 0;
Empty set (0.00 sec)
sketchMedia is offline  
Reply With Quote
Old 07-10-2009, 03:54 PM   #3 (permalink)
The Addict
 
Join Date: May 2009
Posts: 287
Thanks: 5
adamdecaf is on a distinguished road
Default

Why don't you extract every hyper-link and then only display those with a file type extension of an image?
__________________
My Site
adamdecaf is offline  
Reply With Quote
Old 07-10-2009, 04:17 PM   #4 (permalink)
Orc
The Prestige
 
Orc's Avatar
 
Join Date: Dec 2007
Posts: 1,044
Thanks: 193
Orc is on a distinguished road
Default

Quote:
Originally Posted by sketchMedia View Post
You can do this quite easily with php DOM and Xpath:

PHP Code:
<?php
$site 
"http://www.zik4.com/";
$html file_get_contents($site);

$dom = new DOMDocument();
$dom->loadHTML($html);

$xpath = new DOMXPath($dom);
$imgs $xpath->evaluate("/html/body//img");


foreach(
$imgs as $img)
{
    echo 
'<br />' $img->getAttribute('src');
}
Not tested, there is prolly a quicker way.
Oh you, that's what I was going to post, exactly that, ever since Salathe showed me some DOMDocument code I've been using this method ever since
__________________
VillageIdiot can have my babbies ;d
Orc is offline  
Reply With Quote
Old 07-10-2009, 07:22 PM   #5 (permalink)
The Addict
 
webtuto's Avatar
 
Join Date: Dec 2007
Location: morocco
Posts: 221
Thanks: 19
webtuto is on a distinguished road
Default

thanks , but it gaves me MANY errors
PHP Code:

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Tag marquee invalid in Entityline58 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline82 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: error parsing attribute name in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline84 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline86 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline88 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline88 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline88 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline88 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline88 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline88 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline88 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline88 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline88 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag tr in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline89 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline91 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline93 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline95 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatchtd and tr in Entityline97 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline101 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline103 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline103 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline104 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline104 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefno name in Entityline155 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag p in Entityline172 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: ID lis-chois already defined in Entityline174 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag p in Entityline211 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag li in Entityline276 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline281 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline283 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline283 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline284 in C:\wamp\www\bot\index.php on line 6

Warning
DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRefexpecting ';' in Entityline284 in C:\wamp\www\bot\index.php on line 6

img
/logo.gif
jimg
/phone.gif
jimg
/phone.gif
http
://zik4.com/images/tingtings.jpg
http://zik4.com/images/martin.jpg
http://zik4.com/images/dolls.jpg
http://zik4.com/images/miley.jpg
http://zik4.com/images/katy.jpg
http://zik4.com/images/jenifer.jpg
http://zik4.com/images/nelly.jpg
http://zik4.com/images/maria.jpg
http://zik4.com/images/james.jpg
http://zik4.com/images/street.jpg
http://zik4.com/images/coldplay.jpg
http://zik4.com/images/Hoobastank.jpg
http://zik4.com/images/3-doors-down.jpg
http://zik4.com/images/pink.jpg
http://clubzik4.com/mzik/images/images/sum41.jpg
http://clubzik4.com/mzik/images/images/simple-plan.jpg
http://clubzik4.com/mzik/images/images/scorpions.jpg
http://clubzik4.com/mzik/images/images/metallica.jpg
http://clubzik4.com/mzik/images/images/limp-bizkit.jpg
http://clubzik4.com/mzik/images/images/guns-n-roses.jpg
http://zik4.com/music-rai/images/mimoun.jpg
http://zik4.com/images/sk.jpg
http://zik4.com/images/taha.jpg
http://zik4.com/images/hwari.jpg
http://zik4.com/images/kheira.jpg
http://zik4.com/music-rai/images/maria.jpg
http://zik4.com/music-rai/images/najim.jpg
http://zik4.com/imags/kasmi.jpg
http://zik4.com/imags/hasinou.jpg
http://zik4.com/imags/hamid.jpg
http://zik4.com/images/Abdel%20Fatah%20El%20Greeny.jpg
http://zik4.com/images/Rola%20Sa3d.jpg
http://zik4.com/images/Jawad%20Al%20Ali.jpg
http://zik4.com/images/Mohammad%20Abdo.jpg
http://zik4.com/images/George%20Wassouf.jpg
http://zik4.com/images/Abd%20Elbaset%20Hamoda.jpg
http://zik4.com/images/Samer.jpg
http://zik4.com/images/player/Ahlam%20Ali%20Al%20Shamsi.jpg
http://www.zik4.com/images/fadel-chaker.jpg
http://www.zik4.com/images/sa3d el-so3gayae.jpg
http://zik4.com/images/pinhas.jpg
http://zik4.com/images/jerra.jpg
http://zik4.com/images/tagada.jpg
http://zik4.com/images/lamrini.jpg
http://zik4.com/images/wlad.jpg
http://zik4.com/images/fatna.jpg
http://zik4.com/images/mardia.jpg
http://www.zik4.com/images/asri.jpg
http://www.zik4.com/images/borgone.jpg
http://zik4.com/list/chaabi/tahour.jpg
http://zik4.com/images/tagada.jpg
http://zik4.com/images/fikri.jpg
http://zik4.com/images/elghiwan.jpg
http://zik4.com/images/latifa.jpg
http://zik4.com/images/lmchahb.jpg
http://www.zik4.com/images/jiljilala.jpg
http://zik4.com/images/topic_brahim_laalami.jpg
http://tbn0.google.com/images?q=tbn:FSIBr5pHU2jcoM:http://www.fesfestival.com/2008/upload/artiste/grand/Ving_Abdelwahab-doukali.jpg
http://tbn0.google.com/images?q=tbn:3gfH-Yw6TNvYHM:http://www.ournia.com/thumbnail.php%3Ffile%3DAbdelhadi_Belkhayat_318595292.jpg%26size%3Darticle_medium
http://zik4.com/images/gloria-estefan.jpg
http://zik4.com/images/jarabe-de-palo.jpg
http://zik4.com/images/la-hungara.jpg
http://zik4.com/images/fangoria.jpg
http://zik4.com/images/gipsykings.jpg
http://zik4.com/images/kiko-y-shara.jpg
http://zik4.com/images/la-quinta-estacion.jpg
http://zik4.com/images/los-chichos.jpg
http://zik4.com/images/los-rebujitos.jpg
http://zik4.com/images/luis-fonsi.jpg
http://zik4.com/images/Skazi.jpg
http://zik4.com/images/Armin%20Van%20Buuren.jpg
http://zik4.com/images/David%20Tavare.jpg
http://zik4.com/images/David%20Vendetta.jpg
http://zik4.com/images/Eric%20Prydz.jpg
http://zik4.com/images/Benny%20Benassi.jpg
http://zik4.com/images/martinsolveig.jpg
http://www.zik4.com/images/david-guetta.jpg
http://www.zik4.com/images/laurent-wolf.jpg
http://www.zik4.com/images/basshunter.jpg
http://ad.advertstream.com/ads.php?what=zone:17854&inf=no&n=ad9cb868
http://s4.histats.com/stats/0.gif?471132&1
images/email.gif
http
://ad.advertstream.com/ads.php?what=zone:19728&inf=no&n=a94da34b 
__________________
Send a message via MSN to webtuto Send a message via Yahoo to webtuto Send a message via Skype™ to webtuto
webtuto is offline  
Reply With Quote
Old 07-10-2009, 08:15 PM   #6 (permalink)
The Prestige
Advanced Programmer Top Contributor Good Samaritan 
 
sketchMedia's Avatar
 
Join Date: Oct 2007
Location: Manchester, UK
Posts: 854
Thanks: 32
sketchMedia is on a distinguished road
Default

Quote:
thanks , but it gaves me MANY errors
It does that, it seems to try and validate the html, as a result we can see it has many problems with your source.

Just whack:
PHP Code:
error_reporting(E_ALL E_WARNING); 
at the top, alternatively you could use the pesky '@' error suppression operator, but I wouldn't condone it!
p.s. They are warnings not errors , sorry i'm a bit pedantic like that.



Quote:
Oh you, that's what I was going to post, exactly that, ever since Salathe showed me some DOMDocument code I've been using this method ever since
hehe, snooze you lose
__________________
mysql> SELECT * FROM `users` WHERE `users`.`clue` > 0;
Empty set (0.00 sec)
sketchMedia is offline  
Reply With Quote
Old 07-10-2009, 10:20 PM   #7 (permalink)
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

Quote:
Originally Posted by sketchMedia View Post
Just whack:
PHP Code:
error_reporting(E_ALL E_WARNING); 
at the top, alternatively you could use the pesky '@' error suppression operator, but I wouldn't condone it!
p.s. They are warnings not errors , sorry i'm a bit pedantic like that.
No, don't do that. They'll silence more than you want to keep quiet. If you want the XML parser to stay shushed, you can turn on user-error-handling for libxml temporarily whilst you parse the markup, then turn it off again to handle errors normally after you're finished.

Quicky example:
PHP Code:
libxml_use_internal_errors(TRUE); // Shhhut up!
$dom->loadHTML($html_string);
libxml_use_internal_errors(FALSE); // Ok, you can complain now. 
Salathe is offline  
Reply With Quote
The Following 3 Users Say Thank You to Salathe For This Useful Post:
codefreek (07-11-2009), ryanmr (07-10-2009), sketchMedia (07-10-2009)
Old 07-10-2009, 10:48 PM   #8 (permalink)
The Contributor
 
ryanmr's Avatar
 
Join Date: Jun 2008
Location: Twin Cities, Minnesota, USA
Posts: 44
Thanks: 3
ryanmr is on a distinguished road
Default

Quote:
Quicky example:
PHP Code:
libxml_use_internal_errors(TRUE); // Shhhut up!
$dom->loadHTML($html_string);
libxml_use_internal_errors(FALSE); // Ok, you can complain now. 
Now that's useful. Thanks for pointing that out. I love PHPDOM for my crawling needs.
__________________
blog twitter ifupdown
ryanmr is offline  
Reply With Quote
Old 07-10-2009, 11:03 PM   #9 (permalink)
The Prestige
Advanced Programmer Top Contributor Good Samaritan 
 
sketchMedia's Avatar
 
Join Date: Oct 2007
Location: Manchester, UK
Posts: 854
Thanks: 32
sketchMedia is on a distinguished road
Default

Thanks Salathe, didnt know about that function ( well that really should be 'I couldnt be arsed researching it myself')
__________________
mysql> SELECT * FROM `users` WHERE `users`.`clue` > 0;
Empty set (0.00 sec)
sketchMedia is offline  
Reply With Quote
Old 07-12-2009, 03:17 PM   #10 (permalink)
The Addict
 
webtuto's Avatar
 
Join Date: Dec 2007
Location: morocco
Posts: 221
Thanks: 19
webtuto is on a distinguished road
Default

thanks for the help
but when i use that code, i just get images from the link i give , i want the code to spread on the website , and get all images on the whole website , not just the page i give it
i hope you understand what i mean
__________________
Send a message via MSN to webtuto Send a message via Yahoo to webtuto Send a message via Skype™ to webtuto
webtuto is offline  
Reply With Quote
Old 07-12-2009, 04:35 PM   #11 (permalink)
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

So what's the problem? If you can crawl one page, you can crawl a million.
Salathe is offline  
Reply With Quote
Old 07-12-2009, 04:39 PM   #12 (permalink)
The Acquainted
 
Hightower's Avatar
 
Join Date: May 2009
Location: Durham, UK
Posts: 134
Thanks: 9
Hightower is on a distinguished road
Default

I have a script that uses CURL to get the links on a certain website. Not sure if it could be used for you, but if you want to give it a go let me know and I'll post it up.
__________________
Hightower's Softpolio
Send a message via MSN to Hightower
Hightower is offline  
Reply With Quote
Old 07-12-2009, 07:32 PM   #13 (permalink)
The Addict
 
webtuto's Avatar
 
Join Date: Dec 2007
Location: morocco
Posts: 221
Thanks: 19
webtuto is on a distinguished road
Default

@hightower : YES , thank you , can you post it
i appreciat it

@salath : i want the crawl to spread on the same website , and get all links in all pages on that website (i dont think its a good idea to crawl every page manually)
__________________
Send a message via MSN to webtuto Send a message via Yahoo to webtuto Send a message via Skype™ to webtuto
webtuto is offline  
Reply With Quote
Old 07-12-2009, 10:01 PM   #14 (permalink)
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

Break the process down into easy steps, we're not here to write all of your code for you! If you have a specific issue, please feel free to ask but I for one would rather not give you everything on a plate.

You already know a basic requirement ("get all links in all pages on that website") so have a think about how you'd do that.
Salathe is offline  
Reply With Quote
Old 07-14-2009, 04:27 PM   #15 (permalink)
The Contributor
 
russellharrower's Avatar
 
Join Date: Jul 2009
Posts: 80
Thanks: 13
russellharrower is on a distinguished road
Default

Hi just wondering how would i use this code to get the following.
site title
Meta tags
Meta Description

and all Links on the site.
russellharrower is offline  
Reply With Quote
Old 07-14-2009, 07:06 PM   #16 (permalink)
The Addict
 
Join Date: May 2009
Posts: 287
Thanks: 5
adamdecaf is on a distinguished road
Default

Quote:
Originally Posted by russellharrower View Post
Hi just wondering how would i use this code to get the following.
site title
Meta tags
Meta Description

and all Links on the site.
You would look at the "print_r();"/"var_dump();" from the result and pick out the content you want.
__________________
My Site
adamdecaf is offline  
Reply With Quote
Old 08-08-2009, 08:55 AM   #17 (permalink)
The Contributor
 
russellharrower's Avatar
 
Join Date: Jul 2009
Posts: 80
Thanks: 13
russellharrower is on a distinguished road
Default

Hi guys I have the following code working,
Code:
<?php

$site = "http://www.techcrunch.com/2009/08/07/geopolitical-attacks-on-twitter-intensified-almost-tenfold-last-night/";
$html = file_get_contents($site);

$dom = new DOMDocument();
libxml_use_internal_errors(TRUE); // Shhhut up! 
$dom->loadHTML($html); 
libxml_use_internal_errors(FALSE); // Ok, you can complain now.  

$xpath = new DOMXPath($dom);
$as = $xpath->evaluate("/html/body//a");


foreach($as as $a)
{
    echo '<br />' . $a->getAttribute('href');
}
?>
As you will see when you run this script there are linkes like #comments
What I would like to do is DELETE any links that come out starting with #

Also is there anyway to send data to a site like when google spiders your site they show GoogleSpider is there any way to show mySpider?
russellharrower is offline  
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
Help making this a class tego10122 Advanced PHP Programming 2 11-12-2012 07:44 AM
Making money Tanax General 13 12-10-2008 12:43 AM
making text boxes a different color? bmathers XHTML, HTML, CSS 4 02-15-2008 06:46 PM
Making money with Adsense danielneri The Lounge 5 01-10-2008 12:20 PM
Need help outputting data (while making it easy to skin) Andrew Absolute Beginners 5 12-20-2007 01:52 AM


All times are GMT. The time now is 06:31 PM.

 
     

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0
Inactive Reminders By Icora Web Design