 |
Account Login
|
 |
 |
Latest Articles
|
 |
 |
IRC Channel
|
 |
 |
Associates
|
 |
 |
Associates
|
 |
|
 |
 |
|
 |
07-10-2009, 02:51 PM
|
#1 (permalink)
|
|
The Addict
Join Date: Dec 2007
Location: morocco
Posts: 221
Thanks: 19
|
help on making my own crawler
hi , i want to make a crawler that grab IMAGES LINKS from another website
so i started like that
PHP Code:
$site = "http://www.zik4.com/";
$file = file_get_contents($site);
and i dont know how to extract just IMAGES URLS(using regex but...) and echo them on my page
any idea on how to search on a source code for a word and echo it ?
thanks in advance
__________________
|
|
|
07-10-2009, 03:46 PM
|
#2 (permalink)
|
|
The Prestige
Join Date: Oct 2007
Location: Manchester, UK
Posts: 854
Thanks: 32
|
You can do this quite easily with php DOM and Xpath:
PHP Code:
<?php $site = "http://www.zik4.com/"; $html = file_get_contents($site);
$dom = new DOMDocument(); $dom->loadHTML($html);
$xpath = new DOMXPath($dom); $imgs = $xpath->evaluate("/html/body//img");
foreach($imgs as $img) { echo '<br />' . $img->getAttribute('src'); }
Not tested, there is prolly a quicker way.
__________________
mysql> SELECT * FROM `users` WHERE `users`.`clue` > 0;
Empty set (0.00 sec)
|
|
|
|
07-10-2009, 03:54 PM
|
#3 (permalink)
|
|
The Addict
Join Date: May 2009
Posts: 287
Thanks: 5
|
Why don't you extract every hyper-link and then only display those with a file type extension of an image?
|
|
|
|
07-10-2009, 04:17 PM
|
#4 (permalink)
|
|
The Prestige
Join Date: Dec 2007
Posts: 1,044
Thanks: 193
|
Quote:
Originally Posted by sketchMedia
You can do this quite easily with php DOM and Xpath:
PHP Code:
<?php
$site = "http://www.zik4.com/";
$html = file_get_contents($site);
$dom = new DOMDocument();
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$imgs = $xpath->evaluate("/html/body//img");
foreach($imgs as $img)
{
echo '<br />' . $img->getAttribute('src');
}
Not tested, there is prolly a quicker way.
|
Oh you, that's what I was going to post, exactly that, ever since Salathe showed me some DOMDocument code I've been using this method ever since
__________________
VillageIdiot can have my babbies ;d
|
|
|
|
07-10-2009, 07:22 PM
|
#5 (permalink)
|
|
The Addict
Join Date: Dec 2007
Location: morocco
Posts: 221
Thanks: 19
|
thanks , but it gaves me MANY errors
PHP Code:
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Tag marquee invalid in Entity, line: 58 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 82 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: error parsing attribute name in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 84 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 86 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 88 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 88 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 88 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 88 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 88 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 88 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 88 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 88 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 88 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : tr in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 89 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 91 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 93 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 95 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID page already defined in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Opening and ending tag mismatch: td and tr in Entity, line: 97 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 101 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 103 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 103 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 104 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 104 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: no name in Entity, line: 155 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : p in Entity, line: 172 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: ID lis-chois already defined in Entity, line: 174 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : p in Entity, line: 211 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: Unexpected end tag : li in Entity, line: 276 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 281 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 283 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 283 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 284 in C:\wamp\www\bot\index.php on line 6
Warning: DOMDocument::loadHTML() [domdocument.loadhtml]: htmlParseEntityRef: expecting ';' in Entity, line: 284 in C:\wamp\www\bot\index.php on line 6
img/logo.gif jimg/phone.gif jimg/phone.gif http://zik4.com/images/tingtings.jpg http://zik4.com/images/martin.jpg http://zik4.com/images/dolls.jpg http://zik4.com/images/miley.jpg http://zik4.com/images/katy.jpg http://zik4.com/images/jenifer.jpg http://zik4.com/images/nelly.jpg http://zik4.com/images/maria.jpg http://zik4.com/images/james.jpg http://zik4.com/images/street.jpg http://zik4.com/images/coldplay.jpg http://zik4.com/images/Hoobastank.jpg http://zik4.com/images/3-doors-down.jpg http://zik4.com/images/pink.jpg http://clubzik4.com/mzik/images/images/sum41.jpg http://clubzik4.com/mzik/images/images/simple-plan.jpg http://clubzik4.com/mzik/images/images/scorpions.jpg http://clubzik4.com/mzik/images/images/metallica.jpg http://clubzik4.com/mzik/images/images/limp-bizkit.jpg http://clubzik4.com/mzik/images/images/guns-n-roses.jpg http://zik4.com/music-rai/images/mimoun.jpg http://zik4.com/images/sk.jpg http://zik4.com/images/taha.jpg http://zik4.com/images/hwari.jpg http://zik4.com/images/kheira.jpg http://zik4.com/music-rai/images/maria.jpg http://zik4.com/music-rai/images/najim.jpg http://zik4.com/imags/kasmi.jpg http://zik4.com/imags/hasinou.jpg http://zik4.com/imags/hamid.jpg http://zik4.com/images/Abdel%20Fatah%20El%20Greeny.jpg http://zik4.com/images/Rola%20Sa3d.jpg http://zik4.com/images/Jawad%20Al%20Ali.jpg http://zik4.com/images/Mohammad%20Abdo.jpg http://zik4.com/images/George%20Wassouf.jpg http://zik4.com/images/Abd%20Elbaset%20Hamoda.jpg http://zik4.com/images/Samer.jpg http://zik4.com/images/player/Ahlam%20Ali%20Al%20Shamsi.jpg http://www.zik4.com/images/fadel-chaker.jpg http://www.zik4.com/images/sa3d el-so3gayae.jpg http://zik4.com/images/pinhas.jpg http://zik4.com/images/jerra.jpg http://zik4.com/images/tagada.jpg http://zik4.com/images/lamrini.jpg http://zik4.com/images/wlad.jpg http://zik4.com/images/fatna.jpg http://zik4.com/images/mardia.jpg http://www.zik4.com/images/asri.jpg http://www.zik4.com/images/borgone.jpg http://zik4.com/list/chaabi/tahour.jpg http://zik4.com/images/tagada.jpg http://zik4.com/images/fikri.jpg http://zik4.com/images/elghiwan.jpg http://zik4.com/images/latifa.jpg http://zik4.com/images/lmchahb.jpg http://www.zik4.com/images/jiljilala.jpg http://zik4.com/images/topic_brahim_laalami.jpg http://tbn0.google.com/images?q=tbn:FSIBr5pHU2jcoM:http://www.fesfestival.com/2008/upload/artiste/grand/Ving_Abdelwahab-doukali.jpg http://tbn0.google.com/images?q=tbn:3gfH-Yw6TNvYHM:http://www.ournia.com/thumbnail.php%3Ffile%3DAbdelhadi_Belkhayat_318595292.jpg%26size%3Darticle_medium http://zik4.com/images/gloria-estefan.jpg http://zik4.com/images/jarabe-de-palo.jpg http://zik4.com/images/la-hungara.jpg http://zik4.com/images/fangoria.jpg http://zik4.com/images/gipsykings.jpg http://zik4.com/images/kiko-y-shara.jpg http://zik4.com/images/la-quinta-estacion.jpg http://zik4.com/images/los-chichos.jpg http://zik4.com/images/los-rebujitos.jpg http://zik4.com/images/luis-fonsi.jpg http://zik4.com/images/Skazi.jpg http://zik4.com/images/Armin%20Van%20Buuren.jpg http://zik4.com/images/David%20Tavare.jpg http://zik4.com/images/David%20Vendetta.jpg http://zik4.com/images/Eric%20Prydz.jpg http://zik4.com/images/Benny%20Benassi.jpg http://zik4.com/images/martinsolveig.jpg http://www.zik4.com/images/david-guetta.jpg http://www.zik4.com/images/laurent-wolf.jpg http://www.zik4.com/images/basshunter.jpg http://ad.advertstream.com/ads.php?what=zone:17854&inf=no&n=ad9cb868 http://s4.histats.com/stats/0.gif?471132&1 images/email.gif http://ad.advertstream.com/ads.php?what=zone:19728&inf=no&n=a94da34b
__________________
|
|
|
07-10-2009, 08:15 PM
|
#6 (permalink)
|
|
The Prestige
Join Date: Oct 2007
Location: Manchester, UK
Posts: 854
Thanks: 32
|
Quote:
|
thanks , but it gaves me MANY errors
|
It does that, it seems to try and validate the html, as a result we can see it has many problems with your source.
Just whack:
PHP Code:
error_reporting(E_ALL ^ E_WARNING);
at the top, alternatively you could use the pesky '@' error suppression operator, but I wouldn't condone it !
p.s. They are warnings not errors , sorry i'm a bit pedantic like that.
Quote:
|
Oh you, that's what I was going to post, exactly that, ever since Salathe showed me some DOMDocument code I've been using this method ever since
|
hehe, snooze you lose 
__________________
mysql> SELECT * FROM `users` WHERE `users`.`clue` > 0;
Empty set (0.00 sec)
|
|
|
|
07-10-2009, 10:20 PM
|
#7 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
Quote:
Originally Posted by sketchMedia
Just whack:
PHP Code:
error_reporting(E_ALL ^ E_WARNING);
at the top, alternatively you could use the pesky '@' error suppression operator, but I wouldn't condone it!
p.s. They are warnings not errors  , sorry i'm a bit pedantic like that.
|
No, don't do that. They'll silence more than you want to keep quiet. If you want the XML parser to stay shushed, you can turn on user-error-handling for libxml temporarily whilst you parse the markup, then turn it off again to handle errors normally after you're finished.
Quicky example:
PHP Code:
libxml_use_internal_errors(TRUE); // Shhhut up!
$dom->loadHTML($html_string);
libxml_use_internal_errors(FALSE); // Ok, you can complain now.
|
|
|
|
|
The Following 3 Users Say Thank You to Salathe For This Useful Post:
|
|
07-10-2009, 10:48 PM
|
#8 (permalink)
|
|
The Contributor
Join Date: Jun 2008
Location: Twin Cities, Minnesota, USA
Posts: 44
Thanks: 3
|
Quote:
Quicky example:
PHP Code:
libxml_use_internal_errors(TRUE); // Shhhut up! $dom->loadHTML($html_string); libxml_use_internal_errors(FALSE); // Ok, you can complain now.
|
Now that's useful. Thanks for pointing that out. I love PHPDOM for my crawling needs.
|
|
|
|
07-10-2009, 11:03 PM
|
#9 (permalink)
|
|
The Prestige
Join Date: Oct 2007
Location: Manchester, UK
Posts: 854
Thanks: 32
|
Thanks Salathe, didnt know about that function ( well that really should be 'I couldnt be arsed researching it myself')
__________________
mysql> SELECT * FROM `users` WHERE `users`.`clue` > 0;
Empty set (0.00 sec)
|
|
|
|
07-12-2009, 03:17 PM
|
#10 (permalink)
|
|
The Addict
Join Date: Dec 2007
Location: morocco
Posts: 221
Thanks: 19
|
thanks for the help
but when i use that code, i just get images from the link i give , i want the code to spread on the website , and get all images on the whole website , not just the page i give it
i hope you understand what i mean
__________________
|
|
|
07-12-2009, 04:35 PM
|
#11 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
So what's the problem? If you can crawl one page, you can crawl a million.
|
|
|
|
07-12-2009, 04:39 PM
|
#12 (permalink)
|
|
The Acquainted
Join Date: May 2009
Location: Durham, UK
Posts: 134
Thanks: 9
|
I have a script that uses CURL to get the links on a certain website. Not sure if it could be used for you, but if you want to give it a go let me know and I'll post it up.
|
|
|
07-12-2009, 07:32 PM
|
#13 (permalink)
|
|
The Addict
Join Date: Dec 2007
Location: morocco
Posts: 221
Thanks: 19
|
@hightower : YES , thank you , can you post it
i appreciat it
@salath : i want the crawl to spread on the same website , and get all links in all pages on that website (i dont think its a good idea to crawl every page manually)
__________________
|
|
|
07-12-2009, 10:01 PM
|
#14 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
Break the process down into easy steps, we're not here to write all of your code for you! If you have a specific issue, please feel free to ask but I for one would rather not give you everything on a plate.
You already know a basic requirement ("get all links in all pages on that website") so have a think about how you'd do that.
|
|
|
|
07-14-2009, 04:27 PM
|
#15 (permalink)
|
|
The Contributor
Join Date: Jul 2009
Posts: 80
Thanks: 13
|
Hi just wondering how would i use this code to get the following.
site title
Meta tags
Meta Description
and all Links on the site.
|
|
|
|
07-14-2009, 07:06 PM
|
#16 (permalink)
|
|
The Addict
Join Date: May 2009
Posts: 287
Thanks: 5
|
Quote:
Originally Posted by russellharrower
Hi just wondering how would i use this code to get the following.
site title
Meta tags
Meta Description
and all Links on the site.
|
You would look at the "print_r();"/"var_dump();" from the result and pick out the content you want.
|
|
|
|
08-08-2009, 08:55 AM
|
#17 (permalink)
|
|
The Contributor
Join Date: Jul 2009
Posts: 80
Thanks: 13
|
Hi guys I have the following code working,
Code:
<?php
$site = "http://www.techcrunch.com/2009/08/07/geopolitical-attacks-on-twitter-intensified-almost-tenfold-last-night/";
$html = file_get_contents($site);
$dom = new DOMDocument();
libxml_use_internal_errors(TRUE); // Shhhut up!
$dom->loadHTML($html);
libxml_use_internal_errors(FALSE); // Ok, you can complain now.
$xpath = new DOMXPath($dom);
$as = $xpath->evaluate("/html/body//a");
foreach($as as $a)
{
echo '<br />' . $a->getAttribute('href');
}
?>
As you will see when you run this script there are linkes like #comments
What I would like to do is DELETE any links that come out starting with #
Also is there anyway to send data to a site like when google spiders your site they show GoogleSpider is there any way to show mySpider?
|
|
|
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|