06-20-2009, 12:36 PM
|
#2 (permalink)
|
|
The Contributor
Join Date: Jun 2008
Location: Twin Cities, Minnesota, USA
Posts: 44
Thanks: 3
|
How about some pseudocode?
- Fetch the page with CURL or file_get_contents
- Use PHPDOM, regular expressions or simple explode to get the title of the page
- Save what you find in the last step in a database, make sure you compare for duplicates of course.
- Use CURL or file_get_contents again to get the favicon.ico/.png/.gif, which is normally located in domain.tld/favicon.com
- Once you get that, save it in some directory and store the filepath to the directory from webroot in your database, check for duplicates of course.
About step 2: I have almost no regular expression experience so I've always relied on really slow usage of the explode function when looking for tags. You'd explode on an open title tag, then explode again on the close title tag. You'd take the [0] index of the resulting array, I think.
Good luck!
|
|
|
|