View Single Post
Old 06-20-2009, 12:36 PM   #2 (permalink)
ryanmr
The Contributor
 
ryanmr's Avatar
 
Join Date: Jun 2008
Location: Twin Cities, Minnesota, USA
Posts: 50
Thanks: 3
ryanmr is on a distinguished road
Default

How about some pseudocode?
  1. Fetch the page with CURL or file_get_contents
  2. Use PHPDOM, regular expressions or simple explode to get the title of the page
  3. Save what you find in the last step in a database, make sure you compare for duplicates of course.
  4. Use CURL or file_get_contents again to get the favicon.ico/.png/.gif, which is normally located in domain.tld/favicon.com
  5. Once you get that, save it in some directory and store the filepath to the directory from webroot in your database, check for duplicates of course.

About step 2: I have almost no regular expression experience so I've always relied on really slow usage of the explode function when looking for tags. You'd explode on an open title tag, then explode again on the close title tag. You'd take the [0] index of the resulting array, I think.

Good luck!
__________________
blog twitter ifupdown
ryanmr is offline  
Reply With Quote