TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   General (http://www.talkphp.com/general/)
-   -   Fetch Favicon and page title (http://www.talkphp.com/general/4581-fetch-favicon-page-title.html)

Sirupsen 06-20-2009 08:24 AM

Fetch Favicon and page title
 
How could I make a script where a user types in an url to for example his youtube profile. When the user clicks "ok", it fetches the favicon and the page titles which it inserts into the database.

I'm pretty lost on how to fetch those favicons and page titles. :/

Hope someone can give a hint, or two! :D

ryanmr 06-20-2009 12:36 PM

How about some pseudocode?
  1. Fetch the page with CURL or file_get_contents
  2. Use PHPDOM, regular expressions or simple explode to get the title of the page
  3. Save what you find in the last step in a database, make sure you compare for duplicates of course.
  4. Use CURL or file_get_contents again to get the favicon.ico/.png/.gif, which is normally located in domain.tld/favicon.com
  5. Once you get that, save it in some directory and store the filepath to the directory from webroot in your database, check for duplicates of course.

About step 2: I have almost no regular expression experience so I've always relied on really slow usage of the explode function when looking for tags. You'd explode on an open title tag, then explode again on the close title tag. You'd take the [0] index of the resulting array, I think.

Good luck!

Sirupsen 06-20-2009 12:55 PM

Hmm, I found this function on the intrawebs actually to get the url of the favicon: (As I'm pretty bad with regular expressions as well)
PHP Code:

    function get_ico_url()
    {
        if (
$this->ico_url == '')
        {
            
$this->ico_url $this->site_url 'favicon.ico';
       
            
# get html of page
            
$h = @fopen($this->site_url'r');
            if (
$h)
            {
                
$html '';
                while (!
feof($h) and !preg_match('/<([s]*)body([^>]*)>/i'$html))
                {
                    
$html .= fread($h200);
                }
                
fclose($h);

                
# search need <link> tag
                
if (preg_match('/<([^>]*)link([^>]*)(rel="icon"|rel="shortcut icon")([^>]*)>/iU'$html$out))
                {

                    
$link_tag $out[0];
                    if (
preg_match('/href([s]*)=([s]*)"([^"]*)"/iU'$link_tag$out))
                    {
                        
$this->ico_type = (!(strpos($link_tag'png')===false)) ? 'png' 'ico';
                        
$ico_href trim($out[3]);
                        if (
strpos($ico_href'http://')===false)
                        {
                            
$ico_href rtrim($this->site_url'/') . '/' ltrim($ico_href'/');
                        }
                        
$this->ico_url $ico_href;
                    }
                }
            }           
        }
        return 
$this->ico_url;
    } 

I guess I'd somehow get PHP to save this .ico file to a directory as you said Ryan, and check for duplicates.
Should be pretty easy to make a remote upload script which could handle that.

Edit:

Made a rather simple remote upload script:
PHP Code:

<?php
<?php
    
include('../_class/favicon.class.php');

    
$url 'http://twitter.com/';

    
$favicon = new favicon('http://twitter.com/'0);
    
$fv $favicon->get_ico_url();

    echo 
$fv;
    
    
$remote_file $fv;
    
    
preg_match('#^(?:http://)?([^/]+)#i'$url$matches);      
    
    
$name $matches[1];
    
    
$file_name $name.".ico";
    
$putdata fopen($remote_file"r");
    
$fp fopen($file_name"w");
    while (
$data fread($putdata102400))
      
fwrite($fp$data);
    
fclose($fp);
    
fclose($putdata);

?>

?>

Right now the name of this specific would be "twitter.com.ico" can someone help me fix the regular expression so it'll be "twitter.ico" only? Thanks!

Wildhoney 06-20-2009 02:03 PM

Won't it also be www.twitter.com.ico when you add the www.?

Also, are you sure you want the .com removed because that identifies the specific domain. There may be a twitter.net.

Sirupsen 06-20-2009 02:12 PM

True actually! Thanks Wildhoney. :)
It'll be a little harder when someone requests to upload a new twitter.com favicon, I should somehow check if there's already a: twitter.com.ico file in that dir. Instead of keep leeching Twitters bandwith each time someone wanna upload this ICO. I'm super new to working like this in PHP, so some help with how to do that would be awesome!

Thanks. :)

Wildhoney 06-20-2009 04:52 PM

What is the script all about? I don't think I understand what it is you're doing.

You can check for the presence of the ICO file by doing the following:

php Code:
$szIco = 'twitter.com.ico';

if (!file_exists($szIco))
{
    /* Then make one... */
}

Sirupsen 06-20-2009 05:48 PM

It's for getting the favicon from a webpage (no matter where it's located, no matter which page there so www.youtube.com/d8Ud2 works as good as youtube.com), remote uploading this favicon to my own server so I won't leech their bandwith. If the file is 7 days < old, it's deletes the file and uploads a new one. Basically I got my script looking like this about an hour ago:

PHP Code:

<?php
    
include('../_class/favicon.class.php');

if (
$_POST['url']) {
    
$url $_POST['url'];

    
$favicon = new favicon("$url"0);
    
$fv $favicon->get_ico_url();
    
    
$remote_file $fv;
    
    
preg_match('#^(?:http://)?([^/]+)#i'$url$matches);      
    
    
$name $matches[1];
    
    
$file_name $name.".ico";
    
$file_age date("U"filectime($file_name));
    
$now date("U");
    
$file_difference $now $file_age;
    
    
// The page title
    
$file file($url);
    
$file implode("",$file);
    
    if (!
file_exists($file_name)) {
           
$putdata fopen($remote_file"r");
        
$fp fopen($file_name"w");
        while (
$data fread($putdata102400))
          
fwrite($fp$data);
        
fclose($fp);
        
fclose($putdata);
        
        echo 
"Wow, cool! Your the first one to use that website.<br>";
        
    } if (
$file_difference '604800') {
        
unlink($file_name);
        
           
$putdata fopen($remote_file"r");
        
$fp fopen($file_name"w");
        while (
$data fread($putdata102400))
          
fwrite($fp$data);
        
fclose($fp);
        
fclose($putdata);
    } else {
    }
    
    echo 
'<br/><img src="http://slimpl.com/beta/user/'.$file_name.'" />';
    if(
preg_match('#^(?:http://)?([^/]+)#i'$url$m)) {
        echo 
" <b>";
        echo 
ucfirst($m[1]);
        echo 
"</b>"n
    
}
    echo 
"<br><br>";
}
?>
Add a new website about you!<br>
<form method="post" action="">
<input type="test" name="url">
<input type="submit">
</form>

1 thing I however need some help with right now is:

PHP Code:

    if(preg_match('#^(?:http://)?([^/]+)#i'$url$m)) {
        echo 
" <b>";
        echo 
ucfirst($m[1]);
        echo 
"</b>"n
    


I'd like this to only return the name of the domain, and not the last ".com" or ".org" or dot whatever. So for example if $url = 'http://twitter.com'; it'll output: Twitter.

Something else is that if the user puts in: http://www.twitter.com it'll create: www.twitter.com.ico, however if he inputs http://twitter.com it'll create twitter.com.ico, how would I fix it so it'll make the http://twitter.com url to http://www.twitter.com, or just require the user to use www adresses as most websites supports this?

This all is regular expressions, which I'm still in the proccess of learning, so I'd love some help on those topics.
Hope that this was all understandable!

Edit: It's all fixed due to awesome Bruja and Salethe on IRC! Here's the expression: http://slexy.org/view/s20e2YvBPt


All times are GMT. The time now is 06:27 PM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0