TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   Absolute Beginners (http://www.talkphp.com/absolute-beginners/)
-   -   Limiting the number of KBs per fetch? (http://www.talkphp.com/absolute-beginners/1433-limiting-number-kbs-per-fetch.html)

Haris 11-12-2007 06:13 AM

Limiting the number of KBs per fetch?
 
I'm using cURL to fetch archive.org search results, some pages are above 500kb that increases the page loading time, I want to limit the size of the page to fetch i.e to around 20-50kbs.

Wildhoney 11-12-2007 01:06 PM

From my experience, I would be inclined to say no out right because of the way the TCP protocol suite works. Although large web-pages will be returned to you via multiple packets, the TCP suite would expect them to be returned no matter what. If packets aren't received it'll just send them again, and I'm not sure cURL would have the power to terminate the session half-way through.

Salathe 11-12-2007 02:07 PM

There is a cURL option called CURLOPT_RANGE whose value (in bytes) can be:
Range(s) of data to retrieve in the format "X-Y" where X or Y are optional. HTTP transfers also support several intervals, separated with commas in the format "X-Y,N-M".
That might be of use to you.

Haris 11-12-2007 02:15 PM

Quote:

Originally Posted by Salathe (Post 3960)
There is a cURL option called CURLOPT_RANGE whose value (in bytes) can be:
Range(s) of data to retrieve in the format "X-Y" where X or Y are optional. HTTP transfers also support several intervals, separated with commas in the format "X-Y,N-M".
That might be of use to you.

I had tried but it didn't worked.

Salathe 11-12-2007 02:52 PM

From experimenting, it seems to work on some sites but not others. My guess is that some sites respect the HTTP_Range header being sent, and other don't.

Another option would be to fopen/fread the required amount.

bluesaga 11-13-2007 09:50 AM

This can still be done using cURL however it is a bit of a pain, you can use the built-in callback functions to do the work storing the content and sending false when reached the required amount.

Salathe's suggestion will work when a website sends the correct http 1.1 headers that include the byte range acceptance header.

A snippet from a class i created a while back to work with this:
PHP Code:

    function _header_callback($ch$string)
    {
        
$this->currentHeaders[] = $string;
        
$count strlen($string);
        if(
preg_match("#Content-Type:#is"$string$match))
        {
            if(!
preg_match("#Content-Type: text#is"$string$match))
            {
                
print_r($this->currentHeaders);
                return 
0;
            }
        }
        return 
$count;
    }

    function 
_content_callback($ch$string)
    {
        
$this->currentDownload .= $string;
        
$length strlen($string);
        if(
strlen($this->currentDownload) > $this->maxDownload)
        return 
0;
        return 
$length;
    }

    function 
_open($url)
    {    
        
$this->currentHeaders = array();
        
$this->currentDownload "";
        
$this->currentHeaders[] = $url;
        
$ch curl_init();
        
curl_setopt($chCURLOPT_URL,$url);
        
curl_setopt($chCURLOPT_HEADER0);
        
curl_setopt($chCURLOPT_FOLLOWLOCATION1);
        
curl_setopt($chCURLOPT_USERAGENT"Recipricol Backlink Checker");

        
curl_setopt($chCURLOPT_HEADERFUNCTION, array($this,'_header_callback')); //Callback for header, odd but works
        
curl_setopt($chCURLOPT_WRITEFUNCTION, array($this,'_content_callback')); //Need a write function to READ? Stupid cURL

        
curl_setopt($chCURLOPT_CONNECTTIMEOUT10);
        
curl_setopt($chCURLOPT_MAXREDIRS5);
        
        
curl_exec($ch);
        
        
curl_close($ch);
        
        
$data $this->currentDownload;
            
        return 
$data;
    } 



All times are GMT. The time now is 05:40 AM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0