TalkPHP
 
 
Account Login
Latest Articles
» The basic usage of PHPTAL, a XML/XHTML template library for PHP
» Vulnerable methods and the areas they are commonly trusted in.
» Simple way to protect a form from bot
» The Basics On: How Session Stealing Works
» How to keep your forms from double posting data
IRC Channel
IRC Speech Bubble Join the friendly bunch on IRC...
(#TalkPHP on Freenode)

...Also available via a web interface.

See this thread for information on the TalkPHP Free Hugs Initiative™. Subject to availability.
Associates
Associates
CSS Tutorials
Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 11-12-2007, 06:13 AM   #1 (permalink)
The Frequenter
Prolific Welcomer Upcoming Programmer 
 
Join Date: Sep 2007
Posts: 360
Thanks: 24
Haris is on a distinguished road
Default Limiting the number of KBs per fetch?

I'm using cURL to fetch archive.org search results, some pages are above 500kb that increases the page loading time, I want to limit the size of the page to fetch i.e to around 20-50kbs.
Haris is offline  
Reply With Quote
Old 11-12-2007, 01:06 PM   #2 (permalink)
La Vida es Sueño
Advanced Programmer Top Contributor 
 
Wildhoney's Avatar
 
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
Wildhoney is on a distinguished road
Default

From my experience, I would be inclined to say no out right because of the way the TCP protocol suite works. Although large web-pages will be returned to you via multiple packets, the TCP suite would expect them to be returned no matter what. If packets aren't received it'll just send them again, and I'm not sure cURL would have the power to terminate the session half-way through.
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
Send a message via AIM to Wildhoney Send a message via MSN to Wildhoney Send a message via Yahoo to Wildhoney
Wildhoney is offline  
Reply With Quote
Old 11-12-2007, 02:07 PM   #3 (permalink)
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

There is a cURL option called CURLOPT_RANGE whose value (in bytes) can be:
Range(s) of data to retrieve in the format "X-Y" where X or Y are optional. HTTP transfers also support several intervals, separated with commas in the format "X-Y,N-M".
That might be of use to you.
Salathe is offline  
Reply With Quote
Old 11-12-2007, 02:15 PM   #4 (permalink)
The Frequenter
Prolific Welcomer Upcoming Programmer 
 
Join Date: Sep 2007
Posts: 360
Thanks: 24
Haris is on a distinguished road
Default

Quote:
Originally Posted by Salathe View Post
There is a cURL option called CURLOPT_RANGE whose value (in bytes) can be:
Range(s) of data to retrieve in the format "X-Y" where X or Y are optional. HTTP transfers also support several intervals, separated with commas in the format "X-Y,N-M".
That might be of use to you.
I had tried but it didn't worked.
Haris is offline  
Reply With Quote
Old 11-12-2007, 02:52 PM   #5 (permalink)
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

From experimenting, it seems to work on some sites but not others. My guess is that some sites respect the HTTP_Range header being sent, and other don't.

Another option would be to fopen/fread the required amount.
Salathe is offline  
Reply With Quote
Old 11-13-2007, 09:50 AM   #6 (permalink)
Super Moderator
Advanced Programmer 
 
bluesaga's Avatar
 
Join Date: Sep 2007
Posts: 165
Thanks: 0
bluesaga is on a distinguished road
Default

This can still be done using cURL however it is a bit of a pain, you can use the built-in callback functions to do the work storing the content and sending false when reached the required amount.

Salathe's suggestion will work when a website sends the correct http 1.1 headers that include the byte range acceptance header.

A snippet from a class i created a while back to work with this:
PHP Code:
    function _header_callback($ch$string)
    {
        
$this->currentHeaders[] = $string;
        
$count strlen($string);
        if(
preg_match("#Content-Type:#is"$string$match))
        {
            if(!
preg_match("#Content-Type: text#is"$string$match))
            {
                
print_r($this->currentHeaders);
                return 
0;
            }
        }
        return 
$count;
    }

    function 
_content_callback($ch$string)
    {
        
$this->currentDownload .= $string;
        
$length strlen($string);
        if(
strlen($this->currentDownload) > $this->maxDownload)
        return 
0;
        return 
$length;
    }

    function 
_open($url)
    {    
        
$this->currentHeaders = array();
        
$this->currentDownload "";
        
$this->currentHeaders[] = $url;
        
$ch curl_init();
        
curl_setopt($chCURLOPT_URL,$url);
        
curl_setopt($chCURLOPT_HEADER0);
        
curl_setopt($chCURLOPT_FOLLOWLOCATION1);
        
curl_setopt($chCURLOPT_USERAGENT"Recipricol Backlink Checker");

        
curl_setopt($chCURLOPT_HEADERFUNCTION, array($this,'_header_callback')); //Callback for header, odd but works
        
curl_setopt($chCURLOPT_WRITEFUNCTION, array($this,'_content_callback')); //Need a write function to READ? Stupid cURL

        
curl_setopt($chCURLOPT_CONNECTTIMEOUT10);
        
curl_setopt($chCURLOPT_MAXREDIRS5);
        
        
curl_exec($ch);
        
        
curl_close($ch);
        
        
$data $this->currentDownload;
            
        return 
$data;
    } 
__________________
Halo 3 Cheats
bluesaga is offline  
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT. The time now is 07:29 AM.

 
     

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0
Inactive Reminders By Icora Web Design