I can see that this thread is a few years old. However, it turned up as one of the only results for connecting a proxy to file_get_contents. I've been learning php for the last two weeks and I've been trying a few experiments. For simplicities sake here's a little program I put together that uses Yahoo, cycles through a set amount of Yahoo pages and searches each page for a word.
PHP Code:
<?php
//assign variable to query
$query = "The colors of Serps";
//assign integer to starting Yahoo search page
$page = 01;
//assign variable to Yahoo Search page
$yahoo = "http://ca.search.yahoo.com/search?&b=$page&p=";
//define pattern to search for
$pattern= "/(red)(blue)/";
//Get webpage contents
$resultspage = file_get_contents("http://ca.search.yahoo.com/search?&b=$page&p=".urlencode($query));
// create while loop to cycle through pages
while (($page <= 100)){
usleep(1000000);
// Search for your pattern in Serp
if (!empty($resultspage)) {
$res = preg_match_all($pattern, $resultspage, $matches);
if ($res) {
foreach(array_unique($matches[0]) as $pattern) {
echo $pattern . "<br />". PHP_EOL;
flush();
ob_flush();
usleep(50000);
}
}
$page = $page + 10;
}
}
echo "<br />";
echo "PROGRAM END<br /><br />";
exit;
?>
I put this together myself and as I've only been learning PHP in the last two weeks I think I've come a long way. What happens is Yahoo eventually returns a '999' error and temporarily blocks your IP when you make too many requests in a short time and I can understand why. That being said the only logical solution would be to have the file_get_contents function go through a proxy.
I have a subscription to a page that I log into that gives me access to a simple page of proxies updated with new ones every second. The conveniently comes out in the following format:
proxy1:portA
proxy2:portB
proxy3:portC
I'll be reviewing the earlier responses in this thread to see if I can figure out how to do this.
Essentially I want the script to:
1> Auto log into my proxy page
2> Grab five random proxies (or top proxies)
3> Use them to make requests as per the script above
4> Loop back to "2" and get more proxies
I would think this would eliminate the block by Yahoo has the results are coming from multiple IPs. Am I on the right path?