![]() |
Crawling Web Pages
I'm trying to build a website caching system but want to simply cache the html code from specific websites that I target.
I'm not talking about anything shady - rather to do backups and such. Is this able by php to "suck" the html from a remote url? Thanks |
file_get_contents would do the trick just fine
|
Thanks for the tip but that was the first thing I tried and it wasn't returning anything. Any ideas?
Thanks jw |
Some websites check to ensure that the user agent HTTP header is set. Every browser will set a user agent, unless home-made, and so if that's not set then it's a tell tale sign of a robot, not a person using a browser. To get around that use cURL and set the user agent.
php Code:
|
You can also set the User Agent by specifying something for the [b]user_agent[/i] setting in php.ini (the ini_set() function will work). Or, you can also create a stream context (stream_context_create()) and specify the User Agent header in there. Both of these methods will enable standard file functions (fopen/fread, file_get_contents, etc) to send along the UA without using the cURL extension.
|
| All times are GMT. The time now is 11:14 PM. |
Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0