![]() |
how to parse source code of a webpage
how do i grab the source code of a webpage and display it like say in a textbox
im trying to parse the source code of a webpage and maybe using preg_match grab a portion of the code and display it. |
You're looking for what's known as "screen scraping" - essentially loading the webpage and "scraping" it for the data you want.
Take a look at some of these Google results - many tutorials out there that should get you started :-) php screen scrape - Google Search Alan |
CURL is the way to go...here's some stuff to get you started:
Using curl to Query Remote Servers - PHP Tutorials cURL and libcurl You do need some patience to understand how this is working...after that, it's a piece of cake :) Just ask if you don't understand something, and I'll try to help you out., as I have a little experience with it. |
I have searched the CURL info online and am having an issue finding how to single out what you want to scrape. For example, I would like to get the 24hour new snowfall amount from here:
Quote:
Thanks |
Hello,
I made an attempt to solve your "problem", and I came up with the following (thanks to the manual) solution. I am sure there are better ways to do this, but this seems to work: PHP Code:
Code:
<tr class="alternateRow">Code:
ArrayYours, Runar |
As much as I love regular expressions, I think that using the DOM extension is more suited to screen scraping in general. For the HTML snippet that buildakicker provided, perhaps something along the lines of the following might be useful.
PHP Code:
|
I have tried the DOM before, but gotten these same errors. Is my server set up wrong or something?
Quote:
|
Runar - thanks for the reply. I haven't gotten the regular expressions down very well, so I appreciate you showing some explaination!
|
so do you have to traverse through all of the information in the code to get to the spot you want to scrape?
ie... Can I start anywhere with preg_match? What if the <tr> does not have a class or id? Do I start at the top and work down to it? Quote:
|
This is why it is a very bad idea to built entire sites using tables, without any ids and classes.
Yes, you may start anywhere using preg_match, but if there are lots of unnamed table rows or cells (or divs for that sake), preg_match is probably not the best solution. If you insist on using regular expressions, then you should know that [\s]+ will match all spaces and line breaks, which is useful search for more than one line. |
so dom?
So using the DOM would be better? Is there a way to do it with jQuery or another JS library that would be better than using PHP? I have not been able to get the PHP method to work correctly.
EDIT! Sweet... so DOMDocument... I am running this on my local server and didn't have the domxml() commented out in PHP.INI |
| All times are GMT. The time now is 12:55 PM. |
Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0