![]() |
Tutorial on writing Website Scrapers
This article discusses about how to write a website scraper using PHP for web site data extraction. The concepts taught can be applied and programmed in Java, C#, etc. Basically any language that has a powerful string processing capability. This article will teach you the basics of website scraping. The article will further cover a tutorial to find web ranking from Yahoo.com search engine.
Steps involved to write a scraping program
Full post: Writing Website Scrapers in PHP | Geek Files |
If you're scraping content from websites (that is: HTML) I guess string processing via strpos() and regular expressions are a thing of the past.
If you're using PHP5 it's very easy to scrape content using the DOM Functions. All you need is a DOMDocument object, then you call the DOMDocument->loadHTML() function and you can navigate the DOM using functions like getElementById, getElementsByTagName.. just like JavaScript. :-) |
A nice little start on the subject, and as DeMo said there is always more than one way to skin a cat (as the saying goes).
Using DOM & XPath, we could condense: PHP Code:
PHP Code:
|
| All times are GMT. The time now is 08:15 AM. |
Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0