This article discusses about how to write a website scraper using PHP for web site data extraction. The concepts taught can be applied and programmed in Java, C#, etc. Basically any language that has a powerful string processing capability. This article will teach you the basics of website scraping. The article will further cover a tutorial to find web ranking from Yahoo.com search engine.
Steps involved to write a scraping program
Visit the URL
Understand the pattern
Validate the structure of pattern on different URLs
If you're scraping content from websites (that is: HTML) I guess string processing via strpos() and regular expressions are a thing of the past.
Now, depending on your skill level or experience with XPath (and/or string functions!) the latter might be even more scary that Sunhil's version! One thing to note is that (at least for the Yahoo site in this tutorial) a User Agent is required, else Yahoo will send back different HTML (not containing the top searches!). However, in the tutorial Sunhil sends along a UA string in the headers so that's ok. :