How would I go about making a php script that does the following: You give various urls to check. Spider has to check all links on that domain & page besides a few links which are specified not to be searched. The spider checks for certain texts eg http://www.fixeddomain.com/********/*****.*** and saves them all.
How would I go round doing something like this? Thanks :)
well, I guess you should get the page contents, search for link probably using some regex, put those links to an array and recursively do the same...... (get a link, search for more links, etc...)
If you need to index pages, a posibility would be making 2 scripts: one for extracting links, and another for using those links to enter and index the page. They would probably need to use a database or a file to store/read those links.