TalkPHP
 
 
Account Login
Latest Articles
» The basic usage of PHPTAL, a XML/XHTML template library for PHP
» Vulnerable methods and the areas they are commonly trusted in.
» Simple way to protect a form from bot
» The Basics On: How Session Stealing Works
» How to keep your forms from double posting data
IRC Channel
IRC Speech Bubble Join the friendly bunch on IRC...
(#TalkPHP on Freenode)

...Also available via a web interface.

See this thread for information on the TalkPHP Free Hugs Initiative™. Subject to availability.
Associates
Associates
CSS Tutorials
Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 12-12-2008, 12:42 AM   #1 (permalink)
The Addict
 
sarmenhb's Avatar
 
Join Date: Jan 2008
Location: los angeles
Posts: 309
Thanks: 44
sarmenhb is on a distinguished road
Default scraping a webpage for text

hello,

i'm trying to make an app that when a user enters a domains name the file_get_contents will scrape only the visible text on the page and display to the user

this is what i've got so far

Code:
<?php
$data = file_get_contents('http://www.gothostin.com/sarmenhb');
$regex = '/[a-zA-z]/';
preg_match($regex,$data,$match);
var_dump($match);
echo $match[1];
?>
but i keep getting something like array{0} or something

what am i doing wrong?

thanks
__________________
no signature set
sarmenhb is offline  
Reply With Quote
Old 12-12-2008, 02:49 AM   #2 (permalink)
La Vida es Sueño
Advanced Programmer Top Contributor 
 
Wildhoney's Avatar
 
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
Wildhoney is on a distinguished road
Default

Only the visible text as opposed to...
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
Send a message via AIM to Wildhoney Send a message via MSN to Wildhoney Send a message via Yahoo to Wildhoney
Wildhoney is offline  
Reply With Quote
Old 12-12-2008, 05:34 AM   #3 (permalink)
The Wanderer
Good Samaritan 
 
martins256's Avatar
 
Join Date: Mar 2008
Posts: 18
Thanks: 0
martins256 is on a distinguished road
Default

Why can't you use strip_tags() ?
<?php
echo strip_tags(file_get_contents('http://www.gothostin.com/sarmenhb'));
?>
martins256 is offline  
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Similar Threads
Thread Thread Starter Forum Replies Last Post
designing a web 2.0 webpage sarmenhb XHTML, HTML, CSS 2 03-13-2013 06:50 AM
how to parse source code of a webpage sarmenhb General 10 11-06-2008 05:19 PM


All times are GMT. The time now is 05:14 PM.

 
     

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0
Inactive Reminders By Icora Web Design