TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   Absolute Beginners (http://www.talkphp.com/absolute-beginners/)
-   -   parsing a page line by line (http://www.talkphp.com/absolute-beginners/3705-parsing-page-line-line.html)

sarmenhb 12-03-2008 05:55 PM

parsing a page line by line
 
hello,

i have a page which contains about 5000 lines.

each line is in this format

vocab n. definition

how can i have php read the first word and place it into a tables column then the second word/char which is n and place that into another column and the last word into another column?

liveordie 12-03-2008 09:49 PM

PHP Code:

<?php

//Read the file line by line into an array
$lines file'/path/to/file.txt' );

//Loop over the array by line
foreach( $lines as $line )
{
    
$parts = array();

    
//Explode each line into exactly 3 parts by a space.
    //Allows the definition to have multiple words separated by spaces.
    
$parts explode' '$line);

    
//Print out the three resulting parts
    
echo 'Word: ' $parts[0] . '<br />';
    echo 
'Type: ' $parts[1] . '<br />';
    echo 
'Definition: ' $parts[2] . '<br />';
}

If the second part is an abbreviation like n. v. adj. or something of that sort, you can strip the period of the second part that is returned from the explode function

Salathe 12-03-2008 11:19 PM

The only trouble with using file is that you're loading the entire file into memory. Much better is to read the file line by line and do with it what you want.

Here are two approaches using this line of thought.

Plain old fopen/fgets
PHP Code:

$file fopen('dictionary.txt''r');
while ( ! 
feof($file))
{
    
$line fgets($file1024);
    list(
$word$type$definition) = explode(' 'rtrim($line), 3);


SPL File Object
PHP Code:

$file = new SplFileObject('dictionary.txt');
foreach (
$file as $line)
{
    list(
$word$type$definition) = explode(' 'rtrim($line), 3);



liveordie 12-04-2008 01:58 AM

I hadn't even considered the memory usage when I quickly typed that out.

I've never seen SplFileObject used before that is a nice class.
Any idea why it hasn't made it into the official docs yet?


All times are GMT. The time now is 10:53 PM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0