 |
Account Login
|
 |
 |
Latest Articles
|
 |
 |
IRC Channel
|
 |
 |
Associates
|
 |
 |
Associates
|
 |
|
 |
|
 |
|
 |
03-02-2009, 12:18 AM
|
#1 (permalink)
|
|
how quixotic are you?
Join Date: Dec 2007
Location: Lapeer, MI
Posts: 445
Thanks: 37
|
help with fscanf
I have to read a very large text file with PHP (about 1 million lines and 45.4 MB) and speed is critical. The file is layed out like so.
Notice the extra space before and after each line
Code:
{"one":1727294,"two":2667541,"three":9998168}
{"one":7005310,"two":9377441,"three":4658508}
{"one":2638549,"two":7931823,"three":992431}
{"one":8817443,"two":1587524,"three":6495056}
{"one":5009765,"two":4831848,"three":2782592}
{"one":9882507,"two":4866943,"three":7389221}
{"one":7161254,"two":281677,"three":9001464}
{"one":6177062,"two":661010,"three":4880065}
{"one":850830,"two":5882873,"three":4219360}
{"one":5865173,"two":8852539,"three":6194152}
Obviously, the file is too large to just load the entire thing into PHP so I'm using fscanf to read the file line by line.
Right now I have it so I can select an entire line if I know the whole thing. Here is what I got for that:
PHP Code:
while($data = fscanf($fh,"\n{\"one\":1054687,\"two\":8728332,\"three\":2499389}%s\n")) { print_r($data); }
What I want to do is be able to select one entire line with only knowing part of it (ex: "three":4219360). I'm not sure how to go about doing this because I have very little experience with fscanf and function of the like. I've tried something like this but it returns all of the rows:
PHP Code:
while($data = fscanf($fh,"\n%s\"three\":2499389%s\n")) { print_r($data); }
I do NOT want to have to load every line into PHP and check it that way as that could be very slow.
|
|
|
|
03-02-2009, 01:11 AM
|
#2 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
If you're after just a simple search, you could use fgets to grab the line and strpos to search it.
PHP Code:
// line: {"one":8405219,"two":4552659,"three":6640965}
$search = '"three":6640965';
$fp = fopen('misc.txt', 'r');
$result = 'No result';
while( ! feof($fp))
{
$line = fgets($fp, 70);
if (strpos($line, $search) !== FALSE)
{
$result = trim($line);
break;
}
}
fclose($fp);
echo $result;
For me, searching through a 46MB file with lines like yours took under two seconds to not find a match (worst case scenario).
|
|
|
|
03-02-2009, 05:05 PM
|
#3 (permalink)
|
|
The Gregarious
Join Date: Feb 2009
Location: New York
Posts: 645
Thanks: 64
|
Quote:
Originally Posted by Salathe
For me, searching through a 46MB file with lines like yours took under two seconds to not find a match (worst case scenario).
|
Wow that's pretty fast for a file with a million records...
|
|
|
|
03-02-2009, 06:13 PM
|
#4 (permalink)
|
|
how quixotic are you?
Join Date: Dec 2007
Location: Lapeer, MI
Posts: 445
Thanks: 37
|
@allworknoplay: That's because I don't have to load the data into PHP when just using fscanf.
@Salathe: The problem with your code is that it requires me to load every line into PHP and then check it. I need to be able to do this with only fscanf if at all possible.
|
|
|
|
03-02-2009, 06:51 PM
|
#5 (permalink)
|
|
The Gregarious
Join Date: Feb 2009
Location: New York
Posts: 645
Thanks: 64
|
Quote:
Originally Posted by ETbyrne
@allworknoplay: That's because I don't have to load the data into PHP when just using fscanf.
@Salathe: The problem with your code is that it requires me to load every line into PHP and then check it. I need to be able to do this with only fscanf if at all possible.
|
Thanks for bring this functionality to light for me, I am going to check it out, I could definitley use this for my projects.
I don't necessarily trust databases for everything. Sometimes I think people go overboard with databases because of the ease of use for the SQL language, but it doesn't always make sense to use a database for everything.
|
|
|
|
03-02-2009, 06:56 PM
|
#6 (permalink)
|
|
The Gregarious
Join Date: Feb 2009
Location: New York
Posts: 645
Thanks: 64
|
Quote:
Originally Posted by ETbyrne
@allworknoplay: That's because I don't have to load the data into PHP when just using fscanf.
@Salathe: The problem with your code is that it requires me to load every line into PHP and then check it. I need to be able to do this with only fscanf if at all possible.
|
Ok I have an idea. You don't want to load the entire file into PHP, so I get that.
What if you were to return all of the lines you were searching for anyway, but then once you retrieve the line, you then parse the info?
This way you're only parsing what you need AFTER you get the full line, instead of trying to parse it ahead of time which would require much more resources?
Does that make sense?
|
|
|
|
03-02-2009, 08:15 PM
|
#7 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
Quote:
Originally Posted by ETbyrne
@Salathe: The problem with your code is that it requires me to load every line into PHP and then check it. I need to be able to do this with only fscanf if at all possible.
|
You'd need to fscanf every line so whether you use that or fgets you're still reading every single line (up until the one that matches). fscanf just parses a line according to the formatting string which you don't need to do; you just need to see if the line is the one you want. Unless I'm mistaken. There's no faster way to do what you want without going through the file line-by-line.
|
|
|
|
03-02-2009, 08:28 PM
|
#8 (permalink)
|
|
The Gregarious
Join Date: Feb 2009
Location: New York
Posts: 645
Thanks: 64
|
Quote:
Originally Posted by Salathe
You'd need to fscanf every line so whether you use that or fgets you're still reading every single line (up until the one that matches). fscanf just parses a line according to the formatting string which you don't need to do; you just need to see if the line is the one you want. Unless I'm mistaken. There's no faster way to do what you want without going through the file line-by-line.
|
hey sal:
You seem to really know your PHP. This is a bit OT from this thread, but would you happen to know of any good PHP/CSS way of creating really professional looking charts/graphs?
I'm currenly using Flash graphs right now which look spectacular but I really want to get away from flash and just use charts that can be outputted to PNG or JPG format.
I've seen a couple of the popular PHP ways to generate graphs and they just don't look clean and sharp. What I mean, is they look pixelated.
I know this is kind of a loaded question, but if you are familiar of any ways to make really need looking charts in PHP, please let me know!
i can scour around and provide to you the "look and feel" of what I'm looking for and you can let me know if this is possible or not..
|
|
|
|
03-02-2009, 10:12 PM
|
#9 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
allworknoplay, post a new topic for that.
|
|
|
|
03-02-2009, 10:18 PM
|
#10 (permalink)
|
|
how quixotic are you?
Join Date: Dec 2007
Location: Lapeer, MI
Posts: 445
Thanks: 37
|
Quote:
Originally Posted by Salathe
You'd need to fscanf every line so whether you use that or fgets you're still reading every single line (up until the one that matches). fscanf just parses a line according to the formatting string which you don't need to do; you just need to see if the line is the one you want. Unless I'm mistaken. There's no faster way to do what you want without going through the file line-by-line.
|
Well, all I know is that by using fscanf I can find all matching rows in a 45 MB file in 80 ms. I know because I've benchmarked it... I think that is reason to believe that fscanf does not load all that data into PHP but rather does it all in C++ or somehow in the file system.
The problem is I have to know what the entire line is in order to do that.
EDIT: By looking at the comments on http://us.php.net/fscanf I found you can use regex. That could very well solve my problem.
|
|
|
|
03-02-2009, 11:23 PM
|
#11 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
All fscanf does is parse a single line according to the formatting string provided. Show me your code which finds all matching rows in 80ms and the benchmark you did as my own experiments were nothing like that amount of time.
|
|
|
|
03-03-2009, 01:31 AM
|
#12 (permalink)
|
|
how quixotic are you?
Join Date: Dec 2007
Location: Lapeer, MI
Posts: 445
Thanks: 37
|
Attached are the scripts I used to do the same test on a similar file. Takes me about 60 ms to run huge_select.php
Note that this is being run on a ~ 8.5 MB 1 million line file. I also ran these tests on the file listed above.
Run huge_create.php first, then huge_select.php. files.php is a file class I made, file::scan() is a wrapper for the fscanf function. The other php file is the class used for benchmarking.
It worked on my and my friend's server so I'm not crazy. If it doesn't work for you then you are doing something wrong... That or I'm doing something terribly right.
Obviously fscanf was not written in PHP and thus works a lot faster than comparing each string manually with PHP.
|
|
|
|
03-03-2009, 10:40 AM
|
#13 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
I can only reiterate what was said before, fscanf is only parsing a line into your requested format (if it can). Whether you use fscanf to read a line or fgets, PHP is still reading the file line by line behind-the-scenes.
Your servers must have much faster disk IO than my laptop and cheap shared hosting, which run your tests at over 2 seconds: both with fscanf and fgets.
|
|
|
|
03-03-2009, 09:57 PM
|
#14 (permalink)
|
|
how quixotic are you?
Join Date: Dec 2007
Location: Lapeer, MI
Posts: 445
Thanks: 37
|
I don't know what to tell ya man, but I'm just running on a cheap dell... The program did take over 3 seconds when I used the wrong syntax for fscanf, and when I tried loading and checking every line. But, when I got the syntax right it took a little less than 60ms for me and my friend. Every time, and it still does. And that is on completely different hardware and software too.
I highly doubt it is my hardware, or the fact that I'm running vista that is making it so fast. I'll have to run these test on my web host and my old 2000 XP computer.
Could anybody else try running run these test? This is all very interesting indeed.
|
|
|
|
03-03-2009, 10:08 PM
|
#15 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
What were you doing wrong initially, to take over 3 seconds, and what did you do to fix it? Do you now have a working script doing what you initially wanted (find a matching line)?
|
|
|
|
03-03-2009, 10:21 PM
|
#16 (permalink)
|
|
how quixotic are you?
Join Date: Dec 2007
Location: Lapeer, MI
Posts: 445
Thanks: 37
|
Initially, I had the format for the fscanf function messed up so it matched all of the lines in the table. It's pretty easy to mess up, like the second piece of code I posted waaay up at the top of this thread:
PHP Code:
while($data = fscanf($fh,"\n%s\"three\":2499389%s\n")) { print_r($data); }
The code above matched all the lines in the text file, so it would print out every line. All the stuff before \"three\" - the newline and the %s thing - where put there in a sad attempt to get the data for the rest of the line. But it messed it all up.
|
|
|
|
03-04-2009, 01:01 AM
|
#17 (permalink)
|
|
The Gregarious
Join Date: Feb 2009
Location: New York
Posts: 645
Thanks: 64
|
Quote:
Originally Posted by ETbyrne
Could anybody else try running run these test? This is all very interesting indeed.
|
Yes this is a very interesting issue.
I'll test this out myself. I am running a 64bit Vista laptop but that part actually doesn't matter. I have VMware running with centos 5.2.
I can definitley help verify how quickly the script runs because if I can get it at around 60-80ms as well like you, then to me that would be great benchmark speeds since it's running on virtualized OS...
I'll let you know what I discover...
|
|
|
|
03-04-2009, 04:25 PM
|
#18 (permalink)
|
|
The Contributor
Join Date: Feb 2009
Posts: 64
Thanks: 1
|
Tested on a XP-machine(32bit).
Hardware:
AMD Turion 64x2
Harddrive has 5400rpm(can't remember more details  )
(Hope I did'nt make something wrong  )

|
|
|
|
03-04-2009, 04:28 PM
|
#19 (permalink)
|
|
The Gregarious
Join Date: Feb 2009
Location: New York
Posts: 645
Thanks: 64
|
how many milliseconds equals 1 second?
|
|
|
|
03-04-2009, 04:30 PM
|
#20 (permalink)
|
|
how quixotic are you?
Join Date: Dec 2007
Location: Lapeer, MI
Posts: 445
Thanks: 37
|
Quote:
Originally Posted by Sakakuchi
Tested on a XP-machine(32bit).
Hardware:
AMD Turion 64x2
Harddrive has 5400rpm(can't remember more details  )
(Hope I did'nt make something wrong  )

|
Yes, it worked for you! Looks like you did everything right, only took 120ms to find over 11 thousand matches!
@allworknoplay: A millisecond is one thousandth of a second.
|
|
|
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|