TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   General (http://www.talkphp.com/general/)
-   -   regex problem (http://www.talkphp.com/general/2868-regex-problem.html)

Village Idiot 05-30-2008 09:37 PM

regex problem
 
I am making a regular expression for work to search though files and find SQL queries, it works perfectly accept it can not cater to multiple lines. How do I make this indifferent to multiple lines?
(((SELECT|DELETE) ([A-Za-z0-9-\*,`\._ ]+) FROM)|(INSERT INTO)|(UPDATE ([A-Za-z0-9-\*,`]+) SET))

Salathe 05-30-2008 10:51 PM

You can use \n to match newlines (eg. within the character class definitions), or \v (>=5.2.4) to match any vertical whitespace.

Village Idiot 05-30-2008 11:02 PM

Thats for the advice, but thats not the issue. I do not know when there will be a breakline, I am searching over more than 50,000 files for what I believe will amount to 1,000 queries which where by a very inconsistent programmer.


In other words, it currently will match
Code:

SELECT `something` FROM `db` WHERE 1
But it will not match
Code:

SELECT `something`
FROM `db'
WHERE 1

I need it to match both, no matter where the line break is

delayedinsanity 05-30-2008 11:16 PM

Have you tried using the s modifier? Depending on how you're feeding the data in to your regex, it may be a quick and dirty solution to your problem (the s modifier treats any string/block that the regular expression is traversing as a single line).

Outside of that, Salathe was sending you in the right direction. You can use \r, \n or \v to check for the possibility of newlines or vertical whitespace, just the same as you would use 0-9 to check for the letters 0-9, or _ to check for underscores, etc... you just have to incorporate them into the regex whereever you think there could eventually be one. Give it the option to find it. For example,

PHP Code:

$szTest = <<<EOF
/* this
is the
test */

booga!

/* more
new lines
in this
comment
*/
EOF;

$szTest preg_replace("~/\*(.*?\r?\n?)+\*/~"'booga!'$test);
echo 
$szTest

Outputs:

booga! booga! booga!
-m

Salathe 05-30-2008 11:27 PM

Adding the ability to have whitespace in the two character classes, combined with using the x modifier, would help. In order to match line breaks you have to tell the regex engine where to expect them, there's no magic allow/disallow switch.

Code:

/(
    ((SELECT|DELETE) [A-Za-z0-9-\*,`._\s]+ FROM)
    |
    (INSERT INTO)
    |
    (UPDATE [A-Za-z0-9-\*,`._\s]+ SET)
)/x


Village Idiot 05-30-2008 11:31 PM

Thanks guys, I'll figure a way out when Monday comes around .


All times are GMT. The time now is 02:46 PM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0