TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   Advanced PHP Programming (http://www.talkphp.com/advanced-php-programming/)
-   -   8 Practical PHP Regular Expressions (http://www.talkphp.com/advanced-php-programming/1612-8-practical-php-regular-expressions.html)

Matt83 12-04-2007 10:45 PM

8 Practical PHP Regular Expressions
 
For all us security paranoids :-) Here are eight practical PHP regular expressions i found on the web which came very handy to me:

Quote:

Note: Scroll down to get the latest/correct versions of these Regular expressions
Validating a Username:

Quote:

Something often overlooked, but simple to do with a regular expression would be username validation. For example, we may want our usernames to be between 4 and 28 characters in length, alpha-numeric, and allow underscores.
PHP Code:

$string "userNaME4234432_";
if (
preg_match('/^[a-z\d_]{4,28}$/i'$string)) { 
echo 
"example 1 successful.";


Telephone Numbers:

Quote:

Number in the following form: (###) ###-####
PHP Code:

$string "(232) 555-5555";
if (
preg_match('/^(\(?[0-9]{3,3}\)?|[0-9]{3,3}[-. ]?)[ ][0-9]{3,3}[-. ]?[0-9]{4,4}$/'$string)) { 
echo 
"example 2 successful.";


Emails:

PHP Code:

$string "first.last@domain.co.uk"
if (
preg_match(
'/^[^0-9][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[@][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[.][a-zA-Z]{2,4}$/',
$string)) { 
echo 
"example 3 successful.";


Postal Codes:

PHP Code:

$string "55324-4324";
if (
preg_match('/^[0-9]{5,5}([- ]?[0-9]{4,4})?$/'$string)) { 
echo 
"example 4 successful.";


Ip Address:

PHP Code:

$string "255.255.255.0";
if (
preg_match(
'^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:[.](?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$',
$string)) { 
echo 
"example 5 successful.";


Hexadecimal Colors:

PHP Code:

$string "#666666";
if (
preg_match('/^#(?:(?:[a-f\d]{3}){1,2})$/i'$string)) { 
echo 
"example 6 successful.";


Multi-line Comments:

PHP Code:

$string "/* commmmment */";
if (
preg_match('/^[(/*)+.+(*/)]$/'$string)) { 
echo 
"example 7 successful.";


Dates:

Quote:

MM/DD/YYYY format
PHP Code:

$string "10/15/2007";
if (
preg_match('/^\d{1,2}\/\d{1,2}\/\d{4}$/'$string)) { 
echo 
"example 8 successful.";


Some might be more/less useful than the others but that will depend on the project you are working on.

Hope you find them useful too. Source/credits: Devolio.org

Wildhoney 12-04-2007 11:26 PM

One question I have is isn't {3,3} exactly the same as {3}?

Salathe 12-05-2007 12:30 AM

Let me start with thanking you for supplying these regular expressions. I intended to comment on one or two things but got carried away so please accept my apologies for the length of this reply!

Some comments:
  1. Telephone Numbers
    There is no need for the parentheses to group results, since we're just testing for the match rather than bring back the matched groups. If you need to group items (for whatever reason) but don't wish to keep them for back references, then use the (?:) pattern.

    As Wildhoney mentioned, why use {3,3} when you can state an explicit number of repetitions (rather than a range) using {3}.

    The [ ] is not necessary, just use a space.
  2. Emails
    Finally a RegExp which allows the plus symbol usage (e.g., salathe+spam@talkphp.com)! Your pattern states that the address must not start with a digit: that means any other character can be used, making @test@test.com (or even $_+@_.aa) valid addresses (correct me if I'm wrong, but I don't think they should be valid).

    Again there is no need to wrap the @ or . characters with brackets. To accept a literal period/fullstop/dot character, outside of character classes, simply escape it: \.
  3. Postal Codes
    See above, regarding {5,5} and {4,4}, and the same for the non-capturing groups.
  4. Ip Address:
    There are no delimeters around the pattern. Attempting to run this code will throw an error.
  5. Hexadecimal Colors
    For readability, I'd probably prefer to use 0-9 than \d. There's no need for the outer non-matching group that I can see.
  6. Multi-line Comments
    This code will not run as-is, resulting in a warning ("Unknown modifier '*'"). The square brackets are not wanted here -- they denote the beginning and end of a character class definition which isn't what we want.

    Next, the forward slashes for the beginning/end of the comments need to be escaped else they conflict with the pattern delimiters (another warning). The asterisks (*) will also need to be escaped.

    The + beside the opening comment characters is not necessary, and the .+ needs to be made ungreedy else you'll run into problems.

    Finally, this will not work if the comment is truly multi-line. You'll want to use the ms pattern modifiers to make the pattern multi-line and dot-all respectively (the latter meaning the . matches newlines as well).

    Suggestion: #/\*.*?\*/#ms
  7. Dates
    The "date" 99/99/9999 would be accepted. It's probably wiser to use one or more of PHP's date functions, or some other method, if you want to check that the date supplied is an actual, valid, date.

Is devolio.com your site?

Matt83 12-05-2007 12:30 AM

Quote:

Originally Posted by Wildhoney (Post 5315)
One question I have is isn't {3,3} exactly the same as {3}?

Good find Adam, im sure that it is the same as theres no difference saying:

{3} three preceding characters and {3,3} three to three preceding characters.

Will update the code above. ^^

Salathe, devolio is not my site. Anyways, big thanks for that great feedback, i really appreciate you pointing that stuff as i have relayed on some of these for previous projects and although i never got an error i see all the points you mention and im starting to panic now!, will try to update them so we all can use them. thanks again

Wildhoney 12-05-2007 12:52 AM

Quote:

Originally Posted by Salathe (Post 5321)
The "date" 99/99/9999 would be accepted. It's probably wiser to use one or more of PHP's date functions, or some other method, if you want to check that the date supplied is an actual, valid, date.

Top post. Thanks! For the above checking, the checkdate has never failed me. Simply pass it in a date and it will tell you if it's valid. It will even check if the amount of days for the specified month is correct, and even takes into consideration leap years! Can't beat that function, in my opinion, and I'm confident there's just no way any one regular expression could ever match that.

Matt83 12-05-2007 01:08 AM

Revised Regular expressions:

TELEPHONE

PHP Code:

$string "(232) 555-5555";
if (
preg_match('/^\(?[0-9]{3}\)?|[0-9]{3}[-. ]? [0-9]{3}[-. ]?[0-9]{4}$/'$string)) { 
echo 
"example 2 successful.";


EMAIL

PHP Code:

$string "first.last@domain.co.uk"
if (
preg_match(
'/^[^\W][a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+)*\@[a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+)*\.[a-zA-Z]{2,4}$/',
$string)) { 
echo 
"example 3 successful."


POSTAL CODE

PHP Code:

$string "55324-4324";
if (
preg_match('/^[0-9]{5}([- ]?[0-9]{4})?$/'$string)) { 
echo 
"example 4 successful.";


IP ADDRESS

PHP Code:

$string "255.255.255.0";
if (
preg_match(
'/^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:[.](?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$/',
$string)) { 
echo 
"example 5 successful.";


HEXADECIMAL COLORS

PHP Code:

<?php             
$string 
"#666666";
if (
preg_match('/^#(?:(?:[a-f0-9]{3}){1,2})$/i'$string)) { 
echo 
"example 6 successful.";
}
?>

MULTI-LINE COMMENTS

PHP Code:

<?php $string "/* commmmment */";
if (
preg_match('#/\*.*?\*/#ms'$string)) { 
echo 
"example 7 successful.";
?>

Thanks guys for the feedback :-) , i followed your suggestions and made the necessary arrangements, let me know if these are ok or if you have anything else to share/add to these

Thanks again!

Wildhoney 12-05-2007 02:48 AM

Much better :-) ! Does that mean we can now look down on the original website which published the regular expressions? Oh let's!

Honestly though, it's good to see progress like this. I posted a comment on the website referring the blog owner to this thread - whether or not he (or she) does or not is another question. I just know I personally would as I really appreciate it when people tell me I could do something better.

Salathe 12-05-2007 03:05 AM

Thanks for taking the time to look over my comments! I really hate long posts full of points like that -- it's all to easy to come across as condescending, something I don't want people to see in my posts.

So for the revised patterns, good job with them. There are still some points that I'd like to pick up on but, mercifully, far fewer than before. To make up for that, I'll go into mind-numbing detail! :-)
  1. TELEPHONE
    The part of the pattern taking care of the area code is seeming a little unwieldy at the moment. Essentially, "reading" the pattern, we are asking for:
    Either: Three digits optionally starting with an opening parenthesis and optionally ending with a closing parenthesis.
    Or: Three digits optionally followed by any one of -, . or a space.
    The main thing bothering me is, in the first instance, the optional opening and optional closing parentheses. Surely we want the pattern to read as:
    Either: Three digits wrapped by parentheses.
    Or: Three digits (optionally) followed by a dash or dot.
    If we didn't change anything, then telephone numbers like (232 555 1234 would be accepted. In production, you might want to accept phone numbers mis-typed like that but for the sake of trying to make a point lets assume that if one parenthesis is present then we must have the other.

    Also, at the moment, there must always be a space character between the area code and the rest of the number. Numbers like 232-555-5555 will not be accepted unless we make that space optional.

    I propose a pattern which reads as the following:
    Either: Three digits wrapped by parentheses, optionally followed by a space.
    Or: Three digits, optionally followed by -, . or a space.
    Followed by: Three digits, optionally followed by -, . or a space.
    Followed by: Four digits.
    As such, a further revised pattern:
    /^(?:\(\d{3}\) ?|\d{3}[-. ]?)\d{3}[-. ]?\d{4}$/

  2. EMAIL
    I'm particularly interested with the first character class definition: [^\W]

    It is basically asking for any character which is not a character which is not allowed in a 'word'. Confusing! Functionally identical is [\w] (or even just \w!) which is simply asking for any 'word' character.

    Reading the PHP manual on regex syntax will give you the whole story, but essentially a word character is alphanumeric or the underscore (with other characters available depending on the locale being used). So interestingly you use the character class [a-zA-Z0-9_] later in the pattern which just happens to be what we just described -- alphanumeric or underscore.

    Therefore, the start of the pattern is asking for 2 or more 'word' characters.... These can then be followed by zero or more occurances of 'dot followed by a 'word' character' and then the rest matching the @ and domain.

    A proposed additional revision would be to head for:
    /^\w[\w+-]*(?:\.[\w+-]+)*@[a-z0-9-]+(?:\.[a-z0-9-])*\.[a-z]{2,4}$/i

    Hopefully the above should be clear in what it'll allow (and not allow!) We need to draw the line somewhere with regards to which characters to allow (the actual RFC822 specification for email addresses is very wide ranging -- $@$ is a valid email address -- not that it will work for you!

    Note that the above suggestion (and yours) would allow addresses like __.__@gmail.com which whilst valid, may not be something you're prepared to allow. Note to everyone: do not confuse pattern matching with making sure that the email exists.

  3. POSTAL CODE
    My only suggestion would be to make the optional group non-catching.
    E.g., /^[0-9]{5}(?:[- ]?[0-9]{4})?$/

  4. IP ADDRESS
    I prefer not to put single characters into a character class definition, as you have with [.].
    E.g., /^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$/

  5. HEXADECIMAL COLORS
    Just some tidying up really, removing a grouping.
    Suggestion: /^#(?:[a-f0-9]{3}){1,2}$/i


Please note that all of the above points are just examples of what one could do and should by no means be seen as drop-in solutions for particular needs. Think of them more as handy starting points to refine for your exact needs. Go play! *!*

devolio 12-05-2007 04:37 AM

Hey everyone,

Though I haven't had a chance to look through all of your responses, I've skimmed over most of them, and appreciate the feedback and improvements. I know, they aren't perfect, but (at one point or another,) they got the job done for me.

When I get a chance tomorrow, I'll read through all of this more thoroughly.

Tanax 12-05-2007 05:57 AM

Awesome thread :D

Matt83 12-05-2007 04:38 PM

Big thanks Salathe, i appreciate the care of details you shared in your feedbacks, i dont have a big background on regular expressions so this things are really eye openers, love to see how they can be improved.

Quote:

Originally Posted by Salathe (Post 5346)
Please note that all of the above points are just examples of what one could do and should by no means be seen as drop-in solutions for particular needs. Think of them more as handy starting points to refine for your exact needs. Go play!

I guess i couldn't have say that better, totally agree with that statement

Quote:

Originally Posted by devolio (Post 5351)
Hey everyone,

Though I haven't had a chance to look through all of your responses, I've skimmed over most of them, and appreciate the feedback and improvements. I know, they aren't perfect, but (at one point or another,) they got the job done for me.

When I get a chance tomorrow, I'll read through all of this more thoroughly.

Hi devolio, thanks for sharing this regular expressions! welcome to these forums

Wildhoney 12-05-2007 05:06 PM

We're all here to learn, Devolio :-) To be fair, even if I had done those regular expressions, Salathe would have pulled them apart. He's just too damned good! Grr. Seriously though, I love threads like this - feels really, community like, everybody pulling together to find the best solutions.

devolio 12-06-2007 05:50 PM

Finally got a chance to go through all of these (holidays are crazy,) and I couldn't agree more with most of your points.

And I more than appreciate all of the feedback, knowledge is power. :)

Would anyone who's contributed to the updates mind if I posted the updates on my site with credit to the site? (or to the individuals?)

Salathe 12-06-2007 05:57 PM

Thanks for taking the time to wade through the comments, devolio. I'm quite happy for anything that I've said to be posted up on your site as I'm sure everyone else here is (but best let them say that explicitly). We're all about sharing and building on each others' strength here. :-)

Wildhoney 12-06-2007 05:58 PM

I don't mind you crediting TalkPHP at all. I can't speak for the individuals though :-) !

Thank you, Devolio!

WinSrev 12-06-2007 06:01 PM

If you need one for an email address (here's something i came across in my functions file, not sure who's it is, but it's not mine)

PHP Code:

function check_email($email
{
    if(
preg_match('/^\w[-.\w]*@(\w[-._\w]*\.[a-zA-Z]{2,}.*)$/'$email$matches))
    {
        if(
function_exists('checkdnsrr'))
        {
            if(
checkdnsrr($matches[1] . '.''MX')) return true;
            if(
checkdnsrr($matches[1] . '.''A')) return true;
        }else{
            if(!empty(
$hostName))
            {
                if( 
$recType == '' $recType "MX";
                
exec("nslookup -type=$recType $hostName"$result);
                foreach (
$result as $line)
                {
                    if(
eregi("^$hostName",$line))
                    {
                        return 
true;
                    }
                }
                return 
false;
            }
            return 
false;
        }
    }
    return 
false;



devolio 12-06-2007 06:14 PM

As long as the individual contributors don't mind.

Also, I've (heard) that validating e-mail addresses with Regex is almost impossible, and have been pointed to this a few times.

http://www.iamcal.com/publish/articl...parsing_email/

Wildhoney 12-06-2007 06:15 PM

One or two questions, WinSrev.

php Code:
if(!empty($hostName))

Surely that variable will also be empty as it's not being set anywhere - unless I'm overlooking something.

Also, you may wish to use preg_match as opposed to the ereg_*/eregi_* function set as the former is faster, and they're removing the latter from PHP 6.

Matt83 12-06-2007 06:36 PM

Im glad to see how much this thread is progressing. Devolio, i wouldnt mind either to see you credit TalkPHP ;-)

Here is another to validate URLS i made yesterday, it's not perfect at all so maybe we can improve it to make it better:

/^(http|https|ftp):\/\/([\w]*)\.([\w]*)\.(com|net|org|biz|info|mobi|us|cc|bz|tv|ws|name |co|me)(\.[a-z]{1,3})?\z/i

PHP Code:

$szString "http://www.talkPHP.com";
if (
preg_match('/^(http|https|ftp):\/\/([\w]*)\.([\w]*)\.(com|net|org|biz|info|mobi|us|cc|bz|tv|ws|name|co|me)(\.[a-z]{1,3})?\z/i'$szString))
    echo 
"This is a valid URL"


WinSrev 12-06-2007 07:38 PM

Quote:

Originally Posted by Wildhoney (Post 5616)
One or two questions, WinSrev.

php Code:
if(!empty($hostName))

Surely that variable will also be empty as it's not being set anywhere - unless I'm overlooking something.

Also, you may wish to use preg_match as opposed to the ereg_*/eregi_* function set as the former is faster, and they're removing the latter from PHP 6.

More than likely, although, i may take another look at it and rewrite it. I'll be sure to post the updated version :-)


All times are GMT. The time now is 03:26 AM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0