 |
Account Login
|
 |
 |
Latest Articles
|
 |
 |
IRC Channel
|
 |
 |
Associates
|
 |
 |
Associates
|
 |
|
 |
|
 |
|
 |
12-04-2007, 10:45 PM
|
#1 (permalink)
|
|
The Contributor
Join Date: Oct 2007
Location: Argentina
Posts: 72
Thanks: 18
|
8 Practical PHP Regular Expressions
For all us security paranoids  Here are eight practical PHP regular expressions i found on the web which came very handy to me:
Quote:
|
Note: Scroll down to get the latest/correct versions of these Regular expressions
|
Validating a Username:
Quote:
|
Something often overlooked, but simple to do with a regular expression would be username validation. For example, we may want our usernames to be between 4 and 28 characters in length, alpha-numeric, and allow underscores.
|
PHP Code:
$string = "userNaME4234432_"; if (preg_match('/^[a-z\d_]{4,28}$/i', $string)) { echo "example 1 successful."; }
Telephone Numbers:
Quote:
|
Number in the following form: (###) ###-####
|
PHP Code:
$string = "(232) 555-5555"; if (preg_match('/^(\(?[0-9]{3,3}\)?|[0-9]{3,3}[-. ]?)[ ][0-9]{3,3}[-. ]?[0-9]{4,4}$/', $string)) { echo "example 2 successful."; }
Emails:
PHP Code:
$string = "first.last@domain.co.uk"; if (preg_match( '/^[^0-9][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[@][a-zA-Z0-9_]+([.][a-zA-Z0-9_]+)*[.][a-zA-Z]{2,4}$/', $string)) { echo "example 3 successful."; }
Postal Codes:
PHP Code:
$string = "55324-4324"; if (preg_match('/^[0-9]{5,5}([- ]?[0-9]{4,4})?$/', $string)) { echo "example 4 successful."; }
Ip Address:
PHP Code:
$string = "255.255.255.0"; if (preg_match( '^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:[.](?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$', $string)) { echo "example 5 successful."; }
Hexadecimal Colors:
PHP Code:
$string = "#666666"; if (preg_match('/^#(?:(?:[a-f\d]{3}){1,2})$/i', $string)) { echo "example 6 successful."; }
Multi-line Comments:
PHP Code:
$string = "/* commmmment */"; if (preg_match('/^[(/*)+.+(*/)]$/', $string)) { echo "example 7 successful."; }
Dates:
PHP Code:
$string = "10/15/2007"; if (preg_match('/^\d{1,2}\/\d{1,2}\/\d{4}$/', $string)) { echo "example 8 successful."; }
Some might be more/less useful than the others but that will depend on the project you are working on.
Hope you find them useful too. Source/credits: Devolio.org
Last edited by Matt83 : 12-05-2007 at 01:09 AM.
|
|
|
|
|
The Following 21 Users Say Thank You to Matt83 For This Useful Post:
|
|
Andrew (12-16-2007), bedri (05-22-2008), cliffgs (10-15-2010), codefreek (07-15-2008), danielneri (01-26-2008), drewbee (06-27-2008), EHJamie (07-25-2010), hello-world (03-02-2009), iflashlord (04-23-2009), Kalle (02-10-2008), Karl (12-04-2007), nefus (03-19-2009), Nor (12-05-2007), nullbyte (03-19-2008), obolus (01-06-2008), Orc (12-12-2008), ReSpawN (10-10-2008), Salathe (12-05-2007), sketchMedia (05-31-2008), Tanax (12-05-2007), Village Idiot (12-05-2007) |
12-04-2007, 11:26 PM
|
#2 (permalink)
|
|
La Vida es Sueño
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
|
One question I have is isn't {3,3} exactly the same as {3}?
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
|
|
|
|
The Following User Says Thank You to Wildhoney For This Useful Post:
|
|
12-05-2007, 12:30 AM
|
#3 (permalink)
|
|
The Contributor
Join Date: Oct 2007
Location: Argentina
Posts: 72
Thanks: 18
|
Quote:
Originally Posted by Wildhoney
One question I have is isn't {3,3} exactly the same as {3}?
|
Good find Adam, im sure that it is the same as theres no difference saying:
{3} three preceding characters and {3,3} three to three preceding characters.
Will update the code above.
Salathe, devolio is not my site. Anyways, big thanks for that great feedback, i really appreciate you pointing that stuff as i have relayed on some of these for previous projects and although i never got an error i see all the points you mention and im starting to panic now!, will try to update them so we all can use them. thanks again
|
|
|
|
12-05-2007, 12:30 AM
|
#4 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
Let me start with thanking you for supplying these regular expressions. I intended to comment on one or two things but got carried away so please accept my apologies for the length of this reply!
Some comments: - Telephone Numbers
There is no need for the parentheses to group results, since we're just testing for the match rather than bring back the matched groups. If you need to group items (for whatever reason) but don't wish to keep them for back references, then use the (?:) pattern.
As Wildhoney mentioned, why use {3,3} when you can state an explicit number of repetitions (rather than a range) using {3}.
The [ ] is not necessary, just use a space.
- Emails
Finally a RegExp which allows the plus symbol usage (e.g., salathe+spam@talkphp.com)! Your pattern states that the address must not start with a digit: that means any other character can be used, making @test@test.com (or even $_+@_.aa) valid addresses (correct me if I'm wrong, but I don't think they should be valid).
Again there is no need to wrap the @ or . characters with brackets. To accept a literal period/fullstop/dot character, outside of character classes, simply escape it: \.
- Postal Codes
See above, regarding {5,5} and {4,4}, and the same for the non-capturing groups.
- Ip Address:
There are no delimeters around the pattern. Attempting to run this code will throw an error.
- Hexadecimal Colors
For readability, I'd probably prefer to use 0-9 than \d. There's no need for the outer non-matching group that I can see.
- Multi-line Comments
This code will not run as-is, resulting in a warning ("Unknown modifier '*'"). The square brackets are not wanted here -- they denote the beginning and end of a character class definition which isn't what we want.
Next, the forward slashes for the beginning/end of the comments need to be escaped else they conflict with the pattern delimiters (another warning). The asterisks (*) will also need to be escaped.
The + beside the opening comment characters is not necessary, and the .+ needs to be made ungreedy else you'll run into problems.
Finally, this will not work if the comment is truly multi-line. You'll want to use the ms pattern modifiers to make the pattern multi-line and dot-all respectively (the latter meaning the . matches newlines as well).
Suggestion: #/\*.*?\*/#ms
- Dates
The "date" 99/99/9999 would be accepted. It's probably wiser to use one or more of PHP's date functions, or some other method, if you want to check that the date supplied is an actual, valid, date.
Is devolio.com your site?
|
|
|
|
|
The Following 10 Users Say Thank You to Salathe For This Useful Post:
|
|
12-05-2007, 12:52 AM
|
#5 (permalink)
|
|
La Vida es Sueño
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
|
Quote:
Originally Posted by Salathe
The "date" 99/99/9999 would be accepted. It's probably wiser to use one or more of PHP's date functions, or some other method, if you want to check that the date supplied is an actual, valid, date.
|
Top post. Thanks! For the above checking, the checkdate has never failed me. Simply pass it in a date and it will tell you if it's valid. It will even check if the amount of days for the specified month is correct, and even takes into consideration leap years! Can't beat that function, in my opinion, and I'm confident there's just no way any one regular expression could ever match that.
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
|
|
|
12-05-2007, 01:08 AM
|
#6 (permalink)
|
|
The Contributor
Join Date: Oct 2007
Location: Argentina
Posts: 72
Thanks: 18
|
Revised Regular expressions:
TELEPHONE
PHP Code:
$string = "(232) 555-5555";
if (preg_match('/^\(?[0-9]{3}\)?|[0-9]{3}[-. ]? [0-9]{3}[-. ]?[0-9]{4}$/', $string)) {
echo "example 2 successful.";
}
EMAIL
PHP Code:
$string = "first.last@domain.co.uk";
if (preg_match(
'/^[^\W][a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+)*\@[a-zA-Z0-9_]+(\.[a-zA-Z0-9_]+)*\.[a-zA-Z]{2,4}$/',
$string)) {
echo "example 3 successful.";
POSTAL CODE
PHP Code:
$string = "55324-4324";
if (preg_match('/^[0-9]{5}([- ]?[0-9]{4})?$/', $string)) {
echo "example 4 successful.";
}
IP ADDRESS
PHP Code:
$string = "255.255.255.0";
if (preg_match(
'/^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:[.](?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$/',
$string)) {
echo "example 5 successful.";
}
HEXADECIMAL COLORS
PHP Code:
<?php
$string = "#666666";
if (preg_match('/^#(?:(?:[a-f0-9]{3}){1,2})$/i', $string)) {
echo "example 6 successful.";
}
?>
MULTI-LINE COMMENTS
PHP Code:
<?php $string = "/* commmmment */";
if (preg_match('#/\*.*?\*/#ms', $string)) {
echo "example 7 successful.";
} ?>
Thanks guys for the feedback  , i followed your suggestions and made the necessary arrangements, let me know if these are ok or if you have anything else to share/add to these
Thanks again!
|
|
|
|
|
The Following 4 Users Say Thank You to Matt83 For This Useful Post:
|
|
12-05-2007, 03:05 AM
|
#7 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
Thanks for taking the time to look over my comments! I really hate long posts full of points like that -- it's all to easy to come across as condescending, something I don't want people to see in my posts.
So for the revised patterns, good job with them. There are still some points that I'd like to pick up on but, mercifully, far fewer than before. To make up for that, I'll go into mind-numbing detail!
- TELEPHONE
The part of the pattern taking care of the area code is seeming a little unwieldy at the moment. Essentially, "reading" the pattern, we are asking for:Either: Three digits optionally starting with an opening parenthesis and optionally ending with a closing parenthesis.
Or: Three digits optionally followed by any one of -, . or a space. The main thing bothering me is, in the first instance, the optional opening and optional closing parentheses. Surely we want the pattern to read as:Either: Three digits wrapped by parentheses.
Or: Three digits (optionally) followed by a dash or dot. If we didn't change anything, then telephone numbers like (232 555 1234 would be accepted. In production, you might want to accept phone numbers mis-typed like that but for the sake of trying to make a point lets assume that if one parenthesis is present then we must have the other.
Also, at the moment, there must always be a space character between the area code and the rest of the number. Numbers like 232-555-5555 will not be accepted unless we make that space optional.
I propose a pattern which reads as the following:Either: Three digits wrapped by parentheses, optionally followed by a space.
Or: Three digits, optionally followed by -, . or a space.
Followed by: Three digits, optionally followed by -, . or a space.
Followed by: Four digits. As such, a further revised pattern:
/^(?:\(\d{3}\) ?|\d{3}[-. ]?)\d{3}[-. ]?\d{4}$/
- EMAIL
I'm particularly interested with the first character class definition: [^\W]
It is basically asking for any character which is not a character which is not allowed in a 'word'. Confusing! Functionally identical is [\w] (or even just \w!) which is simply asking for any 'word' character.
Reading the PHP manual on regex syntax will give you the whole story, but essentially a word character is alphanumeric or the underscore (with other characters available depending on the locale being used). So interestingly you use the character class [a-zA-Z0-9_] later in the pattern which just happens to be what we just described -- alphanumeric or underscore.
Therefore, the start of the pattern is asking for 2 or more 'word' characters.... These can then be followed by zero or more occurances of 'dot followed by a 'word' character' and then the rest matching the @ and domain.
A proposed additional revision would be to head for:
/^\w[\w+-]*(?:\.[\w+-]+)*@[a-z0-9-]+(?:\.[a-z0-9-])*\.[a-z]{2,4}$/i
Hopefully the above should be clear in what it'll allow (and not allow!) We need to draw the line somewhere with regards to which characters to allow (the actual RFC822 specification for email addresses is very wide ranging -- $@$ is a valid email address -- not that it will work for you!
Note that the above suggestion (and yours) would allow addresses like __.__@gmail.com which whilst valid, may not be something you're prepared to allow. Note to everyone: do not confuse pattern matching with making sure that the email exists.
- POSTAL CODE
My only suggestion would be to make the optional group non-catching.
E.g., /^[0-9]{5}(?:[- ]?[0-9]{4})?$/
- IP ADDRESS
I prefer not to put single characters into a character class definition, as you have with [.].
E.g., /^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$/
- HEXADECIMAL COLORS
Just some tidying up really, removing a grouping.
Suggestion: /^#(?:[a-f0-9]{3}){1,2}$/i
Please note that all of the above points are just examples of what one could do and should by no means be seen as drop-in solutions for particular needs. Think of them more as handy starting points to refine for your exact needs. Go play! 
|
|
|
|
|
The Following 4 Users Say Thank You to Salathe For This Useful Post:
|
|
12-05-2007, 04:38 PM
|
#8 (permalink)
|
|
The Contributor
Join Date: Oct 2007
Location: Argentina
Posts: 72
Thanks: 18
|
Big thanks Salathe, i appreciate the care of details you shared in your feedbacks, i dont have a big background on regular expressions so this things are really eye openers, love to see how they can be improved.
Quote:
Originally Posted by Salathe
Please note that all of the above points are just examples of what one could do and should by no means be seen as drop-in solutions for particular needs. Think of them more as handy starting points to refine for your exact needs. Go play!
|
I guess i couldn't have say that better, totally agree with that statement
Quote:
Originally Posted by devolio
Hey everyone,
Though I haven't had a chance to look through all of your responses, I've skimmed over most of them, and appreciate the feedback and improvements. I know, they aren't perfect, but (at one point or another,) they got the job done for me.
When I get a chance tomorrow, I'll read through all of this more thoroughly.
|
Hi devolio, thanks for sharing this regular expressions! welcome to these forums
|
|
|
|
12-05-2007, 02:48 AM
|
#9 (permalink)
|
|
La Vida es Sueño
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
|
Much better  ! Does that mean we can now look down on the original website which published the regular expressions? Oh let's!
Honestly though, it's good to see progress like this. I posted a comment on the website referring the blog owner to this thread - whether or not he (or she) does or not is another question. I just know I personally would as I really appreciate it when people tell me I could do something better.
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
|
|
|
12-05-2007, 04:37 AM
|
#10 (permalink)
|
|
The Wanderer
Join Date: Dec 2007
Posts: 13
Thanks: 2
|
Hey everyone,
Though I haven't had a chance to look through all of your responses, I've skimmed over most of them, and appreciate the feedback and improvements. I know, they aren't perfect, but (at one point or another,) they got the job done for me.
When I get a chance tomorrow, I'll read through all of this more thoroughly.
|
|
|
|
12-05-2007, 05:57 AM
|
#11 (permalink)
|
|
The Prestige
Join Date: Sep 2007
Location: Sweden, Stockholm
Posts: 1,080
Thanks: 115
|
Awesome thread :D
|
|
|
|
12-05-2007, 05:06 PM
|
#12 (permalink)
|
|
La Vida es Sueño
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
|
We're all here to learn, Devolio  To be fair, even if I had done those regular expressions, Salathe would have pulled them apart. He's just too damned good! Grr. Seriously though, I love threads like this - feels really, community like, everybody pulling together to find the best solutions.
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
|
|
|
12-06-2007, 05:50 PM
|
#13 (permalink)
|
|
The Wanderer
Join Date: Dec 2007
Posts: 13
Thanks: 2
|
Finally got a chance to go through all of these (holidays are crazy,) and I couldn't agree more with most of your points.
And I more than appreciate all of the feedback, knowledge is power. :)
Would anyone who's contributed to the updates mind if I posted the updates on my site with credit to the site? (or to the individuals?)
|
|
|
|
12-06-2007, 05:57 PM
|
#14 (permalink)
|
|
Moderateur
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
|
Thanks for taking the time to wade through the comments, devolio. I'm quite happy for anything that I've said to be posted up on your site as I'm sure everyone else here is (but best let them say that explicitly). We're all about sharing and building on each others' strength here. 
|
|
|
|
12-06-2007, 05:58 PM
|
#15 (permalink)
|
|
La Vida es Sueño
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
|
I don't mind you crediting TalkPHP at all. I can't speak for the individuals though  !
Thank you, Devolio!
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
|
|
|
12-06-2007, 06:01 PM
|
#16 (permalink)
|
|
The Acquainted
Join Date: Sep 2007
Posts: 133
Thanks: 6
|
If you need one for an email address (here's something i came across in my functions file, not sure who's it is, but it's not mine)
PHP Code:
function check_email($email) { if(preg_match('/^\w[-.\w]*@(\w[-._\w]*\.[a-zA-Z]{2,}.*)$/', $email, $matches)) { if(function_exists('checkdnsrr')) { if(checkdnsrr($matches[1] . '.', 'MX')) return true; if(checkdnsrr($matches[1] . '.', 'A')) return true; }else{ if(!empty($hostName)) { if( $recType == '' ) $recType = "MX"; exec("nslookup -type=$recType $hostName", $result); foreach ($result as $line) { if(eregi("^$hostName",$line)) { return true; } } return false; } return false; } } return false; }
|
|
|
|
The Following 2 Users Say Thank You to WinSrev For This Useful Post:
|
|
12-06-2007, 06:14 PM
|
#17 (permalink)
|
|
The Wanderer
Join Date: Dec 2007
Posts: 13
Thanks: 2
|
As long as the individual contributors don't mind.
Also, I've (heard) that validating e-mail addresses with Regex is almost impossible, and have been pointed to this a few times.
http://www.iamcal.com/publish/articl...parsing_email/
|
|
|
|
12-06-2007, 06:15 PM
|
#18 (permalink)
|
|
La Vida es Sueño
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
|
One or two questions, WinSrev.
Surely that variable will also be empty as it's not being set anywhere - unless I'm overlooking something.
Also, you may wish to use preg_match as opposed to the ereg_*/ eregi_* function set as the former is faster, and they're removing the latter from PHP 6.
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
|
|
|
12-06-2007, 07:38 PM
|
#19 (permalink)
|
|
The Acquainted
Join Date: Sep 2007
Posts: 133
Thanks: 6
|
Quote:
Originally Posted by Wildhoney
One or two questions, WinSrev.
Surely that variable will also be empty as it's not being set anywhere - unless I'm overlooking something.
Also, you may wish to use preg_match as opposed to the ereg_*/ eregi_* function set as the former is faster, and they're removing the latter from PHP 6.
|
More than likely, although, i may take another look at it and rewrite it. I'll be sure to post the updated version 
|
|
|
12-09-2007, 11:19 AM
|
#20 (permalink)
|
|
The Contributor
Join Date: Dec 2007
Location: Belgium
Posts: 60
Thanks: 6
|
Great topic. I love talking about all the tiny details of regular expressions. They do matter!
Nobody has made a comment on the very first regex though.
Quote:
Originally Posted by Matt83
Validating a Username:
PHP Code:
$string = "userNaME4234432_";
if (preg_match('/^[a-z\d_]{4,28}$/i', $string)) {
echo "example 1 successful.";
}
|
I'm telling you this username regex is flawed. Have a good look at it. Think. Test it. Then read on for the explanation.
The Problem
Okay, the often made mistake is to think that $ matches at the end of a string. The truth is that by default $ matches immediately before the final character if it is a newline.
This means that usernames containing a newline character at the end would pass. You wouldn't want that, right?
The Solution
Simply alter the meaning of $ by applying the D modifier to your regex. This will make it match only at the real end of the string. Alternatively, you could use the \z metacharacter.
The example below should make it all crystal clear, I hope.
PHP Code:
$username = "mr_newline\n";
preg_match('/^[a-z\d_]{4,28}$/i', $username); // TRUE
preg_match('/^[a-z\d_]{4,28}$/iD', $username); // FALSE
preg_match('/^[a-z\d_]{4,28}\z/i', $username); // FALSE
Further Reading
|
|
|
|
|
The Following 2 Users Say Thank You to Geert For This Useful Post:
|
|
|
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Hybrid Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
|