View Single Post
Old 12-05-2007, 03:05 AM   #8 (permalink)
Salathe
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

Thanks for taking the time to look over my comments! I really hate long posts full of points like that -- it's all to easy to come across as condescending, something I don't want people to see in my posts.

So for the revised patterns, good job with them. There are still some points that I'd like to pick up on but, mercifully, far fewer than before. To make up for that, I'll go into mind-numbing detail!
  1. TELEPHONE
    The part of the pattern taking care of the area code is seeming a little unwieldy at the moment. Essentially, "reading" the pattern, we are asking for:
    Either: Three digits optionally starting with an opening parenthesis and optionally ending with a closing parenthesis.
    Or: Three digits optionally followed by any one of -, . or a space.
    The main thing bothering me is, in the first instance, the optional opening and optional closing parentheses. Surely we want the pattern to read as:
    Either: Three digits wrapped by parentheses.
    Or: Three digits (optionally) followed by a dash or dot.
    If we didn't change anything, then telephone numbers like (232 555 1234 would be accepted. In production, you might want to accept phone numbers mis-typed like that but for the sake of trying to make a point lets assume that if one parenthesis is present then we must have the other.

    Also, at the moment, there must always be a space character between the area code and the rest of the number. Numbers like 232-555-5555 will not be accepted unless we make that space optional.

    I propose a pattern which reads as the following:
    Either: Three digits wrapped by parentheses, optionally followed by a space.
    Or: Three digits, optionally followed by -, . or a space.
    Followed by: Three digits, optionally followed by -, . or a space.
    Followed by: Four digits.
    As such, a further revised pattern:
    /^(?:\(\d{3}\) ?|\d{3}[-. ]?)\d{3}[-. ]?\d{4}$/

  2. EMAIL
    I'm particularly interested with the first character class definition: [^\W]

    It is basically asking for any character which is not a character which is not allowed in a 'word'. Confusing! Functionally identical is [\w] (or even just \w!) which is simply asking for any 'word' character.

    Reading the PHP manual on regex syntax will give you the whole story, but essentially a word character is alphanumeric or the underscore (with other characters available depending on the locale being used). So interestingly you use the character class [a-zA-Z0-9_] later in the pattern which just happens to be what we just described -- alphanumeric or underscore.

    Therefore, the start of the pattern is asking for 2 or more 'word' characters.... These can then be followed by zero or more occurances of 'dot followed by a 'word' character' and then the rest matching the @ and domain.

    A proposed additional revision would be to head for:
    /^\w[\w+-]*(?:\.[\w+-]+)*@[a-z0-9-]+(?:\.[a-z0-9-])*\.[a-z]{2,4}$/i

    Hopefully the above should be clear in what it'll allow (and not allow!) We need to draw the line somewhere with regards to which characters to allow (the actual RFC822 specification for email addresses is very wide ranging -- $@$ is a valid email address -- not that it will work for you!

    Note that the above suggestion (and yours) would allow addresses like __.__@gmail.com which whilst valid, may not be something you're prepared to allow. Note to everyone: do not confuse pattern matching with making sure that the email exists.

  3. POSTAL CODE
    My only suggestion would be to make the optional group non-catching.
    E.g., /^[0-9]{5}(?:[- ]?[0-9]{4})?$/

  4. IP ADDRESS
    I prefer not to put single characters into a character class definition, as you have with [.].
    E.g., /^(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)(?:\.(?:25[0-5]|2[0-4]\d|1\d\d|[1-9]\d|\d)){3}$/

  5. HEXADECIMAL COLORS
    Just some tidying up really, removing a grouping.
    Suggestion: /^#(?:[a-f0-9]{3}){1,2}$/i


Please note that all of the above points are just examples of what one could do and should by no means be seen as drop-in solutions for particular needs. Think of them more as handy starting points to refine for your exact needs. Go play!
Salathe is offline  
Reply With Quote
The Following 4 Users Say Thank You to Salathe For This Useful Post:
Andrew (12-16-2007), devolio (12-06-2007), Matt83 (12-05-2007), Wildhoney (12-05-2007)