Can you explain why you chose to tokenise the string over using regular expressions to extract the email addresses? Also, the patterns used to remove "invalid" characters and determine valid addresses are far from ideal. For example, they would allow _@-.zz but disallow firstname.lastname@example.org (the first being an invalid address and the second a valid one).
You know, Salathe, even I forget that the + in an email address is valid from time-to-time. Such as like if my email was email@example.com, I could specify it as firstname.lastname@example.org, and it'll come right to my inbox at email@example.com. Many registration forms don't actually check for duplicates based on this either, if they allow the + then often they would class it as a unique email address.
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
Just to clarify, your use of the + sign within an email address is incorrect Wildhoney. Email sent to firstname.lastname@example.org would be routed through to email@example.com. I very often use it to label things that I think will be spammy or to specify 'groups' which my mail software can then filter incoming mail by. Sadly, all too many websites refuse to accept that the + symbol is a valid character when used like this. :(
Thanks for listening Sunil, and good luck with your work on the script.