TalkPHP
 
 
Account Login
Latest Articles
» The basic usage of PHPTAL, a XML/XHTML template library for PHP
» Vulnerable methods and the areas they are commonly trusted in.
» Simple way to protect a form from bot
» The Basics On: How Session Stealing Works
» How to keep your forms from double posting data
IRC Channel
IRC Speech Bubble Join the friendly bunch on IRC...
(#TalkPHP on Freenode)

...Also available via a web interface.

See this thread for information on the TalkPHP Free Hugs Initiative™. Subject to availability.
Associates
Associates
CSS Tutorials
Reply
 
LinkBack Thread Tools Search this Thread Display Modes
Old 12-06-2007, 06:38 PM   #1 (permalink)
La Vida es Sueño
Advanced Programmer Top Contributor 
 
Wildhoney's Avatar
 
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
Wildhoney is on a distinguished road
Default ...Another regular expression problem!

I have the following code at the moment:

php Code:
$szChem = 'H2OCe';
$aMatches = preg_split('~([^\P{Lu}]+)~', $szChem, null, PREG_SPLIT_DELIM_CAPTURE);

print_r($aMatches);

Which outputs:

Code:
Array
(
    [0] => 
    [1] => H
    [2] => 2
    [3] => O
    [4] => 
    [5] => C
    [6] => e
)
I could remove the flag PREG_SPLIT_DELIM_CAPTURE but I want the uppercase character I'm splitting it on. So if things were going right for me the array would look like so:

Code:
Array
(
    [0] => H2
    [1] => O
    [2] => Ce
)
But I just can't seem to get it. Any ideas?
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.

Last edited by Wildhoney : 12-06-2007 at 07:10 PM.
Send a message via AIM to Wildhoney Send a message via MSN to Wildhoney Send a message via Yahoo to Wildhoney
Wildhoney is offline  
Reply With Quote
Old 12-06-2007, 07:47 PM   #2 (permalink)
La Vida es Sueño
Advanced Programmer Top Contributor 
 
Wildhoney's Avatar
 
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
Wildhoney is on a distinguished road
Default

I've sort of come up with a solution on my own, but it feels a little hackish with all those "OR"s in:

php Code:
$szChem = 'H2OCe5CO5L';
preg_match_all('~[^\P{Lu}]{1}[^\P{Ll}]+\d+|[^\P{Lu}]{1}\d+|[^\P{Lu}]{1}[^\P{Ll}]+|[^\P{Lu}]{1}~', $szChem, $aMatches);

print_r($aMatches);

Be much easier if I could split by the uppercase letters, but still retain the uppercase letters and include them alongside the matches, as opposed to in a separate array index.
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
Send a message via AIM to Wildhoney Send a message via MSN to Wildhoney Send a message via Yahoo to Wildhoney
Wildhoney is offline  
Reply With Quote
Old 12-06-2007, 08:49 PM   #3 (permalink)
Moderateur
RegEx Guru PHP Guru Top Contributor Advanced Programmer 
 
Salathe's Avatar
 
Join Date: Apr 2007
Posts: 1,393
Thanks: 5
Salathe is on a distinguished road
Default

If I've understood everything correctly, you'll want to take a simpler approach to this. We're using preg_split here in an unconventional manner; usually, one would expect to retrieve repeated items split by a certain character or group of characters:
php Code:
$aWords = preg_split('/\s+/', "This is    an\texample");
// $aWords = array('This', 'is', 'an', 'example');
 

Instead, we're going to be taking a different route grabbing the delimiter itself (in the example above, it would be the whitespace) by using a slightly more complicated regular expression which will result in the split strings (what went into $aWords above) being empty. To return the delimiter matches we can use the flag PREG_SPLIT_DELIM_CAPTURE and to ignore the empty split strings we'll use PREG_SPLIT_NO_EMPTY.

To cut a long story short, I think the example below should work for what you need Wildhoney:

php Code:
$szChem = 'H2OCe5CO5L';
$aMatches = preg_split('/(\p{Lu}\P{Lu}*)/', $szChem, null,
                       PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($aMatches);
/*
Array
(
    [0] => H2
    [1] => O
    [2] => Ce5
    [3] => C
    [4] => O5
    [5] => L
)
*/

Where it looks like you've been going wrong is that you were only using the \P escape sequence. This means 'any character which is not' whatever follows: \P{Lu} matches any single character which is not an uppercase letter. The alternative is \p does the inverse by matching 'any character which is' whatever follows: \p{Lu} matches a single uppercase letter.
Salathe is offline  
Reply With Quote
The Following 2 Users Say Thank You to Salathe For This Useful Post:
Matt83 (12-06-2007), Wildhoney (12-06-2007)
Old 12-06-2007, 09:06 PM   #4 (permalink)
La Vida es Sueño
Advanced Programmer Top Contributor 
 
Wildhoney's Avatar
 
Join Date: Sep 2007
Location: Oldham
Posts: 2,280
Thanks: 90
Wildhoney is on a distinguished road
Default

Thanks I forgot that the upper-case variation means the inverse to the lower-case. Perfect!
__________________
The man who comes back through the Door in the Wall will never be quite the same as the man who went out.
Send a message via AIM to Wildhoney Send a message via MSN to Wildhoney Send a message via Yahoo to Wildhoney
Wildhoney is offline  
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On


All times are GMT. The time now is 05:52 PM.

 
     

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0
Inactive Reminders By Icora Web Design