TalkPHP

TalkPHP (http://www.talkphp.com/forums.php)
-   Advanced PHP Programming (http://www.talkphp.com/advanced-php-programming/)
-   -   ...Another regular expression problem! (http://www.talkphp.com/advanced-php-programming/1638-another-regular-expression-problem.html)

Wildhoney 12-06-2007 06:38 PM

...Another regular expression problem!
 
I have the following code at the moment:

php Code:
$szChem = 'H2OCe';
$aMatches = preg_split('~([^\P{Lu}]+)~', $szChem, null, PREG_SPLIT_DELIM_CAPTURE);

print_r($aMatches);

Which outputs:

Code:

Array
(
    [0] =>
    [1] => H
    [2] => 2
    [3] => O
    [4] =>
    [5] => C
    [6] => e
)

I could remove the flag PREG_SPLIT_DELIM_CAPTURE but I want the uppercase character I'm splitting it on. So if things were going right for me the array would look like so:

Code:

Array
(
    [0] => H2
    [1] => O
    [2] => Ce
)

But I just can't seem to get it. Any ideas?

Wildhoney 12-06-2007 07:47 PM

I've sort of come up with a solution on my own, but it feels a little hackish with all those "OR"s in:

php Code:
$szChem = 'H2OCe5CO5L';
preg_match_all('~[^\P{Lu}]{1}[^\P{Ll}]+\d+|[^\P{Lu}]{1}\d+|[^\P{Lu}]{1}[^\P{Ll}]+|[^\P{Lu}]{1}~', $szChem, $aMatches);

print_r($aMatches);

Be much easier if I could split by the uppercase letters, but still retain the uppercase letters and include them alongside the matches, as opposed to in a separate array index.

Salathe 12-06-2007 08:49 PM

If I've understood everything correctly, you'll want to take a simpler approach to this. We're using preg_split here in an unconventional manner; usually, one would expect to retrieve repeated items split by a certain character or group of characters:
php Code:
$aWords = preg_split('/\s+/', "This is    an\texample");
// $aWords = array('This', 'is', 'an', 'example');
 

Instead, we're going to be taking a different route grabbing the delimiter itself (in the example above, it would be the whitespace) by using a slightly more complicated regular expression which will result in the split strings (what went into $aWords above) being empty. To return the delimiter matches we can use the flag PREG_SPLIT_DELIM_CAPTURE and to ignore the empty split strings we'll use PREG_SPLIT_NO_EMPTY.

To cut a long story short, I think the example below should work for what you need Wildhoney:

php Code:
$szChem = 'H2OCe5CO5L';
$aMatches = preg_split('/(\p{Lu}\P{Lu}*)/', $szChem, null,
                       PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
print_r($aMatches);
/*
Array
(
    [0] => H2
    [1] => O
    [2] => Ce5
    [3] => C
    [4] => O5
    [5] => L
)
*/

Where it looks like you've been going wrong is that you were only using the \P escape sequence. This means 'any character which is not' whatever follows: \P{Lu} matches any single character which is not an uppercase letter. The alternative is \p does the inverse by matching 'any character which is' whatever follows: \p{Lu} matches a single uppercase letter.

Wildhoney 12-06-2007 09:06 PM

Thanks :-) I forgot that the upper-case variation means the inverse to the lower-case. Perfect!


All times are GMT. The time now is 10:58 PM.

Powered by vBulletin® Version 3.6.8
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.
Search Engine Optimization by vBSEO 3.1.0