If I've understood everything correctly, you'll want to take a simpler approach to this. We're using preg_split here in an unconventional manner; usually, one would expect to retrieve repeated items split by a certain character or group of characters:
php Code:
$aWords =
preg_split('/\s+/',
"This is an\texample");
// $aWords = array('This', 'is', 'an', 'example');
Instead, we're going to be taking a different route grabbing the delimiter itself (in the example above, it would be the whitespace) by using a slightly more complicated regular expression which will result in the split strings (what went into $aWords above) being empty. To return the delimiter matches we can use the flag
PREG_SPLIT_DELIM_CAPTURE and to ignore the empty split strings we'll use
PREG_SPLIT_NO_EMPTY.
To cut a long story short, I think the example below
should work for what you need Wildhoney:
php Code:
$szChem =
'H2OCe5CO5L';
$aMatches =
preg_split('/(\p{Lu}\P{Lu}*)/',
$szChem,
null,
PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY
);
print_r($aMatches);
/*
Array
(
[0] => H2
[1] => O
[2] => Ce5
[3] => C
[4] => O5
[5] => L
)
*/
Where it looks like you've been going wrong is that you were only using the
\P escape sequence. This means 'any character which is not' whatever follows:
\P{Lu} matches any single character which is not an uppercase letter. The alternative is
\p does the inverse by matching 'any character which is' whatever follows:
\p{Lu} matches a single uppercase letter.