Thread: RegEx
View Single Post
Old 12-11-2007, 06:37 PM   #2 (permalink)
Geert
The Contributor
RegEx Guru 
 
Join Date: Dec 2007
Location: Belgium
Posts: 60
Thanks: 6
Geert is on a distinguished road
Default

Alright, question 1 and 2 about greediness strongly relate to each other. Regex quantifiers are greedy by default. This means that metacharacters like *, + and ? will always try to match as much as possible.

Let's take this example string: John said: "I like octopuses." Jeff added: "Especially orange ones."

When you apply the regex ".*" to that string the following happens:
  1. The regex starts looking for the first ".
  2. As soon as it finds one, the .* part kicks in.
  3. . matches any character and races through to the end of the string.
  4. Then it starts looking back, one character at a time until it encouters another ". This process is called backtracking.
  5. This is the part of the original string that gets matched: "I like octopuses." Jeff added: "Especially orange ones."

Now, when you add a question mark after the quantifier (change .* to .*?), you make it ungreedy (aka lazy).

.*? won't race through till the end of the string. Instead it will first look at the following character and stops as soon as it encouters ". Thus it only matches "I like octopuses.".


Back to your original target string: <p>some <span>text</span></p><span>another</span>. Basically what you need to do is replace the " from my example with <span> and </span>.

The regex becomes: <span>(.*?)</span>.

Try it. Play with it. Experiment. I hope I did a somewhat decent job in explaining this stuff.

I'll leave the non-capturing parentheses stuff for Salathe.
__________________
Kohana - PHP5 framework
Geert is offline  
Reply With Quote
The Following 4 Users Say Thank You to Geert For This Useful Post:
Karl (12-11-2007), victorius (12-12-2007), Wildhoney (12-11-2007), xenon (12-12-2007)