Alright, question 1 and 2 about greediness strongly relate to each other. Regex quantifiers are greedy by default. This means that metacharacters like
*,
+ and
? will always try to match as much as possible.
Let's take this example string:
John said: "I like octopuses." Jeff added: "Especially orange ones."
When you apply the regex
".*" to that string the following happens:
- The regex starts looking for the first
".
- As soon as it finds one, the
.* part kicks in.
. matches any character and races through to the end of the string.
- Then it starts looking back, one character at a time until it encouters another
". This process is called backtracking.
- This is the part of the original string that gets matched:
"I like octopuses." Jeff added: "Especially orange ones."
Now, when you add a question mark after the quantifier (change
.* to
.*?), you make it ungreedy (aka lazy).
.*? won't race through till the end of the string. Instead it will first look at the following character and stops as soon as it encouters
". Thus it only matches
"I like octopuses.".
Back to your original target string:
<p>some <span>text</span></p><span>another</span>. Basically what you need to do is replace the
" from my example with
<span> and
</span>.
The regex becomes:
<span>(.*?)</span>.
Try it. Play with it. Experiment. I hope I did a somewhat decent job in explaining this stuff.
I'll leave the non-capturing parentheses stuff for Salathe.
