Hello ppl. Over the time, I've encountered some issues with regex (amongst other stuff) which still aren't clear to me. I guess either it's advanced stuff, or I just simply couldn't find an article or something that will answer my questions. So, here they go:
1. I have a html-formatted text:
Code:
<p>some <span>text</span></p><span>another</span>
...and a rule that *should* match the text in the first span (between the opening and its corresponding closing tag):
Code:
/<span(?.*)>(.*)<\/span>/
That should do (although it's too lame, since only one pair of tags will be matched), but the question is: how do I know that $2 will match only "text" and not "text</span></p><span>another"? How can I tell the regex engine to "stop when you run into the first </span>"? I believe the answer it's related to the next question...
2. I'd be very thankful to someone who knows the regular expressions well and has a little time to explain to me (and others, of course) what is the difference between "greedy" and "non-greedy" matches (with some simple examples). I've read about these in some books, and gone thorough them a couple of times, but still I can't fully understand this technique.
3. What is
?: supposed to do? I know that
? will make the previous match optional, meaning that the following expression will match both "a-bc" and "abc".
Thanks in advance.