Here’s some PHP code from last year that recently reared its ugly head. Basically, it was designed to take a line of contact info from a business directory and automatically convert any URLs or email addresses to clickable links by identifying the patterns and wrapping them in the appropriate HTML:
function directory_linkify($contact) { $rx_url = "#(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))#"; if (preg_match_all($rx_url, $contact, $matches) > 0) { foreach ($matches[0] as $match) { $contact = str_replace($match, "<a href = 'http://" . str_replace('http://', '', $match) . "' target = '_blank'>" . str_replace('http://', '', $match) . "</a>", $contact); } } $rx_email = "#[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,3})#"; if (preg_match_all($rx_email, $contact, $matches) > 0) { foreach ($matches[0] as $match) { $contact = str_replace($match, "<a href = 'mailto:" . $match . "'>" . $match . "</a>", $contact); } } return $contact; }
It worked great until someone added two URLs to the same line, so given this:
Bob Smith, 1-800-888-8888, www.bobsmith.com, www.bobsmith.com/very_important_page.html
I was seeing this result:
Bob Smith, 1-800-888-8888, <a href=”http://www.bobsmith.com”>www.bobsmith.com</a>, <a href=”http://www.bobsmith.com”>www.bobsmith.com</a>/very_important_page.html
WTF? The second URL wasn’t catching the path. But other tests with full path URLs were passing elsewhere.
At this point, and it doesn’t help that the bug appeared late at night when I should have been going to bed, the undisciplined whack-a-mole debugger in me came out in full force. Maybe the order of the URLs mattered, maybe the regex I was using was faulty, maybe I needed to end that second URL with something different, maybe, maybe…
Of course, as with most bugs, it was pretty simple once I stepped back and actually looked at what was happening. Let’s look at that foreach loop and assume the regex is finding URLs just fine, since it’s been working for a year already.
We’ve got two URLs in the $matches array: www.bobsmith.com and www.bobsmith.com/very_important_page.html. So we loop.
In step one, we change www.bobsmith.com to <a href = ‘http://www.bobsmith.com’>www.bobsmith.com</a>.
And in step two, we change www.bobsmith.com/very_important_page.html to <a href = ‘http://www.bobsmith.com/very_important_page.html’>www.bobsmith.com/very_important_page.html</a>.
Except we don’t! Step one already changed that text to <a href = ‘http://www.bobsmith.com’>www.bobsmith.com</a>/very_important_page.html, which means the str_replace call in the second round of the loop never finds anything to replace.
This is a great case where you’ve got a simple process that happens to involve a complicated distraction like a regex. When you’re debugging, it’s always a great idea to step through the function with a pencil and paper, and ideally with a second set of eyes, especially for relatively self-contained methods like this (which are the kinds you should strive to write anyway – hmm, maybe I should have split the email and URL detection out into separate methods…)
Anyway, in case you’re curious or need such a thing, here’s the replacement code that links properly. I doubt it’s perfect (I’m doing more PHP lately but this was from a while ago) so let me know if you spot any other bugs or opportunities for improvement!
function directory_linkify($contact) { $rx_url = "#(?i)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))#"; if (preg_match_all($rx_url, $contact, $matches) > 0) { $index = 0; foreach ($matches[0] as $match) { $index = strpos($contact, $match, $index); $contact = substr_replace($contact, "<a href = 'http://" . str_replace('http://', '', $match) . "' target = '_blank'>" . str_replace('http://', '', $match) . "</a>", $index, strlen($match)); } } // and then similar changes for email... return $contact; }