Bug Report #41

xss_clean not working with PHP 5/Poor regex needs improvement

Added by Tido - almost 7 years ago. Updated almost 7 years ago.

Status:ClosedStart date:
Priority:HighDue date:
Assignee:Jim Auldridge -% Done:


Target version:1.0
Resolution:fixed Points:


A bug and an optimization at the same time! Can't beat that! Check out http://codeigniter.com/forums/viewthread/50055/


#1 Updated by Jim Auldridge - almost 7 years ago

CI Rev566 has a different check and removal of JS event handlers than what is quoted from CI 1.5.3 in the linked thread.

I am assuming that we want to replace lines 571 and 572 of system/libraries/Input.php with Geert's one liner from the thread?

Code versions:
Official CI 1.5.3:

$str = preg_replace('#(<[^>]+.*?)(onblur|onchange|onclick|onfocus|onload|onmouseover|onmouseup|onmousedown|onselect|onsubmit|onunload|onkeypress|onkeydown|onkeyup|onresize)[^>]*>#iU',"\\1>",$str);

CI SVN Rev 566:

$event_handlers = array('onblur','onchange','onclick','onfocus','onload','onmouseover','onmouseup','onmousedown','onselect','onsubmit','onunload','onkeypress','onkeydown','onkeyup','onresize', 'xmlns');
$str = preg_replace("#<([^>]+)(".implode('|', $event_handlers).")([^>]*)>#iU", "&lt;\\1\\2\\3&gt;", $str);

Geert's replacement (he was replacing the official CI 1.5.3)

$str = preg_replace('#(<[^>]+?)(?:on(?:blur|change|click|focus|load|mouse(?:over|up|down)|select|submit|unload|key(?:press|down|up)|resize))[^>]*>#i', '\\1>', $str); 

#2 Updated by Jim Auldridge - almost 7 years ago

Ci SVN version has added checks that are not in official 1.5.3 or Geert's replacement. Asked Geert to mesh his with CI SVN:


#3 Updated by Tido - almost 7 years ago

From Geert's post @ CI forums

Okay, the latest CI code looks like this:

* Remove [[JavaScript]] Event Handlers
* Note: This code is a little blunt.  It removes
* the event handler and anything up to the closing >,
* but it's unlikely to be a problem.
$event_handlers = array('onblur','onchange','onclick','onfocus','onload','onmouseover','onmouseup','onmousedown','onselect','onsubmit','onunload','onkeypress','onkeydown','onkeyup','onresize', 'xmlns');
$str = preg_replace("#<([^>]+)(".implode('|', $event_handlers).")([^>]*)>#iU", "&lt;\\1\\2\\3&gt;", $str);

After imploding, the actual regex that gets applied looks like this:

$str = preg_replace("#<([^>]+)(onblur|onchange|onclick|onfocus|onload|onmouseover|onmouseup|onmousedown|onselect|onsubmit|onunload|onkeypress|onkeydown|onkeyup|onresize|xmlns)([^>]*)>#iU", "&lt;\\1\\2\\3&gt;", $str); 

What is in SVN right now is quite a change compared with the previous code. Even though, the comments in the code say so, this new regex will NOT remove the event handler and anything up to the closing >! Only the < and > are replaced by their html entities < and >.

So before I can even start optimize this stuff, we need to know what exactly we want to do. I don’t think this new regex works as it should.

#4 Updated by Woody Gilk almost 7 years ago

Nothing to add right now, but can you please link to forum post?

We should all remember to always link to external resources when we discuss them. :)

#5 Updated by Tido - almost 7 years ago

Geert replied to his post explaining the differences and proposing a solution: http://codeigniter.com/forums/viewreply/259372/

Let’s compare the regex from CI 1.5.3 (first line) with the one that is currently in SVN (second line). For the sake of readability, I added the x modifier in order to neglect whitespace in the regex. Now you can very clearly see the differences.

$str = preg_replace('#(<[^>]+.*?)(onblur|onchange|onclick|onfocus|onload|onmouseover|onmouseup|onmousedown|onselect|onsubmit|onunload|onkeypress|onkeydown|onkeyup|onresize      ) [^>]* >#iUx', "\\1>",                $str); // 1.5.3
$str = preg_replace("#<([^>]+   )(onblur|onchange|onclick|onfocus|onload|onmouseover|onmouseup|onmousedown|onselect|onsubmit|onunload|onkeypress|onkeydown|onkeyup|onresize|xmlns)([^>]*)>#iUx", "&lt;\\1\\2\\3&gt;", $str); // SVN

On the one hand the SVN regex is optimized by dropping the useless .*? part in the beginning, which is good. It also looks for xmlns. I don’t know why but that is actually the only change in functionality. Correction, I guess that should have been the only change in functionality. The main difference can be found in the replacement part, about which I talked before.

So the only thing I am going to change about my optimization is the extra check for xmlns:

$str = preg_replace('#(<[^>]+?)(?:xmlns|on(?:blur|change|click|focus|load|mouse(?:over|up|down)|select|submit|unload|key(?:press|down|up)|resize))[^>]*>#i', '\\1>', $str); 

#6 Updated by Tido - almost 7 years ago

  • Status changed from New to Closed
  • Resolution set to fixed

Tested and it seems to work fine. Committing.

Also available in: Atom PDF