The increasing popularity of web applications has increased the threats and security
vulnerabilities. One such vulnerability is due to XSS or Cross-Site scripting. XSS is an attack
technique that forces web application to echo the malicious executable code supplied by
the attacker, which then loads in the results in the user’s web browser. The data supplied is
usually the HTML/JavaScript code that executes at the client side.
XSS vulnerabilities can be classified into three types – Firstly, DOM based which exists in
the clients web page, Secondly; on-Persistent or Reflected is when malicious input
supplied is displayed back onto the screen after returning back from the server. This is
most common vulnerability found in most of the web applications and finally the most
dangerous XSS vulnerability - Persistent or Second Order or Stored XSS wherein the
malicious data supplied is stored in the persistent storage or database.
Thus, there is a need to focus on handling these XSS vulnerabilities using frameworks or
open source solutions.
Threats of Cross-Site Scripting
Some of the most common threats by cross-site scripting are-
- Inserting a script into a web page collects the user cookies which are then sent back to the attacker. This situation would be dangerous particularly in single sign-on applications
- Insertion of a malicious link into a well known meaningful website can cause the user clicking it to be navigated to a harmful site.
- XSS can cause sensitive data to be sent back to the attacker.
Solutions for handling cross-site scripting
Some of the possible solutions to handle XSS are –
- Encoding and Decoding – Encode HTML input and alternatively decode the pre-defined tags. This ensures all the output is encoded and allows only specified tags to be included. However it is difficult to properly decode the attributes and enumerate all the required tags.
- Use Mark-Up languages such as BBCode – Create mark-up tags that can be decoded using Markup parsers. This ensures the required and valid tags insertion but enforces the developers to learn the language.
- Using XSD for validation – Create an XSD file that defines the allowable tags and HTML elements. Convert the HTML content into XML and verify the XML against the XSD. This ensures valid and allowable tag insertion and the implementation is flexible enough but an XSD needs to be created for all HTML elements and no response or error message can be given to the user.
- Use Open Source XSS solutions such as AntiSamy – These solutions provide APIs that can be called by the web applications. These APIs are simple to use and also provide user-friendly error messages to the user.
Introduction to AntiSamy
AntiSamy is a project by OWSAP (Open Web Application Security Project). It is an
enterprise web input validation and output encoding tool that provides us with a set of
API that can be invoked to filter and validate the input against XSS and to ensure the user
input supplied is in compliance with an application’s rules.
AntiSamy has been released to support Java and .NET applications. It is simple to use and
flexible enough and can be modified depending on the project needs. The tool uses
NekoHTML and Policy file for validating HTML and CSS inputs.
NekoHTML is a simple HTML scanner that parses HTML into XML using a parser. The policy
file consists of directives, common attributes, regular expressions, general tag attributes,
css and tag rules. It can be modified to suit the project requirements.
The input HTML supplied by the user is first parsed using NekoHTML and then validated
against the policy files. User-friendly error messages are returned back to the user.
There are four types of policy files that can however be modified to meet project
requirements. These are –
- antisamy-slashdot.xml - This policy file strictly allows only <b>, <u>, <i>, <a>, <blockquote> HTML tags. No CSS tags are allowed.
- antisamy-ebay.xml – The policy file strictly emphasizes on the input containing more rich content.
- antisamy-myspace.xml – This policy file restricts the user from submitting Javascript.
- antisamy-anythinggoes.xml – This policy file allows every single HTML and CSS
elements except Javascript and other phishing CSS tags.
Integrating AntiSamy with the web application is very simple and can be done by
following the steps below –
- Download antisamy.jar, nekoHtml.jar and xercesImpl.jar and place them in the class-path
- Download the required policy file and place it in the class-path
- Call the API methods for scanning / filtering the user input
AntiSamy can be integrated with a Maven application. The pom.xml entry for the same is –
<repository>
<id>wso2-maven2-repository</id>
<url>http://dist.wso2.org/maven2</url>
</repository>
<dependency>
<groupId>antisamy</groupId>
<artifactId>antisamy-bin</artifactId>
<version>1.2</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>nekohtml</groupId>
<artifactId>nekohtml</artifactId>
<version>1.9.7</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.8.0</version>
<scope>compile</scope>
</dependency>
How does AntiSamy work?
The fig below describes the workflow of AntiSamy for HTML and CSS inputs
HTML / XHTML Workflow
CSS Workflow
For instance, consider the HTML input
<body>
<p> This is a test <h1 onclick= “alert(‘malicious code’);” > for </h1>
AntiSamy </p>
</body>
The above mentioned code consists of malicious code “onclick=”alert(‘malicious code’)” “
embed in <h1> tags.
This input when filtered using AntiSamy is first cleaned using NekoHtml
The parsed input is then validated using antisamy-slashdot.xml policy file to validate and
filter the malicious code.
This input when filtered using AntiSamy returns a clean output as shown below along with
a user-friendly message.
Features
Some of the key features of AntiSamy are as listed below–
- Simple API to use
- Policy files can be customized thus provides flexibility to the desired behavior
- Support for Maven applications
- User-friendly error messages are displayed.
- Works with broken HTML and CSS
- Support for other languages like ColdFusion and .NET.
- Internationalized error messages for English, Italian, Portuguese, Russian and Chinese.
Let us now create an application that uses AntiSamy for filtering the input data for XSS. Let
us create a wrapper class AntiSamyWrapper that contains scan() method to filter the input
data. The scan() of this class internally passes the input data and the policy file to the
scan() of AntiSamy.java class. This method returns an instance of CleanResults class.
The instance of CleanResults contains the clean input and error messages while filtering
the input data.
AntiSamyWrapper is a wrapper class that interacts with the AntiSamy API for validating
and filtering the input. The class returns a singleton instance when getInstance() of the
class is invoked. The scan() of the class can be called by passing the input data. The data is
passed through the API that returns the instance of CleanResults. This class contains the
error messages and clean input.
Pre-Requisites
JDK 1.4 or above
antisamy.jar – (can be downloaded from http://code.google.com/p/owaspantisamy/downloads/list )
AntiSamy Policy Files - (can be downloaded from
NekoHTML.jar - (can be downloaded from http://sourceforge.net/projects/nekohtml/ )
XercesImpl-2.5.jar - (can be downloaded from http://www.ibiblio.org/maven/xerces/jars/)
Step-by-Step approach for implementing AntiSamy
- Download antisamy.jar, nekoHtml.jar and xercesImpl.jar and place them in the class-path.
- Download the example policy file (antisamy-slashdot-1.3.xml) and place it in the class-path. Customize the policy file if required.
- Create java classes and modify the classes as mentioned below
- Run the class files and check the output.
AntiSamyWrapper.java
//Add package statement //Import statement public final class AntiSamyWrapper { private static AntiSamyWrapper antiSamyWrapper; public AntiSamy antiSamy; public Policy policy; public CleanResults cleanResults; String policyFilePath =; String policyFileName = ; private AntiSamyWrapper { try { policy = Policy.getInstance(this.getClass(). getResourceAsStream(policyFilePath + policyFileName)); } catch(PolicyException pe) { //Add code here to handle exceptions } antiSamy= new AntiSamy(); } /* Returns the Singleton instance of the class */ public static AntiSamyWrapper getInstance() { if(antiSamyWrapper == null) { antiSamyWrapper = new AntiSamyWrapper(); } return antiSamyWrapper; } public String scan(String inputString) { try { cleanResults = antiSamy.scan(inputString,policy); // Add code here to handle errors by accessing method // getNumberOfErrors() } catch(ScanException sc) { //Add code here to handle exceptions } catch(PolicyException pe) { // Add code here to handle exceptions } return cleanResults.getCleanHTML(); } }
AntiSamyTest.java
///Add package statement //Import statement public class AntiSamyTest { public static void main(String[] args) { String inputData = “ This is a test for AntiSamy”; String cleanInput = AntiSamyWrapper.getInstance().scan(inputData); System.out.println(“ Clean Output from AntiSamy -- > ” + cleanInput); } }
Run AntiSamyTest.java class. The value for the “cleanInput” is filtered output from
AntiSamyWrapper class and the output of the program is -
Clean Output from AntiSamy -- > this is a <b> for</b>
The error message by invoking scan() of AntiSamy class is –
The body tag has been filtered for security reasons. The contents of the tag
will remain in place., The b tag contained an attribute that we could not
process. The onclick attribute has been filtered out, but the tag is still
in place. The value of the attribute was "alert('hi')".
This message can be viewed by adding the following code in scan() of AntiSamyWrapper
class.
cleanResults = antiSamy.scan(inputString,policy); // Existing Code
ArrayList errorList = cleanResults.getErrorMessages();