We use cookies and other tracking technologies to improve your browsing experience on our site, analyze site traffic, and understand where our audience is coming from. To find out more, please read our privacy policy.

By choosing 'I Accept', you consent to our use of cookies and other tracking technologies.

We use cookies and other tracking technologies to improve your browsing experience on our site, analyze site traffic, and understand where our audience is coming from. To find out more, please read our privacy policy.

By choosing 'I Accept', you consent to our use of cookies and other tracking technologies. Less

We use cookies and other tracking technologies... More

Login or register
to publish this job!

Login or register
to save this job!

Login or register
to save interesting jobs!

Login or register
to get access to all your job applications!

Login or register to start contributing with an article!

Login or register
to see more jobs from this company!

Login or register
to boost this post!

Show some love to the author of this blog by giving their post some rocket fuel 🚀.

Login or register to search for your ideal job!

Login or register to start working on this issue!

Login or register
to save articles!

Login to see the application

Engineers who find a new job through JavaScript Works average a 15% increase in salary 🚀

You will be redirected back to this page right after signin

Blog hero image

Getting Started With HTML Sanitizer API

Blessing Krofegha 21 April, 2022 | 4 min read

Introduction

The security of any internet application is critical; safely rendering data as an HTML document in an application is even more challenging; if adequate security precautions are not taken, an application could be hijacked by malicious code attackers. The HTML Sanitizer API addresses this issue. The HTML Sanitizer API is explained in this article and how to use it in web applications.

What is HTML Sanitization

HTML Sanitization is the process of checking an HTML document for safety. It entails examining a current HTML document to create a new HTML document from it, with only the elements regarded as safe and undangerous. This can protect a web application from cross-site scripting (XSS) attacks by allowing basic HTML tags to be inserted into a webpage while disallowing more advanced tags or HTML attributes Onclick attributes that can be used by attackers.

How to Use the HTML Sanitization API

To use this API, you will need to instantiate a new object from the Sanitizer class, which we can use to sanitize strings of HTML so we can safely insert it into the DOM during instantiation. It can pass an optional object parameter to the constructor to configure the Sanitizer instance. By default, the Sanitizer constructor removes out XSS-relevant inputs, including script tags, exposing an application to malicious codes. Passing the configuration parameter is only necessary to handle application-specific use cases. To instantiate a new Sanitizer object, we use the code below;

// Instantiating the object with the default configurations
let sanitizer = new Sanitizer();

// Instantiating the sanitizer object configuration
const config = {
 allowElements: ["em", "b", "p"],
 blockElements: ["i", "span"],
 dropElements: ["h6"],
 // allow styles only on divs
 allowAttributes: { style: ["div"] }, // to allow styles on all elements, {"style": ["*"]}
 // drop the id attribute on span
 dropAttributes: { id: ["span"] }, // to drop the id attribute everywhere {"id": ["*"]}
 allowCustomElements: true,
 allowComments: true,
};
const configured_sanitizer = new Sanitizer(config);

The parameters supplied into the config object in the preceding sample code are described below.

allowElements The allowElements option is an array of strings with elements that the sanitizer should retain in the input. blockElements The blockElements option is an array of strings with elements where the sanitizer should remove the elements from the input but retain their children. dropElements The dropElements option is an array of strings with elements that the sanitizer should remove from the input, including its children. allowAttributes The allowAttributes option is an attribute match list, which determines whether an attribute (on a given element) should be allowed. dropAttributes The dropAttributes option is an attribute match list, which determines whether it should drop an attribute (on a given element). allowCustomElements They allow custom elements option controls whether or not custom elements are taken into account, and dropping them is the default. We will still verify custom elements against all other built-in or specified tests if this option is true. allowComments The allowComments option determines whether HTML comments are allowed.

The API exposes three core methods that developers can use to check for the safety of an HTML string they are;

Element.setHTML( )

setHTML(input, sanitizer) is the syntax for the setHTML method, where input is the string of HTML to be sanitized and sanitizer is an instance of the Sanitizer Class.

The setHTML method is part of the Element interface and is used to parse and sanitize an HTML string. The parsing step removes any HTML elements that are invalid in the context of the element from the input parameter. In contrast, the sanitization process removes any additional dangerous or undesired elements and attributes. Instead of using the Element.innerHTML method to inject an untrusted string of HTML into an element, use the Element.setHTML method.

const unsanitized_html_string = "hello <script>alert(123)</script> world"; // Unsanitized string of HTML

const sanitizer = new Sanitizer(); // Default sanitizer;

// Get the Element with id "target" and set it with the sanitized string.
const target = document.getElementById("target");
target.setHTML(unsanitized_html_string, sanitizer);

console.log(target.innerHTML);
// "hello world"

Sanitizer.sanitizeFor( ) sanitizeFor(element, input) is the syntax for the sanitizeFor method. The element parameter is a string indicating the element that the input will be inserted into, for example, "div," "p," "section," "article." The input parameter is the string of HTML to be sanitized.

The sanitizeFor method is part of the Sanitizer interface and accepts the destination tag name of an HTML element as the first parameter and the string to be sanitized as the second parameter. The returned value is the HTML element object of that type supplied as a parameter to the function that contains the sanitized subtree as its child. For example, if "div" was passed as an argument, the return value will be an HTMLDivElement. This method is used to sanitize an untrusted HTML string that is available but that the developer wants to insert into the DOM later.

const unsanitized_html_string = "hello <script>alert(123)</script> world"; // Unsanitized string of HTML
const sanitizer = new Sanitizer(); // Default sanitizer;

// sanitizeFor used to sanitize the string
let sanitizedDiv = sanitizer.sanitizeFor("div", unsanitized_html_string);

//We can verify the returned element type, and view sanitized HTML in string form:
console.log( (sanitizedDiv instanceof HTMLDivElement) );
// true
console.log(sanitizedDiv.innerHTML)
// "hello world"

// At a later time ...
// Get the element to update. This must be a div to match our sanitizeFor() context.
// Set its content to be the children of our sanitized element. 
document.querySelector("div#target").replaceChildren(sanitizedDiv.children);

Sanitizer.sanitize( ) The sanitize method is part of the Sanitizer interface; it sanitizes a tree of DOM nodes and removes every unwanted element and attribute. This method is to be used when the data to be sanitized already available as nodes in the DOM. The syntax for this method is sanitized (input). The input parameter is a DocumentFragment or Document.

The sanitize method is used below to sanitize the content of an iframe with id myFrame

const sanitizer = new Sanitizer(); 

const frame_element = document.getElementById("myFrame")
const unsanitized_frame_tree = frame_element.contentWindow.document;

// Sanitize the document tree and update the frame.
const sanitized_frame_tree = sanitizer.sanitize(unsanitized_frame_tree);
frame_element.replaceChildren(sanitized_frame_tree);

Conclusion

The Sanitizer API aims to reduce the level of vulnerability a web application can have by cleaning untrusted HTML elements before they are injected into the DOM. It should be utilized by developers to boost the security of web apps as the number of applications on the internet grows. As of writing, this technology is still considered experimental and should not be used in a production context.

References

Author's avatar
Blessing Krofegha
Blessing Krofegha is a Software Engineer Based in Lagos Nigeria, with a burning desire to contribute to making the web awesome for all, by writing and building solutions.

Related Issues

open-editions / corpus-joyce-ulysses-tei
open-editions / corpus-joyce-ulysses-tei
  • Started
  • 0
  • 16
  • Intermediate
  • HTML
open-editions / corpus-joyce-ulysses-tei
open-editions / corpus-joyce-ulysses-tei
  • Started
  • 0
  • 5
  • Intermediate
  • HTML
open-editions / corpus-joyce-ulysses-tei
open-editions / corpus-joyce-ulysses-tei
  • Started
  • 0
  • 5
  • Intermediate
  • HTML
open-editions / corpus-joyce-ulysses-tei
open-editions / corpus-joyce-ulysses-tei
  • Started
  • 0
  • 7
  • Intermediate
  • HTML

Get hired!

Sign up now and apply for roles at companies that interest you.

Engineers who find a new job through JavaScript Works average a 15% increase in salary.

Start with GitHubStart with Stack OverflowStart with Email