Does WordPress sanitize arguments to WP_Query?

https://wordpress.stackexchange.com/questions/387658

20-05-2021
|

Question

This is a very straight-forward question, but it's important and I can't find anything definitive in the docs.

This question asks a similar question for the 's' parameter, specifically. I want to know if WordPress validates/sanitizes parameters for any of the other parameters. For example, do the terms in a tax_query get sanitized automatically?

Clarification: This is a technical / engineering question about specifically what the WP_Query class does with particular parameters. Several recent answers offer philosophical advice and general best practices regarding validation/sanitization. But that's not what this question is about. My goal is compile facts rather than opinions.

Solution

It's actually a good question. Of course user input cannot be trusted, but also sanitizing the same value twice "just in case" isn't the best solution for a problem.

The only real answer to this question can be given by looking at the code and following what goes on.

And after reading it for a few hours, here is what I can say about it:

WP Query sanitizes the values but it doesn't do it all in one place. Values are being sanitized before they are actually used. It's a very organic and on the go approach. Also, not all of the values are sanitized, but in those cases a prepared statement is used for the SQL query.

Let's get into detail

So, what happens when we do:

$query = new WP_Query($args);

The WP_Query class constructor checks if the $args is an empty array or not. And if it's not, it will run its query() method passing the $args array (which is at this point called $query array).

$this->query($query);

The query() method calls the init() method which unsets possible previous values and sets new defaults.

Then it runs wp_parse_args() on the $args array. This function does not sanitize anything, it serves like a bridge between default data and input data.

The next call is for the get_posts() method, which is in charge of retrieving the posts based on the given query variables.

The first thing that is called inside the get_posts() method is the parse_query() method, which starts by calling the fill_query_vars() method (this one makes sure that a list of default keys are set. The ones that are not, get set with an empty string or empty array depending on the case).

Then, still inside the parse_query() method, the first santization takes place.

p is checked against is_scalar() and cleaned with intval()

Also absint() is used on the following values:

page_id
year
monthnum
day
w
paged
hour
minute
second
attachment_id

Also, a preg_replace('|[^0-9]|'...) is run on m, cat, author to only allow comma separated list of positive or negative integers on these.

For other values at this point, only the trim() function is used. This is the case for:

pagename
name
title

After this, the method starts checking what type of query we are running. Is it a search? An attachment? A page? A single post? ...

If a pagename is set then we call (without sanitizing the value) get_page_by_path($qv['pagename']). But checking that function source we can see that the value is sanitized with esc_sql() before it's used for a database request.

After that, we can see that when the keys post_type or post_status are used, they are both sanitized with sanitize_key() (Only lowercase alphanumeric characters, dashes, and underscores are allowed).

For the taxonomy related parameters, the parse_tax_query() method is called.

category__and, category__in, category__not_in, tag_id, tag__in, tag__not_in, tag__and are sanitized with absint()

tag_slug__in and tag_slug__and are sanitized with sanitize_title_for_query()

At this point the parse_query() method is over, but we still are inside the get_posts() method.

posts_per_page is sanitized.

title is being used unsanitized but with a prepared statement. You may find this question interesting: Are prepared statements enough to prevent SQL injection?

Then we have post__in and post__not_in that are being sanitized with absint().

And if you keep reading the code and pay attention, you will see that all the keys are actually being sanitized before they get to touch a SQL statement or a prepared statement is used instead.

So, to answer your original question:

Does WordPress sanitize arguments to WP_Query?

It does sanitize most of them but not all. For example, pagename, name and title are only "cleaned" with the trim() function (does not return SQL safe values!). But for the values that are not sanitized, a prepared statement is used to perform the database request.

Should you trust this?

Well, in this specific case I would prefer to go for the possibly redundant just presanitize everything approach before you throw it into the query.

Me too, as an engineering student, I would love a solid yes or no answer. But note that the WordPress codebase has evolved in a natural way so it's just like nature: messy. It does not mean it's bad. But messy means that there could be an unseen edge-case where somebody could potentially sneak in with a bomb. And you can prevent that by just doubling your guards!

OTHER TIPS

I want to know if I have to sanitize user input for any of the other parameters.

You should never trust user input, and therefore always sanitize and/or validate it, regardless of whether it is already done in core. Your code, your responsibility.

As stated in the Theme Handbook, for example:

Don’t trust any data. Don’t trust user input, third-party APIs, or data in your database without verification. Protection of your WordPress themes begins with ensuring the data entering and leaving your theme is as intended. Always make sure to validate on input and sanitize (escape) data before use or on output.

You could also check the code of the WP_Query class for yourself. A quick search for sanitization-related code, e.g. WordPress's built-in sanitization functions, reveals that at least some sanitization takes place. But if I were you, I'd better be safe than sorry, and write and test my own sanitization and/or validation logic.

I have a simple rule that I learned a long time ago when studying for fullstack development.

If it's data/code that you didn't write yourself, always always always sanitize and validate it.

Never trust data from anywhere, really, not even from established companies like facebook, google, github.

Even if you think that wordpress will sanitize the information you should always do your own sanitation and validation.

Never go like, it's just a simple newsletter registration form with only a email field, whats the worst that can happen? STOP!

When you hear about data leaks that exposed hundreds or thousands (in the best case) of user data, it sometimes happens because some developer just said, what's the worst that can happen, or even worse, was not aware of the (basic) security measure he should implement to prevent such a leak.

Always sanitize and validate!

EDIT

Ok, so after reading your question again I think I know what you mean now.

For example, the tax_query, after looking in the wordpress github repo I found that everything related to tax_query is being handled by class WP_Tax_Query, among other things in handles the sanitization and validation of the data that you passed into tax_query.

Now, if class WP_Tax_Query handles tax_query im sure that if I would keep looking in the wordpress repo I would have found more classes or methods that handle sanitization and validation of other properties.

I think your best bet it to go to WP_Query in the wordpress repo and doing a deep dive into all the properties that you use in order to see how WP_Query handles each one of them

Licensed under: CC-BY-SA with attribution

Not affiliated with wordpress.stackexchange