Question

How do I set the character encoding in Apache HTTP Components?

I do something like this:

    Form form = Form.form();
    form = form.add("somekey", "somevalue");
    Request request = Request.Post("http://somehost/some-form")
                             .request.bodyForm(form.build());

"somekey" and "somevalue" are unicode strings because all java string are unicode. http components converts them to latin-1 when I tested. I want it to convert to something else (e.g., utf-8).

Was it helpful?

Solution

Going by what you've shown in your example, you seem to be using the fluent API.

Looking into the javadocs there is a version of request.bodyForm() that accepts a charset:

    import org.apache.http.Consts;
    ...
    request = request.bodyForm(form.build(), Consts.UTF_8);

According to the source, the charset defaults to Consts.ISO_8859_1 (a.k.a. Latin-1).

Alternatives

  1. If that doesn't work, consider:

    import org.apache.http.Consts;
    ...
    request.elementCharset(Consts.UTF_8);
    
  2. As a last resort, it should be possible to set the content charset. Looking at the source for elementCharset(), you could try the following:

    import org.apache.http.Consts;
    import org.apache.http.params.CoreProtocolPNames;
    ...
    request.config(CoreProtocolPNames.HTTP_CONTENT_CHARSET, Consts.UTF_8);
    
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top