Question

I have a default setup of Tomcat 7 and everything java-related configured to use utf-8.

This does not work (utf-8 characters are mangled):

<%@ page language="java" pageEncoding="utf-8" contentType="text/html; charset=utf-8"%>
<%@ page import="java.net.*" %>
<%@ page import="java.io.*" %>
<%
    URL target = new URL("http://en.wikipedia.org/wiki/Main_Page");
    Reader input = new BufferedReader(new InputStreamReader(target.openStream()));
    StringWriter buffer = new StringWriter();
    char[] chrs = new char[1024 * 4];
    int n = 0;
    while (-1 != (n = input.read(chrs)))
    {
        buffer.write(chrs, 0, n);
    }
    StringReader reader = new StringReader(buffer.toString());
    n = 0;
    while (-1 != (n = reader.read(chrs)))
    {
        out.write(chrs, 0, n);
    } 
%>

This does, but logs IllegalStateExceptions:

<%@ page language="java" pageEncoding="utf-8" contentType="text/html; charset=utf-8"%>
<%@ page import="java.net.*" %>
<%@ page import="java.io.*" %>
<%
    URL target = new URL("http://en.wikipedia.org/wiki/Main_Page");
    Reader input = new BufferedReader(new InputStreamReader(target.openStream()));
    StringWriter buffer = new StringWriter();
    char[] chrs = new char[1024 * 4];
    int n = 0;
    while (-1 != (n = input.read(chrs)))
    {
        buffer.write(chrs, 0, n);
    }
    StringReader reader = new StringReader(buffer.toString());
    OutputStreamWriter output = new OutputStreamWriter(response.getOutputStream());
    n = 0;
    while (-1 != (n = reader.read(chrs)))
    {
        output.write(chrs, 0, n);
    }
%>

I've been searching but found no answers. Is this a bug in Tomcat, or is there something I'm missing?

Was it helpful?

Solution

When you construct InputStreamReader without specifying a charset as 2nd argument, then the platform default encoding will be used, which is often ISO-8859-1. You need to specify the same charset as specified in the response header of the target URL, which is UTF-8.

input = new BufferedReader(new InputStreamReader(target.openStream(), "UTF-8"));

The IllegalStateException is caused because you're doing this in a JSP instead of a Servlet. The JSP internally uses response.getWriter(), but you're calling response.getOutputStream() in a JSP scriptlet. This cannot be done simultaneously as explained in their javadocs. Also, the double loop is far from efficient. Just write immediately to out (which is response.getWriter()) in the first loop instead of to some buffer.

Regardless, this is a terrible way of proxying. Rather use a Servlet or grab JSTL <c:import> instead.

<c:import url="http://en.wikipedia.org/wiki/Main_Page" /> 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top