1. JSP pages must include the header:
{{{ <%@ page
- contentType="text/html; charset=UTF-8"
%> }}}
2. For translation of inputs coming back from the browser there must be a method that translates from the browser's ISO-8859-1 to UTF-8. ISO-8859-1 is the default character encoding for servers and browsers according to the HTTP specification section 3.4.1.
{{{ /**
- Convert ISO-8859-1 format string (which is the default sent by IE
- to the UTF-8 format that the database is in.
- /
- public String toUTF8(String isoString) {
- String utf8String = null;
if (null != isoString && !isoString.equals("")) {
- try {
- byte[] stringBytesISO = isoString.getBytes("ISO-8859-1"); utf8String = new String(stringBytesISO, "UTF-8");
catch(UnsupportedEncodingException e) {
throw new RuntimeException(e);
- utf8String = isoString;
- try {
- String utf8String = null;
I have found that these three steps are all that is necessary to make your site accept any language that UTF-8 can work with. I extend my thanks to those of you on the Tomcat users list who helped me find these little gems.
(from the tomcat-user mailing list)
Note This method is not useful because it doesn't work with non-ASCII character. "stringBytesISO" is an ISO-8859-1 byte stream. We can't use it as an UTF-8 byte stream if it contains non-ASCII character.
Alternative solution
The solution suggested above works, but from the architecture perspective the correct way is to add a filter to the Tomcat that will do necessary correction for the application deployed without any additional changes to the rest of the code.
1. Make sure JSP header is set as suggested:
<%@ page contentType="text/html; charset=UTF-8"%>
2. Example of filter:
{{{import java.io.*; import java.util.*; import javax.servlet.*; import javax.servlet.http.*;
public class CharsetFilter implements Filter {
- private String encoding;
public void init(FilterConfig config) throws ServletException {
- encoding = config.getInitParameter("requestEncoding"); if( encoding==null ) encoding="UTF-8";
public void doFilter(ServletRequest request, ServletResponse response, FilterChain next) throws IOException, ServletException {
- // Respect the client-specified character encoding // (see HTTP specification section 3.4.1) if(null == request.getCharacterEncoding())
- request.setCharacterEncoding(encoding);
} }}}
Corresponding portion of web.xml configuration will look like:
{{{ <!--CharsetFilter start-->
<filter>
<filter-name>Charset Filter</filter-name> <filter-class>CharsetFilter</filter-class>
<init-param>
<param-name>requestEncoding</param-name> <param-value>UTF-8</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>Charset Filter</filter-name> <url-pattern>/*</url-pattern>
</filter-mapping>
<!--CharsetFilter end-->}}}
The suggested solution originates from Sergey Astakhov (all texts are in russian) (sergeya@comita.spb.ru)
Important note: Note that this filter should be as far towards the front of your filter chain as possible. If some other code calls request.getParameter (or a similar method) before this filter is invoked, then the encoding will not be set properly, and your parameters will still be decoded improperly.
- TIP -
Update the file $CATALINA_HOME/conf/server.xml for UTF-8 support by connectors. Example:
{{{<Connector port="8080"
URIEncoding="UTF-8"/>}}}
or
{{{<Connector port="8080"
useBodyEncodingForURI="true"/>}}}
URIEncoding specifies the character encoding used to decode the URI.
useBodyEncodingForURI indicates whether to use the encoding specified in contentType (or explicitly set using Request.setCharacterEncoding() method) to decode the URI query parameters. The default value is set to "false".
Note that this changes the behavior of reading GET parameters from the request URI and will not affect POST parameters at all.