Character Encoding problem with IBM's JSF and Ajax

Filed under: Development, WebSphere Portal, Ajax — lars @ 04:26:23 am

I came across an irritating bug today - one to add to my list of gripes about IBM's JSF framework. All of our application's form submissions that were performed via hx:ajaxRefreshSubmit and conatained a "£" (British currency symbol) had the "£", as well as the character after it, dropped from the value that was bound to the backing bean.

I had a look at the headers with the Firefox extension LiveHttpHeaders , and I found that the value "test£test" was being passed by the browser as:

viewPC_7_R78TBGL20OS9D02HPDKDFR1006_%3AdetailForm%3AcontentId=test%A3test

But when retrieving the value from the JSF backing-bean, or calling renderRequest.getparameter(), I got "testest" rather than "test£test".

The resulting journey taught me a bit (too much!) about browser behaviour and character encoding which I'll detail below, although those of you reading that just want to get straight to the point may just want to read the Summary (at the bottom).

I'm also told that this problem will be fixed in the upcoming 7.0.0.3 release from IBM (of RAD7 I assume!), although other Ajax frameworks may suffer from this too.

The Web-App problem

My initial investigations pointed me towards a related problem that afflicts Web Applications running on TomCat and WebSphere, and probably others (such as PHP apps). Take the following JSP:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<%@page language="java" contentType="text/html; charset=UTF-8" pageEncoding="UTF-8"%>
<html><head><title>default</title>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head><body>
<form method="post">
<input type="text" name="testInput" value="<%=request.getParameter("testInput")%>">
<input type="submit">
</form>
</body></html>

This simply submits a form value back to itself, but the key thing here is the character encoding is set to UTF-8 in the @page directive. Having either the contentType or pageEncoding attributes set to UTF-8 causes the servlet container to send the following header in ant HTTP response for this page:

...
Content-Type: text/html; charset=UTF-8
...

This tells the browser that the page is encoded as UTF-8, and when it submits the form back to the server, it encodes characters with the same encoding. Thus, a POST request from a form-submission with the value "test£test" would contain the following:

testInput=test%C2%A3test

This 2-byte encoding for "£" to %C2%A3 is apparently how UTF-8 is supposed to work. But the server seems to be unaware that this POST request is coming from a JSP that was encoded using UTF-8, and decodes this response as 2 single-byte characters (probably assuming a default of ISO-8859-1). This results in the "£" character being prefixed by an accented "A" character like this: "test£test".

Accept-charset workaround

One workaround to this is the "accept-charset" attribute of the <form> tag. You can use this to force the encoding of <form> submissions to a certain character set, for example by adding accept-charset="ISO-8859-1". However, IE6 seems to ignore this (it works in FireFox 2.0 though). Also, I notice that JSF's <h:form> tag doesn't allow you to do this properly. It has an attribute called "acceptcharset" that renders the same as HTML. The missing hyphen would appear to be incorrect (according to this http://msdn2.microsoft.com/en-us/library/ms533061.aspx), and this stops the fix from working even in FireFox 2.0.

In FireFox 2.0, where this fix does work, you can observe that the headers change from this:

testInput=test%C2%A3test

to this:

testInput=test%A3test

Allowing the server to parse this value correctly. More info on FireFox and accept-encoding can be found in this bug-listing: https://bugzilla.mozilla.org/show_bug.cgi?id=241540
It seems part of the problem is current-day browser's reluctance to append character-encoding information to the Content-Type header when they submit a POST request.

Content-Type workaround

The other workaround to this involves simply not using UTF-8 character encoding in your page, for example:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<%@page language="java" contentType="text/html; charset=ISO-8859-1" pageEncoding="ISO-8859-1"%>
<html><head><title>default</title>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
</head><body>
<form method="post">
<input type="text" name="testInput" value="<%=request.getParameter("testInput")%>">
<input type="submit">
</form>
</body></html>

This will result in the server sending a different Content-Type header in it's response:

Content-Type: text/html; charset=ISO-8859-1

The browser will then use this character-encoding for it's forum-submissions, and the problem again goes away.

The WebSphere Portal (ajax) problem

The problem in portal is a little different. Portal defaults to UTF-8 encoding for all it's pages that are served in English, as per this table:

Japanese ja   Shift-JIS
Simplified Chinese zh GBK
Traditional Chinese zh_TW, HTML Big5
Traditional Chinese zh_TW, WML UTF-8
Korean ko KSC5601
All others UTF-8

(from http://www-306.ibm.com/software/globalization/topics/wsportal/portlet.jsp)

Presumably because portal is aware of this default configuration, it *does* parse the browser's form submissions correctly, as it correctly assumes all POSTs will be encoded by the browser in UTF-8 - as per the Content-Type header it uses to serve up pages.

The problem arises when using Ajax as supplied by IBM's JSF implementation (and possibly other Ajax frameworks).

In a normal form-submission, the browser looks at the page's Content-Type header and decides how to send any POST request back. However, with an Ajax request, the form is not actually submitted as far as the browser is concerned. Some Javascript will have to examine the DOM to retrieve all form inputs and then build a String that contains all the POST data for sending, for example:

xmlHttpRequest.setRequestHeader("Content-Type", "application/x-www-form-urlencoded");
xmlHttpRequest.send("testInput=test%C2test");

The problem in the case of IBM JSF's hx:ajaxRefreshSubmit component, is that it's not smart enough to look at the server's Content-Type header to deduce the correct encoding to be used. It Always encodes it's characters with the default ISO encoding.

When Portal tries to decode this as UTF-8, it expects a 2-byte code and therefore fails to decode this character properly, leaving you the "testest" result, where the pound/£ as well as the character after it are skipped.

The only fix I have found for this is to use JavaScript to modify the affected characters (in our application - a business app in the UK - only "£" seems to be affected). So to force the browser to send the propert 2-byte UTF-8 encoded value in the submit, we replace instances of "£" with "£" on form-submission.

You can do this by calling the following code in the onClick event of your submit input:

function fixPoundEncoding() {
fixPoundEncodingForInputs(this.form.getElementsByTagName("INPUT"), "text", false);
fixPoundEncodingForInputs(this.form.getElementsByTagName("TEXTAREA"), false, true);
}

function fixPoundEncodingForInputs(oInputs, sInputType, bIgnoreType) {
for(var i=0; i < oInputs.length; i++) {
if((bIgnoreType || oInputs[i].type == sInputType) && oInputs[i].id.indexOf("viewPC_") == 0) {
//replace instances of pound (%A3 = 163) with %A3%C2
oInputs[i].value = oInputs[i].value.replace(eval("/" + String.fromCharCode(163) + "/g"), String.fromCharCode(194) + String.fromCharCode(163));
}
}
}

This list of characters with their hex and decimal values for URL encoding may help: http://lab.artlung.com/urlencode/

Summary

After that not-so-thrilling ride through the mess that is character encoding, here's a quick summary of the above fixes/workarounds:

  • If your web application is using UTF-8 encoding and you find your "£" characters are being sent by the browser as £, you can fix this by either:
    1. changing from UTF-8 to ISO-8859-1 in the @page directive'spageEncoding and/or contentType attributes (changing the Content-Typehttp response header). Any Content-Type Meta tags should be changed too.
    2. For FireFox2.0 (and possibly others, but not iE6) adding the attribute accept-charset="ISO-8859-1" to your <form> tag.Both of these should coax the browser into encoding characters as ISO rather than UTF-8, and should resolve your problem.
  • If your Portal is skipping "£" characters, particular with Ajax POSTs, it's possible that your browser is sending ISO encoded characters that the server is trying to decode as UTF-8. One possible (but not so elegant) fix is replacing occurences of "£" with "£" before submission. Note that some browsers don't like you having these non-standard characters in your javascript code, so I recommend you refer to them as String.fromCharCode(194) and String.fromCharCode(163) respectively.

Comments

  • HP
    We have a similar problem for our IBM WS portal/JSF based application. Actually we have a file download utility on a portlet page. On click it create a PDF file using some parameters passed from that page and shows a popup to download the file. But, in non-portal environment same code shows a correct "GB Pound character" but for portal it doesnt show it. Has it to do with the charset selected while creating the PDF document? Or it has to do with the page's charset which allows us to download that page? Please help.

    Thanks - HP

    Comment by HP [Visitor] — 06/25/08 @ 09:47

  • lars
    Hi HP,
    The problem I'm describing above was specifically when submitting form details via Ajax. If this is not how you are submitting your form, then your problem is slightly different. I could still related to WebSphere Portal's default encoding of UTF-8 rather than ISO for a web application though. I'd start by testing (with debug output?) whether the values are correct when you first retrieve them from the request, before they get to your PDF - to make sure the PDF generation library isn't causing your issue.

    Comment by lars [Member] — 06/25/08 @ 16:38

  • etahan
    i encountered the same problem with Ibm JSF extension.what i did to solve this problem ,changed the pageEncoding to UTF-8,
    with tomcat and jboss, i added URIEncoding=UTF-8 parameter server.xml to the connectors
    and I created a filter
    req.setCharacterEncoding("UTF-8");
    res.setCharacterEncoding("UTF-8");
    setting character encoding to UTF-8 for request and response,
    and for java to read jsp files in UTF-8 format correctly i added a startup parameter
    to server java option
    -Dfile.encoding=UTF-8
    after these configurations(for me i had to do all of these mentioned above) everything worked well.

    Comment by etahan [Visitor] · http://www.etahan.com — 03/15/10 @ 06:49

Leave a comment

Allowed XHTML tags: <p, ul, ol, li, dl, dt, dd, address, blockquote, ins, del, span, bdo, br, em, strong, dfn, code, samp, kdb, var, cite, abbr, acronym, q, sub, sup, tt, i, b, big, small>


Options:
(Line breaks become <br />)
(Set cookies for name, email & url)




powered by  b2evolution