Double-Byte Characters problem still occurs

NakoRuru1982 · March 6, 2009, 6:42am

Hi Openfire team,

This problem has been existing since openfire 3.5.2. But I googled many articles, saying that openfire team has fixed this bug.

I tested openfire under many kinds of evironment. Strange that spark works well with openfire, however, socket communication between xiff and openfire may cause this bug when meeting long messages.

I read the output printed on the screen, finding that, the output of xmpp message which comes from spark displayed as a whole, while the output of xmpp message which comes from xiff client displayed separately, that is, when meeting long message, dobule-byte characters may be split into 2 parts. But I don’t think it’s a bug of xiff/actionscript but the feature of socket.

When I read XMLLightweightParser#read(ByteBuffer) in detail, I know openfire team had checked last invalid character, and rewinded one byte. But **Neithter you keep the record of the first byte which is split from a double-byte character, nor the parameter is the same at any 2 times XMLLightweightParser#read(ByteBuffer) is called. This causes the first byte of a double-byte character lost, and causes the second byte of a double-byte generates 2 invalid utf-8 character ‘0xFFFD’. **This is my opinion.

Thanks,

Nako Ruru

LG1 · March 6, 2009, 11:07am

Hi Nako,

JM-1371 is still open. Your bold text describes the problem very good. So I did attach a modified parser to JIRA which is quite long and complex. It’s not in SVN, so you need to download it there if you want to compile Openfire with it.

LG

NakoRuru1982 · March 6, 2009, 1:09pm

The code attched to JIRA doesn’t work. when xiff client sends long messages, it will be disconected from server. I didn’t debug in detail. I guess an error code is received from server.

LG1 · March 6, 2009, 1:55pm

Hi,

can you provide a test case for this problem? As you may have noticed I did attach also a test case which sends 1, 2 and 3 byte UTF-8 characters in chunks of 1, 2, 3, …, 10 bytes and they are parsed properly.

LG

NakoRuru1982 · March 6, 2009, 2:04pm

Hi LG,

I have solved the problem with the code provided by Tuolin Chen.

Thanks,

Nako Ruru.