Thursday, April 22, 2010

Known performance impact of using UTF-8 character set instead of ASCII within TIBCO Business Works(BW)

BW is a java-based product and internally uses Unicode.

Let us consider a simple BW process which reads a file in SJIS format (Shift JIS is a character encoding for Japanese language) and writes a file in UTF-8 format(8-bit UCS/Unicode Transformation Format, is a variable-length character encoding for Unicode). In this case, BW converts the encoding from SJIS to Unicode at Read File Activity and then converts from Unicode to UTF-8 at Write File Activity.
If the data contains only ASCII characters, there is no performance penalty.
If the data contains Unicode characters beyond the ASCII range then the non-ASCII characters will be converted into multiple bytes and this conversion does result in a performance penalty. Multiple byte encoding also increases the payload of messages to be delivered. As an example, 1 Japanese character will be converted into 3 bytes.

No comments:

Post a Comment