Differences between revisions 2 and 3
Revision 2 as of 2005-11-27 11:34:48
Size: 3116
Revision 3 as of 2009-09-20 23:45:57
Size: 3116
Editor: localhost
Comment: converted to 1.6 markup
No differences found!

Configuring the logchar type

One of the most difficult parts is support of internationalized text strings. Internally, the library can represent text strings either by using UTF8 encoding or by using the wchar_t type.

UTF8 has the advantage that it is ASCII transparent, i.e each character of the 7 bit ASCII charset is encoded identical in UTF8. Also, the standard string functions of the C library can usually still be used, but its important to understand that they work on byte semantic. This is not a problem for pure 7 bit ASCII strings, since they are identical between ASCII and UTF8, and each byte represents one character. But, as soon as characters need to be encoded which are not part of 7 bit ASCII, like german umlauts or asian characters, each character will be encoded with up to four bytes. A word which consists of 5 characters can therefore need between 5 and 20 bytes, depending on which characters the word contains. This variable size makes handling of UTF8 strings more complex than ASCII strings, because a particular index within the byte array does not necessarily correspond to a character boundary.

wchar_t, on the other hand, uses more than 8 bit for each character. The advantage is that each wchar_t corresponds to exactly one character. The disadvantage is that a string usually takes more place than in UTF8 encoding, because each character (also the 7 bit ASCII characters) need a whole wchar_t. Furthermore, the actual width of wchar_t depends on the compiler which is used, the GNU C compiler for example allocates 32 bits for each wchar_t. A word which consists of 5 characters would always need 20 bytes then, even if it contains only 7 bit ASCII characters.

cfstring ... (PENDING SECTION)

The actual representation (UTF8, wchar_t or cfstring) is encapsulated by the logchar typedef for a single character and the LogString typedef for strings.

The interface to the log4cxx API (i.e. to functions like Logger::info()) usually consists of char* or std::string& parameters, and if the underlying system supports wchar_t, additonal methods are enabled which take wchar_t* and std::wstring& as parameters.

To properly configure the log4cxx library, the build system does the following:

  • The GNU autotools based build system checks if the system supports wchar_t. If so, the wchar_t API methods are enabled. This is done by defining the LOG4CXX_HAS_WCHAR_T macro.
  • The ant based build system assumes wchar_t as being supported.
  • The default for logchar is wchar_t on MAC and Windows, UTF8 on POSIX systems. The user can change this default during the build with -Dlogchar=... for the ant build system or with --with-logchar=... for the GNU autotools based build system.

Depending on which logchar type was choosen, one of the following defines is set to 1 in include/log4cxx/log4cxx.h (the others are always set to 0 by the build system):

  • LOGCHAR_IS_UTF8 if UTF8 encoding was choosen

  • LOGCHAR_IS_WCHAR if wchar_t encoding was choosen

  • LOGCHAR_IS_CFSTRING if cfstring encoding was choosen

logcharConfig (last edited 2009-09-20 23:45:57 by localhost)