Subversion
Defines | Functions
svn_utf.h File Reference

UTF-8 conversion routines. More...

#include <apr_pools.h>
#include <apr_xlate.h>
#include "svn_types.h"
#include "svn_string.h"

Go to the source code of this file.

Defines

#define SVN_APR_LOCALE_CHARSET   APR_LOCALE_CHARSET
#define SVN_APR_DEFAULT_CHARSET   APR_DEFAULT_CHARSET

Functions

void svn_utf_initialize (apr_pool_t *pool)
 Initialize the UTF-8 encoding/decoding routines.
svn_error_tsvn_utf_stringbuf_to_utf8 (svn_stringbuf_t **dest, const svn_stringbuf_t *src, apr_pool_t *pool)
 Set *dest to a utf8-encoded stringbuf from native stringbuf src; allocate *dest in pool.
svn_error_tsvn_utf_string_to_utf8 (const svn_string_t **dest, const svn_string_t *src, apr_pool_t *pool)
 Set *dest to a utf8-encoded string from native string src; allocate *dest in pool.
svn_error_tsvn_utf_cstring_to_utf8 (const char **dest, const char *src, apr_pool_t *pool)
 Set *dest to a utf8-encoded C string from native C string src; allocate *dest in pool.
svn_error_tsvn_utf_cstring_to_utf8_ex2 (const char **dest, const char *src, const char *frompage, apr_pool_t *pool)
 Set *dest to a utf8 encoded C string from frompage encoded C string src; allocate *dest in pool.
svn_error_tsvn_utf_cstring_to_utf8_ex (const char **dest, const char *src, const char *frompage, const char *convset_key, apr_pool_t *pool)
 Like svn_utf_cstring_to_utf8_ex2() but with convset_key which is ignored.
svn_error_tsvn_utf_stringbuf_from_utf8 (svn_stringbuf_t **dest, const svn_stringbuf_t *src, apr_pool_t *pool)
 Set *dest to a natively-encoded stringbuf from utf8 stringbuf src; allocate *dest in pool.
svn_error_tsvn_utf_string_from_utf8 (const svn_string_t **dest, const svn_string_t *src, apr_pool_t *pool)
 Set *dest to a natively-encoded string from utf8 string src; allocate *dest in pool.
svn_error_tsvn_utf_cstring_from_utf8 (const char **dest, const char *src, apr_pool_t *pool)
 Set *dest to a natively-encoded C string from utf8 C string src; allocate *dest in pool.
svn_error_tsvn_utf_cstring_from_utf8_ex2 (const char **dest, const char *src, const char *topage, apr_pool_t *pool)
 Set *dest to a topage encoded C string from utf8 encoded C string src; allocate *dest in pool.
svn_error_tsvn_utf_cstring_from_utf8_ex (const char **dest, const char *src, const char *topage, const char *convset_key, apr_pool_t *pool)
 Like svn_utf_cstring_from_utf8_ex2() but with convset_key which is ignored.
const char * svn_utf_cstring_from_utf8_fuzzy (const char *src, apr_pool_t *pool)
 Return a fuzzily native-encoded C string from utf8 C string src, allocated in pool.
svn_error_tsvn_utf_cstring_from_utf8_stringbuf (const char **dest, const svn_stringbuf_t *src, apr_pool_t *pool)
 Set *dest to a natively-encoded C string from utf8 stringbuf src; allocate *dest in pool.
svn_error_tsvn_utf_cstring_from_utf8_string (const char **dest, const svn_string_t *src, apr_pool_t *pool)
 Set *dest to a natively-encoded C string from utf8 string src; allocate *dest in pool.

Detailed Description

UTF-8 conversion routines.

Whenever a conversion routine cannot convert to or from UTF-8, the error returned has code APR_EINVAL.

Definition in file svn_utf.h.


Function Documentation

svn_error_t* svn_utf_cstring_from_utf8_ex ( const char **  dest,
const char *  src,
const char *  topage,
const char *  convset_key,
apr_pool_t *  pool 
)

Like svn_utf_cstring_from_utf8_ex2() but with convset_key which is ignored.

Deprecated:
Provided for backward compatibility with the 1.3 API.
svn_error_t* svn_utf_cstring_from_utf8_ex2 ( const char **  dest,
const char *  src,
const char *  topage,
apr_pool_t *  pool 
)

Set *dest to a topage encoded C string from utf8 encoded C string src; allocate *dest in pool.

Since:
New in 1.4.
const char* svn_utf_cstring_from_utf8_fuzzy ( const char *  src,
apr_pool_t *  pool 
)

Return a fuzzily native-encoded C string from utf8 C string src, allocated in pool.

A fuzzy recoding leaves all 7-bit ascii characters the same, and substitutes "?\\XXX" for others, where XXX is the unsigned decimal code for that character.

This function cannot error; it is guaranteed to return something. First it will recode as described above and then attempt to convert the (new) 7-bit UTF-8 string to native encoding. If that fails, it will return the raw fuzzily recoded string, which may or may not be meaningful in the client's locale, but is (presumably) better than nothing.

### Notes:

Improvement is possible, even imminent. The original problem was that if you converted a UTF-8 string (say, a log message) into a locale that couldn't represent all the characters, you'd just get a static placeholder saying "[unconvertible log message]". Then Justin Erenkrantz pointed out how on platforms that didn't support conversion at all, "svn log" would still fail completely when it encountered unconvertible data.

Now for both cases, the caller can at least fall back on this function, which converts the message as best it can, substituting "?\\XXX" escape codes for the non-ascii characters.

Ultimately, some callers may prefer the iconv "//TRANSLIT" option, so when we can detect that at configure time, things will change. Also, this should (?) be moved to apr/apu eventually.

See http://subversion.tigris.org/issues/show_bug.cgi?id=807 for details.

svn_error_t* svn_utf_cstring_to_utf8_ex ( const char **  dest,
const char *  src,
const char *  frompage,
const char *  convset_key,
apr_pool_t *  pool 
)

Like svn_utf_cstring_to_utf8_ex2() but with convset_key which is ignored.

Deprecated:
Provided for backward compatibility with the 1.3 API.
svn_error_t* svn_utf_cstring_to_utf8_ex2 ( const char **  dest,
const char *  src,
const char *  frompage,
apr_pool_t *  pool 
)

Set *dest to a utf8 encoded C string from frompage encoded C string src; allocate *dest in pool.

Since:
New in 1.4.
void svn_utf_initialize ( apr_pool_t *  pool)

Initialize the UTF-8 encoding/decoding routines.

Allocate cached translation handles in a subpool of pool.

Note:
It is optional to call this function, but if it is used, no other svn function may be in use in other threads during the call of this function or when pool is cleared or destroyed. Initializing the UTF-8 routines will improve performance.
Since:
New in 1.1.
 All Data Structures Files Functions Variables Typedefs Enumerations Enumerator Defines