The m17n Library 1.8.4
|
Charset objects and API for them. More...
Macros | |
#define | MCHAR_INVALID_CODE |
Invalid code-point. | |
Functions | |
MSymbol | mchar_define_charset (const char *name, MPlist *plist) |
MSymbol | mchar_resolve_charset (MSymbol symbol) |
Resolve charset name. | |
int | mchar_list_charset (MSymbol **symbols) |
List symbols representing charsets. | |
int | mchar_decode (MSymbol charset_name, unsigned code) |
Decode a code-point. | |
unsigned | mchar_encode (MSymbol charset_name, int c) |
Encode a character code. | |
int | mchar_map_charset (MSymbol charset_name, void(*func)(int from, int to, void *arg), void *func_arg) |
Call a function for all the characters in a specified charset. | |
Variables | |
MSymbol | Mcharset |
Variables: Symbols representing a charset. | |
Each of the following symbols represents a predefined charset. | |
MSymbol | Mcharset_ascii |
Symbol representing the charset ASCII. | |
MSymbol | Mcharset_iso_8859_1 |
Symbol representing the charset ISO/IEC 8859/1. | |
MSymbol | Mcharset_unicode |
Symbol representing the charset Unicode. | |
MSymbol | Mcharset_m17n |
Symbol representing the largest charset. | |
MSymbol | Mcharset_binary |
Symbol representing the charset for ill-decoded characters. | |
Variables: Parameter keys for mchar_define_charset(). | |
These are the predefined symbols to use as parameter keys for the function mchar_define_charset() (which see). | |
MSymbol | Mmethod |
MSymbol | Mdimension |
MSymbol | Mmin_range |
MSymbol | Mmax_range |
MSymbol | Mmin_code |
MSymbol | Mmax_code |
MSymbol | Mascii_compatible |
MSymbol | Mfinal_byte |
MSymbol | Mrevision |
MSymbol | Mmin_char |
MSymbol | Mmapfile |
MSymbol | Mparents |
MSymbol | Msubset_offset |
MSymbol | Mdefine_coding |
MSymbol | Maliases |
Variables: Symbols representing charset methods. | |
These are the predefined symbols that can be a value of the Mmethod parameter of a charset used in an argument to the mchar_define_charset() function. A method specifies how code-points and character codes are converted. See the documentation of the mchar_define_charset() function for the details. | |
MSymbol | Moffset |
MSymbol | Mmap |
Symbol for the map type method of charset. | |
MSymbol | Munify |
Symbol for the unify type method of charset. | |
MSymbol | Msubset |
MSymbol | Msuperset |
Symbol for the superset type method of charset. | |
Charset objects and API for them.
The symbol Mcharset
.
The m17n library uses charset objects to represent a coded character sets (CCS). The m17n library supports many predefined coded character sets. Moreover, application programs can add other charsets. A character can belong to multiple charsets.
The m17n library distinguishes the following three concepts:
unsigned
is used to represent a code-point. An invalid code-point is represented by the macro MCHAR_INVALID_CODE
.Each charset object defines how characters are converted between code-points and character codes. To encode means converting code-points to character codes and to decode means converting character codes to code-points.
Any decoded M-text has a text property whose key is the predefined symbol Mcharset
. The name of Mcharset
is "charset"
.
#define MCHAR_INVALID_CODE |
Invalid code-point.
The macro MCHAR_INVALID_CODE gives the invalid code-point.
MSymbol mchar_define_charset | ( | const char * | name, |
MPlist * | plist | ||
) |
MSymbol mchar_resolve_charset | ( | MSymbol | symbol | ) |
Resolve charset name.
The mchar_resolve_charset() function returns symbol if it represents a charset. Otherwise, canonicalize symbol as to a charset name, and if the canonicalized name represents a charset, return it. Otherwise, return Mnil.
int mchar_list_charset | ( | MSymbol ** | symbols | ) |
List symbols representing charsets.
The mchar_list_charsets() function makes an array of symbols representing a charset, stores the pointer to the array in a place pointed to by symbols, and returns the length of the array.
int mchar_decode | ( | MSymbol | charset_name, |
unsigned | code | ||
) |
Decode a code-point.
The mchar_decode() function decodes code-point code in the charset represented by the symbol charset_name to get a character code.
unsigned mchar_encode | ( | MSymbol | charset_name, |
int | c | ||
) |
Encode a character code.
The mchar_encode() function encodes character code c to get a code-point in the charset represented by the symbol charset_name.
int mchar_map_charset | ( | MSymbol | charset_name, |
void(*)(int from, int to, void *arg) | func, | ||
void * | func_arg | ||
) |
Call a function for all the characters in a specified charset.
The mcharset_map_chars() function calls func for all the characters in the charset named charset_name. A call is done for a chunk of consecutive characters rather than character by character.
func receives three arguments: from, to, and arg. from and to specify the range of character codes in charset. arg is the same as func_arg.
MERROR_CHARSET
MSymbol Mcharset_ascii |
Symbol representing the charset ASCII.
The symbol Mcharset_ascii has name "ascii"
and represents the charset ISO 646, USA Version X3.4-1968 (ISO-IR-6).
MSymbol Mcharset_iso_8859_1 |
Symbol representing the charset ISO/IEC 8859/1.
The symbol Mcharset_iso_8859_1 has name "iso-8859-1"
and represents the charset ISO/IEC 8859-1:1998.
MSymbol Mcharset_unicode |
Symbol representing the charset Unicode.
The symbol Mcharset_unicode has name "unicode"
and represents the charset Unicode.
MSymbol Mcharset_m17n |
Symbol representing the largest charset.
The symbol Mcharset_m17n has name "m17n"
and represents the charset that contains all characters supported by the m17n library.
MSymbol Mcharset_binary |
Symbol representing the charset for ill-decoded characters.
The symbol Mcharset_binary has name "binary"
and represents the fake charset which the decoding functions put to an M-text as a text property when they encounter an invalid byte (sequence).
See Code Conversion for more details.
MSymbol Mmethod |
MSymbol Mdimension |
MSymbol Mmin_range |
MSymbol Mmax_range |
MSymbol Mmin_code |
MSymbol Mmax_code |
MSymbol Mascii_compatible |
MSymbol Mfinal_byte |
MSymbol Mrevision |
MSymbol Mmin_char |
MSymbol Mmapfile |
MSymbol Mparents |
MSymbol Msubset_offset |
MSymbol Mdefine_coding |
MSymbol Maliases |
MSymbol Moffset |
@brief Symbol for the offset type method of charset. The symbol #Moffset has the name <tt>"offset"</tt> and, when used as a value of @b Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by this calculation:
CHARACTER-CODE = CODE-POINT - MIN-CODE + MIN-CHAR
where, MIN-CODE is a value of @b Mmin_code parameter of the charset, and MIN-CHAR is a value of @b Mmin_char parameter.
MSymbol Mmap |
Symbol for the map type method of charset.
The symbol Mmap has the name "map"
and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up. The map must be given by Mmapfile parameter.
MSymbol Munify |
Symbol for the unify type method of charset.
The symbol Munify has the name "unify"
and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up and offsetting. The map must be given by Mmapfile parameter. For this kind of charset, a unique continuous character code space for all characters is assigned.
If the map has an entry for a code-point, the conversion is done by looking up the map. Otherwise, the conversion is done by this calculation:
CHARACTER-CODE = CODE-POINT - MIN-CODE + LOWEST-CHAR-CODE
where, MIN-CODE is a value of @b Mmin_code parameter of the charset, and LOWEST-CHAR-CODE is the lowest character code of the assigned code space.
MSymbol Msubset |
@brief Symbol for the subset type method of charset. The symbol #Msubset has the name <tt>"subset"</tt> and, when used as a value of @b Mmethod parameter of a charset, it means that the charset is a subset of a parent charset. The parent charset must be given by @b Mparents parameter. The conversion of code-points and character codes of the charset is done conceptually by this calculation:
CHARACTER-CODE = PARENT-CODE (CODE-POINT) + SUBSET-OFFSET
where, PARENT-CODE is a pseudo function that returns a character code of CODE-POINT in the parent charset, and SUBSET-OFFSET is a value given by @b Msubset_offset parameter.
MSymbol Msuperset |
Symbol for the superset type method of charset.
The symbol Msuperset has the name "superset"
and, when used as a value of Mmethod parameter of a charset, it means that the charset is a superset of parent charsets. The parent charsets must be given by Mparents parameter.
MSymbol Mcharset |