diff --git a/v2_0/doc/utf8cpp.html b/v2_0/doc/utf8cpp.html index c528541..069c2be 100644 --- a/v2_0/doc/utf8cpp.html +++ b/v2_0/doc/utf8cpp.html @@ -101,14 +101,14 @@

Many C++ developers miss an easy and portable way of handling Unicode encoded - strings. C++ Standard is currently Unicode agnostic, and while some work is being - done to introduce Unicode to the next incarnation called C++0x, for the moment - nothing of the sort is available. In the meantime, developers use 3rd party - libraries like ICU, OS specific capabilities, or simply roll out their own - solutions. + strings. The original C++ Standard (known as C++98 or C++03) is Unicode agnostic, + and while some work is being done to introduce Unicode to the next incarnation + called C++0x, for the moment nothing of the sort is available. In the meantime, + developers use third party libraries like ICU, OS specific capabilities, or simply + roll out their own solutions.

- In order to easily handle UTF-8 encoded Unicode strings, I have come up with a small + In order to easily handle UTF-8 encoded Unicode strings, I came up with a small generic library. For anybody used to work with STL algorithms and iterators, it should be easy and natural to use. The code is freely available for any purpose - check out the license at the beginning of the utf8.h file. If you run into @@ -129,7 +129,7 @@ Introductionary Sample

- To illustrate the use of this utf8 library, let's start with a small but complete program + To illustrate the use of the library, let's start with a small but complete program that opens a file containing UTF-8 encoded text, reads it line by line, checks each line for invalid UTF-8 byte sequences, and converts it to UTF-16 encoding and back to UTF-8:

@@ -206,6 +206,10 @@ utf16to8.

Checking if a file contains valid UTF-8 text

+

+Here is a function that checks whether the content of a file is valid UTF-8 encoded text without +reading the content into the memory: +

    
 bool valid_utf8_file(iconst char* file_name)
 {
@@ -218,8 +222,25 @@
 
     return utf8::is_valid(it, eos);
 }
+
+

+Because the function utf8::is_valid() works with input iterators, we were able +to pass an istreambuf_iterator to it and read the content of the file directly +without loading it to the memory first.

+

+Note that other functions that take input iterator arguments can be used in a similar way. For +instance, to read the content of a UTF-8 encoded text file and convert the text to UTF-16, just +do something like: +

+
+    utf8::utf8to16(it, eos, back_inserter(u16string));
 

Ensure that a string contains valid UTF-8 text

+

+If we have some text that "probably" contains UTF-8 encoded text and we want to +replace any invalid UTF-8 sequence with a replacement character, something like +the following function may be used: +

 void fix_utf8_string(std::string& str)
 {
@@ -228,6 +249,9 @@
     str = temp;
 }
 
+

The function will replace any invalid UTF-8 sequence with a Unicode replacement character. +There is an overloaded function that enables the caller to supply their own replacement character. +

Reference