Another update of documentation

git-svn-id: http://svn.code.sf.net/p/utfcpp/code@93 a809a056-fc17-0410-9590-b4f493f8b08e
2009-07-07 00:46:34 +00:00 · 2009-07-07 00:46:34 +00:00 · 054defb568
commit 054defb568
parent 9d935b3c69
1 changed files with 31 additions and 7 deletions
--- a/v2_0/doc/utf8cpp.html
+++ b/v2_0/doc/utf8cpp.html
@ -101,14 +101,14 @@
    </h2>
    <p>
      Many C++ developers miss an easy and portable way of handling Unicode encoded
-      strings. C++ Standard is currently Unicode agnostic, and while some work is being
-      done to introduce Unicode to the next incarnation called C++0x, for the moment
-      nothing of the sort is available. In the meantime, developers use 3rd party
-      libraries like ICU, OS specific capabilities, or simply roll out their own
-      solutions.
+      strings. The original C++ Standard (known as C++98 or C++03) is Unicode agnostic,
+      and while some work is being done to introduce Unicode to the next incarnation
+      called C++0x, for the moment nothing of the sort is available. In the meantime,
+      developers use third party libraries like ICU, OS specific capabilities, or simply
+      roll out their own solutions.
    </p>
    <p>
-      In order to easily handle UTF-8 encoded Unicode strings, I have come up with a small
+      In order to easily handle UTF-8 encoded Unicode strings, I came up with a small
      generic library. For anybody used to work with STL algorithms and iterators, it should be
      easy and natural to use. The code is freely available for any purpose - check out
      the license at the beginning of the utf8.h file. If you run into
@ -129,7 +129,7 @@
      Introductionary Sample
    </h3>
    <p>
-      To illustrate the use of this utf8 library, let's start with a small but complete program 
+      To illustrate the use of the library, let's start with a small but complete program 
      that opens a file containing UTF-8 encoded text, reads it line by line, checks each line
      for invalid UTF-8 byte sequences, and converts it to UTF-16 encoding and back to UTF-8:
    </p>
@ -206,6 +206,10 @@
      <code>utf16to8</code>.
    </p>
    <h3 id="validfile">Checking if a file contains valid UTF-8 text</h3>
+<p>
+Here is a function that checks whether the content of a file is valid UTF-8 encoded text without
+reading the content into the memory:
+</p>
 <pre>    
 <span class="keyword">bool</span> valid_utf8_file(i<span class="keyword">const char</span>* file_name)
 {
@ -218,8 +222,25 @@

    <span class="keyword">return</span> utf8::is_valid(it, eos);
 }
+</pre>
+<p>
+Because the function <code>utf8::is_valid()</code> works with input iterators, we were able
+to pass an <code>istreambuf_iterator</code> to it and read the content of the file directly 
+without loading it to the memory first.</p>
+<p>
+Note that other functions that take input iterator arguments can be used in a similar way. For
+instance, to read the content of a UTF-8 encoded text file and convert the text to UTF-16, just 
+do something like:
+</p>
+<pre>
+    utf8::utf8to16(it, eos, back_inserter(u16string));
 </pre>
    <h3 id="fixinvalid">Ensure that a string contains valid UTF-8 text</h3>
+<p>
+If we have some text that "probably" contains UTF-8 encoded text and we want to
+replace any invalid UTF-8 sequence with a replacement character, something like 
+the following function may be used:
+</p>
 <pre>
 <span class="keyword">void</span> fix_utf8_string(std::string&amp; str)
 {
@ -228,6 +249,9 @@
    str = temp;
 }
 </pre>
+<p>The function will replace any invalid UTF-8 sequence with a Unicode replacement character. 
+There is an overloaded function that enables the caller to supply their own replacement character.
+</p>
    <h2 id="reference">
      Reference
    </h2>