Another update of documentation

git-svn-id: http://svn.code.sf.net/p/utfcpp/code@93 a809a056-fc17-0410-9590-b4f493f8b08e
This commit is contained in:
ntrifunovic 2009-07-07 00:46:34 +00:00
parent 9d935b3c69
commit 054defb568

View file

@ -101,14 +101,14 @@
</h2>
<p>
Many C++ developers miss an easy and portable way of handling Unicode encoded
strings. C++ Standard is currently Unicode agnostic, and while some work is being
done to introduce Unicode to the next incarnation called C++0x, for the moment
nothing of the sort is available. In the meantime, developers use 3rd party
libraries like ICU, OS specific capabilities, or simply roll out their own
solutions.
strings. The original C++ Standard (known as C++98 or C++03) is Unicode agnostic,
and while some work is being done to introduce Unicode to the next incarnation
called C++0x, for the moment nothing of the sort is available. In the meantime,
developers use third party libraries like ICU, OS specific capabilities, or simply
roll out their own solutions.
</p>
<p>
In order to easily handle UTF-8 encoded Unicode strings, I have come up with a small
In order to easily handle UTF-8 encoded Unicode strings, I came up with a small
generic library. For anybody used to work with STL algorithms and iterators, it should be
easy and natural to use. The code is freely available for any purpose - check out
the license at the beginning of the utf8.h file. If you run into
@ -129,7 +129,7 @@
Introductionary Sample
</h3>
<p>
To illustrate the use of this utf8 library, let's start with a small but complete program
To illustrate the use of the library, let's start with a small but complete program
that opens a file containing UTF-8 encoded text, reads it line by line, checks each line
for invalid UTF-8 byte sequences, and converts it to UTF-16 encoding and back to UTF-8:
</p>
@ -206,6 +206,10 @@
<code>utf16to8</code>.
</p>
<h3 id="validfile">Checking if a file contains valid UTF-8 text</h3>
<p>
Here is a function that checks whether the content of a file is valid UTF-8 encoded text without
reading the content into the memory:
</p>
<pre>
<span class="keyword">bool</span> valid_utf8_file(i<span class="keyword">const char</span>* file_name)
{
@ -218,8 +222,25 @@
<span class="keyword">return</span> utf8::is_valid(it, eos);
}
</pre>
<p>
Because the function <code>utf8::is_valid()</code> works with input iterators, we were able
to pass an <code>istreambuf_iterator</code> to it and read the content of the file directly
without loading it to the memory first.</p>
<p>
Note that other functions that take input iterator arguments can be used in a similar way. For
instance, to read the content of a UTF-8 encoded text file and convert the text to UTF-16, just
do something like:
</p>
<pre>
utf8::utf8to16(it, eos, back_inserter(u16string));
</pre>
<h3 id="fixinvalid">Ensure that a string contains valid UTF-8 text</h3>
<p>
If we have some text that "probably" contains UTF-8 encoded text and we want to
replace any invalid UTF-8 sequence with a replacement character, something like
the following function may be used:
</p>
<pre>
<span class="keyword">void</span> fix_utf8_string(std::string&amp; str)
{
@ -228,6 +249,9 @@
str = temp;
}
</pre>
<p>The function will replace any invalid UTF-8 sequence with a Unicode replacement character.
There is an overloaded function that enables the caller to supply their own replacement character.
</p>
<h2 id="reference">
Reference
</h2>