Another update of documentation
git-svn-id: http://svn.code.sf.net/p/utfcpp/code@93 a809a056-fc17-0410-9590-b4f493f8b08e
This commit is contained in:
parent
9d935b3c69
commit
054defb568
1 changed files with 31 additions and 7 deletions
|
@ -101,14 +101,14 @@
|
|||
</h2>
|
||||
<p>
|
||||
Many C++ developers miss an easy and portable way of handling Unicode encoded
|
||||
strings. C++ Standard is currently Unicode agnostic, and while some work is being
|
||||
done to introduce Unicode to the next incarnation called C++0x, for the moment
|
||||
nothing of the sort is available. In the meantime, developers use 3rd party
|
||||
libraries like ICU, OS specific capabilities, or simply roll out their own
|
||||
solutions.
|
||||
strings. The original C++ Standard (known as C++98 or C++03) is Unicode agnostic,
|
||||
and while some work is being done to introduce Unicode to the next incarnation
|
||||
called C++0x, for the moment nothing of the sort is available. In the meantime,
|
||||
developers use third party libraries like ICU, OS specific capabilities, or simply
|
||||
roll out their own solutions.
|
||||
</p>
|
||||
<p>
|
||||
In order to easily handle UTF-8 encoded Unicode strings, I have come up with a small
|
||||
In order to easily handle UTF-8 encoded Unicode strings, I came up with a small
|
||||
generic library. For anybody used to work with STL algorithms and iterators, it should be
|
||||
easy and natural to use. The code is freely available for any purpose - check out
|
||||
the license at the beginning of the utf8.h file. If you run into
|
||||
|
@ -129,7 +129,7 @@
|
|||
Introductionary Sample
|
||||
</h3>
|
||||
<p>
|
||||
To illustrate the use of this utf8 library, let's start with a small but complete program
|
||||
To illustrate the use of the library, let's start with a small but complete program
|
||||
that opens a file containing UTF-8 encoded text, reads it line by line, checks each line
|
||||
for invalid UTF-8 byte sequences, and converts it to UTF-16 encoding and back to UTF-8:
|
||||
</p>
|
||||
|
@ -206,6 +206,10 @@
|
|||
<code>utf16to8</code>.
|
||||
</p>
|
||||
<h3 id="validfile">Checking if a file contains valid UTF-8 text</h3>
|
||||
<p>
|
||||
Here is a function that checks whether the content of a file is valid UTF-8 encoded text without
|
||||
reading the content into the memory:
|
||||
</p>
|
||||
<pre>
|
||||
<span class="keyword">bool</span> valid_utf8_file(i<span class="keyword">const char</span>* file_name)
|
||||
{
|
||||
|
@ -218,8 +222,25 @@
|
|||
|
||||
<span class="keyword">return</span> utf8::is_valid(it, eos);
|
||||
}
|
||||
</pre>
|
||||
<p>
|
||||
Because the function <code>utf8::is_valid()</code> works with input iterators, we were able
|
||||
to pass an <code>istreambuf_iterator</code> to it and read the content of the file directly
|
||||
without loading it to the memory first.</p>
|
||||
<p>
|
||||
Note that other functions that take input iterator arguments can be used in a similar way. For
|
||||
instance, to read the content of a UTF-8 encoded text file and convert the text to UTF-16, just
|
||||
do something like:
|
||||
</p>
|
||||
<pre>
|
||||
utf8::utf8to16(it, eos, back_inserter(u16string));
|
||||
</pre>
|
||||
<h3 id="fixinvalid">Ensure that a string contains valid UTF-8 text</h3>
|
||||
<p>
|
||||
If we have some text that "probably" contains UTF-8 encoded text and we want to
|
||||
replace any invalid UTF-8 sequence with a replacement character, something like
|
||||
the following function may be used:
|
||||
</p>
|
||||
<pre>
|
||||
<span class="keyword">void</span> fix_utf8_string(std::string& str)
|
||||
{
|
||||
|
@ -228,6 +249,9 @@
|
|||
str = temp;
|
||||
}
|
||||
</pre>
|
||||
<p>The function will replace any invalid UTF-8 sequence with a Unicode replacement character.
|
||||
There is an overloaded function that enables the caller to supply their own replacement character.
|
||||
</p>
|
||||
<h2 id="reference">
|
||||
Reference
|
||||
</h2>
|
||||
|
|
Loading…
Reference in a new issue