Added utfreader tests and more sample files
git-svn-id: http://svn.code.sf.net/p/utfcpp/code@37 a809a056-fc17-0410-9590-b4f493f8b08e
This commit is contained in:
parent
0ac74b9a49
commit
5a06d4d77c
5 changed files with 515 additions and 12 deletions
167
test_data/utf8samples/Unicode_transcriptions.html
Normal file
167
test_data/utf8samples/Unicode_transcriptions.html
Normal file
|
@ -0,0 +1,167 @@
|
|||
? *Unicode Transcriptions* Notes <#Notes>
|
||||
|
||||
Glyphs <http://www.macchiato.com/unicode/show.html> | Samples
|
||||
<http://www.macchiato.com/unicode/Unicode_transcriptions.html> | Charts
|
||||
<http://www.macchiato.com/unicode/charts.html> | UTF
|
||||
<http://www.macchiato.com/unicode/convert.html> | Forms
|
||||
<http://www-4.ibm.com/software/developer/library/utfencodingforms/> |
|
||||
Home <http://www.macchiato.com>.
|
||||
<http://member.linkexchange.com/cgi-bin/fc/fastcounter-login?750641>
|
||||
|
||||
Name Text Image
|
||||
Arabic (Arabic) يونِكود ?
|
||||
Arabic (Persian) یونیکُد / ?/
|
||||
Armenian Յունիկօդ
|
||||
Bengali য়ূনিকোড
|
||||
Bopomofo ㄊㄨㄥ˅ ㄧˋ ㄇㄚ˅
|
||||
ㄨㄢˋ ㄍㄨㄛˊ ㄇㄚ˅
|
||||
Braille
|
||||
Buhid
|
||||
Canadian Aboriginal ᔫᗂᑰᑦ
|
||||
Cherokee ᏳᏂᎪᏛ
|
||||
Cypriot
|
||||
Cyrillic (Russian) Юникод ?
|
||||
Deseret (English) ???????
|
||||
Devanagari (Hindi) यूनिकोड ?
|
||||
Ethiopic ዩኒኮድ
|
||||
Georgian უნიკოდი ?
|
||||
Gothic
|
||||
Greek Γιούνικοντ
|
||||
Gujarati યૂનિકોડ
|
||||
Gurmukhi ਯੂਨਿਕੋਡ
|
||||
Han (Chinese) 统一码 ?
|
||||
統一碼 ?
|
||||
万国码 ?
|
||||
萬國碼 ?
|
||||
Hangul 유니코드
|
||||
Hanunoo
|
||||
Hebrew יוניקוד
|
||||
Hebrew (pointed) יוּנִיקוׁד
|
||||
Hebrew (Yiddish) יוניקאָד ?
|
||||
Hiragana (Japanese) ゆにこおど
|
||||
Katakana (Japanese) ユニコード ?
|
||||
Kannada ಯೂನಿಕೋಡ್
|
||||
Khmer យូនីគោដ
|
||||
Lao
|
||||
Latin Unicode Unicode
|
||||
Latin (IPA <#English_Pronunciation>) ˈjunɪˌkoːd ?
|
||||
Latin (Am. Dict. <#American_Dictionary>) Ūnĭcōde̽ ?
|
||||
Limbu
|
||||
Linear B
|
||||
Malayalam യൂനികോഡ്
|
||||
Mongolian
|
||||
Myanmar
|
||||
Ogham ᚔᚒᚅᚔᚉᚑᚇ / /
|
||||
Old Italic
|
||||
Oriya ୟୂନିକୋଡ
|
||||
Osmanya
|
||||
Runic (Anglo-Saxon) ᛡᚢᚾᛁᚳᚩᛞ
|
||||
Shavian
|
||||
Sinhala යණනිකෞද්
|
||||
Syriac ܝܘܢܝܩܘܕ
|
||||
Tagbanwa
|
||||
Tagalog
|
||||
Tai Le
|
||||
Tamil யூனிகோட்
|
||||
Telugu యూనికోడ్
|
||||
Thaana
|
||||
Thai ยูนืโคด
|
||||
Tibetan (Dzongkha) ཨུ་ནི་ཀོཌྲ།
|
||||
Ugaritic
|
||||
Yi
|
||||
|
||||
|
||||
Notes:
|
||||
|
||||
There are different ways to transcribe the word “Unicode”, depending on
|
||||
the language and script. In some cases there is only one language that
|
||||
customarily uses a given script; in others there are many languages. The
|
||||
goal here is at a minimum to collect at least one transcription for each
|
||||
script in a language customarily written in that script, with more
|
||||
languages if possible. If the transcription is the same for multiple
|
||||
languages in a script, then a single representative language is used.
|
||||
|
||||
Still missing are transcriptions for the items above in RED (in at least
|
||||
one language). I would appreciate any other transcriptions, or
|
||||
corrections for the ones listed here. Send to mark3@macchiato.com
|
||||
<mailto:mark3@macchiato.com>, using the directions below:
|
||||
|
||||
* *Supplying Missing Items*
|
||||
o Most Latin-script languages will follow the spelling, and
|
||||
change the pronunciation. For any that would not, it would
|
||||
be good to have the alternate spelling.
|
||||
o For non-Latin scripts the goal is to match the English
|
||||
pronunciation — /*not*/ spelling. Above is the IPA <#IPA>
|
||||
(in phonemic transcription) that should be matched as
|
||||
closely as possible (without sounding affected in the target
|
||||
language)
|
||||
o Text would be best in either the UTF-8 text, or the code
|
||||
points in hex HTML. E.g. either of the following:
|
||||
+ "Юникод"
|
||||
+ "Юникод"
|
||||
+ Note: for / supplementary characters/
|
||||
<http://www.unicode.org/glossary/#supplementary_character>,
|
||||
there should be one hex number per code point, not two
|
||||
surrogates
|
||||
<http://www.unicode.org/glossary/#surrogate_code_point>:
|
||||
# 𐀀 /*not*/ �&xDC00;
|
||||
o If you have a good font, I'd also appreciate a GIF. It
|
||||
should be *96 x 24* bits, with the text centered, in black
|
||||
on white (plus grays if smoothed).
|
||||
* *Other Comments*
|
||||
o Because some browsers won't handle the text, both text and
|
||||
GIF image are supplied. If you can’t read the text columns,
|
||||
see Display Problems
|
||||
<http://www.unicode.org/help/display_problems.html>.
|
||||
o The Chinese versions (inc. Bopomofo) are translations, not
|
||||
transcriptions, since "transcription in Chinese is pretty
|
||||
lame" [J. Becker].
|
||||
o There are other "translations" of Unicode that may be in
|
||||
use, such as the Vietnamese "Thống Nhất Mã".
|
||||
o For sample pages in different languages on the Unicode site,
|
||||
see What is Unicode?
|
||||
<http://www.unicode.org/unicode/standard/WhatIsUnicode.html>
|
||||
o Americans are not generally used to IPA, and find a variety
|
||||
of different systems in their dictionaries. This one leaves
|
||||
the base letters as they are, and uses diacritics for
|
||||
pronunciation.
|
||||
* *Etymology of /Unicode/*
|
||||
o Coined by J. Becker. Not related to previous usages, such as:
|
||||
+ A telegraphic code in which one word or set of letters
|
||||
represents a sentence or phrase; a telegram or message
|
||||
in this. (late 19th century, OED)
|
||||
o According to my references, the prefix "uni" is directly
|
||||
from Latin while the word "code" is through French.
|
||||
o The original Indo-European apparently would have been
|
||||
*oino-kau-do ("one strike give"): *kau apparently being
|
||||
related to such English words as: hew, haggle, hoe, hag,
|
||||
hay, hack, caudad, caudal, caudate, caudex, coda, codex,
|
||||
codicil, coward, incus, and Kovač (personal name: "smith").
|
||||
+ I will leave the exact derivations to the exegetes,
|
||||
but I like the association with "haggle" myself.
|
||||
* *Contributions*
|
||||
o This draws on contributions or comments from:
|
||||
+ Dixon Au
|
||||
+ Joe Becker
|
||||
+ Maurice Bauhahn
|
||||
+ Abel Cheung
|
||||
+ Peter Constable
|
||||
+ Michael Everson
|
||||
+ Christopher John Fynn
|
||||
+ Michael Kaplan
|
||||
+ George Kiraz
|
||||
+ Abdul Malik
|
||||
+ Siva Nataraja
|
||||
+ Roozbeh Pournader
|
||||
+ Jonathan Rosenne
|
||||
+ Jungshik Shin
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
|
||||
Terms of Use <http://www.macchiato.com/terms_of_use.html>. Last updated:
|
||||
MED - 04/20/2003 15:30:33.
|
||||
<http://member.linkexchange.com/cgi-bin/fc/fastcounter-login?750641>
|
||||
|
||||
|
||||
|
Loading…
Add table
Add a link
Reference in a new issue