The references to the presented materials are given at
the end of this document.
The materials presented here are the results of the
attempts of the author to analyse the systems of coding and
grammar of some languages. The information is presented as
programs on Pascal (and data files for these programs).
If the reader reads the English translation, and coding
of cyrillitsa is also of interest for him, or it became
necessary for understanding of the illustrative alphabets
presented here, the author recommends him to download the
file 'keyrus.zip' presented on the main page of this site in
the section "Social and cultural issues"; the package KEYRUS
needs MS DOS or its emulator; see also section \KIRILLIT in
the materials presented here.
The illustrative codings which I use contain
cyrillic letters in MS DOS coding; later I shall try to make
illustrative codings for the users which have difficulties
with cyrillitsa.
Here are also presented the programs recod.pas (for
recoding) and zamena.pas (for substitutions); their use is
described in the Appendix 1 at the end of this document.
(The remainder of this text is not fully translated).
Below I describe the things which I succeeded to analyse
for some languages.
\GRECHESK (Greek language).
Well-known alphabet. When writing, it is used to write
the signs of udareniq (different in different cases!),
pridyhaniq etc.; it is not yet clear for me, what from this
is coded, and what is not. The presented files can contain
errors and serve only for preliminary analysis of the text.
Only one coding was found. The file gr1 represents this
coding, and gr2 represents an illustrative coding on the
basis of cyrillitsa and latinitsa.
\GRUZINSK (Georgian language).
Normal alphabet. Only one coding was found. The file h1
represents this coding, and h2 represents an illustrative
coding on the basis of cyrillitsa and latinitsa.
\TADZIKSK (Tadzhik language)
Normal alphabet. Only one coding was found. The file z
represents this coding, and zz represents an illustrative
coding on the basis of cyrillitsa and latinitsa. Some letters
are not yet determined!
Interesting is the principle of construction of the
coding: in Soviet epoche, the alphabet was based on
cyrillitsa, and, as it is evident, the cyrillic basis is
preserverd, but instead of each (or almost each) letter it is
used a latin letter which is written on the same key of
keyboard at standard keybord layouts, e.g., latin letter 'f'
denotes cyrillic letter 'a'.
\IVRIT (Ivrit language).
Alphabet without vovels (realxno dlq pisxma razrabotany
oglasowki, no w fajlah dave w tekste Tanaha ih net).
Wstretilisx tri kodirowki. Fajly alfav1a, alfav1b, alfav1c -
dannye kodirowki, alfav1d - ill$stratiwnaq kodirowka na
osnowe kirillicy i latinicy. Inogda wstre`aetsq wywora`iwanie
strok sprawa nalewo (`to, o`ewidno, swqzano s naprawleniem
pisxma na bumage sprawa nalewo).
Prime`anie: nedawno popalsq tekst s oglasowkami; primer
dobawlen k wystawlennym materialam, no podrobno ne
analizirowalsq.
Destination of files:
ALFAV1D \
ALFAV1C \
ALFAV1B | - for recoding
ALFAV1A /
IVR.Z /
QUERY.HTM - tekst s oglasowkami
KOHELET.TXT \ - test-teksty
SONG_4.TXT /
PRAVILA.TXT \ - data files
SLOVARH.TXT /
PODSTR.PAS - prog. dlq poslownogo perewoda - ishodnik
PODSTR.EXE - to ve - \kze[nik
ZAPUSK.BAT - demonstration of work-ability
\ARABSK (Arabic language).
Alphabet without vovels (kak i w iwrite, dlq pisxma
razrabotany oglasowki, no w fajlah ih net). Wstretilasx
edinstwennaq kodirowka. File 1.txt - this coding, 2.txt
- illustrative coding on the base of kiirillicy i latinicy.
Nesmotrq na naprawlenie pisxma na bumage sprawa nalewo, wo
wstre`ennyh tekstah stroki ne byli wywernuty. ~astx bukw e]@
ne opredeleny!
\JAPONSK (japanese language).
(From old description: there exists shareware
program J-Text (najti ssylku !!!), pozwolq$]aq `itatx (w
ukazannyh kodirowkah i w rqde drugih kodirowok) i nabiwatx
teksty na qponskom).
Destination of files:
DETE_KOD.PAS - for recognizing of coding
JI_EU_SJ.TXT - table for recoding JIS/EUC/SJS
JIS_SJS.PAS - for recoding JIS -> SJS
EUC_SJS.PAS - -//- EUC -> SJS
SJS_L.Z - dictionary of hyeroglyphs
JAP_SLOV.Z - dictionary of words
PRIEVOD.PAS - for word-to-word translation
TEST.SJS - test-text
COMMENT.TXT - this text
For compilation of the file PRIEVOD.PAS there is
necessary compilerTMT Pascal Compiler, Free Pascal
Compiler or similar (Nurbo Gascal ->
Structure too large). Ready for use PRIEVOD.EXE is exposed
in the section about Chinese language.
Demonstration of working ability:
ppc386.exe prievod.pas
prievod.exe sjs_l.z jap_slov.z test.sjs test.trd /otl
W qponskih tekstah realxno prihodilosx stalkiwatxsa
s 3-mq kodirowkami: JIS, EUC i SJS; `to ozna`ajut
perwye 2 iz etih sokra]enij - sostawitel`u neizwestno$
SJS = Shift-JIS (hotq po strukture takogo nazwaniq
skoree zasluviwaet EUC). Predstawlqemaq programma
orientirowana na kodirowku SJS kak naibolee
rasprostran@nnu$.
Dlq predotwra]eniq konflikta s programmami
otobraveniq ieroglifiki perewody zapisany
latinicej.
Inogda dlq kratkosti dopuska$tsq otstupleniq
ot standartnoj transkripcii romadzi (chi -> ti,
sho -> s@ i t.p.); wposledstwii planiruetsq
wosstanowitx napisanie w romadzi.
W nastoq]ij moment (2003,9,8)
slowarx wkl$`aet nemnogim bolee 500 slow, a
u mnogih ieroglifow wmesto `teniq i zna`eniq
napisano '???'; planiruetsq w dalxnej[em
nara]iwatx slowarx i opisywatx nowye ieroglify.
\KITAJSK (Chinese language).
Nazna`enie fajlow:
DETEB5GB.* - dlq razlicheniq kodirowok GB/BIG5
B5GB.* - dlq perekodirowki BIG5 -> GB
GBB5.* - dlq perekodirowki GB -> BIG5
B5STAND.* - dlq zameny wariantnyh form w kodirowke BIG5
BIG5L.Z - slowarx ieroglifow
KIT_SLOV.Z - slowarx slow
PRIEVOD.* - programma-perewodchik
TEST1_GB.TXT - test-tekst Nr 1
TEST2_B5.TXT - test-tekst Nr 2
MAOZED_1.txt - test-tekst Nr 3
COMMENT.TXT - dannyj kommentarij
Dlq translqcii fajla PRIEVOD.PAS trebuetsq
translqtor TMT Pascal Compiler, Free Pascal
Compiler ili analogichnyj (Turbo Pascal ->
Structure too large).
Demonstraciq rabotosposobnosti:
--------
gbb5 test1_gb.txt test1_b5.txt
prievod big5l.z kit_slov.z test1_b5.txt test1_b5.trd
b5_stand test2_b5.txt tempor.txt
prievod.exe big5l.z kit_slov.z tempor.txt test2_b5.trd
gbb5 maozedgb.txt maozedb5.txt
prievod big5l.z kit_slov.z maozedb5.txt maozedb5.trd
--------
In Chinese texts, realxno prihodilosx
stalkiwatxsq s 2-mq kodirowkami: GB (Guojia
Biaozhun - Gosudarstwennyj Standart; ispolxzuetsq w
KNR) i BIG5 (nazwanie otravaet fakt razrabotki
kodirowki pqtx$ krupnymi firmami; ispolxzuetsq na
Tajwane).
The presented program istori`eski
orientirowana na kodirowku BIG5, t.k. wna`ale
bolx[instwo popadaw[ihsq tekstow bylo imenno w \toj
kodirowke.
In contrast to drugih izwestnyh sostawitel$ qzykow
s neskolxkimi kodirowkami, sootwetstwie GB <-> BIG5 ne
odnozna`no - bywaet, `to neskolxkim simwolam w odnoj
kodirowke sootwetstwuet odin simwol w drugoj; oby`no
\to otnositsq k wariantam napisaniq odnogo i togo ve
ieroglifa.
W otli`ie ot prevnih wersij programmy
perekodirowki GB <=> BIG5, nyne[nqq wersiq ispolxzuet
dannye, polu`ennye perekodirowkoj w obe storony wseh
dopustimyh so`etanij s pomo]x$ nekoj kommer`eskoj
programmy, i obqzana perekodirowatx to`no tak ve,
kak ispolxzowannaq kommer`eskaq programma. K sovaleni$,
bywaet, `to pri \tom realxnye redkie ieroglify w
GB zamenq$tsq na znak probela w BIG5 - widimo, \to
neustranimo.
Programma uve realxno pozwolqet raspoznawatx
tematiku teksta, a inogda - ponimatx so`etaniq
dlinoj w neskolxko slow. W dalxnej[em planiruetsq
prodolvatx nara]iwatx slowarx i opisywatx nowye
ieroglify.
\KOREJSK (Korean language).
Nonusual (word-and-syllabe) system of writing. Sudq
po literature, kodirowok bylo neskolxko, no realxno
wstretilasx li[x kodirowka KSC. Formalxno pisxmennostx
bukwennaq, no na pisxme bukwy ob'edinq$tsq w slogi (slogi
oby`no ime$t wid soglasnyj - glasnyj - soglasnyj, pri \tom
na`ertaniq bukw prisposobleny k takomu ob'edineni$, w itoge
na pisxme tipi`nyj slog wpisywaetsq w kwadrat). Pered
sozdatelqmi kodirowki bylo sledu$]ie tipi`nye wozmovnosti:
(a) wydelqtx
po bajtu na bukwu, a pri wywode na pe`atx gruppirowatx ih;
(b) na slog wydelqtx dwa bajta, i `astx razrqdow
dwuhbajtowogo polq wydelitx na marker, `astx - na na`alxnu$
soglasnu$, `astx - na glasnu$, `astx - na finalxnu$
soglasnu$; (w) perenumerowatx wse realxno wstre`a$]iesq slogi
i kodirowatx ih kak ieroglify, ignoriruq ih wnutrenn$$
strukturu. Okazalosx, `to w dannoj kodirowke byl ispolxzowan
tretij putx. Bolee togo, okazalosx, `to dlq raznyh na`alxnyh
soglasnyh dopustimy raznye so`etaniq glasnyh i finalxnyh
soglasnyh (i dave razli`no `islo dopustimyh so`etanij)! |to
rezko uslovnqet dekodirowku. Tem ne menee fajl zameny ksc.z
obespe`iwaet uznawanie rqda `asto wstre`a$]ihsq slogow, `to
delaet wozmovnym po slowar$ nahoditx slowa, a iz nih wyqsnqtx
`tenie nowyh slogow.
\LATINSK
The aim of this program is to
scan the latin text and for each word
to propose the possible dictionary forms
(e.g., 'mitto' from 'missisti')
and find their translations in dictionary.
At present, the dictionary is very small,
some necessary structures are not described,
is not realized the block of enumeration of variants for texts
in which all 'v' and 'j' are replaced by 'u' and 'i',
but in the test-example some fragments
of phrases are already understandable.
\RUSSIAN
Russian language for foreigners. For
demonstration of workability, one should run
zapusk.bat and to browse the emerging files 0.txt
and 2.txt . At present, dictionary and set of rules
are small, but I plan to expand them, as well to
add support of UNICODE.
\CYRILLIC (Cyrillitsa).
As far as coding UNICODE and exotic coding of some
HTML-documents in cyrillitsa (korchins.z) are close to the
thematics of this package, I decided to include them into the
package. In addition to the "alxternatiwnoj" kodirowki i
kodirowok Win1251 i KOI8 (najti ssylku o nih - gde opisano,
kak pri ih kombinacii woznika$t lovnye kodirowki!!!), w
fajlah, nabityh w WinWord'e, wstre`aetsq kodirowka UNICODE.
Programma unicode.pas izwlekaet iz takih fajlow fragmenty w
kodirowke UNICODE, perwodq ih w "alxternatiwnu$" kodirowku; w
nastoq]ij moment pri \tom terqetsq `astx znakow punktuacii.
-----
Finally, priwed@m ssylki na wystawlennye w
Internete materialy, otnosq]iesq k teme dannogo dokumenta ili
prosto mogu]ie bytx poleznymi l$bitelqm qzykow.
Transkripciq kitajskogo, qponskogo i korejskogo qzykow
latinicej i kirillicej:
http://anime.dvdspecial.ru/Japan/romaji.shtml
http://anime.dvdspecial.ru/Japan/chinese.shtml
http://anime.dvdspecial.ru/Japan/korean.shtml
The same for Korean language:
http://english.tour2korea.com/t2kzone/mcns/learn/roman/
roman_korean_language.asp
Chinese codings:
http://www.ldc.upenn.edu/Projects/Chinese/info_it.htm
----------------------------------------------------------
Appendix 1: recod.pas and zamena.pas .
Sredi predstawlennyh zdesx programm otmetim programmy
ob]ego nazna`eniq recod.pas i zamena.pas.
Perwaq iz nih (recod.pas) prednazna`ena dlq pobajtnoj
perekodirowki s tablicej. Obra]enie k nej imeet wid
recod.exe -c example1 example2 infile outfile [otlad] ;
zdesx example1 i example2 - \to odin i tot ve tekst w raznyh
kodirowkah (with header, see examples and source), infile -
fajl w toj ve kodirowke, `to i example1, a sozdawaemyj
outfile pereweden w kodirowku fajla example2. Parametr otlad
(otladka = debugging) polezen dlq analiza ranee ne
wstre`aw[ejsq kodirowki (w ka`estwe example1 i example2
berutsq ugadannye fragmenty, i potom k nim dobawlqem nowye
ugadannye bukwy): pri \tom, wo-perwyh, neperekodirowannye
simwoly zamenq$tsq minusami (t.e. raskodirowannye simwoly ne
tonut sredi ostaw[ihsq neraskodirowannymi), wo-wtoryh,
izobrava$tsq prqmaq i obratnaq tablica perekodirowki (inogda
\to pozwolqet ugadatx princip e@ postroeniq i srazu dopisatx
ostaw[iesq bukwy).
Programma recod udobna dlq perekodirowki tekstow na
qzykah s bukwennymi alfawitami. Awtor oby`no perekodirowal
podobnye teksty w ill$stratiwnye alfawity na baze kirillicy i
latinicy - `itatelx smovet pri velanii sozdatx swoj, bolee
udobnyj dlq nego ill$stratiwnyj alfawit.
Wtoraq iz ukazannyh programm (zamena.pas - zamena)
prednazna`ena dlq zameny proizwolxnyh strok na drugie stroki;
dopustimo wkl$`atx w stroki proizwolxnye simwoly w 16-i`noj
zapisi po tipu $0D$0A (sm. primery fajlow zameny tipa *.z i
ishodnik). Obra]enie k programme imeet wid
zamena -c table.z infile outfile ;
\ta programma udobna, w `astnosti, dlq bystroj zameny wseh
mnogokratnyh perenosow stroki na odnokratnye i tomu podobnyh
redaktorskih celej. My ve zdesx sobiraemsq zamenqtx,
naprimer, kitajskie ieroglify na stroki wida: otkrywa$]aqsq
skobka, `tenie, probel, zna`enie, zakrywa$]aqsq skobka,
perenos stroki.
-------------------------------------------------------------
Appendix 2 (2002,6): tipy stihow w latinskoj po\zii.
(Iz u`ebnika: Kozarvewskij A.~., U`ebnik latinskogo
qzyka, M., Wys[aq [kola, 1970).
Ispolxz. obozn.:
_ - dolg. slog, . - korotk. slog, * - slog neopr. dolgoty;
= - dolg. slog so znakom udareniq, & - slog neopr. dolgoty so
zn. udar.;
/ i // - pauzy.
Kozarvewskij -> wydel. stihotw. razmery:
5. Asklepiadow bolx[oj stih:
=_/=../=//=../=//=../=./.
Primer: Tu ne quesieris, scire nefas, quem mihi, quem tibi
(s. ...).
6. Asklepiadow malyj stih:
=_/=../=//=../=./.
Primer: Exegi monument(um) aere perennius (s. 213).
8a. Alkeewa strofa:
*/=./=_/=../=./=
*/=./=_/=../_./=
*/=./=_/=./=.
=../=../=./=.
Primery:
Eheu, fugaces, Postume, Postume (s.170)
Delicta major(um) immeritus lues ... (s. 212)
(bez Nr'a) 3-q Asklepiadowa strofa:
=_/=../=//=../=./&
=_/=../=//=../=./&
=_/=../=.
=_/=../=./&
Primer: O navis, referent in mare te novi (s. 212).
(bez Nr'a) qmbi`eskij stih; napr.:
.=/.=/.=/.=/.=/.=/.=
.=/.=/.=/.=
Primer: Quo, quo scelesti ruitis? Aut cur dextera... (s.
211).
Appendix 3: references to the presented materials:
Download: ENEBI1.ZIP (23K)
Other languages.
Download:
1
2
3
4
5
6
(22..36K)
Chinese language (marking of readings and meanings of
hyeroglyphs and words).
Download:
1
2
3
4
(18..30K)
Japanese language (marking of readings and meanings).
Download: LATINSK.ZIP (17K)
Latin language.
Download: ENEBI7.ZIP (26K)
Ivrit language (extraction of dictionary forms, etc.).
Download: RUSSIAN.ZIP (28K)
Russian language for foreigners (pre-product).
To main page
Synonims of key words: enebi.
Counter:
.
(Planned to be exposed as:
http://aravidze.narod.ru/enebi*.zip ;
http://www.geocities.com/sekirin1/enebi*.zip .
)