GRETIL - Göttingen Register of Electronic
Texts in Indian Languages
Introduction
This site is no longer being updated and reflects the register as of June 28th, 2019 for documentation purposes! You can access the current register at: http://gretil.sub.uni-goettingen.de
THE CONCEPT
FORMATS / ENCODINGS |
==> CONCORDANCE / SYSTEMATIC LIST OF
ENCODINGS
HOW TO FIND AND DOWNLOAD
AN E-TEXT
INPUT OF E-TEXTS -- SOME
SUGGESTIONS
THE CONCEPT
GRETIL is intended as a cumulative register of the numerous
download sites
for electronic texts in Indian languages.
GRETIL registers only e-texts that are freely available for
scholarly
purposes and can be employed for word search etc. in a standard
word
processing programme. In general, this excludes PDF files
displaying text
in Devanagari or other Indian scripts, special formats for
proprietary
software, as well as e-texts distributed for commercial profit.
Apart from registering electronic texts in Indian languages,
GRETIL is
also intended to facilitate access to these texts. For this purpose
all
registered e-texts are made available in the GRETIL Archives.
It should be noted that the archived files are intended for
reference
purposes only.
FORMATS / ENCODINGS
The texts provided in the GRETIL Archive have been converted
from files
in various encodings. The details of these encodings are not
always stated
in the respective source files, and in some cases references to a
particular
encoding standard (e.g., "ITRANS") have to be
taken with some reservation.
In addition, some encodings may be ambivalent in their
designation of
characters (assigning, e.g., "n" to various Sanskrit
class nasals), or downright
obscure, thus denying the uninitiated access even for basic
reference tasks.
In order to alleviate these and other problems, the GRETIL
Archive
provides all available resources in three standardized formats.
See
CONCORDANCE
and SYSTEMATIC LIST
of GRETIL encodings and transliteration systems (PDF):
- REE
This encoding was devised sometime in the 1980's by the
late Ronald E. Emmerick,
Professor of Iranian Studies at Hamburg University, for
WordPerfect 5.1 DOS
and related utility programmes BHELA, CARAKA etc.
(DOS versions).
In memory of its esteemed author this encoding is here
referred to as
"REE".
The choice of "REE" encoding may seem
surprising, and it stands
to reason whether Ronald E. Emmerick would have
approved of connecting
his name with an encoding that he probably considered
long outdated.
Nonetheless, quite a number of indologists still use it.
However, the
main reason for this choice was that, over the years, a
variety of tools
had been developed for conversion from various
encodings to REE.
Go to REE Archive
- CSX(+)
Classical Sanskrit eXtended
(Plus)
This encoding is based on two code lists first defined in
1990 as "CS"
for the basic transliteration of Classical Sanskrit, and
"CSX" for an
extended character set taking into account Vedic accents,
Tamil etc.
Unless stated otherwise, the minor additions and
modifications to CSX, later codified in CSX+,
are inconsequential for the files available from the
GRETIL CSX Archive.
Go to CSX Archive
- HTML Unicode (UTF-8)
UTF-8 has been included as an additional archive format
on 19.2.2003,
although its use entails hardware and software
requirements that
may not be considered necessary, or desirable, by
everyone.
All UTF-8 Archive files have "Arial Unicode"
as standard
font setting. If you do not have this font (formerly
available as
freeware and now included in office packages), please
make sure
you have installed some other suitable Unicode font.
You can load the GRETIL files in UTF-8 directly into
your word processor, provided
it can handle Unicode.
Go to UTF-8 Archive
Unless stated otherwise in the files, Vedic accents were dropped
in
order to facilitate basic word search. Otherwise the texts remain
principally
unchanged, even though the sometimes peculiar conventions of
transliteration,
especially when designed to emulate Devanagari, leave much
room for further
standardization.
HOW TO FIND AND DOWNLOAD AN
E-TEXT
You can approach the search for a particular e-text from two
angles:
- Systematic search
If you are looking for an e-text in a particular Indian language
(e.g., Sanskrit) and / or
of a particular literary genre (e.g., Poetry), you can look up the
language /
genre in the Index.
For easier orientation, the Index is organized according to Moriz
Winternitz's
"History of Indian Literature". The archive of the
Pali section
follows Oskar von Hinüber's "Handbook of Pali
Literature".
The Tamil section follows Kamil V. Zvelebil's "Lexicon of
Tamil Literature".
You are recommended to start your systematic search from the
Index of the
GRETIL homepage
because the Index of an individual Archive (REE | CSX etc.)
may not
register all e-texts available in all formats (e.g., Tamil texts are
not available in REE encoding; consequently, they are not
included in
the Index to the REE Archive).
- Alphabetic Search
The GRETIL HTML pages are written in plain ASCII code
without diacritics,
which makes the search for the name of a particular author /
text very
easy:
- Go to the top of the GRETIL homepage
(or the respective Archive page),
- open the "Search" interface of your browser
(usually by typing Ctrl-F),
- enter the name of the author / text.
- Download
In order to download the requested e-text, point to the
download symbol
of the respective file and "Save link" with right
mouse click.
INPUT OF E-TEXTS -- SOME
SUGGESTIONS
(formerly despatched on the INDOLOGY discussion list)
Here are some points that I have come to find useful in my own
work as well as in preparing files from various sources for
GRETIL.
- Format
Assuming that the aim of the text input is to provide a
scholarly reference aid for a given text, rather than an
exercise
in piety, I consider transliteration in a PLAIN TEXT
FILE
preferable to any other format such as PDF, RTF,
HTML etc.,
which may turn out practically useless for the said
purpose,
especially when combined with non-Latin scripts.
- Encoding
No matter which encoding is used in transliteration,
it should be
- FREE FROM ANY AMBIGUITY (that may, e.g.,
arise from employing
"n" for different Sanskrit class nasals)
- and FULLY DOCUMENTED at the beginning of
every e-text,
preferably in a chart describing the character and/or
giving the
equivalent "ASCII" number. No matter what
may happen
to the e-text during the file transfer / download process,
the attached chart will suffer the same distortions -- and
will thus
enable the reader on the receiving end to
"reconstruct"
the encoding, if need be.
Here is an example of what such an encoding chart may
look like:
_____________________________________________
This file is endoded in CSX+
description char. = ASCII
long a à 224
long A â 226
long i ã 227
long I ä 228
long u å 229
vocalic r ç 231
long vocalic r é 233
vocalic l ë 235
long vocalic l í 237
velar n ï 239
palatal n ¤ 164
retroflex t ñ 241
retroflex d ó 243
retroflex n õ 245
palatal s ÷ 247
retroflex s ù 249
anusvara ü 252
visarga þ 254
_____________________________________________
All GRETIL e-texts contain a chart of the respective encoding.
For a full documentation see
CONCORDANCE
and SYSTEMATIC LIST
of GRETIL encodings and transliteration systems (PDF).
- Reference System
This is perhaps the most neglected aspect in
the majority of e-texts one comes across. And yet,
with the
computer's well-known limitation to one screenful of
text at a
time, it is crucial to provide readers with adequate
orientation,
citing, as it were, book, chapter and verse in each and
every
screenful of text.
- REFERENCES SHOULD BE PLACED
AT THE END of the respective
text unit (such as a verse or line) to allow
for later
SORTING of lines (or padas) in
alphabetical order
(cf. below).
- REFERENCES SHOULD BE GIVEN IN
FULL, e.g. "3,13.120",
instead of restricting them to the smallest
unit, say, the
verse number (just "120" instead
of "3,13.120"). Having
browsed two or three screens up or down
from a chapter
heading, one may easily have forgotten
where exactly one
happens to be. Orientation can be even more
difficult if an
ordinary word search takes you from the
beginning of the
file right to a verse with the enigmatic
reference "120":
for a start, you will have to scroll 119 verses
up to find
out that you're in chapter 13, and it is all too
plain that
your expedition through the text - and away
from the
passage you were looking for - doesn't end
there.
- With next to no additional effort, references
can be made
SUITABLE FOR CLASSIFIED SEARCH
simply by using distinctive
punctuation, such as COMMA between
book and chapter, and
DOT between chapter and verse. This
allows you to
distinguish the search for "3,13"
(=book 3, chapter 13)
from "3.13" (chapter 3, verse 13).
- Especially when a file contains more than
one e-text, the
reference should include an
ABBREVIATION FOR THE TEXT in
question, preferably with a connecting
underscore to
prevent accidental separation due to line
break, e.g.
"MBh_3,13.120". Such an
abbreviation is essential in pada /
verse indices that you may later want to
merge with indices
of other texts to search for parallels.
- In a file combining a root text and
interspersed
commentary, say, the Mahabharata and
Nilakantha's
Bharatabhavadipa, distinct abbreviations,
e.g.,
"MBh_3,13.120" resp.
"MBhN_3,13.120", will facilitate
orientation significantly.
- MARKERS FOR METRICAL UNITS
(padas) AND SECTIONS OF PROSE
(sentences) are indispensable for generating
indices of padas, etc.
For instance, the Anustubh pattern could
look like this:
For a four-pada verse:
........ $ ........ &
........ % ........ // Name_n,n.n //
For a six-pada verse:
........ $ ........ &
........ % ........ \
........ # ........ // Name_n,n.n //
Here, again, everything is fine as long as it
is
UNAMBIGUOUS.
GRETIL home | Link to Indological Resources
Last update: 15.3.2016
Contact
© 2002 Niedersächsische Staats-
und Universitätsbibliothek
Göttingen
|