GRETIL - Göttingen Register of Electronic Texts in Indian Languages
Introduction
THE CONCEPT
FORMATS / ENCODINGS | ==> CONCORDANCE / SYSTEMATIC LIST OF ENCODINGS
HOW TO FIND AND DOWNLOAD AN E-TEXT
INPUT OF E-TEXTS -- SOME SUGGESTIONS
THE CONCEPT
GRETIL is intended as a cumulative register of the numerous download sites
for electronic texts in Indian languages.
GRETIL registers only e-texts that are freely available for scholarly
purposes and can be employed for word search etc. in a standard word
processing programme. In general, this excludes PDF files displaying text
in Devanagari or other Indian scripts, special formats for proprietary
software, as well as e-texts distributed for commercial profit.
Apart from registering electronic texts in Indian languages, GRETIL is
also intended to facilitate access to these texts. For this purpose all
registered e-texts are made available in the GRETIL Archives.
It should be noted that the archived files are intended for reference
purposes only.
FORMATS / ENCODINGS
The texts provided in the GRETIL Archive have been converted from files
in various encodings. The details of these encodings are not always stated
in the respective source files, and in some cases references to a particular
encoding standard (e.g., "ITRANS") have to be taken with some reservation.
In addition, some encodings may be ambivalent in their designation of
characters (assigning, e.g., "n" to various Sanskrit class nasals), or downright
obscure, thus denying the uninitiated access even for basic reference tasks.
In order to alleviate these and other problems, the GRETIL Archive
provides all available resources in three standardized formats. See
CONCORDANCE
and SYSTEMATIC LIST
of GRETIL encodings and transliteration systems (PDF):
- REE
This encoding was devised sometime in the 1980's by the late Ronald E. Emmerick,
Professor of Iranian Studies at Hamburg University, for WordPerfect 5.1 DOS
and related utility programmes BHELA, CARAKA etc. (DOS versions).
In memory of its esteemed author this encoding is here referred to as
"REE".
The choice of "REE" encoding may seem surprising, and it stands
to reason whether Ronald E. Emmerick would have approved of connecting
his name with an encoding that he probably considered long outdated.
Nonetheless, quite a number of indologists still use it. However, the
main reason for this choice was that, over the years, a variety of tools
had been developed for conversion from various encodings to REE.
Go to REE Archive
- CSX(+)
Classical Sanskrit eXtended (Plus)
This encoding is based on two code lists first defined in 1990 as "CS"
for the basic transliteration of Classical Sanskrit, and "CSX" for an
extended character set taking into account Vedic accents, Tamil etc.
Unless stated otherwise, the minor additions and modifications to CSX, later codified in CSX+,
are inconsequential for the files available from the GRETIL CSX Archive.
Go to CSX Archive
- HTML Unicode (UTF-8)
UTF-8 has been included as an additional archive format on 19.2.2003,
although its use entails hardware and software requirements that
may not be considered necessary, or desirable, by everyone.
All UTF-8 Archive files have "Arial Unicode" as standard
font setting. If you do not have this font (formerly available as
freeware and now included in office packages), please make sure
you have installed some other suitable Unicode font.
You can load the GRETIL files in UTF-8 directly into your word processor, provided
it can handle Unicode.
Go to UTF-8 Archive
Unless stated otherwise in the files, Vedic accents were dropped in
order to facilitate basic word search. Otherwise the texts remain principally
unchanged, even though the sometimes peculiar conventions of transliteration,
especially when designed to emulate Devanagari, leave much room for further
standardization.
HOW TO FIND AND DOWNLOAD AN E-TEXT
You can approach the search for a particular e-text from two angles:
- Systematic search
If you are looking for an e-text in a particular Indian language (e.g., Sanskrit) and / or
of a particular literary genre (e.g., Poetry), you can look up the language /
genre in the Index.
For easier orientation, the Index is organized according to Moriz Winternitz's
"History of Indian Literature". The archive of the Pali section
follows Oskar von Hinüber's "Handbook of Pali Literature".
The Tamil section follows Kamil V. Zvelebil's "Lexicon of Tamil Literature".
You are recommended to start your systematic search from the
Index of the GRETIL homepage
because the Index of an individual Archive (REE | CSX etc.) may not
register all e-texts available in all formats (e.g., Tamil texts are
not available in REE encoding; consequently, they are not included in
the Index to the REE Archive).
- Alphabetic Search
The GRETIL HTML pages are written in plain ASCII code without diacritics,
which makes the search for the name of a particular author / text very
easy:
- Go to the top of the GRETIL homepage
(or the respective Archive page),
- open the "Search" interface of your browser (usually by typing Ctrl-F),
- enter the name of the author / text.
- Download
In order to download the requested e-text, point to the download symbol
of the respective file and "Save link" with right mouse click.
INPUT OF E-TEXTS -- SOME SUGGESTIONS
(formerly despatched on the INDOLOGY discussion list)
Here are some points that I have come to find useful in my own
work as well as in preparing files from various sources for GRETIL.
- Format
Assuming that the aim of the text input is to provide a
scholarly reference aid for a given text, rather than an exercise
in piety, I consider transliteration in a PLAIN TEXT FILE
preferable to any other format such as PDF, RTF, HTML etc.,
which may turn out practically useless for the said purpose,
especially when combined with non-Latin scripts.
- Encoding
No matter which encoding is used in transliteration,
it should be
- FREE FROM ANY AMBIGUITY (that may, e.g., arise from employing
"n" for different Sanskrit class nasals)
- and FULLY DOCUMENTED at the beginning of every e-text,
preferably in a chart describing the character and/or giving the
equivalent "ASCII" number. No matter what may happen
to the e-text during the file transfer / download process,
the attached chart will suffer the same distortions -- and will thus
enable the reader on the receiving end to "reconstruct"
the encoding, if need be (see also Experimental Site).
Here is an example of what such an encoding chart may look like:
_____________________________________________
This file is endoded in CSX+
description char. = ASCII
long a à 224
long A â 226
long i ã 227
long I ä 228
long u å 229
vocalic r ç 231
long vocalic r é 233
vocalic l ë 235
long vocalic l í 237
velar n ï 239
palatal n ¤ 164
retroflex t ñ 241
retroflex d ó 243
retroflex n õ 245
palatal s ÷ 247
retroflex s ù 249
anusvara ü 252
visarga þ 254
_____________________________________________
All GRETIL e-texts contain a chart of the respective encoding. For a full documentation see
CONCORDANCE
and SYSTEMATIC LIST
of GRETIL encodings and transliteration systems (PDF).
- Reference System
This is perhaps the most neglected aspect in
the majority of e-texts one comes across. And yet, with the
computer's well-known limitation to one screenful of text at a
time, it is crucial to provide readers with adequate orientation,
citing, as it were, book, chapter and verse in each and every
screenful of text.
- REFERENCES SHOULD BE PLACED AT THE END of the respective
text unit (such as a verse or line) to allow for later
SORTING of lines (or padas) in alphabetical order
(cf. below).
- REFERENCES SHOULD BE GIVEN IN FULL, e.g. "3,13.120",
instead of restricting them to the smallest unit, say, the
verse number (just "120" instead of "3,13.120"). Having
browsed two or three screens up or down from a chapter
heading, one may easily have forgotten where exactly one
happens to be. Orientation can be even more difficult if an
ordinary word search takes you from the beginning of the
file right to a verse with the enigmatic reference "120":
for a start, you will have to scroll 119 verses up to find
out that you're in chapter 13, and it is all too plain that
your expedition through the text - and away from the
passage you were looking for - doesn't end there.
- With next to no additional effort, references can be made
SUITABLE FOR CLASSIFIED SEARCH simply by using distinctive
punctuation, such as COMMA between book and chapter, and
DOT between chapter and verse. This allows you to
distinguish the search for "3,13" (=book 3, chapter 13)
from "3.13" (chapter 3, verse 13).
- Especially when a file contains more than one e-text, the
reference should include an ABBREVIATION FOR THE TEXT in
question, preferably with a connecting underscore to
prevent accidental separation due to line break, e.g.
"MBh_3,13.120". Such an abbreviation is essential in pada /
verse indices that you may later want to merge with indices
of other texts to search for parallels.
- In a file combining a root text and interspersed
commentary, say, the Mahabharata and Nilakantha's
Bharatabhavadipa, distinct abbreviations, e.g.,
"MBh_3,13.120" resp. "MBhN_3,13.120", will facilitate
orientation significantly.
- MARKERS FOR METRICAL UNITS (padas) AND SECTIONS OF PROSE
(sentences) are indispensable for generating indices of padas, etc.
For instance, the Anustubh pattern could look like this:
For a four-pada verse:
........ $ ........ &
........ % ........ // Name_n,n.n //
For a six-pada verse:
........ $ ........ &
........ % ........ \
........ # ........ // Name_n,n.n //
Here, again, everything is fine as long as it is
UMAMBIGUOUS.
GRETIL home | Link to Indological Resources
Last update: 10.6.2011
Reinhold Grünendahl
© 2002 Niedersächsische Staats- und Universitätsbibliothek Göttingen
|