GRETIL - Göttingen Register of Electronic Texts in Indian Languages

Introduction


THE CONCEPT

FORMATS / ENCODINGS | ==> CONCORDANCE / SYSTEMATIC LIST OF ENCODINGS

HOW TO FIND AND DOWNLOAD AN E-TEXT

INPUT OF E-TEXTS -- SOME SUGGESTIONS


THE CONCEPT

GRETIL is intended as a cumulative register of the numerous download sites for electronic texts in Indian languages.

GRETIL registers only e-texts that are freely available for scholarly purposes and can be employed for word search etc. in a standard word processing programme. In general, this excludes PDF files displaying text in Devanagari or other Indian scripts, special formats for proprietary software, as well as e-texts distributed for commercial profit.

Apart from registering electronic texts in Indian languages, GRETIL is also intended to facilitate access to these texts. For this purpose all registered e-texts are made available in the GRETIL Archives.

It should be noted that the archived files are intended for reference purposes only.


FORMATS / ENCODINGS

The texts provided in the GRETIL Archive have been converted from files in various encodings. The details of these encodings are not always stated in the respective source files, and in some cases references to a particular encoding standard (e.g., "ITRANS") have to be taken with some reservation. In addition, some encodings may be ambivalent in their designation of characters (assigning, e.g., "n" to various Sanskrit class nasals), or downright obscure, thus denying the uninitiated access even for basic reference tasks.

In order to alleviate these and other problems, the GRETIL Archive provides all available resources in three standardized formats. See CONCORDANCE and SYSTEMATIC LIST of GRETIL encodings and transliteration systems (PDF):

  • REE
    This encoding was devised sometime in the 1980's by the late Ronald E. Emmerick, Professor of Iranian Studies at Hamburg University, for WordPerfect 5.1 DOS and related utility programmes BHELA, CARAKA etc. (DOS versions). In memory of its esteemed author this encoding is here referred to as "REE".
    The choice of "REE" encoding may seem surprising, and it stands to reason whether Ronald E. Emmerick would have approved of connecting his name with an encoding that he probably considered long outdated. Nonetheless, quite a number of indologists still use it. However, the main reason for this choice was that, over the years, a variety of tools had been developed for conversion from various encodings to REE.

    Go to REE Archive

  • CSX(+)
    Classical Sanskrit eXtended (Plus)
    This encoding is based on two code lists first defined in 1990 as "CS" for the basic transliteration of Classical Sanskrit, and "CSX" for an extended character set taking into account Vedic accents, Tamil etc.
    Unless stated otherwise, the minor additions and modifications to CSX, later codified in CSX+, are inconsequential for the files available from the GRETIL CSX Archive.

    Go to CSX Archive

  • HTML Unicode (UTF-8)
    UTF-8 has been included as an additional archive format on 19.2.2003, although its use entails hardware and software requirements that may not be considered necessary, or desirable, by everyone.
    All UTF-8 Archive files have "Arial Unicode" as standard font setting. If you do not have this font (formerly available as freeware and now included in office packages), please make sure you have installed some other suitable Unicode font.
    You can load the GRETIL files in UTF-8 directly into your word processor, provided it can handle Unicode.

    Go to UTF-8 Archive


    Unless stated otherwise in the files, Vedic accents were dropped in order to facilitate basic word search. Otherwise the texts remain principally unchanged, even though the sometimes peculiar conventions of transliteration, especially when designed to emulate Devanagari, leave much room for further standardization.


HOW TO FIND AND DOWNLOAD AN E-TEXT

You can approach the search for a particular e-text from two angles:

  • Systematic search
    If you are looking for an e-text in a particular Indian language (e.g., Sanskrit) and / or of a particular literary genre (e.g., Poetry), you can look up the language / genre in the Index.
    For easier orientation, the Index is organized according to Moriz Winternitz's "History of Indian Literature". The archive of the Pali section follows Oskar von Hinüber's "Handbook of Pali Literature". The Tamil section follows Kamil V. Zvelebil's "Lexicon of Tamil Literature".

    You are recommended to start your systematic search from the Index of the GRETIL homepage because the Index of an individual Archive (REE | CSX etc.) may not register all e-texts available in all formats (e.g., Tamil texts are not available in REE encoding; consequently, they are not included in the Index to the REE Archive).

  • Alphabetic Search
    The GRETIL HTML pages are written in plain ASCII code without diacritics, which makes the search for the name of a particular author / text very easy:
  • Go to the top of the GRETIL homepage (or the respective Archive page),
  • open the "Search" interface of your browser (usually by typing Ctrl-F),
  • enter the name of the author / text.

  • Download
    In order to download the requested e-text, point to the download symbol

    of the respective file and "Save link" with right mouse click.



INPUT OF E-TEXTS -- SOME SUGGESTIONS
(formerly despatched on the INDOLOGY discussion list)

Here are some points that I have come to find useful in my own work as well as in preparing files from various sources for GRETIL.

  • Format
    Assuming that the aim of the text input is to provide a scholarly reference aid for a given text, rather than an exercise in piety, I consider transliteration in a PLAIN TEXT FILE preferable to any other format such as PDF, RTF, HTML etc., which may turn out practically useless for the said purpose, especially when combined with non-Latin scripts.

  • Encoding
    No matter which encoding is used in transliteration, it should be
    - FREE FROM ANY AMBIGUITY (that may, e.g., arise from employing "n" for different Sanskrit class nasals)
    - and FULLY DOCUMENTED at the beginning of every e-text, preferably in a chart describing the character and/or giving the equivalent "ASCII" number. No matter what may happen to the e-text during the file transfer / download process, the attached chart will suffer the same distortions -- and will thus enable the reader on the receiving end to "reconstruct" the encoding, if need be.

    Here is an example of what such an encoding chart may look like:

    
           _____________________________________________
    This file is endoded in CSX+ description char. = ASCII long a 224 long A 226 long i 227 long I 228 long u 229 vocalic r 231 long vocalic r 233 vocalic l 235 long vocalic l 237 velar n 239 palatal n 164 retroflex t 241 retroflex d 243 retroflex n 245 palatal s 247 retroflex s 249 anusvara 252 visarga 254 _____________________________________________
    All GRETIL e-texts contain a chart of the respective encoding. For a full documentation see CONCORDANCE and SYSTEMATIC LIST of GRETIL encodings and transliteration systems (PDF).

  • Reference System
    This is perhaps the most neglected aspect in the majority of e-texts one comes across. And yet, with the computer's well-known limitation to one screenful of text at a time, it is crucial to provide readers with adequate orientation, citing, as it were, book, chapter and verse in each and every screenful of text.

    • REFERENCES SHOULD BE PLACED AT THE END of the respective text unit (such as a verse or line) to allow for later SORTING of lines (or padas) in alphabetical order (cf. below).

    • REFERENCES SHOULD BE GIVEN IN FULL, e.g. "3,13.120", instead of restricting them to the smallest unit, say, the verse number (just "120" instead of "3,13.120"). Having browsed two or three screens up or down from a chapter heading, one may easily have forgotten where exactly one happens to be. Orientation can be even more difficult if an ordinary word search takes you from the beginning of the file right to a verse with the enigmatic reference "120": for a start, you will have to scroll 119 verses up to find out that you're in chapter 13, and it is all too plain that your expedition through the text - and away from the passage you were looking for - doesn't end there.

    • With next to no additional effort, references can be made SUITABLE FOR CLASSIFIED SEARCH simply by using distinctive punctuation, such as COMMA between book and chapter, and DOT between chapter and verse. This allows you to distinguish the search for "3,13" (=book 3, chapter 13) from "3.13" (chapter 3, verse 13).

    • Especially when a file contains more than one e-text, the reference should include an ABBREVIATION FOR THE TEXT in question, preferably with a connecting underscore to prevent accidental separation due to line break, e.g. "MBh_3,13.120". Such an abbreviation is essential in pada / verse indices that you may later want to merge with indices of other texts to search for parallels.

    • In a file combining a root text and interspersed commentary, say, the Mahabharata and Nilakantha's Bharatabhavadipa, distinct abbreviations, e.g., "MBh_3,13.120" resp. "MBhN_3,13.120", will facilitate orientation significantly.

    • MARKERS FOR METRICAL UNITS (padas) AND SECTIONS OF PROSE (sentences) are indispensable for generating indices of padas, etc. For instance, the Anustubh pattern could look like this:

      For a four-pada verse:
      ........ $ ........ &
      ........ % ........ // Name_n,n.n //

      For a six-pada verse:
      ........ $ ........ &
      ........ % ........ \
      ........ # ........ // Name_n,n.n //

      Here, again, everything is fine as long as it is UNAMBIGUOUS.


GRETIL home | Link to Indological Resources


Last update: 15.3.2016
Contact

© 2002 Niedersächsische Staats- und Universitätsbibliothek Göttingen