Abstract
A general theory of ordering lexicographic entries in various languages is formulated. Basic concepts and principles of one-, and many-step ordering are discussed. The algorithms for one-step (alphabetic), two-step (alphabetic with diacritical marks) and many-step ordering (with attributes) are derived and illustrated on examples. A simple method of computer implementation is proposed. The method consists in creation and sorting of extended alphanumeric strings containing the information about all steps of ordering. The problems encountered in implementation of the theory: transcription of lexicographic entries into a standard ASCII format, identification of alphabetic units, diacritical marks, ligatures and contractions, as well as creation of extended strings, are discussed. A master program applicable to various languages is described and example application to Spanish language discussed.
Similar content being viewed by others
References
ANSI, American National Standards Institution. ANSI X3.4—1986,1986.
Ashton Tate. dBase III for your 16-bit PC. 1981.
Emmer, Mark B. SNOBOL4+. Prentice Hall: Englewood Cliffs, 1985.
Fox Software Corporation. FoxBASE+, v. 2.00. 1987.
ISO, International Standardization Organization. 8-bit, single-byte coded graphic character sets, ISO 8859-x, ISO 8859-1, 8859-4: Latin Alphabet. Includes characters appearing in European languages using modified Latin alphabet.
MicroPro International Corporation Wordstar Professional v. 5.0.1988.
Patton, P. C. and R. A. Holoien, eds. Computing in the Humanities. Lexington Books: Lexington-Toronto, 1988, p. 51.
TEI, Text Encoding Initiative. TEI P1: Guidelines for the Encoding and Interchange of Machine-Readable Texts. Ed. C. M. Sperberg-McQueen and Lou Burnard. Chicago & Oxford, Draft Version 1.0, 16 July 1990.
Ziabicki, A. Automatic Ordering of Alphanumeric Strings According to Various Alphabets and Different Ordering Rules (in Polish). Inst. Fundamental Technological Res., Polish Academy of Sci. Report # 40/88, Warsaw, 1988.
Ziabicki, A. Two-step Ordering of Lexicographic Entries with Diacritical Marks in Various Languages (in Polish). Inst. Fundamental Technological Res., Polish Academy of Sci. Report # 47/90, Warsaw, 1990.
Author information
Authors and Affiliations
Additional information
Andrzej Ziabicki is professor of polymer physics in the Institute of Fundamental Technological Research, Polish Academy of Sciences in Warsaw. He is an author of several books and more than 150 papers on the theory, structure and physical properties of polymers. He is interested in computer-assisted education and computer applications in the humanities. Andrzej Ziabicki developed algorithms and computer programs for several lexicographical projects (reverse index for a Latin dictionary, a dictionary of Hittite names). These provided a basis for a more general theory of ordering lexicographic entries. He has undertaken a systematic study of ordering systems in various languages.
Rights and permissions
About this article
Cite this article
Ziabicki, A. The theory of ordering lexicographic entries: Principles, algorithms and computer implementation. Comput Hum 26, 119–137 (1992). https://doi.org/10.1007/BF00116348
Issue Date:
DOI: https://doi.org/10.1007/BF00116348