Digital EDOC

The Digital EDOC (version 1.0)

Site currently under renovation as we migrate the backend to a graph db model, 2.0 version with more sources
(說文解字、經典釋文、方言、Pulleyblank 等等) and tools coming in 2024!


Welcome to the website of the Digital Etymological Dictionary of Old Chinese. This site contains a range of new tools designed to facilitate extensive analyses of the phonology and phonological structures of early Chinese texts.

Direct support for the development and maintenance of this site comes from a generous grant from the U.S. Department of State J. William Fulbright program, from the East Asian Languages and Civilizations Department of the University of Chicago, and from the Center for the Study of Excavated Documents and Ancient Philology at Fudan University.

Any questions, comments or bug reports may be sent to the email address at the bottom of this page.

For the Chinese version of this site (in traditional characters), please see the 中文版.


Instructions for Use:

The Digital EDOC is comprised of six main modules which users can access via the left-side menu.

1) EDOC : Linear Output

Enter text in the large text area next to the “Enter Text” label, either by copying and pasting or manual entry. The text should be in Chinese, in traditional characters; punctuation and/or other letters or symbols can also be included. Next, select at least one checkbox from the lists of fields for each database. (The buttons “All” and “None” can be used to select all or none of the data fields from a specific database.) Clicking on the “Display Phonetics” button parses the character(s) entered in the text area and displays a table, with each character listed in the first column, followed by the data retrieved from the database field(s) (selected via the checkboxes) running left-to right. This output may then be copied and pasted in Excel, Word or virtually any spreadsheet or word processing program.

At present, the text area is limited to 200 characters or symbols; there is no limit to the number of checkboxes a user may select.

In this module, headers for the data fields run left-to-right in the top row of each table. Any field which contains no information in the database is left blank, and any character which is not in the database will return blank fields.

The data from each database is to be viewed as a separate entity; while there are often correlations from one to the next, the fields across databases are in no way aligned nor is any correlation implied by their arrangement. Also, there are often multiple entries for each character in each database, but no order of preference should be assumed based on vertical alignment. Data returned from each database may be directly compared, but should be considered separate from data returned from the other database(s), even when presented in the same table. A thick black line separates data returned from the different databases.

The output can be scrolled both horizontally and vertically, while the text entry and checkboxes will remain visible at all times.

There is one special symbol which has a unique function in this module. The pilcrow symbol ¶ (also known as the “paragraph mark”) will cause the output to be split into multiple tables at the pilcrow, which is then removed. One blank line is inserted between each table in the output. The symbol can be cut and pasted from here for ease of use: ¶

2) EDOC : Table Output

In most respects, this module is identical to the first, with one significant difference: the output in this module presents the text from the text entry box running left-to-right across the top row of the table(s), and the data returned from the database(s) is then displayed vertically under each character. Headers for the data fields are displayed in the first column of each table and run top-to-bottom.

Again, there are often multiple entries in the database for each character in each database, yet the data returned from each database should be considered separate from data in the other database(s), even when aligned in the same table. As in the first module, a thick black line separates the data returned from different databases.

In this module, use of the pilcrow ¶ symbol will split the data into separate tables at the pilcrow.

3) EDOC : File Input

In terms of function, this module is identical to the first, except that rather than accepting input via a text box, it allows the user to upload a text file and parses the file as its input source. Either Unicode (UTF-8) or ANSI encoding may be used for the text file. The output is then arrayed in a tab-delimited file, and upon completion, the user is given a link via which the file may be downloaded.

This function is designed to enable large texts to be run through the database, texts which would result in output which would be difficult to display efficiently due to size and display constraints.

The current filesize limit for the input text file is 200,000 bytes (200KB).

A file conversion normally takes less than a minute, though exceptionally high server load or internet traffic may result in longer processing times.

4) EDOC : Search

The search module allows the user to search using various criteria as found in any of the databases, up to five levels deep. Wildcards (% and _) may be used along with the operator "Like" for more open-ended pattern matching. A result-set of characters based on the search criteria is then displayed on the lower section of the screen.

Results from the search are displayed as lines of 30 characters: these constitute the result-set. The characters can be copied and pasted into either of the two main modules in order to access more data for comparative purposes.

If the result-set returns more than 300 graphs, only the first 300 are displayed, followed by "output limit reached".


5) EDOC : Sound

This function is currently under construction.


6) Lists

This page provides lists which display a range of characters from the databases; each list is database-specific and returns only characters exactly as found in that database. The characters from these lists may then be used in the Search function.

The two lists currently diplayed are the rhyme groups 韻部 from the Qièyùn 《切韻》 and Guǎngyùn 《廣韻》 rhyme dictionaries. Requests for other lists should be sent to the Digital EDOC Support email address: support@edoc.uchicago.edu.


7) Links (to Databases of and Resources for Digitized Chinese Texts)

This page provides (1) Links to online databases of digitized Chinese texts and (2) Links to other sites with digital applications for early Chinese phonology and philology.

(1) Any of the texts on these sites can be copied and pasted directly into the main Digital EDOC modules. Most-recommended sites (due to quality control and breadth or depth of materials) are listed first.

(2) These online applications and databases are primarily aides for early Chinese phonological and philological analyses. Most-recommended sites are likewise listed first.

How to Cite:

The Digital EDOC is not a citable resource. Any data from the databases must be independently verified in print editions before publication.

Publication data for citing and verifying the data in the Digital EDOC:

ABC Etymological Dictionary of Old Chinese:

Schuessler, Axel. ABC Etymological Dictionary of Old Chinese. Honolulu: University of Hawai'i Press, 2007.

Baxter-Sagart 2011:

Baxter, W. and L. Sagart (n.d.). Baxter-Sagart Old Chinese reconstruction (Version 1.00). Online at http://crlao.ehess.fr/document.php?id=1217. Accessed 7-30-2012.

Qièyùn《切韻》:

Long Yuchun (ed.) 龍宇純著. 《唐寫全本王仁昫刊謬補缺切韻校箋》. 香港: 香港中文大學, 1968.

Guǎngyùn《廣韻》:

Yu Naiyong (coll.) 余迺永校注. 《新校互註宋本廣韻定稿本》. 上海: 上海人民大版社, 2008.

Jingdian shiwen《經典釋文》:

………………..

Shuo wen jie zi《說文解字》:

………………..

Minimal Old Chinese and Later Han Chinese:

………………..



For a discussion of the development of the website, please see the About page.