Wordcorr Spells Success in Any Language
One of the most unique projects completed by DataHouse is the Wordcorr project. Wordcorr started out when linguist Dr. Joe Grimes and DataHouse applied for and received a grant from the National Science Foundation (NSF) to develop a way to compare families of languages using the power of computers.
The field of comparative linguistics depends on a lot of hard work tabulating correspondences between sounds and meanings. In the past, a researcher immersed in the study of one language would come across a word that sounded like it came from another language. There would then be an arduous process of tracking and comparing sounds and words, attempting to demonstrate the probability that two languages were related, and tracing the evolution and origin of the language.
With linguist-machine collaboration, Dr. Grimes wanted to develop a tool that could automate the tracking of sounds and probable meanings, letting the linguists add their creativity while using the computer to do the heavy computation of comparing words and sounds. He selected DataHouse to help develop that tool.
Wordcorr is open source. That means that, just like the open source Linux operating system, Wordcorr is available for download worldwide from SourceFORGE, the open source portal, or at the Wordcorr Web site. Wordcorr has been downloaded by teachers, students, and linguists doing field work across the globe, including Southeast Asia, China, India, Africa and South America.

Figure 1 - Wordcorr's home on the open source marketplace
The Development Process
The Wordcorr application was developed in Java. Java was selected because the Wordcorr application had to be used by linguists in the field, and they needed a portable language, one that could be used on any platform. The development team included DataHouse developers, Dr. Grimes and two University of Hawaii linguistics students. The collaborative programming effort spanned two years.
For DataHouse, one of the most challenging aspects of the job involved getting an education in linguistics. The vocabulary used by professional linguists to describe languages is unique, and transcribing words involves use of the International Phonetic Alphabet to represent sounds that cannot be described using the regular Roman alphabet. The sounds in languages have to be transcribed in a consistent manner to facilitate comparison.

Figure 2 - Wordcorr interface shows words used to describe "octopus" in various Polynesian languages
The complex user interface gives professional linguists the ability to define their own collections, create their own character combinations representing unique sounds, annotate their own work, and tabulate commonalities among languages that may be related. They can also download and work with collections made available by other linguists, providing archival information to the Open Language Archives Community.
Using Wordcorr
Wordcorr can be used in the classroom to teach students how to compare languages, and it can be downloaded by those who are just curious about languages, but it is primarily used by field linguists who work with natural languages, recording and transcribing them for posterity. Linguists transcribe and annotate individual words in a variety of possibly related languages. This information is then downloaded to Wordcorr, where it can be easily compared to other languages that might be related.
One such comparison that can be downloaded for study involves Polynesian languages. A quick look demonstrates, for example, that the word for "sea" is closely related in several languages - "tai" in Maori, Samoan, and Rarotongan, "tahi" in Tongan, and "kai" in Hawaiian.
|