Skip to Main Content

Library Scene

Research Focus: Computational Linguistics

by John Schriner on 2022-03-03T10:11:00-05:00 in Linguistics, Research Focus (Blog Posts) | 0 Comments

There are currently ~7000 distinct languages on Earth.  That may seem like a lot depending on where you live, who you interact with, and the media you consume.  Languages become endangered and it's generally agreed that we lose a language about every two weeks every 3-4 months due to not being passed on to children. 

There are MANY branches in linguistics including Second Language Acquisition, Sociolinguistics (the study of the sociological aspects of language), Historical Linguistics (the study of language families and shared ancestor languages), Phonology (the study of linguistic sound), and Computational Linguistics (using computers and scripting to get a better understanding of language).  Even just thinking about those branches we see that each of them overlaps with other giant academic areas like psychology, anthropology, cognitive science, and computer science, respectively.

Historical linguistics is wildly interesting as we learn that English is a Germanic language and not a Romance language like French, Spanish or Romanian; or we step back further and see that languages spread across the world as varied as Hindi, Dutch, and Russian are from an old language family called Indo-European. 

Sociolinguistics is equally interesting when we think about variation in English: apparently if you grew up in New York City when queued up to buy movie tickets you likely say that you're "standing online," whereas people outside of the city say they are "standing in line" (of course variation varies).

My research focus is computational linguistics for the purpose of better understanding how we use language--both written and spoken.  This could mean projects like looking at years of Reddit posts collected into a corpus and seeing how word-use and memes rise and fall.  Or it could mean machine-learning for the purposes of improving speech-to-text systems like Alexa or Siri.  We use a lot of the python scripting language which turns out to be used in many academic areas.

This recent project used machine-learning to predict the pronunciation of words in the Adyghe language of the Caucasus.  Here is just one of the findings for the experiment.  The phonological (sound) data is written in the International Phonetic Alphabet and Adyghe is written in the Cyrllic script.

This may look complicated but computational linguistics programs build up scripting skills atop foundational linguistics classes. 

It's an area that's wide open for research!

 Add a Comment



Enter your e-mail address to receive notifications of new posts by e-mail.


  Return to Blog
This post is closed for further discussion.