In A Matter of Interpretation, Dr. James A. Martin in Fomoco News discusses the progress made in computational linguistics and natural language processing as a result of his work on the AQUAINT project. The ACQUAINT project is a joint collaboration between CITRUS and UC Berkeley’s School of Information studying how information can be processed to provide enhanced semantic understanding for humans and machines alike. This joint effort has created a computational linguistics tool that could have implications for many fields, from neuroscience to computer science to education.
AQUAINT’s work in computational linguistics is important for natural language processing because AQUAINT has developed a tool that can handle multiple types of input simultaneously. A Matter of Interpretation discusses how AQUAINT will continue to push the boundaries on what it means to understand language by developing tools that are capable of analyzing not just written text, but also speech and video into one complete system.
Aim of the research
Its research aims at understanding human communication competence, which could have implications for many fields, from neuroscience to computer science to education. A Matter of Interpretation discusses AQUAINT’s ambition is building systems capable of handling multiple types of input (i.e., audio/video along with written text) simultaneously. A Matter of Interpretation discusses how AQUAINT will continue to push the boundaries on what it means to understand language by developing tools that are capable of analyzing not just written text, but also speech and video into one complete system.
The end result is an accurate model (specifically when compared with other tools) which can be applied across multiple languages without having to make major adjustments or training processes. A purely artificial intelligence approach would not work because there are too many unknown factors involved; computers do not currently have the capability to derive meaning from a context. A computational linguistics approach is the best solution for this problem because it uses factual information about language and processes data in ways that are similar to how humans would go about solving the same issue.
The first half of A Matter of Interpretation
It discusses what computational linguistics actually means, why they’re important, and some related concepts such as artificial intelligence (AI) and machine learning. The rest focuses on specific projects where these technologies were used successfully by academics across different disciplines. A few key areas discussed here include advancements made in healthcare analytics with A.I. and computational linguistics, A Matter of Interpretation: Developing a Computational Linguistics Tool for Natural Language Processing
Natural language processing
There are many different ways to build tools in computational linguistics, but this tutorial will focus on two major techniques. The first is statistical natural language processing (NLP), which uses machine learning methods such as neural networks or hidden Markov models to automatically learn from training data. The other approach is symbolic natural language processing (NLP), which requires that you programmatically define all knowledge needed to process language. A significant advantage of symbolic NLP is that it allows you to easily extend the application beyond what has been trained on, while a statistical approach typically requires retraining from scratch if new data needs to be processed.
Historically, symbolic approaches have been considered preferable. A statistical approach requires a lot of data to work well and training from scratch can be time consuming for complex models such as neural networks. However, these days with the huge amount of available text on the internet it is possible that even very sophisticated NLP tasks could learn everything they need automatically given enough computing power.
Two main techniques
These two main techniques are not mutually exclusive and many applications combine them in some way or use more specialized tools. For example, a rule-based parser might make initial guesses about parts of speech using pattern matching against known words before trying a statistical method to disambiguate word senses (e.g., “run” can mean either an action or something like “a successful business”). A statistical parser might rely on a rule-based part of speech tagger to avoid overfitting.
A naïve Bayes classifier or decision tree can be used for tagging or parsing, and such an approach is sometimes called “weakly supervised” as opposed to the common “semi-supervised” methods that use additional labeled data (e.g., morphological information), though usually with less human effort expended in collecting it. A related field is natural language generation which attempts to produce coherent, meaningful sentences at the second half of NLP pipelines using mostly unannotated corpus sources like newswire text.
A Study of Morphological Awareness for Grammatical Error Correction
A representative example of such a system is the IBM Watson which used to compete at Jeopardy! A “strongly supervised” approach can be considered as a special case of semi-supervised learning where all training data are labeled.
A recent paper A Study of Morphological Awareness for Grammatical Error Correction by Lample, Daelemans and van der Laan explored a way to better incorporate morphologic information in order to improve the performance of machine-learning systems which perform grammatical error correction using noisy text.