Princeton Wordnet Explained

What is a Wordnet?

  • In today’s post we will be exploring the Princeton Wordnet. The goal is to gain an understanding of what it is, how it functions and how it is organised. We will also be discussing how a wordnet differs from other language tools we may come across. 
  • Firstly, it is important to understand what exactly a word net does. Put simply, a wordnet is a collection of data containing lexical information about words. The Princeton Wordnet, which we will be drawing from today, is an english wordnet, so all lexical data it contains pertains to English in particular. 
  • Already we can begin to see a difference between this tool and other language tools, take for instance parallel corpora which show two languages in concordance rather than one.
  • Another key concept to keep in mind is synonymy. Think of a wordnet almost as an advanced kind of theasaurus. Words are collected together in groups called synsets based on the closeness of their meaning and linguists can then examine their relation between eachother. Essentially, a wordnet is a database that allows us to closely explore the relationships words have with each other. 

Princeton wordnet explained

synsets, hierarchy & structure

Synsets

As mentioned above, a word net groups words into synsets based on how closely they’re related, using meaning and sense to organise these synsets.

Let’s take a moment to understand what exactly the characteristics of a synset are.n

A synset displays a collection of words who have a similar semantic concept or meaning, pictured right we can see an example of a synset. As you can see, words can be more closely linked than one would initially presume. An example of a simple synset would be bag -> satchel -> rucksack.

This may appear to be the same as a traditional thesaurus, however, the organisation of a word net goes far beyond the information contained in a thesaurus.

/https://www.cs.princeton.edu/courses/archive/fall10/cos226/assignments/wordnet.html

Hierarchy & Structure

Synsets are organised based on a hierarchical structure. Words are grouped based on how close or far, i.e. broad and narrow their respective meanings are.

It is when looking at this hierarchy that we are introduced to the concepts of hypernyms and hyponyms. Although they may sound daunting, they’re actually quite an easy concept to understand.

Hypernyms deal with linking more general terms to more abstract or conceptual ones, becoming increasingly specific the more the hypernym continues. This is sometimes called super-subordinate relation.

An example of a hypernym would be cat -> rag doll. A rag doll cat breed is a specific subordinate term of the word cat.

Hyponyms on the other hand, is the opposite, we’re moving from a specific term to find it’s more general hypernym. It’s essentially a role reversal : rag doll -> cat .

Beyond this, synsets explore concepts far beyond a traditional thesaurus or dictionary by addressing other key lexical questions one may have about word relationships. They can display antonyms (words which hold an opposite meaning), meronymy (part-whole relations, e.g. finger -> hand) and holonymy (whole-part relations. e.g. hand -> finger).

Why word net?

So why choose a word net over a traditional language tool such as a dictionary or thesaurus? Well, as we’ve learned today the difference is found in the organisation and hierarchy used in a word net.

Dictionaries and Thesauri lack a structured hierarchy. A dictionary is simply displayed alphabetically while a thesaurus is organised based on synonymy alone.

Wordnet is unique in it’s ability to provide a nuanced and detailed display of the relations between words on a much more comprehensive level. It is an indispensable tool for aspiring lexical and semantic enthusiasts.

(I’ve included this short video from Crash Course Linguistics which offers a great simple overview of semantics and word relations! Make sure to check it out)

Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *