What if your computer knew what you were typing? What if it knew you were bullying someone online? Would you still type those words, aware that your computer knows you are being hurtful?
Artificial intelligence is already being used around the world online. Gmail now uses an AI-based programme to suggest responses to emails. The iPhone has Siri, which listens to your commands and does her — or his, if you alter the default settings — best to provide answers. Even Grammarly, an internet-browser adapter, reads what you write and suggests edits to improve your correspondence. But do any of these methods actually understand your writing?
True AI should be able to understand the context of what you are writing, determine the overarching purpose of the communication, and perhaps even the reason behind the writing it in the first place. From there, it could determine if what you’re saying is positive, neutral, or negative. If it deems you to be writing something meant to be hurtful towards someone else, it could seize control of your keyboard, thus helping end online bullying or preventing you from sending that angry email immediately after a heated conversation.
Before we can end online bullying, we need to determine how to make computers understand what we are actually typing. For this, we look to word definitions.
If an AI can understand the meaning of the words you use, as well as the context within which those words are being used, it could differentiate between the 28 percent of words that have multiple meanings, thus, understanding what you’re talking about.
Within text analytics, this is done by pulling apart sentences, then putting them back together after determining their meanings.
Each sentence is split into individual words. Stop words (e.g., it, the, a) are removed, and each word is matched to every possible synset — or sets of synonyms — and definition. Words are also matched based on type, so if a word should be a noun, we can match it only with possible synsets and definitions that are also nouns. Joining by word type can help shrink the number of possibilities, reducing the computational time of the whole process.
Figure 1: Each word in the sentence is matched to possible definitions in the dictionary. Not all words have definitions in the dictionary based on the exact version of a word.
The most likely word definitions are determined by minimising the cosine distance between all words in a sentence. Cosine distances are calculated between two sets of tokens, and tokens are derived from two strings.
Still following? Good.
The initial set of tokens comprises of the words from the original sentence, plus the synsets, definitions, hypernyms (the next most general version of that word), and hypernym definitions of all words that only have a single possibility. The second set of tokens includes the words from the next possible word, definition, hypernym, and hypernym definition combination.
Figure 2: The initial comparison text is compared to each possible definition of a word. The definition that results in the lowest cosine distance is selected to be the most likely and is added to the initial comparison text.
The greater the similarity between two sets of words, the smaller the cosine distance between them. The word and definition combination that obtains the smallest cosine distance is joined to the initial set of words. Then the analysis is repeated with the next word possibility.
Figure 3: Once one word has been decided on, it is added with its definition to the initial comparison text and compared to the possible definitions of the next word.
The process continues until all words have been assigned. In order to account for word definitions that have equal cosine distances, this process is repeated around 1,000 times, changing the order in which the definitions are compared. This way, the resulting frequency of those definitions will even out and reject anything with no statistical significance.