Dictionaries and Mappings

Pulse contains built-in dictionaries and maps that help determine the sentiment of sentences. You have the option of creating and loading user-defined dictionaries and maps.

Dictionaries and Mappings are loaded across all client sessions and remain in memory even if the database is stopped and started.

Dictionaries

Pulse uses a proprietary system dictionary to help score sentiment. The system dictionary is not visible or modifiable. You can, however, alter the default way that Pulse scores sentiment by modifying user dictionaries. The user dictionaries provide flexibility so that you can tune sentiment scoring for your specific domain. You do not have to modify user dictionaries if Pulse is scoring your data appropriately.

Users can apply dictionaries on a per-user basis. Any number of Pulse users can concurrently apply different sets of dictionaries without conflicts and without disrupting the sessions of other users. Each user can have one dictionary of each type loaded at any given time. If a user does not specify a dictionary of a given type, Pulse uses the default dictionary for that type.

Mappings

Maps are lists of synonyms of one or more words that map to another word. Using maps allows you to analyze text that pertains to the same subject or concept but may use slightly different terminology.

For example, you can map both 'Hewlett Packard' and 'Hewlett-Packard' (with hyphen) to 'HP.' Pulse substitutes the mapped words to the core word when it runs its analysis.

Dictionary and Mapping Tables

User dictionaries and a normalization map for each supported language reside in tables inside the Pulse schema. You can see the contents of the tables with simple queries such as:

SELECT * FROM pulse.pos_words_en;

Or:

SELECT * FROM pulse.pos_words_es;

There is one table per dictionary/map for each language. The table name has the language abbreviation as a suffix. For example, English tables have the suffix "_en" and Spanish tables have the suffix "_es". By default, the user dictionaries and normalization map are empty. You can modify these tables to tune Pulse to your specific needs. After you modify these tables, you must load the changes into memory.

You can update the user dictionaries and normalization tables at any time. To do so, you must run load functions (see LoadDictionary()and LoadMapping()) to load the values from the tables into memory. Your changes affect sentiment scoring only after you load the new values.

Note: Loading a user dictionary or loading a normalization map overwrites the values in memory with the values from the specified table. You cannot append user dictionaries or the normalization map in memory.

The following dictionary table names provide descriptions of the English user dictionaries. For Pulse versions that support Spanish, the same set of dictionaries with the suffix "_es" is present in the Pulse schema.

Dictionary Table Name Description
white_list_en

Words that are always marked as an attribute. This list augments the built-in Pulse attribute discovery process. Add words that you always want scored to the white_list user dictionary. For example, such words can include nouns, phrases or business-dependent attributes that are not auto-discovered by Pulse.

This list is typically modified to increase the accuracy of sentiment scoring for your domain.

stop_words_en

Words that are never marked as an attribute. Add words that you do not want scored to the stop_words user dictionary. Use this dictionary to filter out attributes that are not of interest to your analysis. This list is typically modified to increase the accuracy of sentiment scoring for your domain.

The stop_words dictionary can only contain nouns and compound nouns. If Pulse does not identify a stop word as a noun, it ignores it.

pos_words_en

Positive words that can be any type of word or phrase. Words in this list are more likely to carry a positive polarity in general.

You can also add exact phrases, such as idioms, to this list.

Examples: adroit, resolve, strong, hit the nail on the head

neg_words_en

Negative words that can be any type of word or phrase that have a negative connotation. Words in this list are deemed more likely to carry a negative polarity in general.

You can also add exact phrases, such as idioms, to this list.

Examples: abhorrent, butcher, racist, wrath, flash in the pan.

neutral_words_en

Words that indicate a neutral connotation. Words in this list are scored with a sentiment of 0, meaning not positive or negative.

The following table shows the tables that describe mapping within Pulse.

Mapping Table Name Description Example
normalization_en

A list of word pairs used to map like terms (synonyms). You can use this to correct common misspellings and map them to the correct spelling. This list is frequently modified and is empty by default.

base/synonym:

  • 'hp'/ 'hewlettpackard'
  • 'hp'/ 'Hewlett-Packard'
  • 'Obama'/ 'President Obama'
  • 'Obama'/ 'Barack Obama'