Bulk Loading Word Lists from Text Files
If you have many words that you need to add to the user-dictionary or normalization mapping, then it may be easier to create the word lists in a text file and load the lists using the COPY command.
Bulk Loading User Dictionary Lists
To bulk load user-dictionary lists into the pulse schema, first create a text file with the list of words to add, one word per line, for each of the user-dictionaries. See Dictionaries and Mappings for a list of the user-dictionaries and normalization map. Optionally name each text file to match the name of the corresponding user-dictionary. Place these text files in the /home/dbadmin
directory.
Then, in vsql, use one or more of the following commands to load the respective text file into the pulse schema. These commands assume that you are using English version of Pulse, that the built-in user dictionary tables in the pulse schema and that the text files are named the same as the user-dictionary.
copy pulse.neg_words_en(standard) from '/home/dbadmin/neg_words.txt';
copy pulse.neutral_words_en(standard) from '/home/dbadmin/neutral_words.txt';
copy pulse.pos_words_en(standard) from '/home/dbadmin/positive_words.txt';
copy pulse.stop_words_en(standard) from '/home/dbadmin/stop_words.txt';
copy pulse.white_list_en(standard) from '/home/dbadmin/white_list.txt';
Bulk Loading the Normalization Map
You can load normalization terms into the pulse schema similarly to loading user-dictionaries. However, instead of one word per line, use the convention of one pair of words per line, separated by a comma. For example, to map the different forms of OpenText to HP, create a text file in /home/dbadmin
named normalization.txt with the following content:
hp, hewlett packard
hp, hewlett-packard
Then, in vsql, use the following command to load the normalization into the pulse schema.
copy pulse.normalization_en (standard_base, standard_synonym) from '/home/dbadmin/normalization.txt' delimiter ',';
When you have finished loading the text files, run the loadUserDictionaries.sql
script to update the new terms in memory:
vsql -f /opt/vertica/packages/pulse/ddl/loadUserDictionaries.sql