SentimentAnalysis

Provides a sentiment score for each attribute (noun) in a given body of text. Positive sentiment receives a positive integer score and negative sentiment receives a negative integer score. A score of 0 indicates that the sentiment for the attribute is neutral.

This function must be used with the OVER() clause. Use OVER(PARTITION BEST) for the best performance if the query does not require specific columns in the OVER() clause. Any valid PARTITION BY clause is acceptable. However, only the PARTITION BY clause which matches the segmentation clause of the table's projection provides optimum performance. You can improve performance by segmenting on the columns in the PARTITION BY clause.

Syntax

SentimentAnalysis(text [, 'language'] [ USING PARAMETERS 
[ whitelistonly = boolean ] 
[, filterlinks = boolean ] 
[, filterusermentions = boolean ] 
[, filterhashtags = boolean ]
[, filterpunctiation = boolean ] 
[, filterretweets = boolean ]
[, relatedwords = boolean ]
[, adjustcasing = boolean ]
[, language = string ]
[, label='label']
[, granularity='ASD']
[, actionPattern='boolean']			
])

Note: language can be specified as an argument and/or as a parameter. When specified as both, the argument value supersedes the parameter value.

Parameters

Argument Description

text

The text to analyze. Limited to 65,000 bytes.

whitelistonly

Optional. Default false. When set to true only attributes defined in the whitelist user-dictionary are scored. Use this setting to limit your analysis to the objects of action patterns.

filterlinks

Optional. Default false. When set to true, links are not included as attributes.

filterusermentions

Optional. Default false. When set to true, Twitter user mentions (@username) are not included as attributes.

filterhashtags

Optional. Default false. When set to true, Twitter hashtags (#hashtag) are not included as attributes.

filterpunctuation

Optional. Default true. Filters any punctuation that occurs at the beginning of an attribute other than @ and #.

filterretweets

Optional. Defaults to false.Filters out the characters "RT" from re-tweets in attributes.
relatedwords Optional. Defaults to false. When set to true, provides up to three words from the sentence used to help determine the sentiment of the attribute.
adjustcasing Optional. Defaults to false. When set to true, all letters in the text are converted to uppercase before sentence detection. After performing sentence detection, Vertica converts all letter to lowercase. This option can help you in cases where the original data is all in lowercase letters and Pulse is incorrectly identifying sentence boundaries.
language

The language:

  • 'english' or 'en'
  • 'spanish' or 'es'
label Optional. The label of the dictionaries that you want to use for sentiment analysis. If you do not include a label, Pulse uses the default dictionaries.
granularity

Optional. The level of the sentiment analysis that you want to perform:

  • A — Attribute level analysis
  • S — Sentence level analysis
  • D — Document level analysis

You can specify any granularity level or combination of levels with your sentiment analysis. If you do not specify a granularity level, Pulse performs an attribute level analysis.

actionPattern

Optional. Default false. When set to true checks for action patterns in the analyzed content.

Examples

These examples show various ways you can use Pulse to detect user sentiment.

Query for sentiment in the following sentence.

SELECT SentimentAnalysis('The quick brown fox jumped over the lazy dog.') OVER(PARTITION BEST);
sentence | attribute | sentiment score
----------+-----------+-----------------
        1 | fox       |               1
        1 | dog       |              -1
(2 rows)

Query to identify the words that triggered the sentiment score.

SELECT SentimentAnalysis('The quick brown fox jumped over the lazy dog.' 
USING PARAMETERS relatedwords=true) OVER(PARTITION BEST);

 sentence | attribute | sentiment_score | related_word_1 | related_word_2 | related_word_3 
----------+-----------+-----------------+----------------+----------------+----------------
        1 | fox       |               1 | quick          | lazy           | 
        1 | dog       |              -1 | lazy           |                | 
(2 rows)

SELECT SentimentAnalysis('The quick brown fox jumped over the lazy dog.', 'english') 
OVER(PARTITION BEST);
 sentence | attribute | sentiment_score
----------+-----------+-----------------
        1 | fox       |               1
        1 | dog       |              -1
(2 rows)
					

SELECT SentimentAnalysis('The quick brown fox jumped over the lazy dog.' 
using PARAMETERS language='english') OVER(PARTITION BEST);
 sentence | attribute | sentiment_score
----------+-----------+-----------------
        1 | fox       |               1
        1 | dog       |              -1
(2 rows)
					


SELECT SentimentAnalysis('El zorro rapido brinco sobre el perro flojo.', 
'spanish') OVER(PARTITION BEST); 
 sentence | attribute | sentiment_score
----------+-----------+-----------------
        1 | zorro     |               1
        1 | perro     |              -1
(2 rows)
					


SELECT SentimentAnalysis('El zorro rapido brinco sobre el perro flojo.' 
using PARAMETERS language='spanish') OVER(PARTITION BEST);
 sentence | attribute | sentiment_score
----------+-----------+-----------------
        1 | zorro     |               1
        1 | perro     |              -1
(2 rows)

				
				
SELECT SentimentAnalysis('The camera takes great quality pictures but is 
expensive. It feels like a professional one.'  
USING PARAMETERS granularity='ASD') over();
 sentence |    attribute     | sentiment_score | mixed
----------+------------------+-----------------+-------
          |                  |               1 | true 
        1 |                  |               0 | true
    	 2 |                  |               1 | false
	 1 | camera           |               1 |
	 1 | quality pictures |               1 |

				
SELECT sentimentAnalysis('Right after school on November 8th I will go to target, walmart, and best buy and buy #blueslidepark just for @MacMiller' USING PARAMETERS actionPattern=true,whitelistonly=true) over();
 sentence | attribute | sentiment_score |    action    |       action_pattern
----------+-----------+-----------------+--------------+----------------------------
      	1 | walmart    |               1 | go to target | #action{$verb $prep $verb} 
 	1 | walmart    |               1 | go to target | #action{$verb to $verb}
		(2 rows)
								

Getting Twitter User-Mentioned Sentiment

SELECT SentimentAnalysis('@company is great!') OVER(PARTITION BEST);
 sentence | attribute | sentiment score
----------+-----------+-----------------
        1 | @company  |               1
(1 row)

Filtering Twitter User Sentiment

SELECT SentimentAnalysis('@company is great!' USING PARAMETERS 
filterusermentions=true) OVER(PARTITION BEST); sentence | attribute | sentiment score ----------+-----------+----------------- (0 rows)

See Also