PartsOfSpeech
Tags the words in one or more sentences with their part of speech classification, using Penn Treebank parts of speech tags.
Syntax
SELECT PartsOfSpeech('sentences'[, language='lang'] [using PARAMETERS [ language='lang'] [, adjustcasing=boolean) OVER(PARTITION BEST);
Parameters
Argument | Description |
---|---|
sentences |
One or more sentences to be tagged with parts of speech markup. |
language |
The language:
|
adjustcasing | Optional. Defaults to false. When set to true, all letters in the text are converted to uppercase before sentence detection. After performing sentence detection, Vertica converts all letter to lowercase. This option can help you in cases where the original data is all in lowercase letters and Pulse is incorrectly identifying sentence boundaries. |
Notes
- This function returns a part of speech markup for each word. The markup used is the Penn Treebank Project Parts of Speech Tags while for Spanish the Parole Reduced Tagset is used.
-
This function must be used with the
over()
clause. Use withOVER(PARTITION BEST)
for the best performance if the query does not require specific columns in theover()
clause.
Examples
select partsOfSpeech('The quick brown fox jumped over the lazy dog.') OVER(PARTITION BEST); sentence | token | part_of_speech ----------+--------+---------------- 1 | the | DT 1 | quick | JJ 1 | brown | JJ 1 | fox | NN 1 | jumped | VBD 1 | over | IN 1 | the | DT 1 | lazy | JJ 1 | dog | NN 1 | . | . (10 rows)
select partsOfSpeech('Every good boy deserves fudge.') OVER(PARTITION BEST); sentence | token | part_of_speech ----------+----------+---------------- 1 | every | DT 1 | good | JJ 1 | boy | NN 1 | deserves | VBZ 1 | fudge | NN 1 | . | . (6 rows)
select partsOfSpeech('The quick brown fox jumped over the lazy dog.', 'english') OVER(PARTITION BEST); sentence | token | part_of_speech ----------+--------+---------------- 1 | the | DT 1 | quick | JJ 1 | brown | JJ 1 | fox | NN 1 | jumped | VBD 1 | over | IN 1 | the | DT 1 | lazy | JJ 1 | dog | NN 1 | . | . (10 rows) select partsofSpeech('El zorro rapido brinco sobre el perro flojo','spanish') over(); sentence | token | part_of_speech ----------+--------+---------------- 1 | El | DA 1 | zorro | NC 1 | rapido | AQ 1 | brinco | AQ 1 | sobre | SP 1 | el | DA 1 | perro | NC 1 | flojo | AQ (8 rows)