verticapy.vDataFrame.regexp#
- vDataFrame.regexp(column: str, pattern: str, method: Literal['count', 'ilike', 'instr', 'like', 'not_ilike', 'not_like', 'replace', 'substr'] = 'substr', position: int = 1, occurrence: int = 1, replacement: str | None = None, return_position: int = 0, name: str | None = None) vDataFrame #
Computes a new vDataColumn based on regular expressions.
Parameters#
- column: str
Input vDataColumn used to compute the regular expression.
- pattern: str
The regular expression.
- method: str, optional
Method used to compute the regular expressions.
- count:
Returns the number of times a regular expression matches each element of the input vDataColumn.
- ilike:
Returns True if the vDataColumn element contains a match for the regular expression.
- instr:
Returns the starting or ending position in a vDataColumn element where a regular expression matches.
- like:
Returns True if the vDataColumn element matches the regular expression.
- not_ilike :
Returns True if the vDataColumn element does not match the case -insensitive regular expression.
- not_like:
Returns True if the vDataColumn element does not contain a match for the regular expression.
- replace:
Replaces all occurrences of a substring that match a regular expression with another substring.
- substr:
Returns the substring that matches a regular expression within a vDataColumn.
- position: int, optional
The number of characters from the start of the string where the function should start searching for matches.
- occurrence: int, optional
Controls which occurrence of a pattern match in the string to return.
- replacement: str, optional
The string to replace matched substrings.
- return_position: int, optional
Sets the position within the string to return.
- name: str, optional
New feature name. If empty, a name is generated.
Returns#
- vDataFrame
self
Examples#
Let’s begin by importing VerticaPy.
import verticapy as vp
Let’s generate a small dataset using the following data:
data = vp.vDataFrame( { "rollno": ['1', '2', '3', '4'], "subjects": [ 'English, Math', 'English, Math, Computer', 'Math, Computer, Science', 'Math, Science', ], } )
Let’s retrieve the second subject.
data.regexp( column = "subjects", pattern = "[^,]+", method = "substr", occurrence = 2, name = "subject_2").select( [ "subjects", "subject_2", ] )
AbcsubjectsVarchar(23)100%Abcsubject_2Varchar(23)100%1 English, Math Math 2 English, Math, Computer Math 3 Math, Computer, Science Computer 4 Math, Science Science Let’s count the number of subjects.
data.regexp( column = "subjects", pattern = ",", method = "count", name = "nb_subjects", ) data["nb_subjects"].add(1) data.select(["subjects", "nb_subjects"])
AbcsubjectsVarchar(23)100%123nb_subjectsInteger100%1 English, Math 2 2 English, Math, Computer 3 3 Math, Computer, Science 3 4 Math, Science 2 See also
vDataFrame.
eval()
: Evaluates an expression.