verticapy.vDataFrame.regexp#

vDataFrame.regexp(column: str, pattern: str, method: Literal['count', 'ilike', 'instr', 'like', 'not_ilike', 'not_like', 'replace', 'substr'] = 'substr', position: int = 1, occurrence: int = 1, replacement: str | None = None, return_position: int = 0, name: str | None = None) → vDataFrame#

Computes a new vDataColumn based on regular expressions.

Parameters#

column: str

Input vDataColumn used to compute the regular expression.

pattern: str

The regular expression.

method: str, optional

Method used to compute the regular expressions.

count:
Returns the number of times a regular expression matches each element of the input vDataColumn.

ilike:
Returns True if the vDataColumn element contains a match for the regular expression.

instr:
Returns the starting or ending position in a vDataColumn element where a regular expression matches.

like:
Returns True if the vDataColumn element matches the regular expression.

not_ilike :
Returns True if the vDataColumn element does not match the case -insensitive regular expression.

not_like:
Returns True if the vDataColumn element does not contain a match for the regular expression.

replace:
Replaces all occurrences of a substring that match a regular expression with another substring.

substr:
Returns the substring that matches a regular expression within a vDataColumn.

position: int, optional

The number of characters from the start of the string where the function should start searching for matches.

occurrence: int, optional

Controls which occurrence of a pattern match in the string to return.

replacement: str, optional

The string to replace matched substrings.

return_position: int, optional

Sets the position within the string to return.

name: str, optional

New feature name. If empty, a name is generated.

Returns#

vDataFrame: self

Examples#

Let’s begin by importing VerticaPy.

import verticapy as vp

Let’s generate a small dataset using the following data:

data = vp.vDataFrame(
    {
        "rollno": ['1', '2', '3', '4'],
        "subjects": [
            'English, Math',
            'English, Math, Computer',
            'Math, Computer, Science',
            'Math, Science',
        ],
    }
)

Let’s retrieve the second subject.

data.regexp(
    column = "subjects",
    pattern = "[^,]+",
    method = "substr",
    occurrence = 2,
    name = "subject_2").select(
        [
            "subjects",
            "subject_2",
        ]
    )

	Abc subjects Varchar(23) 100%	Abc subject_2 Varchar(23) 100%
1	English, Math	Math
2	English, Math, Computer	Math
3	Math, Computer, Science	Computer
4	Math, Science	Science

Let’s count the number of subjects.

data.regexp(
    column = "subjects",
    pattern = ",",
    method = "count",
    name = "nb_subjects",
)
data["nb_subjects"].add(1)
data.select(["subjects", "nb_subjects"])

	Abc subjects Varchar(23) 100%	123 nb_subjects Integer 100%
1	English, Math	2
2	English, Math, Computer	3
3	Math, Computer, Science	3
4	Math, Science	2