Loading...

verticapy.vDataFrame.regexp#

vDataFrame.regexp(column: str, pattern: str, method: Literal['count', 'ilike', 'instr', 'like', 'not_ilike', 'not_like', 'replace', 'substr'] = 'substr', position: int = 1, occurrence: int = 1, replacement: str | None = None, return_position: int = 0, name: str | None = None) vDataFrame#

Computes a new vDataColumn based on regular expressions.

Parameters#

column: str

Input vDataColumn used to compute the regular expression.

pattern: str

The regular expression.

method: str, optional

Method used to compute the regular expressions.

  • count:

    Returns the number of times a regular expression matches each element of the input vDataColumn.

  • ilike:

    Returns True if the vDataColumn element contains a match for the regular expression.

  • instr:

    Returns the starting or ending position in a vDataColumn element where a regular expression matches.

  • like:

    Returns True if the vDataColumn element matches the regular expression.

  • not_ilike :

    Returns True if the vDataColumn element does not match the case -insensitive regular expression.

  • not_like:

    Returns True if the vDataColumn element does not contain a match for the regular expression.

  • replace:

    Replaces all occurrences of a substring that match a regular expression with another substring.

  • substr:

    Returns the substring that matches a regular expression within a vDataColumn.

position: int, optional

The number of characters from the start of the string where the function should start searching for matches.

occurrence: int, optional

Controls which occurrence of a pattern match in the string to return.

replacement: str, optional

The string to replace matched substrings.

return_position: int, optional

Sets the position within the string to return.

name: str, optional

New feature name. If empty, a name is generated.

Returns#

vDataFrame

self

Examples#

Let’s begin by importing VerticaPy.

import verticapy as vp

Let’s generate a small dataset using the following data:

data = vp.vDataFrame(
    {
        "rollno": ['1', '2', '3', '4'],
        "subjects": [
            'English, Math',
            'English, Math, Computer',
            'Math, Computer, Science',
            'Math, Science',
        ],
    }
)

Let’s retrieve the second subject.

data.regexp(
    column = "subjects",
    pattern = "[^,]+",
    method = "substr",
    occurrence = 2,
    name = "subject_2").select(
        [
            "subjects",
            "subject_2",
        ]
    )
Abc
subjects
Varchar(23)
100%
Abc
subject_2
Varchar(23)
100%
1English, Math Math
2English, Math, Computer Math
3Math, Computer, Science Computer
4Math, Science Science

Let’s count the number of subjects.

data.regexp(
    column = "subjects",
    pattern = ",",
    method = "count",
    name = "nb_subjects",
)
data["nb_subjects"].add(1)
data.select(["subjects", "nb_subjects"])
Abc
subjects
Varchar(23)
100%
123
nb_subjects
Integer
100%
1English, Math2
2English, Math, Computer3
3Math, Computer, Science3
4Math, Science2

See also

vDataFrame.eval() : Evaluates an expression.