verticapy.vDataFrame.merge_similar_names#
- vDataFrame.merge_similar_names(skip_word: str | list[str]) vDataFrame #
Merges columns with similar names. The function generates a COALESCE statement that merges the columns into a single column that excludes the input words. Note that the order of the variables in the COALESCE statement is based on the order of the ‘get_columns’ method.
Parameters#
- skip_word: str | list, optional
List of words to exclude from the provided column names. For example, if two columns are named ‘age.information.phone’ and ‘age.phone’ AND
skip_word
is set to['.information']
, then the two columns are merged together with the following COALESCE statement:COALESCE("age.phone", "age.information.phone") AS "age.phone"
Returns#
- vDataFrame
An object containing the merged element.
Examples#
Let’s begin by importing VerticaPy.
import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.For this example, let’s generate a dataset which has two columns that are duplicates with slight change in spelling and some missing values:
vdf = vp.vDataFrame( { "user.id": [12, None, 13], "id": [12, 11, None], } )
123user.idInteger66%123idInteger66%1 12 12 2 [null] 11 3 13 [null] In order to remove the redundant column, we can combine them using
merge_similar_names
:vdf.merge_similar_names(skip_word = "user.")
123idInteger100%1 12 2 11 3 13 Note
This function is particularly useful when flattening highly nested JSON files. Such files may contain redundant features and inconsistencies. The function is designed to merge these features, ensuring consistent information.
See also
vDataFrame.
pivot()
: Pivots the vDataFrame.