verticapy.vDataFrame.cdt#
- vDataFrame.cdt(columns: str | list[str] | None = None, max_cardinality: int = 20, nbins: int = 10, tcdt: bool = True, drop_transf_cols: bool = True) vDataFrame #
Returns the complete disjunctive table of the vDataFrame. Numerical features are transformed to categorical using the
vDataFrame.
discretize()
method. Applying PCA on TCDT leads to MCA (Multiple correspondence analysis).Warning
This method can become computationally expensive when used with categorical variables with many categories.
Parameters#
- columns: SQLColumns, optional
List of the vDataColumns names.
- max_cardinality: int, optional
For any categorical variable, keeps the most frequent categories and merges the less frequent categories into a new unique category.
- nbins: int, optional
Number of bins used for the discretization (must be > 1).
- tcdt: bool, optional
If set to True, returns the transformed complete disjunctive table (TCDT).
- drop_transf_cols: bool, optional
If set to True, drops the columns used during the transformation.
Returns#
- vDataFrame
the CDT relation.
Examples#
Let’s begin by importing VerticaPy.
import verticapy as vp
Hint
By assigning an alias to
verticapy
, we mitigate the risk of code collisions with other libraries. This precaution is necessary because verticapy uses commonly known function names like “average” and “median”, which can potentially lead to naming conflicts. The use of an alias ensures that the functions fromverticapy
are used as intended without interfering with functions from other libraries.Let us create a
vDataFrame
with multiple columns:vdf = vp.vDataFrame( { "id": [0, 1, 2, 3, 4, 5], "cats": ["A", "B", "C", "A", "B", "C"], "vals": [2, 4, 8, 1, 4, 2], }, )
123idInteger100%... AbccatsVarchar(1)100%123valsInteger100%1 0 ... A 2 2 1 ... B 4 3 2 ... C 8 4 3 ... A 1 5 4 ... B 4 6 5 ... C 2 We can create the complete disjunctive table of the
vDataFrame
:vdf.cdt(columns=["cats", "vals"], tcdt = False)
123idInteger100%... 123cats_ABool100%123vals_8Bool100%1 0 ... 1 0 2 1 ... 0 0 3 2 ... 0 1 4 3 ... 1 0 5 4 ... 0 0 6 5 ... 0 0 Same can be done to create the transformed complete disjunctive table of the
vDataFrame
:vdf.cdt(columns=["cats", "vals"], tcdt = True)
123idInteger100%... 123cats_ANumeric(36)100%123vals_8Numeric(36)100%1 0 ... -0.5 -1.0 2 1 ... -1.0 -1.0 3 2 ... -1.0 0.0 4 3 ... -0.5 -1.0 5 4 ... -1.0 -1.0 6 5 ... -1.0 -1.0 Note
This method can be useful to build an MCA (Multiple Correspondence Analysis) model based on a PCA (Principal Component Analysis) one. The transformed complete disjunctive table refers to a table used in MCA, where the original categorical data is transformed into binary indicators to represent the absence or presence of categories.
See also
PCA
: Principal Component Analysis.