SQL Magic¶
%sql -c sql_command -f input_file -ncols nb_of_cols -nrows nb_of_rows -o output_file
Executes SQL queries in the Jupyter cell.
Parameters¶
Name | Type | Optional | Description |
---|---|---|---|
-c / --command | str | ✓ | SQL Command to execute. |
-f / --file | str | ✓ | Input file to execute. |
-ncols | int | ✓ | Maximum number of columns to display. |
-nrows | int | ✓ | Maximum number of rows to display. |
-o / --output | str | ✓ | Output file. Use this option to export the result of the query to the CSV or JSON format. |
Returns¶
vDataFrame : Result of the query.
Examples¶
These examples demonstrate the following:
- Setting up the environment.
- SQL Magic.
- Getting the vDataFrame of a query.
- Using variables inside a query.
- Limiting the number of rows and columns.
- Exporting a query to JSON or CSV
- Executing SQL files
Setting up the environment¶
If you don't already have one, create a new connection:
import verticapy as vp
# Save a new connection
vp.new_connection({"host": "10.211.55.14",
"port": "5433",
"database": "testdb",
"password": "XxX",
"user": "dbadmin"},
name = "VerticaDSN")
If you already have a connection in a connection file, you can use it by running the following command:
# Connection using the VerticaDSN connection
vp.connect("VerticaDSN")
Load the extension:
%load_ext verticapy.sql
Load a sample dataset. These sample datasets are loaded into the public schema by default. You can specify a target schema with the 'name' and 'schema' parameters:
from verticapy.datasets import load_titanic, load_iris
titanic = load_titanic()
iris = load_iris()
SQL Magic¶
Use '%%sql' to run a query on the dataset:
%%sql
SELECT
survived,
AVG(fare) AS avg_fare,
AVG(age) AS avg_age
FROM titanic
GROUP BY 1;
You can also run queries with '%sql' and the '-c' option:
%sql -c 'SELECT DISTINCT Species FROM iris;'
You can use a single cell for multiple queries:
%%sql
DROP TABLE IF EXISTS test;
CREATE TABLE test AS SELECT 'Badr Ouali' AS name;
SELECT * FROM test;
To add comments to query:
%%sql
-- Comment Test
/* My Vertica Version */
SELECT version(); -- Looking at the current Vertica version.
Getting the vDataFrame of a query¶
Results are stored in a vDataFrame, which is assigned to a temporary variable called '_'. As shown in the following procedure, you can assign this to a variable to save your results.
%%sql
SELECT
age,
fare,
pclass
FROM titanic
WHERE age IS NOT NULL AND fare IS NOT NULL;
Assign the results to a new variable:
titanic_clean = _
display(titanic_clean)
Temporary results are stored in a vDataFrame, so you can call vDataFrame methods:
titanic_clean.corr()
Using variables inside a query¶
You can use variables in a SQL query with the ':' operator. This variable can be a vDataFrame, a tablesample, a pandas.DataFrame, or any standard Python type.
import verticapy.stats as st
class_fare = titanic_clean.groupby("pclass",
[st.avg(titanic_clean["fare"])._as("avg_fare")])
class_fare
Use the 'class_fare' variable in a SQL query:
%%sql
SELECT
x.*,
y.avg_fare
FROM titanic AS x LEFT JOIN (SELECT * FROM :class_fare) AS y
ON x.pclass = y.pclass;