Python Integration

DBCode Notebooks support Python cells, allowing you to combine SQL queries with Python data analysis tools. This powerful integration enables you to query your database with SQL and then process, visualize, and analyze the results using Python libraries like pandas, matplotlib, and numpy.

Prerequisites

To use Python in DBCode Notebooks, you need to have the Jupyter extension installed in VS Code.

Using Python Cells

Adding a Python Cell

When creating a new cell in a DBCode Notebook, you can choose the language for that cell. Simply change the language selector from “SQL” to “Python” to create a Python cell.

Selecting a Python Kernel

The first time you execute a Python cell in a notebook, DBCode will discover any running Jupyter kernels and prompt you to select one. If you have multiple Jupyter notebooks open with different kernels, you’ll be able to choose which kernel to use.

Your kernel selection is saved with the notebook, so you won’t be prompted again unless you explicitly change it. You can change the kernel at any time by clicking on the kernel selector in the cell status bar.

The kernel selector shows the Python environment and the Jupyter notebook it’s running from (e.g., “Python - analysis.ipynb”), making it easy to identify which kernel you’re using.

SQL to Python Injection

One of the most powerful features of Python integration is the ability to automatically inject SQL query results into Python as pandas DataFrames. This allows you to seamlessly transition from querying data to analyzing it.

Enabling Injection

To inject SQL results into Python:

  1. Add @var annotations to your SQL cells as comments
  2. Click the Python injection toggle in the SQL cell’s status bar to enable injection
  3. Execute the SQL cell
  4. The results will be automatically available in Python cells as pandas DataFrames

Note: The Python injection toggle only appears in SQL cells if your notebook contains at least one Python cell.

Using @var Annotations

The @var annotation tells DBCode which variable name to use when injecting the result set into Python. Add it as a SQL comment before your query:

-- @var users
SELECT * FROM users WHERE created_at > NOW() - INTERVAL '7 days';
-- @var orders
SELECT * FROM orders WHERE status = 'pending';

Multiple Result Sets

If your SQL cell produces multiple result sets (e.g., multiple SELECT statements), you can use multiple @var annotations. They are matched to result sets by order:

-- @var user_count
SELECT COUNT(*) FROM users;
-- @var order_count
SELECT COUNT(*) FROM orders;
-- @var product_count
SELECT COUNT(*) FROM products;

The first @var annotation gets the first result set, the second gets the second result set, and so on.

Using Injected Data in Python

Once injection is enabled and the SQL cell is executed, the variables are immediately available in Python cells:

import pandas as pd
import matplotlib.pyplot as plt
# The 'users' DataFrame is automatically available
print(f"Total users: {len(users)}")
print(users.head())
# Analyze and visualize
user_summary = users.groupby('country').size()
user_summary.plot(kind='bar')
plt.title('Users by Country')
plt.show()

Injection Help

When Python injection is enabled, a help icon (?) appears in the cell status bar. Click it to:

  • See which variables will be created
  • View the @var annotations found in the cell
  • Access this documentation

Data Types

SQL results are automatically converted to appropriate Python types:

  • Numbers: Integers and decimals become Python int/float
  • Strings: Text becomes Python str
  • Dates/Times: Timestamps are preserved as pandas datetime objects
  • NULL values: Converted to pandas NaN

Example Workflow

Here’s a complete example of using SQL and Python together:

SQL Cell 1:

-- @var sales_data
SELECT
date_trunc('month', order_date) as month,
SUM(amount) as total_sales,
COUNT(*) as order_count
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY date_trunc('month', order_date)
ORDER BY month;

Python Cell 1:

import pandas as pd
import matplotlib.pyplot as plt
# Calculate growth rate
sales_data['growth_rate'] = sales_data['total_sales'].pct_change() * 100
# Create visualization
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))
# Sales trend
ax1.plot(sales_data['month'], sales_data['total_sales'], marker='o')
ax1.set_title('Monthly Sales Trend')
ax1.set_ylabel('Total Sales ($)')
# Growth rate
ax2.bar(sales_data['month'], sales_data['growth_rate'])
ax2.set_title('Month-over-Month Growth Rate')
ax2.set_ylabel('Growth (%)')
ax2.axhline(y=0, color='r', linestyle='--')
plt.tight_layout()
plt.show()

Tips and Best Practices

  • Variable Names: Use descriptive variable names in @var annotations (e.g., @var monthly_revenue instead of @var data)
  • One Query Per Variable: While you can have multiple result sets, consider using separate SQL cells for complex queries to keep your analysis organized
  • Check Your Data: Always inspect the injected DataFrame with .head(), .info(), or .describe() before analyzing
  • DataFrame Operations: All standard pandas operations work on injected DataFrames (filtering, grouping, joining, etc.)
  • Install Libraries: Make sure required Python libraries (pandas, matplotlib, etc.) are installed in your Python environment

Troubleshooting

No Python Kernel Found

If you don’t have any running Jupyter kernels, DBCode will prompt you to open a Jupyter notebook and run a cell first. This initializes a kernel that DBCode can connect to.

Variables Not Available

Make sure:

  1. Python injection is enabled (toggle in SQL cell status bar)
  2. The SQL cell has been executed after enabling injection
  3. Your @var annotations are correctly formatted as comments
  4. You’re using the correct variable name in Python

Duplicate Variable Names

Each @var annotation must have a unique name within the notebook. If you use the same variable name multiple times, DBCode will show an error and prevent injection.

Next Steps