How to Backtest an AI Trading Strategy with Claude Code (Python)

What You Are Building

A Python backtesting workflow that takes any trading strategy generated by Claude Code, tests it against historical price data, and produces real performance metrics: total return, Sharpe ratio, max drawdown, and trade-by-trade logs. You will use vectorbt for fast backtests and Claude Code to generate the strategy logic.

Why Backtesting Matters

Writing a trading bot is the easy part. Knowing whether it would have made or lost money in the past is what separates a toy project from a real strategy. Every bot tutorial on this site — DCA bots, momentum strategies, news-reactive bots — produces a strategy you can test on paper. But paper trading only shows you forward performance. Backtesting lets you see how the strategy would have handled the 2022 crypto crash, the 2023 rally, or the March 2026 liquidation cascade.

The catch: backtesting is easy to do wrong. Bad data, lookahead bias, and overfitting to historical patterns can make a losing strategy look profitable. This tutorial covers how to avoid those mistakes.

Prerequisites

Python 3.10+
Claude Code installed and working
Basic Python familiarity
No prior backtesting experience needed

Step 1: Set Up the Environment

Create a project and install dependencies:

mkdir backtest-lab && cd backtest-lab
python -m venv venv
source venv/bin/activate
pip install vectorbt yfinance pandas numpy matplotlib

vectorbt is a Python library built for fast backtesting. It uses NumPy under the hood, so it can test thousands of parameter combinations in seconds. yfinance provides free historical price data from Yahoo Finance.

Step 2: Fetch Historical Data

Start Claude Code and prompt it to build the data pipeline:

Write a Python script that:

Uses yfinance to download daily OHLCV data for a given ticker symbol

Accepts the ticker, start date, and end date as command-line arguments

Saves the data to a CSV file in a /data directory

Prints the date range, number of rows, and any missing days

Handle stock splits and dividend adjustments (use adjusted close)

Claude Code will produce something like this:

import yfinance as yf
import pandas as pd
import sys
import os

def fetch_data(ticker, start, end):
    os.makedirs("data", exist_ok=True)
    df = yf.download(ticker, start=start, end=end, auto_adjust=True)
    if df.empty:
        print(f"No data returned for {ticker}")
        return None

    output_path = f"data/{ticker}_{start}_{end}.csv"
    df.to_csv(output_path)

    trading_days = pd.bdate_range(start, end)
    missing = trading_days.difference(df.index)

    print(f"Ticker: {ticker}")
    print(f"Date range: {df.index[0].date()} to {df.index[-1].date()}")
    print(f"Total rows: {len(df)}")
    print(f"Missing trading days: {len(missing)}")

    return df

if __name__ == "__main__":
    ticker = sys.argv[1] if len(sys.argv) > 1 else "SPY"
    start = sys.argv[2] if len(sys.argv) > 2 else "2020-01-01"
    end = sys.argv[3] if len(sys.argv) > 3 else "2026-05-01"
    fetch_data(ticker, start, end)

Run it:

python fetch_data.py SPY 2020-01-01 2026-05-01
python fetch_data.py BTC-USD 2020-01-01 2026-05-01

Step 3: Build and Backtest an SMA Crossover Strategy

Now prompt Claude Code to create the backtest:

Write a Python backtesting script using vectorbt that:

Loads OHLCV data from a CSV file

Implements a simple moving average crossover strategy (fast SMA crosses above slow SMA = buy, crosses below = sell)

Accepts fast_period and slow_period as parameters (default 20 and 50)

Runs the backtest with an initial capital of $10,000

Prints: total return %, annualized return %, Sharpe ratio, max drawdown %, total trades, win rate %

Generates an equity curve chart saved as PNG

Exports a trade log to CSV with entry date, exit date, entry price, exit price, PnL, and holding period

The core backtest code with vectorbt:

import vectorbt as vbt
import pandas as pd
import sys

def run_backtest(csv_path, fast_period=20, slow_period=50, initial_cash=10000):
    df = pd.read_csv(csv_path, index_col=0, parse_dates=True)
    close = df["Close"]

    fast_ma = vbt.MA.run(close, window=fast_period)
    slow_ma = vbt.MA.run(close, window=slow_period)

    entries = fast_ma.ma_crossed_above(slow_ma)
    exits = fast_ma.ma_crossed_below(slow_ma)

    portfolio = vbt.Portfolio.from_signals(
        close,
        entries=entries,
        exits=exits,
        init_cash=initial_cash,
        fees=0.001,  # 0.1% per trade
        freq="1D",
    )

    stats = portfolio.stats()
    print("\n=== Backtest Results ===")
    print(f"Total Return:     {stats['Total Return [%]']:.2f}%")
    print(f"Annualized Return:{stats['Annualized Return [%]']:.2f}%")
    print(f"Sharpe Ratio:     {stats['Sharpe Ratio']:.2f}")
    print(f"Max Drawdown:     {stats['Max Drawdown [%]']:.2f}%")
    print(f"Total Trades:     {stats['Total Trades']}")
    print(f"Win Rate:         {stats['Win Rate [%]']:.2f}%")

    # Save equity curve
    fig = portfolio.plot()
    fig.write_image("equity_curve.png")
    print("\nEquity curve saved to equity_curve.png")

    # Export trade log
    trades = portfolio.trades.records_readable
    trades.to_csv("trade_log.csv", index=False)
    print(f"Trade log saved to trade_log.csv ({len(trades)} trades)")

    return portfolio

if __name__ == "__main__":
    csv_path = sys.argv[1] if len(sys.argv) > 1 else "data/SPY_2020-01-01_2026-05-01.csv"
    fast = int(sys.argv[2]) if len(sys.argv) > 2 else 20
    slow = int(sys.argv[3]) if len(sys.argv) > 3 else 50
    run_backtest(csv_path, fast, slow)

Run it:

python backtest.py data/SPY_2020-01-01_2026-05-01.csv 20 50

You will get output like:

=== Backtest Results ===
Total Return:     34.82%
Annualized Return:5.12%
Sharpe Ratio:     0.41
Max Drawdown:     18.73%
Total Trades:     23
Win Rate:         43.48%

A Sharpe ratio below 1.0 and a 43% win rate is typical for a basic SMA crossover on SPY. The strategy works through its winners being larger than its losers, not through high accuracy.

Step 4: Parameter Optimization

Test many SMA combinations at once to find which periods work best:

Add parameter optimization to the backtest:

Test fast periods from 5 to 50 (step 5) and slow periods from 20 to 200 (step 10)

Only test combinations where fast < slow

For each combination, record: total return, Sharpe ratio, max drawdown, number of trades

Print the top 10 combinations sorted by Sharpe ratio

Generate a heatmap of Sharpe ratios across the parameter grid, saved as PNG

import itertools

def optimize(csv_path, initial_cash=10000):
    df = pd.read_csv(csv_path, index_col=0, parse_dates=True)
    close = df["Close"]

    fast_periods = range(5, 55, 5)
    slow_periods = range(20, 210, 10)

    results = []
    for fast, slow in itertools.product(fast_periods, slow_periods):
        if fast >= slow:
            continue
        fast_ma = vbt.MA.run(close, window=fast)
        slow_ma = vbt.MA.run(close, window=slow)
        entries = fast_ma.ma_crossed_above(slow_ma)
        exits = fast_ma.ma_crossed_below(slow_ma)

        pf = vbt.Portfolio.from_signals(
            close, entries=entries, exits=exits,
            init_cash=initial_cash, fees=0.001, freq="1D"
        )
        stats = pf.stats()
        results.append({
            "fast": fast,
            "slow": slow,
            "return_pct": stats["Total Return [%]"],
            "sharpe": stats["Sharpe Ratio"],
            "max_dd": stats["Max Drawdown [%]"],
            "trades": stats["Total Trades"],
        })

    results_df = pd.DataFrame(results)
    top10 = results_df.nlargest(10, "sharpe")
    print("\n=== Top 10 Parameter Combinations (by Sharpe Ratio) ===")
    print(top10.to_string(index=False))

    return results_df

Step 5: Backtest a Strategy from Another Tutorial

The real value is backtesting strategies you actually plan to trade. If you built a momentum bot or a DCA bot, you can test that exact logic against historical data.

Prompt Claude Code:

Convert the momentum strategy from my momentum_bot.py into a vectorbt backtest. The strategy rules are:

Buy when 14-day RSI crosses above 30 from below AND price is above the 50-day SMA

Sell when RSI crosses above 70 OR price drops below the 50-day SMA

Use the same position sizing and fee assumptions

Run it on SPY daily data from 2020 to 2026

Claude Code will translate the live trading logic into vectorbt signals. The key difference: in live trading you deal with partial candles, API latency, and slippage. In a backtest, execution is instant at the close price. This is an important gap to understand.

Common Backtesting Mistakes

These are the errors that make bad strategies look good:

Lookahead Bias

Your strategy accidentally uses future data to make decisions. This happens when you calculate indicators on the full dataset before splitting into in-sample and out-of-sample periods. Always calculate indicators using only data available up to the decision point.

Survivorship Bias

If you test a stock screening strategy using today’s S&P 500 list, you are only testing companies that survived. Companies that went bankrupt or were delisted are not in your test data, and they would have been sell signals or losses. For index-level backtests (SPY, QQQ), this is less of an issue because you are testing the index itself.

Overfitting

If your optimized parameters work perfectly on 2020-2024 data but fail on 2025-2026 data, you have overfit to the training period. Split your data: use 2020-2023 for optimization and 2024-2026 for validation. If the Sharpe ratio drops by more than 50% on the validation set, the strategy is probably curve-fitted.

Ignoring Trading Costs

A strategy that trades 500 times per year with 0.1% fees per trade loses 50% of its capital to fees alone before any market returns. Always include realistic fee assumptions. For crypto, 0.1% is standard. For stocks through a broker like Alpaca, fees are zero but you still have spread costs of roughly 0.01-0.05%.

Unrealistic Fill Assumptions

Backtests assume you can buy or sell at the exact close price. In reality, your limit order might not fill, or you might get a worse price during volatile moments. Adding 0.05-0.1% slippage to your backtest makes results more realistic.

Reading Your Results

Here is how to interpret the key metrics:

Metric	Good	Okay	Bad
Sharpe Ratio	> 1.5	0.5 - 1.5	< 0.5
Max Drawdown	< 15%	15% - 30%	> 30%
Win Rate (trend following)	> 40%	30% - 40%	< 30%
Win Rate (mean reversion)	> 55%	45% - 55%	< 45%
Profit Factor	> 2.0	1.2 - 2.0	< 1.2

A strategy does not need a high win rate to be profitable. Trend-following strategies often win only 35-45% of trades but make 2-3x more on winners than they lose on losers.

Next Steps

Backtest every strategy before running it live or on paper
Use out-of-sample validation (train on one period, test on another)
Compare your strategy to a simple buy-and-hold benchmark — if you cannot beat holding, the complexity is not worth it
Try backtesting on different assets: crypto (BTC DCA strategy), forex (forex bot), or individual stocks
Read our AI trading 101 guide for foundational concepts
Check the MCP servers guide for connecting Claude Code to live data feeds after your backtest looks promising