Backtesting Trading Strategies: What It Tells You (and What It Doesn't)

Backtesting is the process of running a trading strategy against historical data to see how it would have performed in the past. For retail traders exploring systematic or algorithmic approaches on NSE, it often feels like the natural first step before risking real capital. But backtesting is also one of the most misunderstood tools in a trader's toolkit. Done carelessly, it produces numbers that look impressive while revealing almost nothing about future performance.

This article walks through what backtesting can and cannot tell you, the metrics worth tracking, and the pitfalls that trip up beginners and experienced traders alike.

What Backtesting Actually Does

When you backtest, you replay a strategy's rules against a historical dataset — say, one-minute NIFTY Futures candles from the past two years — and record every simulated trade. The output is a set of statistics summarising how the rules behaved on that data.

Notice the emphasis: on that data. A backtest does not simulate the market. It simulates your rules on one specific historical sequence of prices. The distinction matters enormously, as we will see.

Key Metrics to Examine

Before evaluating any backtest result, you need to understand what the numbers actually mean.

Metric	What it measures	Rough benchmark
Win rate	% of trades that were profitable	Meaningless alone; must pair with reward/risk
Profit factor	Gross profit ÷ Gross loss	> 1.5 is typically worth investigating
Average win / average loss ratio	Size of winning trades vs losing trades	Should align with your intended R:R
Max drawdown	Largest peak-to-trough equity decline	Should be tolerable relative to account size
Expectancy	Average rupees earned per trade (net of losses)	Must be positive; higher is better
Number of trades	Total sample size	Fewer than ~50–100 trades is usually insufficient

Expectancy is arguably the most important single number. A strategy with a 40% win rate can have strong positive expectancy if average winners are roughly 2× the average losers. Conversely, a 70% win rate with small winners and large losers will drain your account over time.

Max drawdown tells you how much pain the strategy inflicts between high-water marks. A backtest showing ₹50,000 drawdown on a ₹1 lakh account might look acceptable on paper but is psychologically brutal in real trading.

Lookahead Bias: The Most Common Killer

Lookahead bias occurs when your strategy, during a simulation of a given moment, uses information that would not have been available at that moment in reality.

Common examples: - Using the closing price of a candle to trigger a trade that executes at the open of the same candle - Calculating an indicator on the "completed" bar but entering at the bar's own open - Using an end-of-day settlement price to make an intraday decision

Lookahead bias is dangerous because it is easy to introduce accidentally and it produces spectacular backtest results that completely evaporate in live trading. Every line of backtest code should be audited for this.

Overfitting: When the Model Learns Noise

Overfitting, also called curve-fitting, happens when a strategy is tuned — consciously or not — to perform well on the specific historical data it was tested on. The model learns the noise of the past rather than a genuine edge.

Signs of overfitting: - The strategy has many parameters (moving average periods, threshold values, time filters) and they were all optimised on the same dataset - Performance degrades sharply when you shift the date range even slightly - The best parameter combination outperforms adjacent combinations by an unusual margin

Parameter sweeps — testing a strategy across a grid of parameter values — can be useful for understanding sensitivity, but they can also mislead. If you sweep 50 parameter combinations and pick the one that performs best, you have implicitly run 50 experiments on the same dataset and cherry-picked the winner. With thin samples this almost always produces a false positive.

Survivorship Bias in Indian Markets

Survivorship bias is more commonly discussed in the context of mutual fund performance data, but it affects strategy research too. If you only backtest on instruments that are currently listed and actively traded, you silently exclude instruments that were delisted, merged, or became illiquid over your test period. The historical data you work with looks healthier than the universe that actually existed at the time.

For NIFTY 50 constituent-based strategies, note that the index composition changes periodically. A strategy that looks good on current constituents may have had access to different (and perhaps stronger) stocks in the past.

The Indian Intraday Data Problem

Sourcing reliable historical intraday data for NSE Futures is genuinely difficult. A few specific challenges:

Expired contracts are often unavailable: Monthly NIFTY Futures contracts expire and are then difficult to obtain through standard retail data feeds. Most retail traders end up using NIFTY spot index data as a proxy, which does not reflect futures pricing, rollover costs, or expiry-day behaviour accurately.
Data gaps and quality: One-minute or tick data from brokers can have gaps, duplicate ticks, or timestamp inconsistencies, especially around market open/close and auction periods.
Limited free sources: Unlike US markets where years of clean tick data are readily available, Indian retail traders typically have access to only 60–400 days of historical candles through broker APIs.

These constraints directly limit how much you can trust a backtest. A two-year backtest on NIFTY Futures that uses spot index data as a price proxy is making assumptions about execution that may not hold.

The Importance of Out-of-Sample and Walk-Forward Testing

The standard response to overfitting is out-of-sample testing: you split your data into a training period (where you develop and tune the strategy) and a holdout period (where you test it without any further changes). If the strategy performs reasonably on the holdout data, that is modest evidence of a genuine edge.

Walk-forward testing takes this further by repeatedly advancing the training and test windows through time, simulating how the strategy would have been re-optimised periodically. Tools like AlgoRaj allow traders to run basic parameter sweeps and review period-by-period breakdowns, which helps identify whether a strategy is consistent or only strong in a particular market regime.

The key rule: once you have looked at out-of-sample data and used it to make a decision, it is no longer out-of-sample. Repeated peeking at your holdout set turns it into an additional training set.

Costs and Slippage: The Silent Destroyers

Many beginners backtest without properly accounting for: - Brokerage and STT: On NIFTY Futures, even discount brokers charge per-lot fees plus Securities Transaction Tax on the sell side - Slippage: The difference between the price you expect and the price you actually get. On fast-moving contracts at market open, slippage of 3–5 points per side is common - Impact cost: For larger positions, your own order can move the market slightly

A strategy showing ₹200 average profit per trade before costs may show near-zero or negative expectancy after realistic charges are applied. Always model costs pessimistically.

The Gap Between Backtest and Live Results

Even a well-constructed backtest with no lookahead bias, realistic costs, and a clean out-of-sample result will typically underperform in live trading. Reasons include:

Live data has gaps and latency that historical data does not capture
Your own psychology leads to skipped trades or early exits
Market regimes change; a strategy that worked in a trending 2023 may struggle in a choppy 2025
Execution systems have their own bugs and delays

Treat a successful backtest as a necessary condition for further investigation, not as proof of an edge. The real test begins with live trading on small position sizes.

Key Takeaways

A backtest tells you how your rules performed on one historical dataset — not how they will perform in the future
Lookahead bias and overfitting are the two most dangerous and common errors; audit your code carefully
Metrics like expectancy and max drawdown matter more than raw win rate
Indian intraday futures data is limited and often imperfect; treat NIFTY spot-as-proxy backtests with extra scepticism
Always model realistic brokerage, STT, and slippage before evaluating a strategy
Out-of-sample testing reduces but does not eliminate the risk of curve-fitting
A good-looking backtest is the beginning of due diligence, not the end of it

This article is for educational purposes only and is not investment advice. Trading in financial markets involves risk of loss.

Written and reviewed by the AlgoRaj Editorial Team — traders and engineers covering Indian intraday and F&O markets. This article is educational and is not investment advice; see our Risk Disclaimer.