Home/Blog/Backtesting Pairs Trading: Statistical Arbitrage for Retail Traders
Stocks8 min read

Backtesting Pairs Trading: Statistical Arbitrage for Retail Traders

By BacktestEverything·June 20, 2025

# Backtesting Pairs Trading: Statistical Arbitrage for Retail Traders

Pairs trading is one of the oldest quantitative strategies on Wall Street, famously employed by firms like D.E. Shaw and Morgan Stanley in the 1980s. The concept is elegant: find two stocks that move together, and when they temporarily diverge, bet on convergence. But does this strategy still work in an era where every quant fund runs similar algorithms?

The Basic Concept

Pairs trading involves identifying two correlated securities and trading the spread between them. When the spread widens beyond its historical norm, you short the outperformer and buy the underperformer, expecting the relationship to return to its mean. The strategy is market-neutral because your long and short positions offset general market movement, leaving you exposed only to the relative performance of the pair.

Finding Suitable Pairs

We screened for pairs using three criteria: high correlation (above 0.8 over the prior 252 trading days), cointegration (passing the Engle-Granger test at the 5% significance level), and fundamental similarity (same sector and similar market capitalization). From the S&P 500 universe, this process typically identifies 30-50 viable pairs at any given time. Examples include KO/PEP, XOM/CVX, GOOG/META, and V/MA.

Backtest Methodology

We used a rolling 252-day window to calculate the spread z-score. Entry occurs when the z-score exceeds 2.0 (or drops below -2.0). Exit occurs when the z-score returns to 0.0 or after a maximum holding period of 30 days. Stop loss triggers if the z-score reaches 3.5, indicating potential breakdown of the relationship. We dollar-neutral the positions so each leg has equal dollar exposure.

Historical Results (2005-2024)

Across our universe of qualifying pairs, the strategy produced an annualized return of 6.8% with a Sharpe ratio of 1.1 and a maximum drawdown of 12%. The win rate was 63% with an average holding period of 8 days. These metrics look attractive until you consider that returns have been declining steadily: 11% annualized in 2005-2010, 7% in 2010-2015, 5% in 2015-2020, and only 3.5% in 2020-2024.

The Declining Edge

The deterioration of pairs trading returns is well-documented in academic literature. As more participants employ similar strategies with faster execution, divergences are arbitraged away more quickly and the spread rarely reaches extreme levels. The average z-score at entry has been declining, meaning you must either accept smaller entry thresholds (reducing profit per trade) or accept fewer trades (reducing opportunity).

Cointegration Instability

A critical finding from our backtesting is that cointegration relationships are not permanent. Roughly 30% of pairs that pass cointegration tests in one year fail in the next. Corporate events, sector shifts, and changing business models can permanently alter the relationship between two stocks. Our backtest benefited from hindsight in pair selection. A realistic forward-looking approach must regularly rescreen for viable pairs and accept that some selected pairs will break down.

Enhancing the Basic Strategy

Several modifications improved our base results. Using Kalman filter-based dynamic hedging ratios instead of static OLS regression improved the Sharpe ratio from 1.1 to 1.3. Incorporating a momentum filter (avoiding pairs where the spread has been trending for more than 10 days) reduced mean-time-to-reversion and improved the win rate by 5%. Position sizing based on pair-specific volatility normalized risk across trades.

Transaction Costs and Borrowing

Pairs trading involves four legs per round trip (buying and selling both securities), doubling transaction costs compared to directional strategies. Additionally, the short side requires share borrowing, which incurs costs of 0.5-3% annualized for liquid large-caps. Our backtest incorporated 0.1% per leg for commissions/slippage and 1% annualized short borrow cost. These realistic costs reduced our annualized return from a gross 8.5% to the net 6.8% reported above.

Risk Considerations

While pairs trading is market-neutral, it carries specific risks. Convergence risk means the pair may never revert (structural breaks). Execution risk occurs when one leg fills and the other does not. Crowding risk means that when many traders are in the same pair, the unwind can be violent. The August 2007 quant meltdown demonstrated that stat arb strategies can experience sudden, severe drawdowns when multiple funds liquidate simultaneously.

Is Pairs Trading Still Viable for Retail?

Based on our backtesting, pairs trading remains viable but less attractive than it was a decade ago. The strategy works best as one component of a diversified quantitative portfolio rather than a standalone approach. Retail traders have one advantage: they can target less liquid, smaller-cap pairs that institutional algorithms avoid due to capacity constraints. Pairs outside the S&P 500 showed less return decay than large-cap pairs.

Conclusion

Pairs trading represents a genuine statistical edge based on mean reversion of cointegrated relationships. However, the edge has been steadily declining due to competition. Our backtest shows that the strategy still produces positive risk-adjusted returns net of costs, but expectations must be calibrated to current market conditions. Focus on less crowded pairs, use adaptive methods like Kalman filtering, and treat pairs trading as one strategy among many rather than relying on it exclusively.

Want to See More Backtests?

Watch our video breakdowns with real data and analysis

Watch Videos

More Articles