Frequency and Distribution of Price Gaps Across NSE Stocks

Table Of Contents

Introduction: Understanding Price Gaps in Indian Equities
The Fetch–Store–Measure Workflow
- Python Implementation: Vectorized Gap Detection
Defining Gaps and Normalization
Structural Origins of Gaps in NSE
Measuring Gap Frequency
- Python Code: Gap Frequency Calculation
Distribution of Gap Frequencies
- Python Code: Visualizing Gap Frequency
Impact of Gap Frequency Across Trading Horizons
Data Sources for NSE Gap Analysis
Python Libraries and Tools Overview
Distribution of Gap Magnitudes
- Python Code: Summary Statistics and Histogram
Tail Behavior and Extreme Gaps
Fitting Parametric Distributions
Stratification by Market Capitalization
Index vs Individual Stocks
Large-Scale NSE Universe Analysis
- DuckDB Example
Aggregated Metrics and Segmentation
Impact on Trading Horizons
Curated Data Sources
Python Libraries and Tools Overview
Advanced Gap Clustering and Autocorrelation
- Python Implementation: Autocorrelation of Gaps
Predictive Modeling of Gaps
Performance Optimization for Large-Scale Analysis
Integration into Trading Models
Curated Data Sources and Workflow
Impact of Gaps Across Trading Horizons
Gap Magnitude Risk Mapping
Advanced Libraries and Methodologies
Data Acquisition and Database Structure
Conclusion

Introduction: Understanding Price Gaps in Indian Equities

Price gaps are discrete price jumps between consecutive trading sessions that occur across National Stock Exchange (NSE) equities. These gaps often represent structural market characteristics rather than isolated trading anomalies. For analysts and quantitative researchers, understanding the frequency, magnitude, and distribution of price gaps is crucial for designing risk models, backtesting strategies, and evaluating market behavior.

By leveraging Python’s rich ecosystem of libraries and computational tools, we can systematically measure gaps, analyze their statistical properties, and model their implications across short-, medium-, and long-term trading horizons. This guide emphasizes a rigorous, market-wide approach rather than event-driven explanations.

The Fetch–Store–Measure Workflow

For large-scale analysis across thousands of NSE stocks over decades, a structured data pipeline is essential.

Fetch

Pull historical OHLC (Open, High, Low, Close) data using libraries such as nsepython or yfinance. For high-frequency institutional-grade data, APIs provided by NSE Data & Analytics can be used.

Store

Efficiently store the time-series data using Parquet files with pandas or a TimescaleDB instance to maintain alignment and rapid query performance.

Measure

Perform vectorized operations with NumPy to calculate gaps between consecutive closes and opens, normalizing them for cross-stock comparison.

Python Implementation: Vectorized Gap Detection

import pandas as pd
import numpy as np

def calculate_gap_distribution(df):
    # Log Returns capture the magnitude of the gap relative to price
    df['Gap_Pct'] = (df['Open'] / df['Adj Close'].shift(1)) - 1

    # Classification based on direction
    df['Gap_Type'] = np.where(df['Gap_Pct'] > 0, 'Up-Gap', 'Down-Gap')

    # Absolute magnitude for distribution analysis
    df['Gap_Magnitude'] = df['Gap_Pct'].abs()

    return df.dropna()

Defining Gaps and Normalization

The gap on day t is defined as:

Gap_t = Open_t - Close_(t-1)
Gap_%_t = (Open_t - Close_(t-1)) / Close_(t-1) * 100

Normalizing gaps as percentages ensures comparability across stocks with varying price levels, allowing aggregation and statistical analysis.

Structural Origins of Gaps in NSE

Session-Based Market Design

NSE operates discrete trading sessions. Overnight order imbalances accumulate, and the opening price reflects auction-based equilibrium, not a continuation of the previous close.

Pre-Open Auction Mechanism

The pre-open session (9:00–9:15 AM) determines the official open. Price gaps often arise as a result of auction-clearing dynamics, independent of external news.

Liquidity Fragmentation

Small-cap and mid-cap stocks exhibit higher gap frequency due to sparse order books, wider bid-ask spreads, and structural liquidity differences.

Measuring Gap Frequency

Gap frequency per stock is calculated as:

Gap_Frequency_i = Number of Gap Days for Stock i / Total Trading Days

This metric allows for aggregation across indices, market-cap segments, and sectors to understand market-wide behavior.

Python Code: Gap Frequency Calculation

df['prev_close'] = df.groupby('symbol')['close'].shift(1)
df['gap_pct'] = (df['open'] - df['prev_close']) / df['prev_close'] * 100
df = df.dropna(subset=['gap_pct'])
gap_freq = (
    df.assign(is_gap=lambda x: x['gap_pct'] != 0)
      .groupby('symbol')['is_gap']
      .mean()
      .rename('gap_frequency')
)

Distribution of Gap Frequencies

Empirical observations indicate that:

Large-cap stocks: Lower gap frequency, narrower distribution
Small-cap stocks: Higher dispersion, fat-tailed frequency distribution

Visualizing the distribution can reveal clustering patterns, skewness, and the presence of outlier behavior, critical for risk modeling.

Python Code: Visualizing Gap Frequency

import matplotlib.pyplot as plt

gap_freq.hist(bins=50)
plt.title('Distribution of Gap Frequency Across NSE Stocks')
plt.xlabel('Gap Frequency')
plt.ylabel('Number of Stocks')
plt.show()

Impact of Gap Frequency Across Trading Horizons

Understanding gap frequency informs trading strategies:

Short-term: High frequency increases intraday volatility and execution risk.
Medium-term: Rolling gap frequency identifies volatility regimes and potential momentum shifts.
Long-term: Structural liquidity assessment and tail-risk exposure influence portfolio construction.

Data Sources for NSE Gap Analysis

Curated, reliable sources include:

Official NSE Bhavcopy Archive for daily open/close data
Yahoo Finance or nsepython for historical OHLC data
BSE India Corporate Actions portal to adjust for splits, bonuses, and dividends

Python Libraries and Tools Overview

The primary Python libraries for gap analysis are:

pandas: Time-series manipulation and shift operations
numpy: Vectorized calculations and log returns
scipy.stats: Skewness, kurtosis, and distribution fitting
matplotlib / seaborn: Visualization of frequency distributions and density plots
duckdb / polars: Optimized large-universe aggregation

Distribution of Gap Magnitudes

After measuring gap frequency, analyzing the magnitude of these gaps provides insight into market volatility and structural risk. Gap magnitude refers to the size of the price difference between the previous close and the current open, normalized for comparability across different stocks.

Absolute vs. Normalized Gaps

Define the variables as:

Absolute Gap: Δ_t = Open_t - Close_(t-1)
Normalized Gap: G_t = Δ_t / Close_(t-1)

Normalized gaps enable cross-sectional comparisons and aggregation across the NSE universe, allowing meaningful statistical analyses.

Statistical Moments of Gap Distribution

To understand the behavior of gap magnitudes, calculate the following moments:

Mean: Indicates average directional bias.
Median: Central tendency robust to outliers.
Standard Deviation: Measures dispersion.
Skewness: Detects asymmetry in upward vs. downward gaps.
Kurtosis: Measures tail heaviness and fat-tail behavior.

Python Code: Summary Statistics and Histogram

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import kurtosis, skew

# Compute statistical moments
from scipy.stats import skew, kurtosis

# Compute statistical moments
def compute_gap_stats(series):
    return pd.Series({
        "gap_mean_pct": series.mean(),
        "gap_median_pct": series.median(),
        "gap_std_pct": series.std(),
        "gap_skewness": skew(series),
        "gap_kurtosis": kurtosis(series, fisher=True)  # excess kurtosis
    })

stats = df.groupby("symbol")["gap_pct"].apply(compute_gap_stats).reset_index()

stats.rename(columns={
    "mean": "gap_mean_pct",
    "median": "gap_median_pct",
    "std": "gap_std_pct",
    "skew": "gap_skewness",
    "kurt": "gap_kurtosis"
}, inplace=True)

# Plot distribution
sns.histplot(df["gap_pct"], bins=200, kde=True)
plt.title("Histogram of Normalized Gap (%) Across All NSE Stock Days")
plt.xlabel("Gap (%)")
plt.ylabel("Count")
plt.show()

Tail Behavior and Extreme Gaps

Large gaps are rare but critical for risk modeling. Extreme gaps often signify tail-risk events and may influence short-term trading strategies.

Threshold Analysis

Compute the proportion of gaps exceeding specific thresholds:

for thr in [2, 5, 10]:
    exceed = (df["gap_pct"].abs() > thr).mean()
    print(f"Proportion of |Gap| > {thr}%: {exceed:.4f}")

Fitting Parametric Distributions

Gap distributions often exhibit heavy tails. Candidate models include Gaussian, Laplace, Student’s t, and Generalized Extreme Value distributions. Fit these using Python:

from scipy import stats

# Fit a Student's t distribution
params = stats.t.fit(df["gap_pct"])
df["gap_t_pdf"] = stats.t.pdf(df["gap_pct"], *params)

Model selection can be guided by AIC/BIC values, QQ plots, and Kolmogorov-Smirnov tests.

Stratification by Market Capitalization

Market microstructure theory suggests that liquidity and volatility significantly influence gap behavior.

Segmentation

df["cap_bucket"] = pd.qcut(df["market_cap"], 4, labels=["Q1", "Q2", "Q3", "Q4"])

Visual Comparison

sns.boxplot(x="cap_bucket", y="gap_pct", data=df)
plt.title("Gap (%) by Market Cap Quartile")
plt.show()

Expect wider dispersion in small-cap stocks and tighter clustering for large-cap stocks.

Index vs Individual Stocks

Indices like NIFTY 50 show lower gap magnitudes due to diversification. Comparing index gaps with individual stock gaps illustrates market-wide stability vs stock-specific volatility.

Index Gap Calculation

idx = pd.read_parquet("index_prices.parquet")
idx["prev_close"] = idx["close"].shift(1)
idx["gap_pct"] = (idx["open"] - idx["prev_close"]) / idx["prev_close"] * 100

Visualization

sns.kdeplot(df["gap_pct"], label="Stocks")
sns.kdeplot(idx["gap_pct"], label="Index")
plt.legend()
plt.title("Stock vs Index Gap % Density")
plt.show()

Large-Scale NSE Universe Analysis

When handling hundreds of NSE stocks, performance and scalability are critical. Tools such as DuckDB, polars, and Dask facilitate fast aggregation and parallel computation.

DuckDB Example

import duckdb

con = duckdb.connect()
con.execute("""
    CREATE TABLE prices AS SELECT * FROM 'nse_data/*.parquet'
""")

gap_sql = """
SELECT symbol,
    AVG(CASE WHEN (open - prev_close) != 0 THEN 1 ELSE 0 END) AS gap_freq
FROM (
    SELECT *,
        LAG(close) OVER (PARTITION BY symbol ORDER BY date) AS prev_close
    FROM prices
)
GROUP BY symbol
"""
df_gap_sql = con.execute(gap_sql).df()

Aggregated Metrics and Segmentation

Compute market-wide statistics and segment by sector, liquidity, and volatility to identify patterns:

agg_stats = df["gap_pct"].agg(["mean", "median", "std"])
seg = df.groupby(["sector", "cap_bucket"])["gap_pct"].describe()

Impact on Trading Horizons

Short-Term Trading

Intraday and overnight traders are affected by gap frequency and tail events. Gaps increase execution risk and slippage.

Medium-Term Trading

Rolling gap frequency and magnitude help detect volatility regimes and momentum signals over weeks to months. Incorporating gap statistics into VaR and CVaR improves risk models.

Long-Term Investing

Structural gap patterns provide insights into liquidity and tail-risk exposure. For strategic investing, gaps are smoothed by compounding, but their cumulative contribution can still inform allocation decisions.

Curated Data Sources

Historical Adjusted Data: Yahoo Finance (NSE Symbols)
Corporate Actions: BSE India portal to adjust for splits, bonuses, dividends
Indicative Pre-Open Prices: NSE Live Market Pre-Open session

Python Libraries and Tools Overview

pandas: Time-series manipulation, groupby, shift
numpy: Vectorized calculations and log returns
scipy.stats: Skewness, kurtosis, and distribution fitting
matplotlib / seaborn: Visualization of frequency and tail behavior
duckdb / polars / Dask: Large-scale aggregation and parallel computation

Advanced Gap Clustering and Autocorrelation

Price gaps often do not occur independently; they tend to cluster during periods of high volatility. Detecting these clusters helps quantify market regimes and improves risk modeling.

Gap Clustering Concept

Clusters are periods where consecutive gaps occur more frequently than random expectation. Positive autocorrelation in gaps indicates clustering, while near-zero autocorrelation suggests independent gap events.

Python Implementation: Autocorrelation of Gaps

import numpy as np

# Binary gap series: 1 if gap exists, 0 otherwise
df["gap_binary"] = (df["gap_pct"] != 0).astype(int)

# Autocorrelation lag-1 per stock
def autocorr(series, lag=1):
    return series.autocorr(lag=lag)

autocorr_stats = df.groupby("symbol")["gap_binary"].apply(autocorr)

Rolling Window Gap Frequency

Rolling measures highlight high-gap regimes and short-term volatility clustering:

df["gap_rolling"] = df.groupby("symbol")["gap_binary"].transform(lambda x: x.rolling(20).mean())

Predictive Modeling of Gaps

Although causality is excluded, statistical models can describe gap structures and their predictive relevance.

Probability of Gap Occurrence

Logistic regression can model the probability of a gap:

from sklearn.linear_model import LogisticRegression

X = df[["prev_volatility", "volume"]]
y = df["gap_binary"]

model = LogisticRegression()
model.fit(X, y)

Output probabilities can be used in simulation and risk evaluation.

Gap Magnitude Modeling

Conditional models, including normal, Laplace, or quantile regression, estimate expected gap size relative to market conditions:

from statsmodels.regression.quantile_regression import QuantReg

X = df[["prev_volatility", "volume"]]
y = df["gap_pct"]
model = QuantReg(y, X).fit(q=0.5)  # Median regression

Performance Optimization for Large-Scale Analysis

Scalability is essential when analyzing thousands of NSE stocks:

Memory-efficient storage: Parquet/Feather files
Vectorized operations using pandas and NumPy
Parallel computation with Dask or Ray
SQL-style aggregation with DuckDB or Polars

Integration into Trading Models

Risk Models

Gap statistics inform overnight slippage, tail-risk adjustments, and Value-at-Risk (VaR) calculations.

Strategy Backtesting

Accurate gap representation ensures realistic backtests. Use gap fills and clustering to avoid assumptions of continuous price paths:

def check_gap_fill(df):
    df['Filled_Up'] = (df['gap_pct'] > 0) & (df['Low'] <= df['prev_close'])
    df['Filled_Down'] = (df['gap_pct'] < 0) & (df['High'] >= df['prev_close'])
    df['Is_Gap_Filled'] = df['Filled_Up'] | df['Filled_Down']
    return df

Portfolio Construction

High-gap and high-volatility stocks may require adjusted weighting to manage risk. Gap dispersion informs hedging strategies and allocation decisions.

Curated Data Sources and Workflow

Effective gap analysis relies on clean, structured data:

Historical OHLC Data: Yahoo Finance, nsepython library
Corporate Actions: BSE India portal (splits, dividends, bonuses)
Pre-Open Indicative Prices: NSE Live Market Pre-Open session (9:00–9:08 AM)

Fetch-Store-Measure in Practice

The workflow for gap analysis (Refer to the Fetch–Store–Measure Workflow section for the complete methodology.):

Visualize & Model: Density plots, QQ plots, logistic or quantile regression for probability and magnitude estimation.

Impact of Gaps Across Trading Horizons

Short-Term Trading (Intraday to 1 Week)

High gap frequency increases volatility and execution risk. Gap Fill Ratios indicate mean-reversion potential, with small gaps often filled quickly, whereas large gaps may continue momentum.

Medium-Term Trading (1 Month to 1 Quarter)

Runaway gaps in liquid stocks can indicate new momentum regimes. Regression analysis of gap size vs forward returns helps identify actionable medium-term signals.

Long-Term Investing (Strategic Horizon)

Over multi-year horizons, micro-structure noise is smoothed by compounding. The Cumulative Gap Contribution metric measures how much overnight movements contribute to overall long-term returns.

Gap Magnitude Risk Mapping

0% – 0.5%: High frequency, short-term noise, standard liquidity risk
0.5% – 2.0%: Moderate frequency, potential medium-term breakout or breakdown
>2.0%: Low frequency, systemic events, tail-risk exposure

Advanced Libraries and Methodologies

pandas: Time-series processing, shift, groupby, rolling windows
numpy: Vectorized math, log returns, standardization
scipy.stats: Skewness, kurtosis, distribution fitting
matplotlib / seaborn: Visualization of distributions, KDE, histograms
statsmodels: Quantile regression, linear regression for predictive modeling
scikit-learn: Logistic regression for gap occurrence probability
duckdb / polars / Dask: Large-scale aggregation and parallel computation
TimescaleDB: Efficient storage and query of time-series OHLC data

Data Acquisition and Database Structure

Structured database schema for NSE gap analysis:

Symbols Table: Stock identifiers, sector, market cap, liquidity metrics
OHLC Table: Date, open, high, low, close, adjusted close, volume
Gap Metrics Table: Gap %, gap type, gap magnitude, rolling stats
Corporate Actions Table: Splits, dividends, bonuses, adjusted price factors
Pre-Open Table: Indicative equilibrium prices, session data

Conclusion

This comprehensive Python-centric framework for analyzing the frequency and distribution of price gaps across NSE stocks provides quantitative researchers, traders, and portfolio managers with robust tools for market-wide risk assessment and strategy development. By combining statistical analysis, clustering, predictive modeling, and structured data workflows, TheUniBit empowers users to incorporate gap insights into risk-adjusted trading and long-term investment strategies, improving market understanding and decision-making.

Frequency and Distribution of Price Gaps Across NSE Stocks

Introduction: Understanding Price Gaps in Indian Equities

The Fetch–Store–Measure Workflow

Fetch

Store

Measure

Python Implementation: Vectorized Gap Detection

Defining Gaps and Normalization

Structural Origins of Gaps in NSE

Session-Based Market Design

Pre-Open Auction Mechanism

Liquidity Fragmentation

Measuring Gap Frequency

Python Code: Gap Frequency Calculation

Distribution of Gap Frequencies

Python Code: Visualizing Gap Frequency

Impact of Gap Frequency Across Trading Horizons

Data Sources for NSE Gap Analysis

Python Libraries and Tools Overview

Distribution of Gap Magnitudes

Absolute vs. Normalized Gaps

Statistical Moments of Gap Distribution

Python Code: Summary Statistics and Histogram

Tail Behavior and Extreme Gaps

Threshold Analysis

Fitting Parametric Distributions

Stratification by Market Capitalization

Segmentation

Visual Comparison

Index vs Individual Stocks

Index Gap Calculation

Visualization

Large-Scale NSE Universe Analysis

DuckDB Example

Aggregated Metrics and Segmentation

Impact on Trading Horizons

Short-Term Trading

Medium-Term Trading

Long-Term Investing

Curated Data Sources

Python Libraries and Tools Overview

Advanced Gap Clustering and Autocorrelation

Gap Clustering Concept

Python Implementation: Autocorrelation of Gaps

Rolling Window Gap Frequency

Predictive Modeling of Gaps

Probability of Gap Occurrence

Gap Magnitude Modeling

Performance Optimization for Large-Scale Analysis

Integration into Trading Models

Risk Models

Strategy Backtesting

Portfolio Construction

Curated Data Sources and Workflow

Fetch-Store-Measure in Practice

Impact of Gaps Across Trading Horizons

Short-Term Trading (Intraday to 1 Week)

Medium-Term Trading (1 Month to 1 Quarter)

Long-Term Investing (Strategic Horizon)

Gap Magnitude Risk Mapping

Advanced Libraries and Methodologies

Data Acquisition and Database Structure

Conclusion

Related Posts