- Conceptual Foundations of Price Gaps
- Impact Across Trading Horizons (Classification Perspective)
- Normalization, Volatility Scaling, and Cross-Sectional Gap Comparability
- Fetch → Store → Measure Workflow for Normalized Gaps
- Impact Across Trading Horizons
- Event Drivers, Market Microstructure, and Statistical Integrity of Gaps
- Event-Based Drivers of Gap Formation
- The Role of the NSE Pre-Open Session
- Fetch → Store → Measure Workflow for Event-Aware Gap Analysis
- Impact Across Trading Horizons
- Advanced Quantitative Metrics, Data Architecture, and Production-Grade Integration
- Production-Grade Fetch → Store → Measure Architecture
- Database Structure and Storage Design
- Python Libraries Used and Their Roles
- Curated Data Sourcing Methodologies
- News Trigger Classification Framework
- Multi-Horizon Statistical Impact Summary
- Conclusion and Industry Application
Price gaps are among the most structurally important discontinuities in Indian equity price series. Unlike indicators derived from continuous intraday movement, gaps originate from discrete information arrival and market microstructure constraints between sessions. This article presents a rigorous, Python-centric, data-first framework for the statistical classification of partial gaps and full gaps, explicitly excluding strategy logic or post-gap price behavior.
The focus is on formal definitions, measurable conditions, reproducible algorithms, and data engineering workflows suitable for institutional-grade research and production analytics within the Indian stock market context.
Conceptual Foundations of Price Gaps
What Constitutes a Price Gap in Indian Markets
In Indian equity markets, a price gap arises when the official opening price of a trading session is discontinuous relative to the previous session’s trading range. This discontinuity reflects overnight information assimilation under constrained liquidity and auction-based opening mechanisms.
Unlike intraday volatility, gaps are not emergent properties of continuous trading. They are boundary phenomena occurring at the interface of two discrete trading sessions and must therefore be defined using exchange-published reference prices only.
Exchange Price References Used in Gap Measurement
Previous Session References
- Previous session close price
- Previous session intraday high
- Previous session intraday low
Current Session References
- Official opening price (post pre-open auction)
- Current session intraday high
- Current session intraday low
No intraday tick, VWAP, or midpoint prices are admissible for gap classification, as gaps are strictly inter-session constructs.
Formal Statistical Definition of Gaps
Notation and Price Variables
Let the following variables be defined for a given equity:
- Pc,t−1: Previous session closing price
- Ht−1: Previous session high
- Lt−1: Previous session low
- Ot: Current session official opening price
- Ht: Current session high
- Lt: Current session low
Gap Magnitude Definition
Formal Mathematical Definition of Gap Magnitude
This raw gap magnitude is directional and forms the base variable for all subsequent classification logic.
Absolute Gap Size
Absolute Gap Formula
Absolute magnitude is used for normalization and cross-sectional comparison.
Classification Logic: Partial Gaps vs Full Gaps
Definition of Partial Gaps
A partial gap occurs when the current session opens outside the previous close, but the current session’s trading range overlaps with the previous session’s range.
Mathematical Conditions for Partial Gaps
Partial gaps indicate overnight imbalance that is at least partially resolved intraday.
Definition of Full Gaps
A full gap occurs when the current session’s entire trading range does not overlap with the previous session’s range.
Mathematical Conditions for Full Gaps
Full gaps represent a complete reset of the price discovery region between sessions.
Python-Centric Gap Detection Algorithms
Fetch → Store → Measure Workflow
Gap classification must be embedded within a deterministic data pipeline to ensure reproducibility and auditability.
Fetch
- Daily OHLC data from NSE or BSE
- Pre-open adjusted official opening price
- Corporate-action-adjusted historical series
Store
- Columnar storage (Parquet / DuckDB)
- Date-indexed price frames
- Corporate action factor tables
Measure
- Vectorized gap magnitude computation
- Boolean classification masks
- Cross-sectional normalization
Python Implementation: Core Gap Classification
Python Algorithm for Partial and Full Gap Detection
import pandas as pd
def classify_gaps(df):
df = df.copy()
df['gap'] = df['open'] - df['close'].shift(1)
df['partial_gap'] = (
((df['open'] > df['close'].shift(1)) & (df['low'] <= df['high'].shift(1))) |
((df['open'] < df['close'].shift(1)) & (df['high'] >= df['low'].shift(1)))
)
df['full_gap'] = (
(df['low'] > df['high'].shift(1)) |
(df['high'] < df['low'].shift(1))
)
return df
This implementation is intentionally minimal and deterministic, making it suitable for both exploratory research and production pipelines.
Impact Across Trading Horizons (Classification Perspective)
Short-Term Horizon
In short horizons, partial gaps dominate statistically and reflect transient liquidity imbalances rather than structural repricing.
Medium-Term Horizon
Medium-term distributions of full gaps often cluster around corporate disclosures and regulatory events.
Long-Term Horizon
Over longer horizons, persistent full gaps often coincide with regime shifts such as index inclusion, ownership change, or capital structure transformation.
These horizon effects are descriptive properties of gap distributions and do not imply any predictive strategy.
Normalization, Volatility Scaling, and Cross-Sectional Gap Comparability
Raw gap magnitudes are not directly comparable across stocks, sectors, or time periods due to differences in price levels, volatility regimes, and liquidity profiles. For statistically meaningful analysis, gap measurements must be normalized and scaled using robust quantitative constructs. This section formalizes those constructs and provides Python-centric implementations suitable for Indian equity data.
Price-Level Normalization of Gaps
Why Absolute Gaps Are Insufficient
A ₹10 gap has vastly different statistical meaning for a ₹100 stock versus a ₹2,000 stock. Therefore, gap magnitude must be expressed relative to an appropriate price anchor to enable cross-sectional analysis.
Close-Price Normalized Gap
Formal Mathematical Definition of Price-Normalized Gap
This ratio expresses the gap as a percentage of the previous close, making it invariant to absolute price levels.
Python Implementation of Price-Normalized Gap
def normalized_gap(df):
df = df.copy()
df['gap_norm'] = (df['open'] - df['close'].shift(1)) / df['close'].shift(1)
return df
Volatility-Scaled Gap Metrics
Rationale for Volatility Scaling
Even normalized gaps must be interpreted relative to a stock’s historical volatility. A 1% gap is routine for high-beta stocks but statistically extreme for low-volatility defensive equities.
Realized Volatility Estimation
Formal Mathematical Definition of Realized Volatility
Where r represents daily log returns and μ is their mean over the lookback window.
Python Implementation of Realized Volatility
import numpy as np
def realized_volatility(df, window=20):
log_returns = np.log(df['close'] / df['close'].shift(1))
return log_returns.rolling(window).std()
Volatility-Scaled Gap
Formal Mathematical Definition of Volatility-Scaled Gap
This measure expresses the gap in units of historical volatility, enabling regime-aware comparisons.
Python Implementation of Volatility-Scaled Gap
def volatility_scaled_gap(df, window=20):
df = df.copy()
df['gap'] = df['open'] - df['close'].shift(1)
df['vol'] = realized_volatility(df, window)
df['gap_vol_scaled'] = df['gap'] / df['vol']
return df
Corporate Action Integrity and Gap Validity
Why Corporate Actions Distort Gap Statistics
Stock splits, bonuses, rights issues, and spin-offs mechanically alter price levels. If unadjusted, these events produce artificial full gaps that do not represent genuine information-driven discontinuities.
Backward Price Adjustment Framework
Formal Mathematical Definition of Adjusted Price
Where At is the cumulative corporate action adjustment factor.
Python Implementation of Corporate Action Adjustment
def apply_adjustment(df, factor_col='adj_factor'):
price_cols = ['open', 'high', 'low', 'close']
for col in price_cols:
df[col] = df[col] * df[factor_col]
return df
Fetch → Store → Measure Workflow for Normalized Gaps
Fetch
- Adjusted OHLC data
- Corporate action factor history
- Continuous daily close series
Store
- Time-series optimized storage
- Separate adjustment factor tables
- Immutable raw data layers
Measure
- Normalized gap ratios
- Volatility-scaled gap scores
- Distributional summaries
Impact Across Trading Horizons
Short-Term Horizon
Volatility-scaled gaps highlight statistically extreme overnight moves that raw gaps often obscure in high-volatility stocks.
Medium-Term Horizon
Normalized gap distributions stabilize across sectors, allowing structural comparison without price-level bias.
Long-Term Horizon
Persistent changes in volatility-scaled gap behavior often coincide with liquidity regime shifts and index reclassification.
Event Drivers, Market Microstructure, and Statistical Integrity of Gaps
While partial and full gaps are defined purely through price overlap conditions, their statistical occurrence is deeply influenced by market microstructure and discrete information events. This section examines the structural, institutional, and event-driven factors that shape gap distributions in Indian equities, without extending into trading strategy or post-gap price behavior.
Event-Based Drivers of Gap Formation
Hard Information Events
Hard information events introduce high-certainty changes to fundamental valuation and frequently result in full gaps, as the overnight information invalidates the prior session’s price discovery range.
Common Hard Event Categories
- Quarterly and annual earnings announcements
- Regulatory actions and compliance disclosures
- Mergers, demergers, and capital restructuring
- Credit rating changes
- Judicial rulings affecting operations
Soft and Anticipatory Information Events
Soft information alters sentiment without conclusively resetting valuation. Such events more frequently result in partial gaps, where initial imbalance is corrected during the session.
Common Soft Event Categories
- Global index futures movement
- Overseas ADR/GDR price changes
- Sector-wide macroeconomic news
- Pre-positioning ahead of scheduled announcements
The Role of the NSE Pre-Open Session
Pre-Open Auction Mechanics
The NSE pre-open session aggregates overnight orders into a call auction, producing the official opening price. This process compresses information asymmetry but does not eliminate price discontinuities.
Statistical Implications for Gap Classification
From a classification perspective, the pre-open price represents a new equilibrium estimate. Full gaps reflect equilibria entirely outside the prior session’s feasible price region, while partial gaps indicate equilibria near the boundary.
Data Quality Filters and Gap Validity
Why Filtering Is Essential
Not all detected gaps represent economically meaningful discontinuities. Structural distortions must be filtered to preserve statistical integrity.
Liquidity-Based Filters
Formal Mathematical Definition of Volume Filter
Where θ is a minimum liquidity threshold (e.g., 0.3) and Vt,avg is rolling average volume.
Python Implementation of Liquidity Filter
def liquidity_filter(df, window=30, threshold=0.3):
avg_vol = df['volume'].rolling(window).mean()
return df['volume'] >= threshold * avg_vol
Trading Suspension and Circuit Filters
Sessions impacted by trading halts, upper/lower circuits, or surveillance actions must be excluded, as price ranges are mechanically constrained.
Gap Frequency and Distribution Analysis
Empirical Distribution Characteristics
Across Indian equities, partial gaps occur with significantly higher frequency than full gaps. Full gaps exhibit heavier tails and stronger clustering around event-heavy periods.
Formal Gap Frequency Metric
Mathematical Definition of Gap Frequency
Here, 𝟙(·) denotes the indicator function.
Python Implementation of Gap Frequency
def gap_frequency(df):
return (df['gap'] != 0).mean()
Fetch → Store → Measure Workflow for Event-Aware Gap Analysis
Fetch
- Adjusted daily OHLC data
- Corporate action records
- Volume and turnover data
- Trading halt and circuit metadata
Store
- Partitioned time-series databases
- Event metadata tables
- Quality-flag columns
Measure
- Filtered gap counts
- Event-conditioned gap distributions
- Sector-level aggregation
Impact Across Trading Horizons
Short-Term Horizon
In short horizons, gap statistics are dominated by event density and overnight information flow, with strong sensitivity to market-wide news.
Medium-Term Horizon
Over medium horizons, gap frequency stabilizes into sector-specific signatures reflecting disclosure intensity and regulatory exposure.
Long-Term Horizon
Long-term gap distributions capture structural changes such as index inclusion, ownership transitions, and liquidity regime shifts.
These effects describe the statistical behavior of gaps rather than actionable signals.
Advanced Quantitative Metrics, Data Architecture, and Production-Grade Integration
This final section completes the comprehensive framework by introducing advanced quantitative measures, formal mathematical definitions, production-grade data architectures, curated data sourcing methodologies, and a consolidated view of all Python libraries applicable to statistical gap classification in Indian equity markets.
Advanced Gap Magnitude Normalisation Metrics
Gap Percentage Normalisation
Raw gap magnitudes must be normalised to allow comparison across stocks with different price levels and volatility regimes.
Formal Mathematical Definition of Gap Percentage
Python Implementation of Gap Percentage
def gap_percentage(open_price, prev_close):
return abs(open_price - prev_close) / prev_close * 100
Gap-to-Range Ratio
This metric expresses the opening gap relative to the previous session’s total price range.
Formal Mathematical Definition of Gap-to-Range Ratio
Python Implementation of Gap-to-Range Ratio
def gap_to_range(open_price, prev_close, prev_high, prev_low):
return abs(open_price - prev_close) / (prev_high - prev_low)
Volatility-Adjusted Gap Metrics
Average True Range Normalisation
ATR-based normalisation allows gap magnitude to be interpreted in the context of prevailing volatility.
Formal Mathematical Definition of Average True Range
Python Implementation of ATR
def compute_atr(df, window=14):
high_low = df['H'] - df['L']
high_close = (df['H'] - df['C'].shift(1)).abs()
low_close = (df['L'] - df['C'].shift(1)).abs()
true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)
return true_range.rolling(window).mean()
Normalized Gap Z-Score
Formal Mathematical Definition of Gap Z-Score
Python Implementation of Gap Z-Score
def gap_zscore(open_price, prev_close, prev_atr):
return (open_price - prev_close) / prev_atr
Production-Grade Fetch → Store → Measure Architecture
Data Fetch Layer
- Exchange bhavcopies (daily OHLCV)
- Corporate action adjustment files
- Index composition history
- Trading halt and surveillance indicators
- Macro-event calendars
Data Store Layer
- Columnar storage using Parquet
- Symbol/year partitioning
- Immutable raw tables
- Derived feature tables for gap metrics
- Metadata flags for data quality
Data Measure Layer
- Gap classification labels
- Normalized gap magnitudes
- Volatility-adjusted ratios
- Event-conditioned aggregates
- Sector and index-level rollups
Database Structure and Storage Design
Core Time-Series Schema
- Trade date (primary key)
- Symbol identifier
- Open, High, Low, Close, Volume
- Adjusted price fields
- Corporate action factor
Gap Metadata Schema
- Gap classification label
- Gap magnitude
- Gap percentage
- Gap-to-range ratio
- ATR-normalized gap score
- Liquidity and validity flags
Python Libraries Used and Their Roles
Core Data Libraries
- pandas – time-series manipulation, rolling windows, joins
- numpy – vectorized numerical computation
- polars – high-performance columnar analytics
Data Storage and Performance
- pyarrow – Parquet IO and memory efficiency
- duckdb – analytical SQL over Parquet
Market Data Access
- yfinance – demonstration-grade OHLC fetch
- exchange-native feeds – production-grade ingestion
Visualization and Diagnostics
- matplotlib – distribution and density plots
- seaborn – exploratory visualization
Curated Data Sourcing Methodologies
- Primary exchange-distributed daily bhavcopies
- Separate ingestion of corporate actions to avoid false gaps
- Pre-open and auction metadata for open price integrity
- Calendar-aligned macroeconomic event datasets
News Trigger Classification Framework
- Macro-economic policy events
- Corporate disclosures and earnings
- Regulatory and compliance actions
- Global market and currency shocks
Multi-Horizon Statistical Impact Summary
Short-Term Horizon
Gap statistics influence volatility clustering, liquidity discontinuity, and microstructure noise measurement.
Medium-Term Horizon
Aggregated gap metrics help identify regime shifts, disclosure intensity, and sectoral sensitivity.
Long-Term Horizon
Persistent full-gap patterns contribute to structural break analysis and long-horizon volatility modelling.
Conclusion and Industry Application
This four-part article established a complete, production-ready, Python-centric framework for the statistical classification of partial and full gaps in Indian equity markets. By prioritizing formal definitions, data integrity, and reproducible workflows, it enables robust quantitative research without reliance on subjective interpretation.
For organizations seeking enterprise-grade market data engineering, analytics pipelines, or quantitative research systems built in Python, TheUniBit delivers scalable solutions aligned with the methodologies described in this article.
