Partial Gaps vs Full Gaps: Statistical Classification

This Python-centric guide explains how closing prices are statistically determined on NSE and BSE. It covers exchange auction mechanics, volume-weighted calculations, data engineering workflows, and algorithmic validation techniques, helping developers build accurate stock analysis and selection systems using reliable Indian market price data.

Table Of Contents
  1. Conceptual Foundations of Price Gaps
  2. Formal Statistical Definition of Gaps
  3. Classification Logic: Partial Gaps vs Full Gaps
  4. Python-Centric Gap Detection Algorithms
  5. Impact Across Trading Horizons (Classification Perspective)
  6. Normalization, Volatility Scaling, and Cross-Sectional Gap Comparability
  7. Price-Level Normalization of Gaps
  8. Volatility-Scaled Gap Metrics
  9. Corporate Action Integrity and Gap Validity
  10. Fetch → Store → Measure Workflow for Normalized Gaps
  11. Impact Across Trading Horizons
  12. Event Drivers, Market Microstructure, and Statistical Integrity of Gaps
  13. Event-Based Drivers of Gap Formation
  14. The Role of the NSE Pre-Open Session
  15. Data Quality Filters and Gap Validity
  16. Gap Frequency and Distribution Analysis
  17. Fetch → Store → Measure Workflow for Event-Aware Gap Analysis
  18. Impact Across Trading Horizons
  19. Advanced Quantitative Metrics, Data Architecture, and Production-Grade Integration
  20. Advanced Gap Magnitude Normalisation Metrics
  21. Volatility-Adjusted Gap Metrics
  22. Production-Grade Fetch → Store → Measure Architecture
  23. Database Structure and Storage Design
  24. Python Libraries Used and Their Roles
  25. Curated Data Sourcing Methodologies
  26. News Trigger Classification Framework
  27. Multi-Horizon Statistical Impact Summary
  28. Conclusion and Industry Application

Price gaps are among the most structurally important discontinuities in Indian equity price series. Unlike indicators derived from continuous intraday movement, gaps originate from discrete information arrival and market microstructure constraints between sessions. This article presents a rigorous, Python-centric, data-first framework for the statistical classification of partial gaps and full gaps, explicitly excluding strategy logic or post-gap price behavior.

The focus is on formal definitions, measurable conditions, reproducible algorithms, and data engineering workflows suitable for institutional-grade research and production analytics within the Indian stock market context.

Conceptual Foundations of Price Gaps

What Constitutes a Price Gap in Indian Markets

In Indian equity markets, a price gap arises when the official opening price of a trading session is discontinuous relative to the previous session’s trading range. This discontinuity reflects overnight information assimilation under constrained liquidity and auction-based opening mechanisms.

Unlike intraday volatility, gaps are not emergent properties of continuous trading. They are boundary phenomena occurring at the interface of two discrete trading sessions and must therefore be defined using exchange-published reference prices only.

Exchange Price References Used in Gap Measurement

Previous Session References

  • Previous session close price
  • Previous session intraday high
  • Previous session intraday low

Current Session References

  • Official opening price (post pre-open auction)
  • Current session intraday high
  • Current session intraday low

No intraday tick, VWAP, or midpoint prices are admissible for gap classification, as gaps are strictly inter-session constructs.

Formal Statistical Definition of Gaps

Notation and Price Variables

Let the following variables be defined for a given equity:

  • Pc,t−1: Previous session closing price
  • Ht−1: Previous session high
  • Lt−1: Previous session low
  • Ot: Current session official opening price
  • Ht: Current session high
  • Lt: Current session low

Gap Magnitude Definition

Formal Mathematical Definition of Gap Magnitude


Gt
=
Ot

Pc,t1


This raw gap magnitude is directional and forms the base variable for all subsequent classification logic.

Absolute Gap Size

Absolute Gap Formula


|
Gt
|
=
|
Ot

Pc,t1
|


Absolute magnitude is used for normalization and cross-sectional comparison.

Classification Logic: Partial Gaps vs Full Gaps

Definition of Partial Gaps

A partial gap occurs when the current session opens outside the previous close, but the current session’s trading range overlaps with the previous session’s range.

Mathematical Conditions for Partial Gaps


Ot
>
Pc,t1
&&
Lt

Ht1





Ot
<
Pc,t1
&&
Ht

Lt1


Partial gaps indicate overnight imbalance that is at least partially resolved intraday.

Definition of Full Gaps

A full gap occurs when the current session’s entire trading range does not overlap with the previous session’s range.

Mathematical Conditions for Full Gaps


Lt
>
Ht1





Ht
<
Lt1


Full gaps represent a complete reset of the price discovery region between sessions.

Python-Centric Gap Detection Algorithms

Fetch → Store → Measure Workflow

Gap classification must be embedded within a deterministic data pipeline to ensure reproducibility and auditability.

Fetch

  • Daily OHLC data from NSE or BSE
  • Pre-open adjusted official opening price
  • Corporate-action-adjusted historical series

Store

  • Columnar storage (Parquet / DuckDB)
  • Date-indexed price frames
  • Corporate action factor tables

Measure

  • Vectorized gap magnitude computation
  • Boolean classification masks
  • Cross-sectional normalization

Python Implementation: Core Gap Classification

Python Algorithm for Partial and Full Gap Detection
import pandas as pd

def classify_gaps(df):
    df = df.copy()
    df['gap'] = df['open'] - df['close'].shift(1)

    df['partial_gap'] = (
        ((df['open'] > df['close'].shift(1)) & (df['low'] <= df['high'].shift(1))) |
        ((df['open'] < df['close'].shift(1)) & (df['high'] >= df['low'].shift(1)))
    )

    df['full_gap'] = (
        (df['low'] > df['high'].shift(1)) |
        (df['high'] < df['low'].shift(1))
    )

    return df

This implementation is intentionally minimal and deterministic, making it suitable for both exploratory research and production pipelines.

Impact Across Trading Horizons (Classification Perspective)

Short-Term Horizon

In short horizons, partial gaps dominate statistically and reflect transient liquidity imbalances rather than structural repricing.

Medium-Term Horizon

Medium-term distributions of full gaps often cluster around corporate disclosures and regulatory events.

Long-Term Horizon

Over longer horizons, persistent full gaps often coincide with regime shifts such as index inclusion, ownership change, or capital structure transformation.

These horizon effects are descriptive properties of gap distributions and do not imply any predictive strategy.

Normalization, Volatility Scaling, and Cross-Sectional Gap Comparability

Raw gap magnitudes are not directly comparable across stocks, sectors, or time periods due to differences in price levels, volatility regimes, and liquidity profiles. For statistically meaningful analysis, gap measurements must be normalized and scaled using robust quantitative constructs. This section formalizes those constructs and provides Python-centric implementations suitable for Indian equity data.

Price-Level Normalization of Gaps

Why Absolute Gaps Are Insufficient

A ₹10 gap has vastly different statistical meaning for a ₹100 stock versus a ₹2,000 stock. Therefore, gap magnitude must be expressed relative to an appropriate price anchor to enable cross-sectional analysis.

Close-Price Normalized Gap

Formal Mathematical Definition of Price-Normalized Gap


GNt
=

Ot

Pc,t1

Pc,t1


This ratio expresses the gap as a percentage of the previous close, making it invariant to absolute price levels.

Python Implementation of Price-Normalized Gap
def normalized_gap(df):
    df = df.copy()
    df['gap_norm'] = (df['open'] - df['close'].shift(1)) / df['close'].shift(1)
    return df

Volatility-Scaled Gap Metrics

Rationale for Volatility Scaling

Even normalized gaps must be interpreted relative to a stock’s historical volatility. A 1% gap is routine for high-beta stocks but statistically extreme for low-volatility defensive equities.

Realized Volatility Estimation

Formal Mathematical Definition of Realized Volatility


σt,n
=



1
n



i=1
n

(
rti

μ
)
2




Where r represents daily log returns and μ is their mean over the lookback window.

Python Implementation of Realized Volatility
import numpy as np

def realized_volatility(df, window=20):
    log_returns = np.log(df['close'] / df['close'].shift(1))
    return log_returns.rolling(window).std()

Volatility-Scaled Gap

Formal Mathematical Definition of Volatility-Scaled Gap


GVt
=

Gt
σt,n



This measure expresses the gap in units of historical volatility, enabling regime-aware comparisons.

Python Implementation of Volatility-Scaled Gap
def volatility_scaled_gap(df, window=20):
    df = df.copy()
    df['gap'] = df['open'] - df['close'].shift(1)
    df['vol'] = realized_volatility(df, window)
    df['gap_vol_scaled'] = df['gap'] / df['vol']
    return df

Corporate Action Integrity and Gap Validity

Why Corporate Actions Distort Gap Statistics

Stock splits, bonuses, rights issues, and spin-offs mechanically alter price levels. If unadjusted, these events produce artificial full gaps that do not represent genuine information-driven discontinuities.

Backward Price Adjustment Framework

Formal Mathematical Definition of Adjusted Price


Pt,adj
=
Pt
×
At


Where At is the cumulative corporate action adjustment factor.

Python Implementation of Corporate Action Adjustment
def apply_adjustment(df, factor_col='adj_factor'):
    price_cols = ['open', 'high', 'low', 'close']
    for col in price_cols:
        df[col] = df[col] * df[factor_col]
    return df

Fetch → Store → Measure Workflow for Normalized Gaps

Fetch

  • Adjusted OHLC data
  • Corporate action factor history
  • Continuous daily close series

Store

  • Time-series optimized storage
  • Separate adjustment factor tables
  • Immutable raw data layers

Measure

  • Normalized gap ratios
  • Volatility-scaled gap scores
  • Distributional summaries

Impact Across Trading Horizons

Short-Term Horizon

Volatility-scaled gaps highlight statistically extreme overnight moves that raw gaps often obscure in high-volatility stocks.

Medium-Term Horizon

Normalized gap distributions stabilize across sectors, allowing structural comparison without price-level bias.

Long-Term Horizon

Persistent changes in volatility-scaled gap behavior often coincide with liquidity regime shifts and index reclassification.

Event Drivers, Market Microstructure, and Statistical Integrity of Gaps

While partial and full gaps are defined purely through price overlap conditions, their statistical occurrence is deeply influenced by market microstructure and discrete information events. This section examines the structural, institutional, and event-driven factors that shape gap distributions in Indian equities, without extending into trading strategy or post-gap price behavior.

Event-Based Drivers of Gap Formation

Hard Information Events

Hard information events introduce high-certainty changes to fundamental valuation and frequently result in full gaps, as the overnight information invalidates the prior session’s price discovery range.

Common Hard Event Categories

  • Quarterly and annual earnings announcements
  • Regulatory actions and compliance disclosures
  • Mergers, demergers, and capital restructuring
  • Credit rating changes
  • Judicial rulings affecting operations

Soft and Anticipatory Information Events

Soft information alters sentiment without conclusively resetting valuation. Such events more frequently result in partial gaps, where initial imbalance is corrected during the session.

Common Soft Event Categories

  • Global index futures movement
  • Overseas ADR/GDR price changes
  • Sector-wide macroeconomic news
  • Pre-positioning ahead of scheduled announcements

The Role of the NSE Pre-Open Session

Pre-Open Auction Mechanics

The NSE pre-open session aggregates overnight orders into a call auction, producing the official opening price. This process compresses information asymmetry but does not eliminate price discontinuities.

Statistical Implications for Gap Classification

From a classification perspective, the pre-open price represents a new equilibrium estimate. Full gaps reflect equilibria entirely outside the prior session’s feasible price region, while partial gaps indicate equilibria near the boundary.

Data Quality Filters and Gap Validity

Why Filtering Is Essential

Not all detected gaps represent economically meaningful discontinuities. Structural distortions must be filtered to preserve statistical integrity.

Liquidity-Based Filters

Formal Mathematical Definition of Volume Filter


Vt

θ
×
Vt,avg


Where θ is a minimum liquidity threshold (e.g., 0.3) and Vt,avg is rolling average volume.

Python Implementation of Liquidity Filter
def liquidity_filter(df, window=30, threshold=0.3):
    avg_vol = df['volume'].rolling(window).mean()
    return df['volume'] >= threshold * avg_vol

Trading Suspension and Circuit Filters

Sessions impacted by trading halts, upper/lower circuits, or surveillance actions must be excluded, as price ranges are mechanically constrained.

Gap Frequency and Distribution Analysis

Empirical Distribution Characteristics

Across Indian equities, partial gaps occur with significantly higher frequency than full gaps. Full gaps exhibit heavier tails and stronger clustering around event-heavy periods.

Formal Gap Frequency Metric

Mathematical Definition of Gap Frequency


FG
=



t=1
T

𝟙
(
gapt

0
)

T


Here, 𝟙(·) denotes the indicator function.

Python Implementation of Gap Frequency
def gap_frequency(df):
    return (df['gap'] != 0).mean()

Fetch → Store → Measure Workflow for Event-Aware Gap Analysis

Fetch

  • Adjusted daily OHLC data
  • Corporate action records
  • Volume and turnover data
  • Trading halt and circuit metadata

Store

  • Partitioned time-series databases
  • Event metadata tables
  • Quality-flag columns

Measure

  • Filtered gap counts
  • Event-conditioned gap distributions
  • Sector-level aggregation

Impact Across Trading Horizons

Short-Term Horizon

In short horizons, gap statistics are dominated by event density and overnight information flow, with strong sensitivity to market-wide news.

Medium-Term Horizon

Over medium horizons, gap frequency stabilizes into sector-specific signatures reflecting disclosure intensity and regulatory exposure.

Long-Term Horizon

Long-term gap distributions capture structural changes such as index inclusion, ownership transitions, and liquidity regime shifts.

These effects describe the statistical behavior of gaps rather than actionable signals.

Advanced Quantitative Metrics, Data Architecture, and Production-Grade Integration

This final section completes the comprehensive framework by introducing advanced quantitative measures, formal mathematical definitions, production-grade data architectures, curated data sourcing methodologies, and a consolidated view of all Python libraries applicable to statistical gap classification in Indian equity markets.

Advanced Gap Magnitude Normalisation Metrics

Gap Percentage Normalisation

Raw gap magnitudes must be normalised to allow comparison across stocks with different price levels and volatility regimes.

Formal Mathematical Definition of Gap Percentage


Gpctt
=

|
Ot
-
Ct-1
|

Ct-1
×
100


Python Implementation of Gap Percentage
def gap_percentage(open_price, prev_close):
    return abs(open_price - prev_close) / prev_close * 100

Gap-to-Range Ratio

This metric expresses the opening gap relative to the previous session’s total price range.

Formal Mathematical Definition of Gap-to-Range Ratio


GRR
=

|
Ot
-
Ct-1
|

Ht-1
-
Lt-1


Python Implementation of Gap-to-Range Ratio
def gap_to_range(open_price, prev_close, prev_high, prev_low):
    return abs(open_price - prev_close) / (prev_high - prev_low)

Volatility-Adjusted Gap Metrics

Average True Range Normalisation

ATR-based normalisation allows gap magnitude to be interpreted in the context of prevailing volatility.

Formal Mathematical Definition of Average True Range


TRt
=
max
(
Ht
-
Lt
,
|
Ht
-
Ct-1
|
,
|
Lt
-
Ct-1
|
)





ATRt
=

1
n



i=0
n

TRt-i


Python Implementation of ATR
def compute_atr(df, window=14):
    high_low = df['H'] - df['L']
    high_close = (df['H'] - df['C'].shift(1)).abs()
    low_close = (df['L'] - df['C'].shift(1)).abs()
    true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)
    return true_range.rolling(window).mean()

Normalized Gap Z-Score

Formal Mathematical Definition of Gap Z-Score


Zgapt
=

Ot
-
Ct-1

ATRt-1


Python Implementation of Gap Z-Score
def gap_zscore(open_price, prev_close, prev_atr):
    return (open_price - prev_close) / prev_atr

Production-Grade Fetch → Store → Measure Architecture

Data Fetch Layer

  • Exchange bhavcopies (daily OHLCV)
  • Corporate action adjustment files
  • Index composition history
  • Trading halt and surveillance indicators
  • Macro-event calendars

Data Store Layer

  • Columnar storage using Parquet
  • Symbol/year partitioning
  • Immutable raw tables
  • Derived feature tables for gap metrics
  • Metadata flags for data quality

Data Measure Layer

  • Gap classification labels
  • Normalized gap magnitudes
  • Volatility-adjusted ratios
  • Event-conditioned aggregates
  • Sector and index-level rollups

Database Structure and Storage Design

Core Time-Series Schema

  • Trade date (primary key)
  • Symbol identifier
  • Open, High, Low, Close, Volume
  • Adjusted price fields
  • Corporate action factor

Gap Metadata Schema

  • Gap classification label
  • Gap magnitude
  • Gap percentage
  • Gap-to-range ratio
  • ATR-normalized gap score
  • Liquidity and validity flags

Python Libraries Used and Their Roles

Core Data Libraries

  • pandas – time-series manipulation, rolling windows, joins
  • numpy – vectorized numerical computation
  • polars – high-performance columnar analytics

Data Storage and Performance

  • pyarrow – Parquet IO and memory efficiency
  • duckdb – analytical SQL over Parquet

Market Data Access

  • yfinance – demonstration-grade OHLC fetch
  • exchange-native feeds – production-grade ingestion

Visualization and Diagnostics

  • matplotlib – distribution and density plots
  • seaborn – exploratory visualization

Curated Data Sourcing Methodologies

  • Primary exchange-distributed daily bhavcopies
  • Separate ingestion of corporate actions to avoid false gaps
  • Pre-open and auction metadata for open price integrity
  • Calendar-aligned macroeconomic event datasets

News Trigger Classification Framework

  • Macro-economic policy events
  • Corporate disclosures and earnings
  • Regulatory and compliance actions
  • Global market and currency shocks

Multi-Horizon Statistical Impact Summary

Short-Term Horizon

Gap statistics influence volatility clustering, liquidity discontinuity, and microstructure noise measurement.

Medium-Term Horizon

Aggregated gap metrics help identify regime shifts, disclosure intensity, and sectoral sensitivity.

Long-Term Horizon

Persistent full-gap patterns contribute to structural break analysis and long-horizon volatility modelling.

Conclusion and Industry Application

This four-part article established a complete, production-ready, Python-centric framework for the statistical classification of partial and full gaps in Indian equity markets. By prioritizing formal definitions, data integrity, and reproducible workflows, it enables robust quantitative research without reliance on subjective interpretation.

For organizations seeking enterprise-grade market data engineering, analytics pipelines, or quantitative research systems built in Python, TheUniBit delivers scalable solutions aligned with the methodologies described in this article.

Scroll to Top