Partial Gaps vs Full Gaps: Statistical Classification

Table Of Contents

Conceptual Foundations of Price Gaps
Formal Statistical Definition of Gaps
- Formal Mathematical Definition of Gap Magnitude
- Absolute Gap Formula
Classification Logic: Partial Gaps vs Full Gaps
- Mathematical Conditions for Partial Gaps
- Mathematical Conditions for Full Gaps
Python-Centric Gap Detection Algorithms
- Python Algorithm for Partial and Full Gap Detection
Impact Across Trading Horizons (Classification Perspective)
Normalization, Volatility Scaling, and Cross-Sectional Gap Comparability
Price-Level Normalization of Gaps
- Formal Mathematical Definition of Price-Normalized Gap
- Python Implementation of Price-Normalized Gap
Volatility-Scaled Gap Metrics
- Formal Mathematical Definition of Realized Volatility
- Python Implementation of Realized Volatility
- Formal Mathematical Definition of Volatility-Scaled Gap
- Python Implementation of Volatility-Scaled Gap
Corporate Action Integrity and Gap Validity
- Formal Mathematical Definition of Adjusted Price
- Python Implementation of Corporate Action Adjustment
Fetch → Store → Measure Workflow for Normalized Gaps
Impact Across Trading Horizons
Event Drivers, Market Microstructure, and Statistical Integrity of Gaps
Event-Based Drivers of Gap Formation
The Role of the NSE Pre-Open Session
Data Quality Filters and Gap Validity
- Formal Mathematical Definition of Volume Filter
- Python Implementation of Liquidity Filter
Gap Frequency and Distribution Analysis
- Mathematical Definition of Gap Frequency
- Python Implementation of Gap Frequency
Fetch → Store → Measure Workflow for Event-Aware Gap Analysis
Impact Across Trading Horizons
Advanced Quantitative Metrics, Data Architecture, and Production-Grade Integration
Advanced Gap Magnitude Normalisation Metrics
- Formal Mathematical Definition of Gap Percentage
- Python Implementation of Gap Percentage
- Formal Mathematical Definition of Gap-to-Range Ratio
- Python Implementation of Gap-to-Range Ratio
Volatility-Adjusted Gap Metrics
- Formal Mathematical Definition of Average True Range
- Python Implementation of ATR
- Formal Mathematical Definition of Gap Z-Score
- Python Implementation of Gap Z-Score
Production-Grade Fetch → Store → Measure Architecture
Database Structure and Storage Design
Python Libraries Used and Their Roles
Curated Data Sourcing Methodologies
News Trigger Classification Framework
Multi-Horizon Statistical Impact Summary
Conclusion and Industry Application

Price gaps are among the most structurally important discontinuities in Indian equity price series. Unlike indicators derived from continuous intraday movement, gaps originate from discrete information arrival and market microstructure constraints between sessions. This article presents a rigorous, Python-centric, data-first framework for the statistical classification of partial gaps and full gaps, explicitly excluding strategy logic or post-gap price behavior.

The focus is on formal definitions, measurable conditions, reproducible algorithms, and data engineering workflows suitable for institutional-grade research and production analytics within the Indian stock market context.

Conceptual Foundations of Price Gaps

What Constitutes a Price Gap in Indian Markets

In Indian equity markets, a price gap arises when the official opening price of a trading session is discontinuous relative to the previous session’s trading range. This discontinuity reflects overnight information assimilation under constrained liquidity and auction-based opening mechanisms.

Unlike intraday volatility, gaps are not emergent properties of continuous trading. They are boundary phenomena occurring at the interface of two discrete trading sessions and must therefore be defined using exchange-published reference prices only.

Exchange Price References Used in Gap Measurement

Previous Session References

Previous session close price
Previous session intraday high
Previous session intraday low

Current Session References

Official opening price (post pre-open auction)
Current session intraday high
Current session intraday low

No intraday tick, VWAP, or midpoint prices are admissible for gap classification, as gaps are strictly inter-session constructs.

Formal Statistical Definition of Gaps

Notation and Price Variables

Let the following variables be defined for a given equity:

P_c,t−1: Previous session closing price
H_t−1: Previous session high
L_t−1: Previous session low
O_t: Current session official opening price
H_t: Current session high
L_t: Current session low

Gap Magnitude Definition

Formal Mathematical Definition of Gap Magnitude

 $G t = O t - P c_{, t - 1}$

This raw gap magnitude is directional and forms the base variable for all subsequent classification logic.

Absolute Gap Size

Absolute Gap Formula

 $| G t | = | O t - P c_{, t - 1} |$

Absolute magnitude is used for normalization and cross-sectional comparison.

Classification Logic: Partial Gaps vs Full Gaps

Definition of Partial Gaps

A partial gap occurs when the current session opens outside the previous close, but the current session’s trading range overlaps with the previous session’s range.

Mathematical Conditions for Partial Gaps

 $O t > P c_{, t - 1} && L t \leq H t_{- 1}$ 

 $O t < P c_{, t - 1} && H t \geq L t_{- 1}$

Partial gaps indicate overnight imbalance that is at least partially resolved intraday.

Definition of Full Gaps

A full gap occurs when the current session’s entire trading range does not overlap with the previous session’s range.

Mathematical Conditions for Full Gaps

 $L t > H t_{- 1}$ 

 $H t < L t_{- 1}$

Full gaps represent a complete reset of the price discovery region between sessions.

Python-Centric Gap Detection Algorithms

Fetch → Store → Measure Workflow

Gap classification must be embedded within a deterministic data pipeline to ensure reproducibility and auditability.

Fetch

Daily OHLC data from NSE or BSE
Pre-open adjusted official opening price
Corporate-action-adjusted historical series

Store

Columnar storage (Parquet / DuckDB)
Date-indexed price frames
Corporate action factor tables

Measure

Vectorized gap magnitude computation
Boolean classification masks
Cross-sectional normalization

Python Implementation: Core Gap Classification

Python Algorithm for Partial and Full Gap Detection

import pandas as pd

def classify_gaps(df):
    df = df.copy()
    df['gap'] = df['open'] - df['close'].shift(1)

    df['partial_gap'] = (
        ((df['open'] > df['close'].shift(1)) & (df['low'] <= df['high'].shift(1))) |
        ((df['open'] < df['close'].shift(1)) & (df['high'] >= df['low'].shift(1)))
    )

    df['full_gap'] = (
        (df['low'] > df['high'].shift(1)) |
        (df['high'] < df['low'].shift(1))
    )

    return df

This implementation is intentionally minimal and deterministic, making it suitable for both exploratory research and production pipelines.

Impact Across Trading Horizons (Classification Perspective)

Short-Term Horizon

In short horizons, partial gaps dominate statistically and reflect transient liquidity imbalances rather than structural repricing.

Medium-Term Horizon

Medium-term distributions of full gaps often cluster around corporate disclosures and regulatory events.

Long-Term Horizon

Over longer horizons, persistent full gaps often coincide with regime shifts such as index inclusion, ownership change, or capital structure transformation.

These horizon effects are descriptive properties of gap distributions and do not imply any predictive strategy.

Normalization, Volatility Scaling, and Cross-Sectional Gap Comparability

Raw gap magnitudes are not directly comparable across stocks, sectors, or time periods due to differences in price levels, volatility regimes, and liquidity profiles. For statistically meaningful analysis, gap measurements must be normalized and scaled using robust quantitative constructs. This section formalizes those constructs and provides Python-centric implementations suitable for Indian equity data.

Price-Level Normalization of Gaps

Why Absolute Gaps Are Insufficient

A ₹10 gap has vastly different statistical meaning for a ₹100 stock versus a ₹2,000 stock. Therefore, gap magnitude must be expressed relative to an appropriate price anchor to enable cross-sectional analysis.

Close-Price Normalized Gap

Formal Mathematical Definition of Price-Normalized Gap

 $G N t = \frac{O}{t} P c_{, t - 1}$

This ratio expresses the gap as a percentage of the previous close, making it invariant to absolute price levels.

Python Implementation of Price-Normalized Gap

def normalized_gap(df):
    df = df.copy()
    df['gap_norm'] = (df['open'] - df['close'].shift(1)) / df['close'].shift(1)
    return df

Volatility-Scaled Gap Metrics

Rationale for Volatility Scaling

Even normalized gaps must be interpreted relative to a stock’s historical volatility. A 1% gap is routine for high-beta stocks but statistically extreme for low-volatility defensive equities.

Realized Volatility Estimation

Formal Mathematical Definition of Realized Volatility

 $σ t_{, n} = \sqrt{\frac{1}{n} \sum_{i}^{=} (r t_{- i} - μ) 2}$

Where r represents daily log returns and μ is their mean over the lookback window.

Python Implementation of Realized Volatility

import numpy as np

def realized_volatility(df, window=20):
    log_returns = np.log(df['close'] / df['close'].shift(1))
    return log_returns.rolling(window).std()

Volatility-Scaled Gap

Formal Mathematical Definition of Volatility-Scaled Gap

 $G V t = \frac{G}{t}$

This measure expresses the gap in units of historical volatility, enabling regime-aware comparisons.

Python Implementation of Volatility-Scaled Gap

def volatility_scaled_gap(df, window=20):
    df = df.copy()
    df['gap'] = df['open'] - df['close'].shift(1)
    df['vol'] = realized_volatility(df, window)
    df['gap_vol_scaled'] = df['gap'] / df['vol']
    return df

Corporate Action Integrity and Gap Validity

Why Corporate Actions Distort Gap Statistics

Stock splits, bonuses, rights issues, and spin-offs mechanically alter price levels. If unadjusted, these events produce artificial full gaps that do not represent genuine information-driven discontinuities.

Backward Price Adjustment Framework

Formal Mathematical Definition of Adjusted Price

 $P t_{, adj} = P t \times A t$

Where A_t is the cumulative corporate action adjustment factor.

Python Implementation of Corporate Action Adjustment

def apply_adjustment(df, factor_col='adj_factor'):
    price_cols = ['open', 'high', 'low', 'close']
    for col in price_cols:
        df[col] = df[col] * df[factor_col]
    return df

Fetch → Store → Measure Workflow for Normalized Gaps

Fetch

Adjusted OHLC data
Corporate action factor history
Continuous daily close series

Store

Time-series optimized storage
Separate adjustment factor tables
Immutable raw data layers

Measure

Normalized gap ratios
Volatility-scaled gap scores
Distributional summaries

Impact Across Trading Horizons

Short-Term Horizon

Volatility-scaled gaps highlight statistically extreme overnight moves that raw gaps often obscure in high-volatility stocks.

Medium-Term Horizon

Normalized gap distributions stabilize across sectors, allowing structural comparison without price-level bias.

Long-Term Horizon

Persistent changes in volatility-scaled gap behavior often coincide with liquidity regime shifts and index reclassification.

Event Drivers, Market Microstructure, and Statistical Integrity of Gaps

While partial and full gaps are defined purely through price overlap conditions, their statistical occurrence is deeply influenced by market microstructure and discrete information events. This section examines the structural, institutional, and event-driven factors that shape gap distributions in Indian equities, without extending into trading strategy or post-gap price behavior.

Event-Based Drivers of Gap Formation

Hard Information Events

Hard information events introduce high-certainty changes to fundamental valuation and frequently result in full gaps, as the overnight information invalidates the prior session’s price discovery range.

Common Hard Event Categories

Quarterly and annual earnings announcements
Regulatory actions and compliance disclosures
Mergers, demergers, and capital restructuring
Credit rating changes
Judicial rulings affecting operations

Soft and Anticipatory Information Events

Soft information alters sentiment without conclusively resetting valuation. Such events more frequently result in partial gaps, where initial imbalance is corrected during the session.

Common Soft Event Categories

Global index futures movement
Overseas ADR/GDR price changes
Sector-wide macroeconomic news
Pre-positioning ahead of scheduled announcements

The Role of the NSE Pre-Open Session

Pre-Open Auction Mechanics

The NSE pre-open session aggregates overnight orders into a call auction, producing the official opening price. This process compresses information asymmetry but does not eliminate price discontinuities.

Statistical Implications for Gap Classification

From a classification perspective, the pre-open price represents a new equilibrium estimate. Full gaps reflect equilibria entirely outside the prior session’s feasible price region, while partial gaps indicate equilibria near the boundary.

Data Quality Filters and Gap Validity

Why Filtering Is Essential

Not all detected gaps represent economically meaningful discontinuities. Structural distortions must be filtered to preserve statistical integrity.

Liquidity-Based Filters

Formal Mathematical Definition of Volume Filter

 $V t \geq θ \times V t_{, avg}$

Where θ is a minimum liquidity threshold (e.g., 0.3) and V_t,avg is rolling average volume.

Python Implementation of Liquidity Filter

def liquidity_filter(df, window=30, threshold=0.3):
    avg_vol = df['volume'].rolling(window).mean()
    return df['volume'] >= threshold * avg_vol

Trading Suspension and Circuit Filters

Sessions impacted by trading halts, upper/lower circuits, or surveillance actions must be excluded, as price ranges are mechanically constrained.

Gap Frequency and Distribution Analysis

Empirical Distribution Characteristics

Across Indian equities, partial gaps occur with significantly higher frequency than full gaps. Full gaps exhibit heavier tails and stronger clustering around event-heavy periods.

Formal Gap Frequency Metric

Mathematical Definition of Gap Frequency

 $F G = \frac{\sum_{t}^{=}}{𝟙} T$

Here, 𝟙(·) denotes the indicator function.

Python Implementation of Gap Frequency

def gap_frequency(df):
    return (df['gap'] != 0).mean()

Fetch → Store → Measure Workflow for Event-Aware Gap Analysis

Fetch

Adjusted daily OHLC data
Corporate action records
Volume and turnover data
Trading halt and circuit metadata

Store

Partitioned time-series databases
Event metadata tables
Quality-flag columns

Measure

Filtered gap counts
Event-conditioned gap distributions
Sector-level aggregation

Impact Across Trading Horizons

Short-Term Horizon

In short horizons, gap statistics are dominated by event density and overnight information flow, with strong sensitivity to market-wide news.

Medium-Term Horizon

Over medium horizons, gap frequency stabilizes into sector-specific signatures reflecting disclosure intensity and regulatory exposure.

Long-Term Horizon

Long-term gap distributions capture structural changes such as index inclusion, ownership transitions, and liquidity regime shifts.

These effects describe the statistical behavior of gaps rather than actionable signals.

Advanced Quantitative Metrics, Data Architecture, and Production-Grade Integration

This final section completes the comprehensive framework by introducing advanced quantitative measures, formal mathematical definitions, production-grade data architectures, curated data sourcing methodologies, and a consolidated view of all Python libraries applicable to statistical gap classification in Indian equity markets.

Advanced Gap Magnitude Normalisation Metrics

Gap Percentage Normalisation

Raw gap magnitudes must be normalised to allow comparison across stocks with different price levels and volatility regimes.

Formal Mathematical Definition of Gap Percentage

 $G {pct}_{t} = \frac{|}{O} C t_{- 1} \times 100$

Python Implementation of Gap Percentage

def gap_percentage(open_price, prev_close):
    return abs(open_price - prev_close) / prev_close * 100

Gap-to-Range Ratio

This metric expresses the opening gap relative to the previous session’s total price range.

Formal Mathematical Definition of Gap-to-Range Ratio

 $G R R = \frac{|}{O} H t_{- 1} - L t_{- 1}$

Python Implementation of Gap-to-Range Ratio

def gap_to_range(open_price, prev_close, prev_high, prev_low):
    return abs(open_price - prev_close) / (prev_high - prev_low)

Volatility-Adjusted Gap Metrics

Average True Range Normalisation

ATR-based normalisation allows gap magnitude to be interpreted in the context of prevailing volatility.

Formal Mathematical Definition of Average True Range

 $T R t = \max (H t - L t, | H t - C t_{- 1} |, | L t - C t_{- 1} |)$ 

 $A T R t = \frac{1}{n} \sum_{i}^{=} T R t_{- i}$

Python Implementation of ATR

def compute_atr(df, window=14):
    high_low = df['H'] - df['L']
    high_close = (df['H'] - df['C'].shift(1)).abs()
    low_close = (df['L'] - df['C'].shift(1)).abs()
    true_range = pd.concat([high_low, high_close, low_close], axis=1).max(axis=1)
    return true_range.rolling(window).mean()

Normalized Gap Z-Score

Formal Mathematical Definition of Gap Z-Score

 $Z {gap}_{t} = \frac{O}{t} A T R t_{- 1}$

Python Implementation of Gap Z-Score

def gap_zscore(open_price, prev_close, prev_atr):
    return (open_price - prev_close) / prev_atr

Production-Grade Fetch → Store → Measure Architecture

Data Fetch Layer

Exchange bhavcopies (daily OHLCV)
Corporate action adjustment files
Index composition history
Trading halt and surveillance indicators
Macro-event calendars

Data Store Layer

Columnar storage using Parquet
Symbol/year partitioning
Immutable raw tables
Derived feature tables for gap metrics
Metadata flags for data quality

Data Measure Layer

Gap classification labels
Normalized gap magnitudes
Volatility-adjusted ratios
Event-conditioned aggregates
Sector and index-level rollups

Database Structure and Storage Design

Core Time-Series Schema

Trade date (primary key)
Symbol identifier
Open, High, Low, Close, Volume
Adjusted price fields
Corporate action factor

Gap Metadata Schema

Gap classification label
Gap magnitude
Gap percentage
Gap-to-range ratio
ATR-normalized gap score
Liquidity and validity flags

Python Libraries Used and Their Roles

Core Data Libraries

pandas – time-series manipulation, rolling windows, joins
numpy – vectorized numerical computation
polars – high-performance columnar analytics

Data Storage and Performance

pyarrow – Parquet IO and memory efficiency
duckdb – analytical SQL over Parquet

Market Data Access

yfinance – demonstration-grade OHLC fetch
exchange-native feeds – production-grade ingestion

Visualization and Diagnostics

matplotlib – distribution and density plots
seaborn – exploratory visualization

Curated Data Sourcing Methodologies

Primary exchange-distributed daily bhavcopies
Separate ingestion of corporate actions to avoid false gaps
Pre-open and auction metadata for open price integrity
Calendar-aligned macroeconomic event datasets

News Trigger Classification Framework

Macro-economic policy events
Corporate disclosures and earnings
Regulatory and compliance actions
Global market and currency shocks

Multi-Horizon Statistical Impact Summary

Short-Term Horizon

Gap statistics influence volatility clustering, liquidity discontinuity, and microstructure noise measurement.

Medium-Term Horizon

Aggregated gap metrics help identify regime shifts, disclosure intensity, and sectoral sensitivity.

Long-Term Horizon

Persistent full-gap patterns contribute to structural break analysis and long-horizon volatility modelling.

Conclusion and Industry Application

This four-part article established a complete, production-ready, Python-centric framework for the statistical classification of partial and full gaps in Indian equity markets. By prioritizing formal definitions, data integrity, and reproducible workflows, it enables robust quantitative research without reliance on subjective interpretation.

For organizations seeking enterprise-grade market data engineering, analytics pipelines, or quantitative research systems built in Python, TheUniBit delivers scalable solutions aligned with the methodologies described in this article.