Delivery Volume Data and What It Represents in Indian Cash Markets

Conceptual Theory: The Mechanics of Settlement in Indian Equities In the high-velocity environment of the Indian stock market, the distinction between a trade and a transfer of ownership is often blurred by the sheer volume of intraday activity. For an institutional trader or a retail investor, understanding “Delivery Volume” is akin to separating the signal […]

Table Of Contents
  1. Conceptual Theory: The Mechanics of Settlement in Indian Equities
  2. The Regulatory and Operational Framework of Delivery Data
  3. Statistical Distribution: How Delivery Volume Clusters in Indian Stocks
  4. Advanced Python Workflows: Processing Historical Delivery Archives
  5. The Final Compendium: Missed Algorithms, Databases, and Technical Ecosystem

Conceptual Theory: The Mechanics of Settlement in Indian Equities

In the high-velocity environment of the Indian stock market, the distinction between a trade and a transfer of ownership is often blurred by the sheer volume of intraday activity. For an institutional trader or a retail investor, understanding “Delivery Volume” is akin to separating the signal from the noise. While the total traded volume represents the aggregate of every buy and sell order executed during market hours, delivery volume represents the specific subset of those trades that result in the actual movement of shares between demat accounts. In the Indian cash market, this process is governed by the clearing and settlement mechanism, ensuring that for every share “delivered,” a corresponding monetary obligation is met on a T+1 basis.

The “Finality” of a trade is reached only when the settlement cycle completes. A significant portion of daily market activity is “squared off” within the same session, where traders speculate on price movements without the intention of holding the asset. These intraday fluctuations contribute to liquidity and price discovery but do not alter the underlying shareholding pattern of the company. In contrast, delivery volume captures the essence of settlement obligation. It is the quantification of shares that have transitioned from a “traded” state to a “settled” state. For a Python-specialized software development company, the challenge lies in building robust systems that can ingest massive datasets from the National Stock Exchange (NSE) and Bombay Stock Exchange (BSE) to isolate these settlement statistics from raw transaction flows.

Leading software firms like TheUniBit empower market participants by designing bespoke “Data Plumbing” solutions. These systems automate the extraction of the Daily Bhavcopy and the Security-wise Deliverable Positions reports, transforming flat files into queryable relational databases. By leveraging Python’s analytical ecosystem, such firms allow traders to move beyond manual spreadsheet tracking toward high-precision ETL (Extract, Transform, Load) pipelines. This structural clarity is essential for any quantitative framework that seeks to measure the actual equity changing hands in the Indian market, free from the behavioral biases often associated with volume interpretation.

The Traded vs. Delivered Identity

To mathematically define the relationship between market activity and settlement, we must establish the Traded vs. Delivered Identity. Total Traded Volume is the sum of all transactions, but it is functionally composed of two distinct components: the volume that is neutralized within the session and the volume that proceeds to the clearinghouse.

Mathematical Specification of Market Volume Decomposition

The formal definition of Total Traded Volume is expressed as the aggregate of Intraday and Delivery components:VT=VI+VD

Where the components are defined as:VD=i=1nSisuch that trade i is not squared-off

Detailed Explanation of Variables and Symbols:

  • VT (Total Traded Volume): The Resultant of the expression representing the total number of equity shares transacted during the trading hours on the exchange.
  • VI (Intraday Volume): The Term representing shares that were bought and sold (or sold and bought) by the same entity within the same trading session.
  • VD (Delivery Volume): The Summand representing the quantity of shares that remain as a net obligation for delivery to the buyer’s demat account.
  • Si: The individual share count for a specific trade ‘i’ that qualifies for settlement.
  • ∑ (Summation Operator): Denotes the total accumulation of all settled shares across ‘n’ number of successful delivery-based transactions.
  • = (Equality Operator): Indicates that the left-hand side (Total Volume) is exactly balanced by the sum of its constituent parts.
Python Implementation: Market Volume Decomposition
import pandas as pd

def decompose_market_volume(total_volume: int, delivery_volume: int) -> dict:
"""
Calculates the Intraday Volume component based on the Traded vs. Delivered Identity.

This function implements the standard market volume identity where the total
volume recorded on the exchange is composed of two exclusive components:
1. Delivery Volume: Shares marked for settlement (demat transfer).
2. Intraday Volume: Shares squared off within the same trading session.

Methodological Definition:
V(Intraday) = V(Total) - V(Delivery)

Parameters:
-----------
total_volume (int):
The gross volume of shares traded during the market session.
Must be a non-negative integer.

delivery_volume (int):
The quantity of shares marked for actual delivery/settlement.
Must be a non-negative integer and cannot exceed total_volume.

Returns:
--------
dict:
A dictionary containing the decomposed volume metrics:
- 'Total Traded Volume': The input total volume.
- 'Delivery Volume': The input delivery volume.
- 'Intraday Volume': The calculated speculative/squared-off volume.
- 'Delivery Percentage': The ratio of delivery to total volume (0.0 to 1.0).

Raises:
-------
ValueError:
If volumes are negative or if delivery_volume exceeds total_volume.
"""

# Validation: Ensure volumes are non-negative
if total_volume < 0 or delivery_volume < 0:
raise ValueError("Volume metrics cannot be negative.")

# Validation: Delivery volume cannot logically exceed total traded volume
if delivery_volume > total_volume:
raise ValueError(
f"Delivery volume ({delivery_volume}) cannot exceed "
f"Total volume ({total_volume})."
)

# Core Calculation: Apply the subtraction identity
# Intraday volume represents the liquidity provided by day-traders
intraday_volume = total_volume - delivery_volume

# Optional Metric: Calculate Delivery Percentage (Quality of buying)
# Handle division by zero if total_volume is 0
if total_volume > 0:
delivery_pct = round((delivery_volume / total_volume) * 100, 2)
else:
delivery_pct = 0.0

# Construct the result dictionary
stats = {
"Total Traded Volume": total_volume,
"Intraday Volume": intraday_volume,
"Delivery Volume": delivery_volume,
"Delivery Percentage": delivery_pct
}

return stats

# --- Example Usage ---

# Example 1: Standard NIFTY stock scenario
try:
# Hypothetical Data: 1.5 Million shares traded, 600k marked for delivery
market_data = decompose_market_volume(
total_volume=1_500_000,
delivery_volume=600_000
)

print("--- Market Volume Decomposition ---")
for metric, value in market_data.items():
print(f"{metric:<20}: {value}")

except ValueError as e:
print(f"Calculation Error: {e}")

# Example 2: High speculation scenario (Low delivery)
print("\n--- Speculative Scenario ---")
speculative_data = decompose_market_volume(1_000_000, 100_000)
print(f"Intraday Activity: {speculative_data['Intraday Volume']}")

Step-by-Step Code Summary

The Python script provided above is designed to decompose stock market volume data into its two fundamental components: settlement activity and speculative activity. This process is essential for technical analysts to understand the nature of market participation.

1. Methodological Definition

The code relies on the fundamental Market Volume Identity. In any given trading session, a trade is either settled (marked for delivery) or squared off (intraday). Therefore, the speculative component is derived by subtracting the delivery quantity from the gross traded quantity.

Mathematical Specification:

VIntraday=VTotalVDelivery

2. Input Validation Layer

Before performing calculations, the function validates the raw data to prevent logical errors:

  • It checks if the inputs Total Volume (VT) and Delivery Volume (VD) are non-negative integers.
  • It enforces the logical constraint where VD ≤ VT. If delivery volume exceeds total volume, the function raises a ValueError to alert the user of data corruption.

3. Calculation Execution

Once validated, the script executes the subtraction operation. It calculates the Intraday Volume (VI), which acts as a proxy for speculative interest or day-trading liquidity in the stock.

4. Metric Aggregation and Output

Finally, the function packages the results into a dictionary structure. This includes the original inputs alongside the calculated Intraday Volume and a derived Delivery Percentage metric. This output format is optimized for integration with data analysis libraries such as pandas for larger datasets.

The Regulatory and Operational Framework of Delivery Data

The reporting of delivery data in India is a byproduct of the Securities and Exchange Board of India (SEBI) guidelines regarding market transparency. Currently, the Indian equity market operates on a T+1 Settlement Cycle. This means that if a trade is executed on Monday, the shares must be delivered to the buyer and funds to the seller by Tuesday. The exchange-reported delivery volume is the final “Security-wise Deliverable Position” calculated after the close of market hours, once all intraday positions have been netted out by the clearing members.

It is critical to distinguish between Gross Delivery and Net Delivery. At the exchange level, the reported delivery volume is usually a “Gross” figure of client-level obligations. For example, if Client A buys 100 shares for delivery and Client B (under the same broker) sells 50 shares for delivery, the broker’s net obligation to the clearing corporation is 50 shares, but the exchange statistics will capture the total volume that resulted in a change of ownership across all clients. This article strictly defines delivery volume as these settled shares reported by the exchanges, avoiding behavioral inferences such as “conviction.” From a data perspective, we treat this as a structural trading statistic that describes the physical throughput of the market.

Workflow: Data Fetch, Store, and Measure

For a developer or quantitative researcher, the “Fetch-Store-Measure” workflow is the backbone of equity analysis. Python’s versatility makes it the preferred language for this automation.

Data Fetching (The Python Ingestion Layer)

The primary sources for this data are the official NSE and BSE websites. The NSE provides a daily file named “security_wise_deliverable_positions_DDMMYYYY.dat” or “.csv”. Automated fetching involves handling SSL handshakes, user-agent headers to mimic browser behavior, and decompression of files.

Python Script: Fetch_Securitywise_Delivery_Report_Script
import requests
import pandas as pd
from io import StringIO

def fetch_nse_delivery_data(date_str: str) -> pd.DataFrame:
"""
Fetches the security-wise delivery report (Full Bhavcopy) from the National
Stock Exchange (NSE) archives for a specific date.

This function automates the retrieval of the 'sec_bhavdata_full' CSV file,
which contains critical data points like Delivery Quantity and Delivery Percentage,
essential for analyzing genuine investment interest vs. intraday speculation.

Parameters:
-----------
date_str (str):
The target trading date in 'DDMMYYYY' format (e.g., '10012026').

Returns:
--------
pd.DataFrame:
A cleaned DataFrame containing specific columns:
[SYMBOL, SERIES, TURNOVER_LACS, DELIV_QTY, DELIV_PER].

Raises:
-------
ConnectionError:
If the NSE server returns a non-200 status code (e.g., 403 Forbidden
or 404 Not Found if the date is a holiday).
ValueError:
If the CSV content cannot be parsed.
"""

# URL Construction:
# NSE archives use a specific naming convention: sec_bhavdata_full_DDMMYYYY.csv
base_url = f"https://archives.nseindia.com/products/content/sec_bhavdata_full_{date_str}.csv"

# Header Mimicry:
# NSE servers often block automated scripts (bots). We use a standard Browser
# User-Agent to mimic a legitimate user request.
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'
}

print(f"Requesting data from: {base_url}")

try:
# Execute HTTP GET request with a timeout to prevent hanging
response = requests.get(base_url, headers=headers, timeout=15)

# Check HTTP Status Code
if response.status_code == 200:
# Parse CSV Data:
# We use StringIO to convert the raw string content into a file-like object
# that pandas can read.
raw_data = pd.read_csv(StringIO(response.text))

# Data Cleaning - Step 1: Strip Whitespace from Column Headers
# NSE CSV headers often contain leading/trailing spaces (e.g., ' SERIES').
raw_data.columns = [c.strip() for c in raw_data.columns]

# Data Cleaning - Step 2: Select Relevant Columns
# We focus on the delivery metrics essential for the analysis.
# Note: We ensure these keys match the stripped column names.
required_columns = [
'SYMBOL',
'SERIES',
'TURNOVER_LACS',
'DELIV_QTY',
'DELIV_PER'
]

# Filter the DataFrame to keep only necessary columns
# We use .copy() to avoid SettingWithCopy warnings later
clean_df = raw_data[required_columns].copy()

# Formatting: Ensure numeric columns are actually numeric
# (sometimes NSE uses '-' for missing data)
clean_df['DELIV_QTY'] = pd.to_numeric(clean_df['DELIV_QTY'], errors='coerce')
clean_df['DELIV_PER'] = pd.to_numeric(clean_df['DELIV_PER'], errors='coerce')

return clean_df

else:
# Handle Server Errors (404 for holidays, 403 for blocks)
raise ConnectionError(
f"Failed to fetch data. Status Code: {response.status_code}. "
"Check if the date is a holiday/weekend."
)

except Exception as e:
# Catch network or parsing errors
print(f"An error occurred: {str(e)}")
return pd.DataFrame() # Return empty DF on failure to prevent crash

# --- Example Usage ---

# Define the target date (e.g., 10th January 2026)
target_date = '10012026'

print(f"--- Fetching NSE Delivery Data for {target_date} ---")
df = fetch_nse_delivery_data(target_date)

if not df.empty:
print("\nData Fetched Successfully!")
print(f"Total Records: {len(df)}")

# Display first 5 rows
print("\nSample Data:")
print(df.head().to_string(index=False))

# Basic Analysis: Find top 3 stocks by Delivery Percentage
print("\nTop 3 Stocks by Delivery %:")
# Filter for Equity (EQ) series only
eq_df = df[df['SERIES'] == 'EQ']
top_delivery = eq_df.sort_values(by='DELIV_PER', ascending=False).head(3)
print(top_delivery[['SYMBOL', 'DELIV_PER']].to_string(index=False))

else:
print("No data returned. Please verify the date.")

Step-by-Step Code Summary

The Python script provided creates an automated pipeline to retrieve, clean, and structure market data directly from the National Stock Exchange (NSE) archives. This process converts raw CSV data into an analytical format suitable for calculating delivery-based insights.

1. Request Construction and Header Spoofing

The function accepts a date string and constructs a dynamic URL pointing to the sec_bhavdata_full file. Crucially, the code defines a custom User-Agent. This mimics a standard web browser (like Chrome), ensuring the NSE server does not reject the request as an automated bot (a 403 Forbidden error).

2. Data Acquisition and Parsing

The script executes an HTTP GET request. Upon receiving a successful response (Status 200), it utilizes the StringIO library to treat the raw text response as an in-memory file buffer. This allows the pandas library to read the data directly without needing to save a physical file to the disk first.

3. Header Sanitization

Raw data files from exchanges often contain inconsistent formatting. The script explicitly iterates through the column headers and performs a strip operation to remove leading and trailing whitespace. For example, converting ” SERIES” to “SERIES”. This prevents “KeyError” exceptions during column selection.

4. Column Filtration and Typing

The script filters the dataset to retain only the metrics required for delivery analysis:

  • SYMBOL: The stock ticker.
  • SERIES: The instrument category (e.g., EQ for Equity).
  • DELIV_QTY: The absolute number of shares marked for settlement.
  • DELIV_PER: The ratio of delivery volume to total volume.

The script also forces numeric conversion on the quantitative columns, handling potential non-numeric placeholders (like hyphens) by coercing them to NaN (Not a Number).

5. Output Delivery

Finally, the function returns a clean pandas DataFrame. If the request fails (e.g., if the date provided is a market holiday), the code raises a descriptive ConnectionError or returns an empty container, ensuring the main program does not crash unexpectedly.

Storage Architecture (The Database Design)

Storing delivery data requires a schema that supports time-series analysis. Using SQLAlchemy with a PostgreSQL backend is the industry standard for handling millions of rows efficiently. The database must account for corporate actions like stock splits; for instance, if a 1:10 split occurs, historical delivery volumes must be multiplied by 10 to remain comparable to current figures.

Measurement and Metric Calculation

The most fundamental measure derived from this data is the Delivery Percentage. This ratio allows us to normalize delivery volume against total traded volume, making it possible to compare the settlement intensity of a high-liquidity stock like Reliance with a low-liquidity small-cap stock.

Mathematical Specification of Delivery Percentage

The Delivery Percentage is the ratio of deliverable quantity to the total traded quantity, expressed as a percentage:Pdelivery=VDVT100

Python Function: Calculate_Delivery_Percentage
 def calculate_delivery_percentage(deliv_qty, traded_qty): """ Calculates the proportion of shares that went for delivery.

Variables:
deliv_qty (VD): Deliverable quantity of shares
traded_qty (VT): Total traded quantity of shares
"""
if traded_qty == 0:
    return 0.0

delivery_percentage = (deliv_qty / traded_qty) * 100
return round(delivery_percentage, 2)

Detailed Explanation of Variables and Symbols:

  • Pdelivery: The Resultant percentage representing the settlement intensity of a stock for a given session.
  • VD (Numerator): The deliverable quantity of shares reported by the exchange clearinghouse.
  • VT (Denominator): The total traded quantity, encompassing both intraday and delivery-based trades.
  • ⋅ 100: The Multiplier used to convert a decimal fraction into a percentage format.
  • Grouping Symbols (Parentheses): Ensures the division operation is completed prior to the multiplication by 100.

Impact on Trading Horizons

The interpretation of delivery data changes based on the trader’s timeframe, yet it remains anchored in the physical reality of share movement:

  • Short-Term (Intraday/Swing): High delivery volume in the previous session informs the “Available Float.” If a large percentage of shares were delivered, they are effectively “locked” in demat accounts for at least T+1, potentially reducing the immediate liquidity available for circular intraday churn in the next session.
  • Medium-Term (Positional): Traders monitor the “Delivery Baseline.” A structural increase in the average daily delivery volume over a 20-day period suggests a shift in the settlement requirements of the stock, indicating that more participants are opting for ownership over speculation.
  • Long-Term (Investing): Large-scale investors track aggregate settlement volumes over years. Periods of consistently high delivery during price consolidation often precede shifts in the demat “holding” patterns of the broader market, as tracked by systems developed by TheUniBit.

Statistical Distribution: How Delivery Volume Clusters in Indian Stocks

Delivery volume is not uniformly distributed across the Indian equity landscape. In the cash market, different categories of stocks—ranging from high-cap index heavyweights to volatile small-cap securities—exhibit distinct settlement signatures. These signatures are often a function of the stock’s liquidity profile and the nature of market participants involved. By analyzing the statistical clustering of delivery data, Python-driven systems can identify “structural norms” for specific stocks, allowing traders to detect when a current session’s delivery deviates from historical expectations.

In Indian markets, “Blue Chip” stocks (typically found in the NIFTY 50) often maintain a high delivery-to-trade ratio, frequently exceeding 50% to 60%. This is because these stocks are staples in institutional portfolios and Mutual Fund schemes where ownership transfer is the primary objective. Conversely, “Beta-heavy” mid-cap stocks may show massive traded volumes with delivery percentages as low as 15% to 20%, indicating that the bulk of the activity is driven by intraday speculators and high-frequency trading (HFT) algorithms that square off positions before the closing bell.

High-Volume vs. High-Delivery Categorization

A common misconception in volume analysis is equating high traded volume with high delivery. To differentiate these, we use a categorization logic that separates “Activity” from “Settlement.” A stock can be in a state of high activity (high $V_T$) but low settlement (low $V_D$), or vice versa. This distinction is vital for understanding the physical liquidity of a security.

Mathematical Specification of Delivery Intensity Index (DII)

To quantify the concentration of delivery relative to market activity, we define the Delivery Intensity Index ($DII$). This metric measures the deviation of the current delivery volume from its rolling mean, normalized by the total trading activity.DII=VDμVDVT

Detailed Explanation of Variables and Symbols:

  • DII: The Resultant index value. A positive value indicates that the current session has higher-than-average delivery relative to total trades.
  • VD: The current session’s deliverable quantity (Numerator component).
  • μVD (Mu): The Arithmetic Mean of the delivery volume over a specified period (e.g., 20 days). It acts as the statistical baseline.
  • VT (Denominator): The Total Traded Volume, used here as a scaling factor to normalize the deviation.
  • – (Subtraction Operator): Determines the raw “Excess Delivery” by finding the difference between current and average settlement.
  • Indices (Subscripts): Used to distinguish between Delivery Volume (D) and Total Volume (T).
Python Implementation: Categorize_Stocks_By_Delivery_Ratio_Algorithm
import pandas as pd
import numpy as np

def categorize_delivery_profile(df: pd.DataFrame, window: int = 20) -> pd.DataFrame:
"""
Categorizes stocks based on delivery concentration and calculates the
Delivery Intensity Index (DII).

This function adds context to raw volume data by comparing current delivery
volume against a historical baseline (Moving Average). It helps identify
if buying interest is 'Fresh Accumulation' or just 'Routine Activity'.

Methodological Definition:
DII = (Current Delivery Volume - Mean Delivery Volume) / Current Total Volume

Parameters:
-----------
df (pd.DataFrame):
Input DataFrame containing:
- 'SYMBOL': Stock identifier
- 'V_T': Total Traded Volume
- 'V_D': Delivery Volume (Settled Quantity)

window (int):
The lookback period for the rolling mean calculation (default 20 days).

Returns:
--------
pd.DataFrame:
A subset DataFrame with calculated metrics:
- 'DELIV_PER': Delivery Percentage (0-100)
- 'DII': Delivery Intensity Index (Normalized deviation)
- 'PROFILE': Classification (High Settlement, Neutral, High Churn)
"""

# Pre-computation Check: Ensure required columns exist
required_cols = {'SYMBOL', 'V_T', 'V_D'}
if not required_cols.issubset(df.columns):
missing = required_cols - set(df.columns)
raise ValueError(f"Input DataFrame is missing columns: {missing}")

# 1. Calculate Delivery Percentage (The baseline metric)
# We use .copy() to prevent SettingWithCopy warnings on slices
df = df.copy()

# Handle division by zero if V_T is 0
df['DELIV_PER'] = np.where(
df['V_T'] > 0,
(df['V_D'] / df['V_T']) * 100,
0.0
)

# 2. Calculate Rolling Mean of Delivery Volume (Baseline)
# We group by SYMBOL to ensure the moving average respects stock boundaries.
# The transform function maintains the original index shape.
df['MEAN_VD'] = df.groupby('SYMBOL')['V_D'].transform(
lambda x: x.rolling(window=window, min_periods=1).mean()
)

# 3. Calculate Delivery Intensity Index (DII)
# DII measures how 'abnormal' the delivery volume is relative to total liquidity.
# Positive DII = Delivery is higher than average (Potential Accumulation)
# Negative DII = Delivery is lower than average (Potential Weakness)
df['DII'] = np.where(
df['V_T'] > 0,
(df['V_D'] - df['MEAN_VD']) / df['V_T'],
0.0
)

# 4. Categorization Logic using Vectorized Operations
# We use np.select for high performance classification instead of .apply()
conditions = [
(df['DELIV_PER'] > 60), # Condition 1: High Delivery
(df['DELIV_PER'] >= 30) & (df['DELIV_PER'] <= 60), # Condition 2: Neutral
(df['DELIV_PER'] < 30) # Condition 3: Low Delivery (High Speculation)
]

choices = ['High Settlement', 'Neutral', 'High Churn']

# Apply logic; 'Unknown' catches any edge cases (like NaNs)
df['PROFILE'] = np.select(conditions, choices, default='Unknown')

# Return specific columns for analysis
return df[['SYMBOL', 'DELIV_PER', 'DII', 'PROFILE']]

# --- Example Usage ---

# Create dummy data for 25 days for a single stock 'RELIANCE'
# to demonstrate the rolling window calculation.
np.random.seed(42)
days = 25
data = {
'SYMBOL': ['RELIANCE'] * days,
# Random total volume between 1M and 2M
'V_T': np.random.randint(1_000_000, 2_000_000, days),
}
# Create delivery volume as a fraction of total (approx 20% to 70%)
data['V_D'] = (data['V_T'] * np.random.uniform(0.2, 0.7, days)).astype(int)

df_sample = pd.DataFrame(data)

print("--- Processing Delivery Profile ---")
result_df = categorize_delivery_profile(df_sample, window=5)

# Display the last 5 days to see the DII in action
print(result_df.tail(5).to_string(index=False))

# Check the profile distribution
print("\n--- Profile Distribution ---")
print(result_df['PROFILE'].value_counts())

Step-by-Step Code Summary

The Python script calculates advanced delivery metrics to classify stocks based on their settlement behavior. Unlike simple delivery percentage, this script introduces a “Delivery Intensity Index” (DII) to contextualize buying pressure against historical averages.

1. Delivery Percentage Calculation

The script first computes the standard delivery ratio. This indicates what portion of the day’s activity resulted in actual ownership transfer versus intraday squaring off.

Delivery %=VDeliveryVTotal×100

2. Historical Baseline (Rolling Mean)

To detect anomalies, the code calculates a rolling average of Delivery Volume (e.g., a 20-day mean). This is done using a Group-Transform operation, which ensures that the moving average is calculated independently for each unique stock Symbol in the dataset.

3. Delivery Intensity Index (DII) Formulation

The script computes the DII, a proprietary metric derived from the deviation of current delivery from the mean, normalized by total liquidity. A high positive DII suggests an unusual surge in delivery-based buying.

DII=VDeliveryMean(VDelivery)VTotal

4. Vectorized Categorization

Finally, the code classifies each record into one of three profiles using NumPy Select. This method is computationally superior to standard loops (like `if-else` or `.apply()`) for large financial datasets:

  • High Settlement (>60%): Indicates strong conviction; investors are taking delivery.
  • Neutral (30% – 60%): Balanced market activity.
  • High Churn (<30%): Dominated by intraday speculation with little long-term interest.

Sectoral Benchmarking: Comparing Delivery “Norms”

Every sector in the Indian market has a unique “Delivery Signature.” For instance, the IT Sector (e.g., TCS, Infosys) often exhibits stable delivery patterns due to heavy institutional holding. In contrast, the Banking and Financial Services (BFSI) sector, particularly high-beta stocks, might show erratic delivery spikes during periods of high news flow. By using Python to benchmark these sectors, a software firm can build automated alerts that trigger only when a stock breaks its own sectoral norm, rather than a generic market-wide threshold.

Workflow: Measure → Compare → Alert

The workflow for sectoral analysis involves aggregating stock-level delivery data into sectoral baskets. Using Python’s groupby function, we calculate the median delivery percentage for a sector and compare individual constituents against this benchmark. This helps in identifying stocks that are undergoing an unusual “Settlement Event” within their peer group.

Advanced Metric: The Delivery Quantity Delta

While the percentage is useful, the absolute change in the number of settled shares provides insight into the scale of settlement obligation. We call this the Delivery Quantity Delta.

Mathematical Specification of Delivery Quantity Delta

The Delta measures the session-over-session growth in settled shares:ΔVD=VD,tVD,t1

Detailed Explanation of Variables and Symbols:

  • ΔVD (Delta V_D): The Resultant value representing the absolute change in delivery volume.
  • VD,t: The delivery volume for the current trading day ‘t’.
  • VD,t-1: The delivery volume for the immediately preceding trading day ‘t-1’.
  • Δ (Delta Symbol): An operator denoting the difference or change in a variable’s value over a temporal interval.
  • t, t-1 (Subscripts): Represent the discrete time indices in the time-series dataset.
Python Algorithm: Calculate_Delivery_Quantity_Delta
import pandas as pd
import numpy as np

def calculate_delivery_delta(df: pd.DataFrame) -> pd.DataFrame:
"""
Calculates the change in delivery volume between the current session (t)
and the previous session (t-1).

This function measures the momentum of delivery accumulation. A positive delta
indicates increasing institutional or serious interest, while a negative delta
suggests waning delivery conviction.

Methodological Definition:
Delta(V_D) = V_D(t) - V_D(t-1)

Parameters:
-----------
df (pd.DataFrame):
Input DataFrame containing:
- 'SYMBOL': Stock identifier.
- 'DATE': Trading date (must be sortable, preferably datetime).
- 'V_D': Delivery Volume (Numeric).

Returns:
--------
pd.DataFrame:
The original DataFrame with an added 'DELIVERY_DELTA' column.
Note: The first record for each symbol will have a NaN delta.

Raises:
-------
ValueError:
If required columns are missing from the input DataFrame.
"""

# Validation: Check for necessary columns
required_columns = {'SYMBOL', 'DATE', 'V_D'}
if not required_columns.issubset(df.columns):
missing = required_columns - set(df.columns)
raise ValueError(f"Missing required columns: {missing}")

# Data Preparation:
# Sorting is critical for time-series difference calculations.
# We sort by SYMBOL first to group stocks, then by DATE to order time.
df = df.sort_values(by=['SYMBOL', 'DATE'], ascending=[True, True])

# Core Calculation: Grouped Difference
# 1. groupby('SYMBOL'): Isolates each stock so calculations don't bleed across symbols.
# 2. ['V_D']: Selects the Delivery Volume column.
# 3. .diff(): Calculates the discrete difference (Current Row - Previous Row).
df['DELIVERY_DELTA'] = df.groupby('SYMBOL')['V_D'].diff()

return df

# --- Example Usage ---

# Create dummy data for two stocks over 3 days
data = {
'DATE': pd.to_datetime(['2026-01-01', '2026-01-02', '2026-01-03',
'2026-01-01', '2026-01-02', '2026-01-03']),
'SYMBOL': ['INFY', 'INFY', 'INFY', 'TCS', 'TCS', 'TCS'],
'V_D': [50000, 55000, 40000, 20000, 25000, 30000]
# INFY: +5k, -15k | TCS: +5k, +5k
}

df_raw = pd.DataFrame(data)

print("--- Raw Data ---")
print(df_raw.to_string(index=False))

# Apply the function
df_processed = calculate_delivery_delta(df_raw)

print("\n--- Calculated Delivery Delta ---")
# Fill NaN with 0 for cleaner display (optional)
df_display = df_processed.copy()
df_display['DELIVERY_DELTA'] = df_display['DELIVERY_DELTA'].fillna(0)

print(df_display.to_string(index=False))

# Analysis of results
print("\n--- Interpretation ---")
last_row = df_display.iloc[-1]
if last_row['DELIVERY_DELTA'] > 0:
print(f"{last_row['SYMBOL']}: Delivery Volume increased by {last_row['DELIVERY_DELTA']} shares.")

Step-by-Step Code Summary

The Python function provided computes the period-over-period change in delivery volume. This metric, known as “Delivery Delta,” is crucial for tracking the acceleration or deceleration of buying interest in a specific security.

1. Temporal Ordering

Before any calculation can occur, the data must be strictly ordered. The function sorts the dataset first by Symbol (to keep specific stock data together) and then by Date. This ensures that the mathematical operation compares a specific trading session ($t$) directly against its immediate predecessor ($t-1$).

2. Isolation via Grouping

To prevent calculation errors where the data of one stock spills over into another (e.g., subtracting TCS’s volume from Infosys’s volume), the script uses a Group By operation on the ‘SYMBOL’ column. This creates isolated logical partitions for every unique stock ticker.

3. Discrete Difference Calculation

The function applies a vectorised difference operation within each group. It subtracts the previous day’s delivery volume from the current day’s volume.

Mathematical Specification:

ΔVD=VD(t)VD(t1)

4. Result Interpretation

The output column ‘DELIVERY_DELTA’ provides a signed integer:

  • Positive Value (+): Indicates rising delivery volume (Accumulation gaining strength).
  • Negative Value (-): Indicates falling delivery volume (Accumulation slowing down).
  • NaN (Not a Number): Appears on the first record of every stock, as there is no prior data point ($t-1$) to subtract from.

Impact on Trading: Short, Medium, and Long-Term Perspectives

  • Short-Term: A massive spike in DELIVERY_DELTA combined with low price volatility often suggests a “Transfer of Ownership” in the cash market. Short-term traders use this to gauge if a stock’s available float is being tightened, which can lead to higher price sensitivity in subsequent sessions.
  • Medium-Term: Positional traders look for “Delivery Stability.” If the DII (Delivery Intensity Index) stays consistently positive over 10 sessions, it indicates a structural period of settlement where more shares are exiting the “trading pool” and entering “holding” status.
  • Long-Term: For long-term investors, sectoral benchmarking is crucial. A company consistently maintaining a delivery percentage higher than its sectoral median is often a sign of institutional absorption, a metric that software solutions from TheUniBit track to help investors identify fundamental settlement trends.

Advanced Python Workflows: Processing Historical Delivery Archives

Analyzing delivery volume in a single session provides a vertical snapshot of market activity, but the true statistical power of this metric is unlocked through longitudinal analysis. In the Indian cash market, historical delivery data is archived by the exchanges in compressed formats that require specialized Python workflows to decompress, parse, and normalize. For a software development firm like TheUniBit, building a “Historical Delivery Trend Analyzer” involves managing a massive time-series database where each entry represents the settlement outcome of thousands of stocks over several years.

The primary challenge in historical processing is the structural evolution of exchange files. Over the last decade, the NSE and BSE have modified their reporting formats multiple times, shifting from fixed-width DAT files to modern, comma-separated CSVs. A robust Python pipeline must be “format-aware,” utilizing flexible parsing logic to ensure data continuity. Furthermore, since analyzing ten years of daily delivery data involves processing approximately 2,500 individual files, the use of Multiprocessing is essential to reduce the execution time from hours to minutes.

The “Bhavcopy” Parser: Decoupling Settlement from Trade

The Bhavcopy (literally “Price Copy”) is the comprehensive record of all securities traded in a session. However, the specific delivery data is often sequestered in a separate report: the “Security-wise Deliverable Positions.” A high-quality parser must join these two datasets using a composite key (Symbol + Series + Date) to create a unified view of the stock’s traded value and its final settlement quantity.

Mathematical Specification of Delivery Persistence Ratio (DPR)

To evaluate the stability of settlement patterns over time, we introduce the Delivery Persistence Ratio (DPR). This metric quantifies the consistency of delivery volume relative to its historical volatility. A high DPR suggests that a stock’s settlement behavior is predictable and not subject to erratic intraday speculative spikes.DPR=μVDσVD+ε

Detailed Explanation of Variables and Symbols:

  • DPR (Delivery Persistence Ratio): The Resultant of the expression, representing a signal-to-noise ratio for settled shares.
  • μVD (Numerator): The Arithmetic Mean of the delivery volume over the chosen historical window (e.g., 252 trading days).
  • σVD (Denominator component): The Standard Deviation of delivery volume, measuring the dispersion of settlement data.
  • ε (Epsilon): A very small Constant used to ensure mathematical stability in cases where the standard deviation is zero.
  • + (Addition Operator): Combines the volatility measure with the safety constant in the denominator.
  • Indices (Subscripts): Specify that the statistics are derived specifically from the Delivery Volume (D) dataset.
Python Workflow: Historical_Delivery_Trend_Analyzer
import zipfile
import pandas as pd
import numpy as np
import os
from multiprocessing import Pool, cpu_count
import shutil

# ==========================================
# PART 1: MOCK DATA GENERATION (For Execution)
# ==========================================
def setup_dummy_environment(folder_name='archives'):
"""Creates dummy zipped CSVs to make this script executable."""
if os.path.exists(folder_name):
shutil.rmtree(folder_name)
os.makedirs(folder_name)

# Create 5 dummy daily reports
for i in range(1, 6):
date_str = f"2026-01-0{i}"
# Dummy data for 3 stocks
data = {
'SYMBOL': ['RELIANCE', 'TCS', 'INFY'],
'TRADED_QTY': np.random.randint(100000, 500000, 3),
'DELIV_QTY': np.random.randint(20000, 100000, 3)
}
df = pd.DataFrame(data)

# Save as CSV, then Zip it
csv_name = f"bhavcopy_{date_str}.csv"
df.to_csv(csv_name, index=False)

zip_name = os.path.join(folder_name, f"bhavcopy_{date_str}.zip")
with zipfile.ZipFile(zip_name, 'w') as zf:
zf.write(csv_name)

os.remove(csv_name) # Cleanup raw CSV
print(f"--- Created dummy archives in '{folder_name}' ---")

# ==========================================
# PART 2: CORE LOGIC (User Request)
# ==========================================

def parse_archive_file(file_path):
"""
Parses a single zipped delivery report and returns a DataFrame.

This function is designed to be the 'worker' unit in a multiprocessing pool.
It handles the extraction and cleaning of a single day's data in isolation.
"""
try:
with zipfile.ZipFile(file_path, 'r') as z:
# Iterate through files in the zip (Assumes standard NSE single-file format)
for filename in z.namelist():
if filename.endswith('.csv'):
with z.open(filename) as f:
df = pd.read_csv(f)

# Cleaning: Standardize headers to handle variations across years
# Stripping whitespace and converting to UPPER CASE is crucial
df.columns = df.columns.str.strip().str.upper()

# Return only necessary columns to minimize memory usage during IPC
# (Inter-Process Communication)
return df[['SYMBOL', 'DELIV_QTY', 'TRADED_QTY']]
except Exception as e:
print(f"Error processing {file_path}: {e}")
return pd.DataFrame() # Return empty DF on failure

def calculate_persistence(series):
"""
Mathematical implementation of Delivery Persistence Ratio (DPR).

This calculates the inverse coefficient of variation.
High Persistence = Consistent delivery volume (High Mean, Low Variance).
Low Persistence = Erratic delivery spikes (High Variance).

Formula: Mean / (Standard Deviation + Epsilon)
"""
mu = series.mean()
sigma = series.std()
epsilon = 1e-9 # Mathematical stabilizer to prevent DivisionByZero errors

return mu / (sigma + epsilon)

def process_archives_parallel(archive_folder):
"""Orchestrates the parallel processing pipeline."""

# 1. Identify all target files
files = [os.path.join(archive_folder, f)
for f in os.listdir(archive_folder)
if f.endswith('.zip')]

print(f"Found {len(files)} archives. Starting parallel processing...")

# 2. Initialize Multiprocessing Pool
# We use roughly the number of available CPU cores
with Pool(processes=cpu_count()) as pool:
# map() distributes the 'parse_archive_file' function across the 'files' list
results = pool.map(parse_archive_file, files)

# 3. Aggregation
# Combine the list of independent DataFrames into one large panel dataset
print("Merging results...")
full_data = pd.concat(results, ignore_index=True)

return full_data

# ==========================================
# PART 3: MAIN EXECUTION
# ==========================================

if __name__ == '__main__':
# Step A: Setup dummy data (So code runs out-of-the-box)
setup_dummy_environment('archives')

# Step B: Run the parallel extraction
df_all = process_archives_parallel('archives')

if not df_all.empty:
print("\n--- Aggregated Data (Sample) ---")
print(df_all.head())

# Step C: Apply Analytical Metric (Persistence)
print("\n--- Calculating Delivery Persistence (Signal-to-Noise) ---")

# Group by SYMBOL to calculate persistence for each stock independently
persistence_scores = df_all.groupby('SYMBOL')['DELIV_QTY'].apply(calculate_persistence)

print(persistence_scores)

# Cleanup (Optional)
shutil.rmtree('archives')
else:
print("No data processed.")

I have added a “Mock Data Generator” section to create temporary zip files so you can run this code immediately without needing an external dataset.

Step-by-Step Code Summary

The Python script implements a high-performance data pipeline designed to ingest historical market archives. By leveraging Parallel Computing, it overcomes the I/O bottlenecks typically associated with reading thousands of zipped daily reports.

1. Compressed Data Extraction (Worker Function)

The parse_archive_file function acts as an isolated worker unit. It accesses a zipped archive without fully decompressing it to the disk, reading the data stream directly into memory using zipfile. It performs immediate schema normalization (stripping and uppercasing headers) to ensure column consistency across different years of data.

2. Parallel Map-Reduce Architecture

Instead of processing files sequentially (one after another), the script utilizes the system’s multiple CPU cores.

  • Map Phase: The Pool.map() method distributes the file list across available processors. If you have 4 cores, 4 files are parsed simultaneously.
  • Reduce Phase: The results from these parallel processes return as a list of DataFrames, which are then concatenated into a single master dataset.

3. Methodological Definition: Persistence Ratio

The script introduces a metric called the Delivery Persistence Ratio (DPR). This measures the quality of accumulation. It is mathematically defined as the “Inverse Coefficient of Variation.” Ideally, we seek stocks with high average delivery ($\mu$) and low volatility in that delivery ($\sigma$).

Mathematical Specification:

DPR=μσ+ε

Where:

  • μ (Mu): Mean Delivery Volume over the period.
  • σ (Sigma): Standard Deviation of Delivery Volume.
  • ε (Epsilon): A negligible constant ($10^{-9}$) added to prevent division-by-zero errors in case of zero variance.

4. Aggregation and Grouping

Finally, the persistence logic is applied using a Group-Apply pattern. The dataset is grouped by unique stock symbols, and the DPR formula is applied to each stock’s historical delivery vector, producing a final “Quality Score” for every security.

Data Store: Designing a Time-Series SQL Schema

When storing historical archives, a flat-file system becomes unmanageable. A specialized SQL structure, optimized for Python’s pandas.to_sql method, is required. The schema must enforce data integrity through unique constraints on Symbol and Trade_Date, preventing duplicate entries during incremental daily updates.

Impact on Trading: Short, Medium, and Long-Term Perspectives

  • Short-Term (Mean Reversion): Historical analyzers identify “Delivery Exhaustion.” If a stock’s delivery volume is $3\sigma$ away from its historical mean, short-term traders anticipate a reversion to the mean, as such extreme settlement events are rarely sustainable across multiple sessions.
  • Medium-Term (Regime Detection): Using the DPR, positional traders detect shifts in “Settlement Regimes.” A rising DPR often correlates with a stock moving from a speculative phase (high intraday churn) to an accumulation phase (steady delivery).
  • Long-Term (Structural Analysis): Institutional software solutions, like those provided by TheUniBit, use historical archives to map out the “Long-Term Float Lock-up.” By aggregating ten years of delivery, one can estimate the percentage of total outstanding shares that have effectively moved into long-term demat holdings, providing a clear picture of the market’s underlying supply-demand structural health.

The Final Compendium: Missed Algorithms, Databases, and Technical Ecosystem

The final stage of mastering delivery volume data involves the integration of high-level quantitative metrics into a scalable Python infrastructure. While previous sections established the conceptual and historical framework, this compendium provides the rigorous mathematical specifications and the architectural blueprint required to deploy an authoritative delivery-tracking system. In the Indian stock market, where liquidity and settlement dynamics can shift rapidly due to regulatory changes or news cycles, a software solution must be both mathematically sound and data-resilient.

For investors and traders, these algorithms serve as the final filter to separate routine market noise from significant settlement anomalies. By utilizing the Python-friendly APIs and structured data sources listed below, firms like TheUniBit enable a sophisticated “Fetch-Store-Measure” workflow that operates with institutional-grade precision.

Advanced Quantitative Metrics

Beyond simple percentages, advanced traders use normalized metrics to compare settlement intensity across different price regimes and volatility windows. The following algorithms represent the “gold standard” for quantitative delivery analysis.

Mathematical Specification of Normalized Delivery Volume (NDV)

The NDV is a descriptive statistic that measures current settlement activity against a historical baseline (typically a 20-day moving average), providing a relative measure of volume “surge.”NDV=VD,i1Nj=1NVD,ij

Python Implementation: Normalized_Delivery_Volume_Algorithm
import pandas as pd
import numpy as np

def calculate_ndv(df: pd.DataFrame, window: int = 20) -> pd.DataFrame:
"""
Calculates the Normalized Delivery Volume (NDV).

NDV is a relative strength metric for volume. Instead of looking at absolute
shares (which vary wildly between stocks), NDV measures the current delivery
volume against its own recent historical average.

Interpretation:
- NDV > 1.0: Current accumulation is higher than the recent trend.
- NDV > 2.0: Significant volume anomaly (High conviction buying/selling).
- NDV < 1.0: Below-average institutional activity.

Methodological Definition:
NDV = Current Delivery Volume / Moving Average(Delivery Volume, n)

Parameters:
-----------
df (pd.DataFrame):
Input DataFrame containing:
- 'SYMBOL': Stock identifier.
- 'V_D': Delivery Volume (Numeric).

window (int):
The lookback period for the moving average (default 20 days).

Returns:
--------
pd.DataFrame:
DataFrame with the added 'MA_VD' (Baseline) and 'NDV' (Ratio) columns.
"""

# Validation: Ensure required columns exist
if 'V_D' not in df.columns or 'SYMBOL' not in df.columns:
raise ValueError("Input DataFrame must contain 'SYMBOL' and 'V_D' columns.")

# Data Preparation: Sort to ensure rolling window respects time order
# (Assuming the dataframe might not be sorted; if 'DATE' existed we would sort by it)
# Here we assume the input is sequentially ordered or we rely on index.

# 1. Calculate Rolling Mean of Delivery Volume (The Baseline)
# We use .transform() instead of .apply() because we want to broadcast
# the scalar rolling mean back to every row in the original DataFrame,
# preserving the original shape and index.
df['MA_VD'] = df.groupby('SYMBOL')['V_D'].transform(
lambda x: x.rolling(window=window, min_periods=1).mean()
)

# 2. Calculate Normalized Delivery Volume (The Ratio)
# We handle division by zero (though mean of positive volumes is rarely 0)
# and fill NaNs with 0 to maintain data integrity.
df['NDV'] = np.where(
df['MA_VD'] > 0,
df['V_D'] / df['MA_VD'],
0.0
)

# Rounding for cleaner readability
df['NDV'] = df['NDV'].round(2)

return df

# --- Example Usage ---

# Create dummy data for a single stock 'TATASTEEL' over 30 days
np.random.seed(42)
days = 30
data = {
'SYMBOL': ['TATASTEEL'] * days,
# Generate random delivery volume between 1M and 5M
'V_D': np.random.randint(1_000_000, 5_000_000, days)
}

# Introduce a massive spike on the last day to simulate Institutional Buying
data['V_D'][-1] = 15_000_000

df_sample = pd.DataFrame(data)

print("--- Calculating Normalized Delivery Volume ---")
result_df = calculate_ndv(df_sample, window=20)

# Display the last 5 days to see the spike
print(result_df.tail(5).to_string(index=False))

# Interpretation of the spike
last_record = result_df.iloc[-1]
if last_record['NDV'] > 2.0:
print(f"\n[ALERT] Anomaly Detected! Current Delivery is {last_record['NDV']}x the 20-day average.")

Step-by-Step Code Summary

The Python function provided computes the Normalized Delivery Volume (NDV). This indicator standardizes volume data, allowing analysts to detect significant accumulation or distribution events regardless of the absolute liquidity of the stock.

1. Establishing the Baseline (Moving Average)

To determine if today’s volume is “high” or “low,” the code first establishes a baseline. It calculates the Simple Moving Average (SMA) of the delivery volume over a specified window (typically 20 days, representing one trading month). The script utilizes a Group-Transform logic to ensure that this average is calculated separately for each stock symbol without mixing data.

2. Methodological Definition: The Normalization Ratio

The core calculation divides the current session’s delivery volume by the historical baseline. This transforms absolute share counts into a relative ratio (a scalar multiplier).

Mathematical Specification:

NDV=VDMA(VD,n)

Where:

  • VD: The Delivery Volume for the current session.
  • MA(VD, n): The Moving Average of Delivery Volume over n periods.

3. Handling Edge Cases

The code includes specific logic (using np.where) to handle potential “Division by Zero” errors. This is necessary for illiquid stocks where the historical average volume might be zero.

4. Interpretation of Output

The resulting ‘NDV’ column provides an immediate signal strength:

  • NDV ≈ 1.0: Normal market activity (Status Quo).
  • NDV > 2.0: Anomaly. Indicates aggressive institutional participation (Accumulation).
  • NDV < 0.5: Drying up of liquidity (Disinterest).

Detailed Explanation of Variables:

  • NDV: The Resultant ratio. A value of 2.0 indicates delivery is twice the historical average.
  • VD,i (Numerator): The deliverable quantity for the current observation ‘i’.
  • N (Denominator parameter): The Window Length, representing the number of past sessions used for smoothing.
  • ∑ (Summation): The total count of delivered shares over the interval N.
  • VD,i-j: The lagged delivery volume for the session occurring ‘j’ days prior to ‘i’.

Mathematical Specification of Delivery Z-Score

The Z-Score identifies “Settlement Anomalies” by measuring how many standard deviations the current delivery volume is from its historical mean. This is crucial for detecting outliers without the bias of absolute stock price or share count.Z=VDμVDσVD

Python Implementation: Z-Score_Delivery_Calculator
import pandas as pd
import numpy as np

def calculate_delivery_zscore(df: pd.DataFrame, window: int = 50) -> pd.DataFrame:
"""
Calculates the statistical Z-score for delivery volume.

This function normalizes the delivery volume using statistical standardization.
It identifies anomalies by measuring how many standard deviations the current
volume is away from the mean.

Interpretation:
- Z > 3: Extremely Rare Accumulation (3 Sigma event).
- Z between -1 and 1: Normal variation (Noise).
- Z < -2: Significant drop in interest.

Methodological Definition:
Z = (Current Volume - Rolling Mean) / Rolling StdDev

Parameters:
-----------
df (pd.DataFrame):
Input containing 'SYMBOL' and 'V_D' (Delivery Volume).
window (int):
Lookback period (default 50 days).

Returns:
--------
pd.DataFrame:
DataFrame with added 'MEAN_VD', 'STD_VD', and 'Z_SCORE' columns.
"""

# Validation: Ensure required columns exist
if 'V_D' not in df.columns or 'SYMBOL' not in df.columns:
raise ValueError("DataFrame must contain 'SYMBOL' and 'V_D'.")

# 1. Create a Group Object
# We group by SYMBOL to ensure statistics are calculated per stock,
# preventing data contamination between different companies.
grouped = df.groupby('SYMBOL')['V_D']

# 2. Calculate Rolling Mean (Mu)
# Using transform preserves the original index, allowing direct assignment
# back to the main DataFrame.
df['MEAN_VD'] = grouped.transform(
lambda x: x.rolling(window=window, min_periods=window).mean()
)

# 3. Calculate Rolling Standard Deviation (Sigma)
df['STD_VD'] = grouped.transform(
lambda x: x.rolling(window=window, min_periods=window).std()
)

# 4. Calculate Z-Score
# We handle the potential division by zero (if variance is 0) by replacing
# 0s in STD with NaN or a small epsilon, though pandas handles this by returning inf/NaN.
# Here, we allow NaNs where data is insufficient (first 'window' rows).
df['Z_SCORE'] = (df['V_D'] - df['MEAN_VD']) / df['STD_VD']

# Optional: Fill NaNs with 0 if you want a continuous plot,
# but strictly speaking, Z-score is undefined for the first N days.
# df['Z_SCORE'] = df['Z_SCORE'].fillna(0)

return df

# --- Example Usage ---

# Create dummy data for 'HDFCBANK'
np.random.seed(101)
days = 100
data = {
'SYMBOL': ['HDFCBANK'] * days,
# Normal distributed volume around 100k
'V_D': np.random.normal(loc=100000, scale=20000, size=days).astype(int)
}

# Create a dataframe
df_test = pd.DataFrame(data)

# Inject an "Anomaly" (Huge Accumulation) at the end
df_test.iloc[-1, df_test.columns.get_loc('V_D')] = 300000 # 3x the mean

print("--- Calculating Delivery Z-Score ---")
# Using a shorter window (20) for this small dataset example
result = calculate_delivery_zscore(df_test, window=20)

# Display the last 5 rows to see the anomaly
print(result[['SYMBOL', 'V_D', 'MEAN_VD', 'Z_SCORE']].tail(5).to_string(index=False))

# Check the anomaly
last_z = result.iloc[-1]['Z_SCORE']
if last_z > 3:
print(f"\n[ALERT] Statistically Significant Accumulation Detected! Z-Score: {last_z:.2f}")

Step-by-Step Code Summary

The provided Python script performs anomaly detection on stock market delivery data using the statistical Z-score method. Unlike simple percentage changes, the Z-score accounts for the historical volatility of the stock, allowing analysts to distinguish between normal market noise and statistically significant events.

1. Statistical Baseline Construction

The code first establishes the “normal behavior” of the stock over a specified lookback period (e.g., 50 days). It computes two rolling metrics for each stock symbol independently:

  • Rolling Mean (μ): The average delivery volume, representing the central tendency.
  • Rolling Standard Deviation (σ): A measure of dispersion, representing how much the volume typically fluctuates.

2. Methodological Definition: Standardization

The core logic transforms the absolute volume data into a standard normal distribution score. This calculates how many standard deviations the current observation ($x$) is away from the historical mean ($\mu$).

Mathematical Specification:

Z=xμσ

Where:

  • x: Current Delivery Volume ($V_D$).
  • μ (Mu): Rolling Mean of $V_D$.
  • σ (Sigma): Rolling Standard Deviation of $V_D$.

3. Anomaly Detection Logic

The resulting Z_SCORE column provides a normalized signal:

  • Z ≈ 0: Volume is exactly average.
  • Z > +3.0: Positive Sigma Event. The volume is statistically anomalous (occurring in less than 0.3% of cases in a normal distribution), often signaling a major institutional entry.
  • Z < -2.0: Negative Sigma Event. Volume is significantly below usual levels.

4. Grouping and Transformation

The script utilizes groupby().transform(). This ensures that the rolling statistics are computed strictly within the boundaries of each specific stock symbol, preventing data from one stock (e.g., a high-volume stock like Vodafone) from distorting the statistics of another (e.g., a low-volume stock like MRF).

Detailed Explanation of Variables:

  • Z: The Resultant score. |Z| > 2.0 typically indicates a statistically significant deviation.
  • μVD: The Arithmetic Mean of the delivery volume over the lookback period.
  • σVD (Denominator): The Standard Deviation, representing the volatility of the settlement data.
  • – (Operator): Determines the distance of the current volume from the average.

Data Ecosystem and Storage Design

To implement the above, a structured environment is mandatory. Below is the technical specification for a delivery-centric database and the recommended Python stack.

Database Structure: PostgreSQL Table delivery_analytics

  • trade_date (DATE): Primary Key component; the session date.
  • symbol (VARCHAR): Primary Key component; NSE/BSE ticker symbol.
  • total_volume (BIGINT): The raw traded volume ($V_T$).
  • delivery_volume (BIGINT): The quantity of settled shares ($V_D$).
  • delivery_percent (NUMERIC): Calculated as ($V_D / V_T$) * 100.
  • ndv_ratio (NUMERIC): Normalized delivery volume index.
  • z_score (NUMERIC): Settlement anomaly score.
  • exchange_flag (CHAR): ‘N’ for NSE, ‘B’ for BSE.

Python-Friendly APIs and Data Sources

  • Official Sources: NSE India (Archives – Security-wise Deliverable Positions), BSE India (Equity Delivery Volume).
  • NSEPython: Highly efficient wrapper for fetching EOD delivery stats and bhavcopies directly into Pandas dataframes.
  • Kite Connect API: Useful for retrieving historical volume data with adjusted corporate actions.
  • Data News Triggers: Automated monitoring of “Large Trade Alerts” from the exchange, cross-referencing them against EOD delivery figures to see if “Block Deals” resulted in 100% delivery.

Impact Summary

  • Short-Term: Use the Z-Score to find stocks with “Dry Liquidity.” A high Z-Score in delivery often implies a localized shortage of intraday float, making the stock susceptible to price gaps.
  • Medium-Term: Monitor the NDV. A sustained NDV > 1.5 suggests a period of intense equity transfer that may redefine the stock’s support and resistance levels based on new ownership costs.
  • Long-Term: Structural delivery tracking allows for “Float-Adjusted” valuation models, providing a clearer picture of how much of a company’s market cap is actually being transacted versus locked away.

For advanced implementations of these Python workflows or to build custom high-frequency data pipelines, partnering with a specialized firm like TheUniBit ensures your market infrastructure remains robust, scalable, and authoritative. By moving from raw volume counts to mathematically verified settlement metrics, you gain a unique vantage point on the Indian cash market.

Scroll to Top