Objectives
- Data Handling: Efficiently process large volumes of tick data using multiprocessing techniques.
- Latency Analysis: Generate and incorporate realistic latency data into the trading simulation.
- Strategy Implementation: Develop and implement a grid trading strategy tailored for high-frequency environments.
- Backtesting: Assess the strategy's performance using historical tick data.
- Analysis and Visualization: Present the results through detailed analysis and graphical representations.
Technologies and Tools
- Programming Language: Python
- Libraries and Frameworks:
numpy
: Numerical computations.numba
: Just-In-Time (JIT) compilation for performance optimization.polars
: High-performance DataFrame library.matplotlib
: Data visualization.hftbacktest
: Custom library for high-frequency backtesting.
Why Hadoop or Spark Isn’t Fit for This?
Hadoop and Spark are not suitable for this project due to the following reasons:
- Granularity and Latency: HFT relies on tick-by-tick data with nanosecond-level timestamps, which Hadoop and Spark cannot efficiently handle.
- Real-time Performance: Generating latency data requires real-time capabilities, which distributed systems like Hadoop and Spark cannot offer efficiently.
- Specialized Computations: Tools like Numba and Python multiprocessing provide better performance for HFT computations than Hadoop or Spark.
- Iterative and Adaptive Processing: HFT strategies involve continuous adjustments, which in-memory processing with Python handles more efficiently than Hadoop or Spark.
Why Use Polars?
Polars offers high-performance features ideal for this project, including:
- Columnar Data Storage: Efficient for analytical workloads and SIMD optimizations.
- Lazy Execution Engine: Defers computations for optimized query plans.
- Parallelism: Enables multi-threaded operations for scalability.
- Memory Efficiency: Leverages Rust's memory safety for efficient management.
Architecture
Data Collection and Preprocessing
- Source: Binance Futures tick-level data.
- Method:
- Used
hftbacktest
to fetch data. - Stored 30 days of BTC-USDT and ETH-USDT tick data in
.gz
format.
- Used
_ = binancefutures.convert(
input_filename=filepath,
output_filename=output_filepath,
combined_stream=True
)
- Outcome: High-resolution tick data for preprocessing.
Data Conversion (GZ to NPZ)
- Rationale:
.npz
format is more efficient for in-memory operations. - Process: Converted
.gz
files to.npz
using multiprocessing.
for file in gz_files:
output_file = file.replace(".gz", ".npz")
hftbacktest.convert(input_filename=file, output_filename=output_file)
- Result: Structured arrays ready for analysis.
Market Depth Snapshot Creation
- Necessity: Ensures continuity for daily trading simulations.
- Methodology: Generated EOD snapshots using
create_last_snapshot
.
_ = create_last_snapshot(
['usdm/btcusdt_20240808.npz'],
tick_size=0.1,
lot_size=0.001,
output_snapshot_filename='usdm/btcusdt_20240808_eod.npz'
)
Grid Trading Strategy Implementation
Overview
Grid trading involves placing buy and sell orders at predefined intervals around a reference price, capitalizing on market volatility.
Strategy Parameters
- Half Spread: Distance from the mid-price for initial orders.
- Grid Interval: Spacing between successive orders.
- Skew: Adjustment based on the current position.
- Order Quantity: Maintained at a notional value of $100.
Implementation
@njit
def grid_trading(hbt, recorder, half_spread, grid_interval, skew, order_qty):
# Function implementation
pass
half_spread = 0.023% of mid_price
grid_interval = 0.086% of mid_price
skew = 0.0004% of mid-price
- Used Numba JIT compilation for speed.
- Dynamically managed orders and recorded performance metrics.
Visualization of Equity Curve
from matplotlib import pyplot as plt
plt.plot(net_equity_df['timestamp'], net_equity_df['cum_ret'])
plt.ylabel('Cumulative Returns (%)')
plt.grid()
plt.show()
Results and Analysis
Performance Metrics
- Cumulative Returns: Total return over the backtesting period.
- Sharpe Ratio: Risk-adjusted return efficiency.
Key Insights
- Profitability: Consistent cumulative returns.
- Risk Management: Favorable Sharpe Ratio.
- Scalability: Efficient multiprocessing and data handling.
Conclusion and Future Work
Conclusion
- Demonstrated end-to-end HFT backtesting, including:
- Data acquisition and preprocessing.
- Latency modeling.
- Grid strategy execution and analysis.
Future Work
- Parametric Optimization: Leverage machine learning for refining parameters.
- Expanded Asset Coverage: Apply to more cryptocurrency pairs or asset classes.
- Real-Time Adaptation: Integrate dynamic adjustments based on real-time data.
- Deeper Risk Analytics: Explore tail-risk behavior and extreme volatility scenarios.
References
- Binance API Documentation
- NumPy, Numba, Polars, Matplotlib Documentation
- HFTBacktest Library