Causal Transformer for High-Frequency Return Prediction
This was my Lingjun Quant Challenge project. I used it to practice sequence modeling under a harsh information boundary: high-frequency return prediction is not only about architecture, but also about refusing to leak the future into validation.
Challenge Context
The task was to model ten-minute-ahead returns from A-share high-frequency microstructure data:
- 500 stocks
- 239 intraday minutes per day
- 384 features
- Parquet-scale data storage
The central difficulty was not only modeling a large sequential table. I had to design a validation protocol that preserved enough intraday context while preventing information leakage.
Method
I treated each stock-day as a minute-level sequence. Each minute became a token, and a strict causal mask prevented the model from reading future minutes while doing minute-wise regression. The architecture followed the information boundary of the task instead of ignoring it.
The main components were:
- causal self-attention over intraday minute sequences
- intraday time embeddings
- stock identity embeddings
- train-date-fitted normalization reused unchanged for validation and inference
Validation and Leakage Control
I split validation chronologically by dateid. Standardization statistics were fit once on training dates and then fixed. That was the main discipline of the project: time-respecting splits, fixed normalization, and suspicion toward any improvement that might come from leakage or time artifacts.
Diagnostics
I used multi-checkpoint diagnostics to understand model behavior instead of only watching a scalar loss:
- feature dependency views
- parameter distribution tracking
- prediction and residual distributions
- per-timeid behavior profiles
- positional embedding norm tracking
These diagnostics helped identify late-training degradation and rising dependence on time features.
What I learned
I do not want to oversell this project with a weak metric. The lesson I keep is more basic and more useful: in high-frequency sequence modeling, validation protocol and leakage control matter as much as the architecture. Diagnostics are necessary because an apparent gain can be a real signal, a time artifact, or simply the model learning the wrong shortcut.
