From Trained Model to Silicon: Automatic Verilog Generation for Streaming Neural Decoders

Abstract

We present a reproducible pipeline that turns a trained model into synthesizable hardware. Starting from a synthetic dataset, we train a logistic decoder, quantize its weights to 16-bit fixed-point, and emit synthesizable Verilog RTL together with an auto-generated testbench. The pipeline targets closed-loop neural decoding, such as calcium-imaging or miniscope-style recording, where low-latency on-device inference matters and shipping a trained model to silicon should not require hand-written HDL.

On the example task the float decoder reaches 0.9816 train accuracy and 0.9781 eval accuracy with an eval AUROC of 0.9986. After fixed-point quantization the model loses less than 2% accuracy, and the generated testbench passes iverilog simulation: the integer RTL behavior is validated against the floating-point reference on held-out sequences.

Pipeline Overview

The pipeline is a single, reproducible path from data to hardware. Each stage feeds the next, and the final stage produces files that go straight into a synthesis or simulation flow:

Stage	Output
1. Synthetic dataset	Labeled sequences for a streaming decode task
2. Train logistic decoder	Floating-point reference model and metrics
3. Quantize weights	16-bit fixed-point weights and features
4. Emit RTL	Synthesizable Verilog module
5. Emit testbench	Auto-generated testbench for iverilog

Because every stage is generated from the same trained model, the hardware is a faithful, reproducible artifact of the model rather than a separate manual reimplementation. There is no hand-written HDL between the Python model and the synthesizable Verilog.

The Decoder and Fixed-Point Quantization

The example module is an 8-ROI logistic decoder with a 4-frame history buffer. The buffer lets the decoder condition on recent frames, which suits the streaming nature of closed-loop neural recording, where each new frame arrives in sequence and a decision is needed with low latency.

Parameter	Value
Regions of interest (ROI)	8
History buffer	4 frames
Feature width	16-bit
Weight width	16-bit
Accumulator width	48-bit

Quantization maps the trained float weights to 16-bit fixed-point, with 16-bit features and a 48-bit accumulator sized to hold the running dot product without overflow across the history window. The wide accumulator is what lets the integer datapath track the floating-point reference closely; the resulting accuracy loss is less than 2%.

RTL Generation and Testbench

From the quantized model the pipeline emits a synthesizable Verilog module and an auto-generated testbench. The testbench drives the module with held-out sequences and compares the integer RTL output against the floating-point reference, so correctness is checked rather than assumed.

The generated testbench passes iverilog simulation. This is the key validation step: the integer RTL behavior is validated against the floating-point reference on held-out sequences, confirming that the quantized hardware reproduces the trained model's decisions within the reported accuracy loss.

Results

The float decoder is accurate on the example task, and the accuracy survives quantization. AUROC near 1.0 indicates the decoder separates the classes cleanly, and the small drop after fixed-point conversion shows the integer datapath preserves that behavior.

0.9781

Eval accuracy

Held-out evaluation accuracy of the trained logistic decoder.

0.9986

Eval AUROC

Held-out evaluation AUROC, indicating near-clean class separation.

Metric	Value
Train accuracy	0.9816
Eval accuracy	0.9781
Eval AUROC	0.9986
Accuracy loss after fixed-point quantization	< 2%

Train accuracy of 0.9816 and eval accuracy of 0.9781 are close, and the less-than-2% loss after fixed-point quantization means the synthesizable RTL behaves like the float reference rather than a degraded approximation of it.

Discussion

The result of interest is RTL synthesis from Python models without hand-written HDL. A practitioner trains a decoder in the usual way and receives synthesizable Verilog plus a testbench that already passes iverilog, with the integer behavior checked against the float reference. The manual, error-prone step of reimplementing a model in HDL is removed from the loop.

For closed-loop neural decoding this matters because the inference has to run on-device with low latency. Emitting fixed-point RTL directly from the trained model keeps the deployed hardware in step with the validated model, and the auto-generated testbench makes each regenerated module re-verifiable against the same reference.

Conclusion

We turn a trained logistic decoder into synthesizable hardware along a single reproducible path: synthetic dataset, trained float model, 16-bit fixed-point quantization, and emitted Verilog RTL with an auto-generated testbench. The example 8-ROI decoder reaches 0.9781 eval accuracy and 0.9986 eval AUROC, loses less than 2% accuracy after quantization, and passes iverilog simulation with the integer RTL validated against the float reference. The path from a Python model to silicon, with no hand-written HDL, is the contribution we report.