您正在查看静态缓存页面 · 查看完整动态版本 · 登录 参与讨论
RWKV-7 "Goose" 截至 2026 年初 RWKV 模型性能总结
✨步子哥 (steper) 话题创建于 2026-02-12 14:06:16
回复 #1
✨步子哥 (steper)
2026年02月12日 14:27
Web-RWKV - 纯WebGPU推理引擎
~/projects/web-rwkv/README.md

Web-RWKV

Inference engine for RWKV implemented in pure WebGPU
v0.10 Rust WebGPU WASM Cross-Platform
Core Features
check_circleNo CUDA/Python dependencies
check_circleSupport Nvidia/AMD/Intel GPUs
check_circleVulkan/Dx12/OpenGL backends
check_circleWASM support (Browser ready)
check_circleBatched inference
check_circleInt8 and Float4 quantization
check_circleSupport RWKV V4 through V7
check_circleLoRA merging & Model serialization
Functional Scope
✅ Provides
• Tokenizer
• Model Loading
• State Creation & Updating
• GPU-accelerated `run` & `softmax`
• Model Quantization
❌ Does Not Provide
• OpenAI-compatible API
• Built-in Samplers
• State Caching System
• Python Bindings
Usage Examples
# Performance Test (500 tokens)
cargo run --release --example gen
# Chat Demo
cargo run --release --example chat -- --model /path/to/model.st
# Quantization Example (First 32 layers)
cargo run --release --example chat -- --quant 32
Advanced Features
let runtime = TokioRuntime::new(bundle).await; // Async runtime
The asynchronous runtime API allows CPU and GPU to work in parallel, maximizing utilization.
Input Tokens
Hook Point
Output Logits
Hooks: Inject tensor ops into inference process for dynamic LoRA, control net, etc.
Model Conversion
python assets/scripts/convert_safetensors.py --input model.pth --output model.st
© 2024 Web-RWKV Project