Web-RWKV

Inference engine for RWKV implemented in pure WebGPU

v0.10 Rust WebGPU WASM Cross-Platform

Core Features
check_circleNo CUDA/Python dependencies
check_circleSupport Nvidia/AMD/Intel GPUs
check_circleVulkan/Dx12/OpenGL backends
check_circleWASM support (Browser ready)
check_circleBatched inference
check_circleInt8 and Float4 quantization
check_circleSupport RWKV V4 through V7
check_circleLoRA merging & Model serialization

Functional Scope
✅ Provides

                        • Tokenizer

                        • Model Loading

                        • State Creation & Updating

                        • GPU-accelerated `run` & `softmax`

                        • Model Quantization
                    
❌ Does Not Provide

                        • OpenAI-compatible API

                        • Built-in Samplers

                        • State Caching System

                        • Python Bindings

Usage Examples

                # Performance Test (500 tokens)

                cargo run --release --example gen
            
                # Chat Demo

                cargo run --release --example chat -- --model /path/to/model.st
            
                # Quantization Example (First 32 layers)

                cargo run --release --example chat -- --quant 32

Advanced Features

                let runtime = TokioRuntime::new(bundle).await; // Async runtime
            
                The asynchronous runtime API allows CPU and GPU to work in parallel, maximizing utilization.
            
Input Tokens
→Hook Point
→Output Logits

                Hooks: Inject tensor ops into inference process for dynamic LoRA, control net, etc.

Model Conversion

                python assets/scripts/convert_safetensors.py --input model.pth --output model.st

RWKV-7 "Goose" 截至 2026 年初 RWKV 模型性能总结

✨步子哥 (steper)

Web-RWKV

上下文回复

#1 ✨步子哥当前回复

推荐