# 馃攰 FunASR 娣卞害瑙f瀽锛氶樋閲屽反宸村紑婧愮殑宸ヤ笟绾ц闊宠瘑鍒伐鍏风
> **涓鍙ヨ瘽浠嬬粛**锛欶unASR 鏄樋閲屽反宸磋揪鎽╅櫌寮婧愮殑绔埌绔闊宠瘑鍒伐鍏峰寘锛岄泦鎴愪簡 Paraformer銆丼enseVoice 绛 SOTA 妯″瀷锛屾敮鎸 ASR銆乂AD銆佹爣鐐规仮澶嶃佹儏鎰熻瘑鍒侀煶棰戜簨浠舵娴嬬瓑澶氫换鍔★紝鏄繛鎺ュ鏈爺绌朵笌宸ヤ笟搴旂敤鐨勬ˉ姊併
---
## 馃搵 鐩綍
1. [椤圭洰姒傝堪](#椤圭洰姒傝堪)
2. [鏍稿績鏋舵瀯涓庢妧鏈師鐞哴(#鏍稿績鏋舵瀯涓庢妧鏈師鐞)
3. [妯″瀷瀹舵棌璇﹁В](#妯″瀷瀹舵棌璇﹁В)
4. [蹇熷紑濮嬩笌瀹炴垬](#蹇熷紑濮嬩笌瀹炴垬)
5. [閮ㄧ讲涓庝紭鍖朷(#閮ㄧ讲涓庝紭鍖)
6. [搴旂敤鍦烘櫙涓庢渚媇(#搴旂敤鍦烘櫙涓庢渚)
7. [涓庡叾浠栧伐鍏峰姣擼(#涓庡叾浠栧伐鍏峰姣)
8. [鎬荤粨涓庡睍鏈沒(#鎬荤粨涓庡睍鏈)
---
## 椤圭洰姒傝堪
### 浠涔堟槸 FunASR锛
**FunASR**锛團undamental End-to-End Speech Recognition Toolkit锛夋槸鐢**闃块噷宸村反杈炬懇闄**寮婧愮殑绔埌绔闊宠瘑鍒伐鍏峰寘銆傚畠浜 2023 骞存寮忓紑婧愶紝鐩爣鏄**鍦ㄨ闊宠瘑鍒殑瀛︽湳鐮旂┒鍜屽伐涓氬簲鐢ㄤ箣闂存灦璧蜂竴搴фˉ姊**銆
### 鏍稿績瀹氫綅
| 缁村害 | 璇存槑 |
|-----|------|
| **寮婧愭ц川** | 瀹屽叏寮婧愶紝Apache 2.0 鍗忚 |
| **寮鍙戝洟闃** | 闃块噷宸村反杈炬懇闄 |
| **妯″瀷鐢熸** | ModelScope + Hugging Face 鍙屽钩鍙板垎鍙 |
| **鏁版嵁瑙勬ā** | 鏁颁竾灏忔椂宸ヤ笟绾ф爣娉ㄦ暟鎹缁 |
| **绀惧尯娲昏穬搴** | GitHub 楂樻槦椤圭洰锛屾寔缁洿鏂 |
### 鏍稿績鍔熻兘鐭╅樀
```
鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹
鈹 FunASR 鍔熻兘鐭╅樀 鈹
鈹溾攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹
鈹 璇煶璇嗗埆 鈹 璇煶绔偣妫娴 鈹 鏍囩偣鎭㈠ 鈹 璇█妯″瀷 鈹
鈹 (ASR) 鈹 (VAD) 鈹 (Punc) 鈹 (LM) 鈹
鈹溾攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹尖攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹尖攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹尖攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹
鈹 璇磋瘽浜洪獙璇 鈹 璇磋瘽浜哄垎绂 鈹 鎯呮劅璇嗗埆 鈹 闊抽浜嬩欢妫娴 鈹
鈹 (SV) 鈹 (SD) 鈹 (SER) 鈹 (AED) 鈹
鈹溾攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹粹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹粹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹粹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹
鈹 澶氫汉瀵硅瘽璇煶璇嗗埆 (Multi-talker ASR) 鈹
鈹斺攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹
```
### 鏈鏂板姩鎬侊紙2024-2025锛
- **2024/10**锛氫腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔 1.12 鍙戝竷锛屾敮鎸 SenseVoiceSmall 妯″瀷
- **2024/09**锛氭柊澧炶闊冲敜閱掓ā鍨嬶紙fsmn_kws銆乻anm_kws 绛夛級
- **2024/07**锛**SenseVoice** 鍙戝竷鈥斺旀敮鎸 ASR+LID+SER+AED 鐨勫妯℃佽闊崇悊瑙fā鍨
- **2024/05**锛氭柊澧炴儏鎰熻瘑鍒ā鍨嬶紙emotion2vec+锛
- **2024/03**锛氭柊澧 **Qwen-Audio** 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷
- **2024/01**锛欶unASR 1.0 鍙戝竷锛屾灦鏋勫叏闈㈠崌绾
---
## 鏍稿績鏋舵瀯涓庢妧鏈師鐞
### 1. 鏁翠綋鏋舵瀯璁捐
FunASR 閲囩敤**妯″潡鍖栬璁**鐞嗗康锛岄氳繃缁熶竴鐨 `AutoModel` 鎺ュ彛鏁村悎澶氱璇煶澶勭悊浠诲姟锛
```
杈撳叆闊抽
鈹
鈻
鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹
鈹 VAD 棰勫鐞 鈹 鈫 璇煶绔偣妫娴嬶紝闀块煶棰戝垏鍒
鈹 (鍙) 鈹
鈹斺攢鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹
鈹
鈻
鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹
鈹 鐗瑰緛鎻愬彇 鈹 鈫 姊呭皵棰戣氨鍥 / FBank
鈹 (Encoder) 鈹
鈹斺攢鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹
鈹
鈻
鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹
鈹 鏍稿績妯″瀷 鈹 鈫 Paraformer / SenseVoice / Whisper
鈹 (ASR妯″瀷) 鈹
鈹斺攢鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹
鈹
鈻
鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹
鈹 鍚庡鐞 鈹 鈫 鏍囩偣鎭㈠銆両TN銆佺儹璇嶅寮
鈹 (鍙) 鈹
鈹斺攢鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹
鈹
鈻
杈撳嚭鏂囨湰
```
### 2. Paraformer锛氶潪鑷洖褰掕瘑鍒殑绐佺牬
#### 鏍稿績鍒涙柊锛欳IF 鏈哄埗
**Paraformer**锛圥arallel Transformer锛夋槸 FunASR 鐨勬棗鑸版ā鍨嬶紝鍏舵牳蹇冨垱鏂版槸**杩炵画闆嗘垚婊ゆ尝锛圕ontinuous Integration Filtering, CIF锛夋満鍒**銆
**浼犵粺鑷洖褰掓ā鍨嬬殑闂**锛
- 涓茶瑙g爜锛屾帹鐞嗛熷害闅忓簭鍒楅暱搴︾嚎鎬у闀
- 鏃犳硶鍏呭垎鍒╃敤 GPU 骞惰璁$畻
- 閿欒浼犳挱锛氭棭鏈熼敊璇奖鍝嶅悗缁粨鏋
**Paraformer 鐨勮В鍐虫柟妗**锛
```
浼犵粺 AR 妯″瀷锛
闊抽 鈫 [閫愬抚瑙g爜] 鈫 "浠" 鈫 "澶" 鈫 "澶" 鈫 "姘" 鈫 ... (涓茶锛屾參)
Paraformer NAR 妯″瀷锛
闊抽 鈫 [CIF棰勬祴鍣╙ 鈫 闀垮害棰勬祴 鈫 [骞惰瑙g爜] 鈫 "浠婂ぉ澶╂皵寰堝ソ" (涓娆℃э紝蹇)
```
#### CIF 宸ヤ綔娴佺▼
```python
# CIF 棰勬祴鍣ㄦ牳蹇冮昏緫锛堢畝鍖栫増锛
def cif_predictor(encoder_output):
# 1. 涓婁笅鏂囧缓妯★細1D鍗风Н鎹曡幏灞閮ㄨ闊崇壒寰
context = conv1d(encoder_output)
# 2. 鐢熸垚甯х骇閲嶈鎬ф潈閲 伪
alphas = sigmoid(output_layer(context))
# 3. 绱Н 伪 鍊硷紝瓒呰繃闃堝兼椂鐢熸垚 token
tokens = []
accumulated = 0
for alpha in alphas:
accumulated += alpha
if accumulated >= threshold:
tokens.append(current_frame_features)
accumulated = 0
return tokens, len(tokens) # 棰勬祴鐨 token 搴忓垪鍜岄暱搴
```
**CIF 浼樺娍**锛
- 鏃犻渶棰勫厛璁惧畾鐩爣闀垮害
- 鑷傚簲涓嶅悓璇熷拰璇煶绫诲瀷
- 瀵归綈绮惧害杈惧抚绾у埆锛10ms锛
#### 鎬ц兘瀵规瘮
| 妯″瀷 | Aishell1 CER | 鎺ㄧ悊閫熷害 (RTF) | 鍙傛暟閲 |
|-----|-------------|---------------|--------|
| Transformer | 5.8% | 0.82 | 180M |
| Conformer | 3.4% | 0.56 | 190M |
| **Paraformer** | **1.94%** | **0.12** | **220M** |
*RTF锛圧eal Time Factor锛= 鎺ㄧ悊鏃堕棿/闊抽鏃堕暱锛岃秺灏忛熷害瓒婂揩*
### 3. SenseVoice锛氬妯℃佽闊崇悊瑙
**SenseVoice** 鏄 FunASR 鎺ㄥ嚭鐨勬柊涓浠h闊冲熀纭妯″瀷锛岄噰鐢**澶氫换鍔″涔**妗嗘灦锛
#### 鏋舵瀯瀵规瘮
**SenseVoice Small**锛堥潪鑷洖褰掞紝杞婚噺绾э級锛
```
杈撳叆闊抽 鈫 Feature Extractor 鈫 Task Embedder 鈫 骞惰杈撳嚭
鈫
鈹屸攢鈹鈹鈹鈹鈹鈹尖攢鈹鈹鈹鈹鈹鈹
LID SER AED/ASR
```
**SenseVoice Large**锛堣嚜鍥炲綊锛屾洿寮哄ぇ锛夛細
```
闊抽杈撳叆 鈫 SAN-M Encoder 鈫 Transformer Decoder 鈫 鑷洖褰掔敓鎴愬簭鍒
杈撳嚭绀轰緥锛
SOS 鈫 LID:zh 鈫 SER:happy 鈫 AED:bgm 鈫 ASR:闃 鈫 AED:/bgm 鈫 ASR:閲 鈫 ASR:宸 鈫 happy 鈫 EOS
```
#### 澶氫换鍔¤兘鍔
| 浠诲姟 | 璇存槑 | 杈撳嚭绀轰緥 |
|-----|------|---------|
| **ASR** | 璇煶璇嗗埆 | "闃块噷宸村反" |
| **LID** | 璇璇嗗埆 | [zh]涓枃銆乕en]鑻辨枃銆乕yue]绮よ銆乕ja]鏃ヨ銆乕ko]闊╄ |
| **SER** | 鎯呮劅璇嗗埆 | [happy]寮蹇冦乕angry]鐢熸皵銆乕sad]闅捐繃銆乕neutral]涓珛 |
| **AED** | 闊抽浜嬩欢妫娴 | [bgm]鑳屾櫙闊充箰銆乕applause]鎺屽0銆乕laughter]绗戝0銆乕cough]鍜冲椊 |
| **ITN** | 閫嗘枃鏈鍒欏寲 | "浜岄浂浜屽洓骞" 鈫 "2024骞" |
#### 鎬ц兘浼樺娍
- **澶氳瑷**锛40涓囧皬鏃舵暟鎹缁冿紝鏀寔 50+ 璇█
- **楂樼簿搴**锛氫腑鏂/绮よ璇嗗埆鏁堟灉浼樹簬 Whisper
- **楂樻晥鐜**锛歋enseVoice-Small 鎺ㄧ悊閫熷害鏄 Whisper-Large 鐨 15 鍊
- **瀵屾枃鏈**锛氬悓鏃惰緭鍑烘儏鎰熴佷簨浠剁瓑璇箟淇℃伅
---
## 妯″瀷瀹舵棌璇﹁В
### 妯″瀷鎬昏
| 妯″瀷 | 浠诲姟 | 璇█ | 鍙傛暟閲 | 鐗圭偣 |
|-----|------|-----|--------|------|
| **SenseVoiceSmall** | ASR+LID+SER+AED | 澶氳瑷 | 330M | 猸 澶氫换鍔$悊瑙 |
| **paraformer-zh** | ASR | 涓枃 | 220M | 闈炲疄鏃讹紝楂樼簿搴 |
| **paraformer-zh-streaming** | ASR | 涓枃 | 220M | 瀹炴椂娴佸紡璇嗗埆 |
| **paraformer-en** | ASR | 鑻辨枃 | 220M | 鑻辫涓撶敤浼樺寲 |
| **conformer-en** | ASR | 鑻辨枃 | 220M | Conformer鏋舵瀯 |
| **ct-punc** | 鏍囩偣鎭㈠ | 涓嫳鏂 | 290M | 涓婁笅鏂囨劅鐭 |
| **fsmn-vad** | VAD | 澶氳瑷 | 0.4M | 瀹炴椂绔偣妫娴 |
| **fsmn-kws** | 璇煶鍞ら啋 | 涓枃 | 0.7M | 瀹炴椂鍞ら啋 |
| **cam++** | 璇磋瘽浜洪獙璇 | 澶氳瑷 | 7.2M | 娣卞害璇磋瘽浜虹壒寰 |
| **Whisper-large-v3** | ASR | 澶氳瑷 | 1550M | OpenAI 妯″瀷 |
| **Qwen-Audio** | 澶氭ā鎬佸ぇ妯″瀷 | 澶氳瑷 | 8B | 闊抽鏂囨湰瀵归綈 |
| **emotion2vec+** | 鎯呮劅璇嗗埆 | 澶氳瑷 | 300M | 4绉嶆儏鎰熺被鍒 |
### 浠h〃鎬фā鍨嬭瑙
#### 1. Paraformer-zh锛堜腑鏂囪闊宠瘑鍒級
```python
from funasr import AutoModel
# 鍔犺浇妯″瀷
model = AutoModel(
model="paraformer-zh",
vad_model="fsmn-vad", # 璇煶绔偣妫娴
punc_model="ct-punc", # 鏍囩偣鎭㈠
# spk_model="cam++" # 璇磋瘽浜哄垎绂伙紙鍙夛級
)
# 鎺ㄧ悊
res = model.generate(
input="asr_example_zh.wav",
batch_size_s=300,
hotword='榄旀惌' # 鐑瘝澧炲己
)
print(res)
```
**杈撳嚭鏍煎紡**锛
```json
[{
"key": "asr_example_zh",
"text": "榄旀惌鏄竴涓紑婧愮殑妯″瀷鍗虫湇鍔″钩鍙般",
"timestamp": [[0, 800], [800, 1200], ...], // 瀛楃骇鍒椂闂存埑
"confidence": 0.95
}]
```
#### 2. SenseVoiceSmall锛堝浠诲姟鐞嗚В锛
```python
from funasr import AutoModel
from funasr.utils.postprocess_utils import rich_transcription_postprocess
model = AutoModel(
model="iic/SenseVoiceSmall",
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000},
device="cuda:0",
)
res = model.generate(
input="example.mp3",
cache={},
language="auto", # 鑷姩璇█妫娴
use_itn=True, # 鍚敤閫嗘枃鏈綊涓鍖
batch_size_s=60,
merge_vad=True,
merge_length_s=15,
)
# 瀵屾枃鏈悗澶勭悊
text = rich_transcription_postprocess(res[0]["text"])
print(text)
```
**杈撳嚭绀轰緥**锛
```
<|zh|><|happy|><|bgm|>闃块噷宸村反鏄竴瀹剁鎶鍏徃<|/bgm|>
```
#### 3. Paraformer-zh-streaming锛堝疄鏃惰瘑鍒級
```python
from funasr import AutoModel
# 娴佸紡閰嶇疆
chunk_size = [0, 10, 5] # [0, 10, 5] = 600ms 寤惰繜
encoder_chunk_look_back = 4
decoder_chunk_look_back = 1
model = AutoModel(model="paraformer-zh-streaming")
# 妯℃嫙娴佸紡杈撳叆
import soundfile
speech, sample_rate = soundfile.read("test.wav")
chunk_stride = chunk_size[1] * 960 # 600ms = 960 閲囨牱鐐 (16kHz)
cache = {}
total_chunk_num = int(len(speech) / chunk_stride) + 1
for i in range(total_chunk_num):
speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride]
is_final = (i == total_chunk_num - 1)
res = model.generate(
input=speech_chunk,
cache=cache,
is_final=is_final,
chunk_size=chunk_size,
encoder_chunk_look_back=encoder_chunk_look_back,
decoder_chunk_look_back=decoder_chunk_look_back
)
print(f"Chunk {i}: {res}")
```
**娴佸紡寤惰繜璇存槑**锛
- `chunk_size = [0, 10, 5]`锛氫笂灞忓疄鏃跺嚭瀛楃矑搴︿负 10脳60=600ms锛屾湭鏉ヤ俊鎭负 5脳60=300ms
- 姣忔鎺ㄧ悊杈撳叆 600ms 闊抽锛岃緭鍑哄搴旀枃瀛
- 鏈鍚庝竴涓墖娈佃缃 `is_final=True` 寮哄埗杈撳嚭鏈鍚庝竴涓瓧
---
## 蹇熷紑濮嬩笌瀹炴垬
### 瀹夎
**鏂瑰紡涓锛歱ip 瀹夎锛堟帹鑽愶級**
```bash
pip install funasr
# 濡傞渶浣跨敤宸ヤ笟棰勮缁冩ā鍨嬶紝棰濆瀹夎
pip install -U modelscope huggingface_hub
```
**鏂瑰紡浜岋細婧愮爜瀹夎**
```bash
git clone https://github.com/alibaba/FunASR.git
cd FunASR
pip install -e ./
```
**渚濊禆瑕佹眰**锛
- Python >= 3.8
- PyTorch >= 1.13
- torchaudio
### 蹇熶綋楠
**鍛戒护琛屾柟寮**锛
```bash
funasr ++model=paraformer-zh \
++vad_model="fsmn-vad" \
++punc_model="ct-punc" \
++input=asr_example_zh.wav
```
**Python API 鏂瑰紡**锛
```python
from funasr import AutoModel
# 涓閿姞杞斤紝鑷姩涓嬭浇妯″瀷
model = AutoModel(model="paraformer-zh")
res = model.generate(input="test.wav")
print(res[0]["text"])
```
### 瀹炴垬妗堜緥
#### 妗堜緥 1锛氶暱闊抽杞啓
```python
from funasr import AutoModel
# 闀块煶棰戦渶瑕佸紑鍚 VAD 鑷姩鍒囧垎
model = AutoModel(
model="paraformer-zh",
vad_model="fsmn-vad",
vad_kwargs={"max_single_segment_time": 30000}, # 鏈澶30绉掍竴娈
punc_model="ct-punc",
)
# 鏀寔浠绘剰鏃堕暱闊抽
res = model.generate(
input="long_meeting_recording.wav",
batch_size_s=300, # 鍔ㄦ batch锛屾绘椂闀 300s
)
# 杈撳嚭甯︽椂闂存埑鐨勫畬鏁存枃鏈
for item in res:
print(f"[{item['timestamp'][0][0]}ms] {item['text']}")
```
#### 妗堜緥 2锛氬疄鏃朵細璁浆褰
```python
import pyaudio
from funasr import AutoModel
# 鍒濆鍖栨祦寮忔ā鍨
model = AutoModel(model="paraformer-zh-streaming")
chunk_size = [0, 10, 5] # 600ms 寤惰繜
# 閰嶇疆闊抽娴
p = pyaudio.PyAudio()
stream = p.open(
format=pyaudio.paInt16,
channels=1,
rate=16000,
input=True,
frames_per_buffer=9600 # 600ms @ 16kHz
)
cache = {}
print("寮濮嬪疄鏃惰浆褰...")
try:
while True:
# 璇诲彇闊抽鍧
data = stream.read(9600)
speech_chunk = np.frombuffer(data, dtype=np.int16)
# 娴佸紡鎺ㄧ悊
res = model.generate(
input=speech_chunk,
cache=cache,
is_final=False,
chunk_size=chunk_size
)
if res[0]["text"]:
print(res[0]["text"], end="", flush=True)
except KeyboardInterrupt:
print("\n鍋滄杞綍")
stream.stop_stream()
stream.close()
p.terminate()
```
#### 妗堜緥 3锛氭儏鎰熷垎鏋
```python
from funasr import AutoModel
model = AutoModel(model="iic/SenseVoiceSmall")
res = model.generate(
input="customer_service_call.wav",
language="auto",
)
text = res[0]["text"]
# 瑙f瀽鎯呮劅鏍囩
if "<|happy|>" in text:
emotion = "寮蹇"
elif "<|angry|>" in text:
emotion = "鐢熸皵"
elif "<|sad|>" in text:
emotion = "闅捐繃"
else:
emotion = "涓珛"
print(f"璇嗗埆缁撴灉锛歿text}")
print(f"鎯呮劅锛歿emotion}")
```
---
## 閮ㄧ讲涓庝紭鍖
### 閮ㄧ讲鏂瑰紡瀵规瘮
| 閮ㄧ讲鏂瑰紡 | 骞冲彴 | 鎬ц兘鐗圭偣 | 閫傜敤鍦烘櫙 |
|---------|-----|---------|---------|
| **Python SDK** | CPU/GPU | 鐏垫椿鏄撶敤 | 寮鍙戞祴璇 |
| **ONNX Runtime** | 璺ㄥ钩鍙 | 楂樻ц兘鎺ㄧ悊 | 鐢熶骇鐜 |
| **TensorRT** | NVIDIA GPU | 鏋佽嚧鎬ц兘 | 楂樺苟鍙戝満鏅 |
| **绉诲姩绔** | Android/iOS | 杞婚噺绾 | 绉诲姩搴旂敤 |
| **Web 鏈嶅姟** | HTTP/WebSocket | 杩滅▼璋冪敤 | 浜戞湇鍔 |
### Docker 涓閿儴缃
```bash
# 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟锛圕PU锛
docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.5
# 涓枃瀹炴椂璇煶鍚啓鏈嶅姟
docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.12
# 鍚姩鏈嶅姟
docker run -p 10095:10095 -it --privileged=true \
-v $PWD/funasr-runtime-resources:/workspace/models \
registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.5
```
### 鎬ц兘浼樺寲鎶宸
#### 1. 閲忓寲鍔犻
```python
# INT8 閲忓寲锛屾帹鐞嗛熷害鎻愬崌 2.3 鍊嶏紝绮惧害鎹熷け < 0.3%
from funasr_onnx import Paraformer
model = Paraformer(
model_dir,
batch_size=1,
quantize=True # 鍚敤閲忓寲
)
```
#### 2. 鍔ㄦ Batch
```python
# 鏍规嵁 GPU 鏄惧瓨鑷姩璋冩暣 batch size
model.generate(
input="test.wav",
batch_size_s=60, # batch 鎬绘椂闀 60s锛岃岄潪鍥哄畾鏍锋湰鏁
)
```
#### 3. 鐑瘝澧炲己
```python
# 鎻愬崌鐗瑰畾璇嶆眹璇嗗埆鍑嗙‘鐜
model.generate(
input="tech_conference.wav",
hotword='澶ц瑷妯″瀷 浜哄伐鏅鸿兘 娣卞害瀛︿範 Transformer'
)
```
#### 4. GPU 澶氳矾骞跺彂
```python
# 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 GPU 鐗堟湰鏀寔鍔ㄦ batch
# 闀块煶棰戞祴璇曢泦涓婂崟绾 RTF=0.0076锛屽绾垮姞閫熸瘮 1200+
```
---
## 搴旂敤鍦烘櫙涓庢渚
### 鍦烘櫙鐭╅樀
| 鍦烘櫙 | 鎺ㄨ崘妯″瀷 | 鍏抽敭鐗规 |
|-----|---------|---------|
| **浼氳杞綍** | paraformer-zh + vad + punc | 闀块煶棰戝垏鍒嗐佹爣鐐规仮澶 |
| **瀹炴椂瀛楀箷** | paraformer-zh-streaming | 浣庡欢杩熸祦寮忚緭鍑 |
| **瀹㈡湇璐ㄦ** | SenseVoiceSmall | 鎯呮劅璇嗗埆銆佷簨浠舵娴 |
| **璇煶鍔╂墜** | fsmn-kws + paraformer | 鍞ら啋璇 + 璇嗗埆 |
| **澶氳瑷缈昏瘧** | Whisper-large-v3 | 99 绉嶈瑷鏀寔 |
| **澹扮汗璇嗗埆** | cam++ | 璇磋瘽浜虹‘璁/鍒嗗壊 |
| **鍐呭瀹℃牳** | SenseVoiceSmall | 闊抽浜嬩欢妫娴 |
| **鏅鸿兘搴ц埍** | SenseVoiceSmall | 澶氫换鍔′竴浣撳寲 |
### 鐪熷疄妗堜緥
#### 妗堜緥 1锛氭櫤鑳戒細璁郴缁
**闇姹**锛
- 2灏忔椂浼氳褰曢煶杞枃瀛
- 鍖哄垎涓嶅悓鍙戣█浜
- 鑷姩鐢熸垚浼氳绾
**鏂规**锛
```python
model = AutoModel(
model="paraformer-zh",
vad_model="fsmn-vad",
punc_model="ct-punc",
spk_model="cam++", # 璇磋瘽浜哄垎绂
)
res = model.generate(
input="meeting_2h.wav",
batch_size_s=300,
)
# 杈撳嚭鏍煎紡锛
# [鍙戣█浜篈] 10:05 鎴戜滑闇瑕佽璁轰笅瀛e害鐨勭洰鏍
# [鍙戣█浜築] 10:07 鎴戣涓哄簲璇ラ噸鐐瑰叧娉ㄧ敤鎴蜂綋楠
```
#### 妗堜緥 2锛氭櫤鑳藉鏈嶅姪鎵
**闇姹**锛
- 瀹炴椂璇嗗埆瀹㈡埛璇煶
- 鍒嗘瀽瀹㈡埛鎯呯华
- 妫娴嬪叧閿瘝锛堟姇璇夈侀娆剧瓑锛
**鏂规**锛
```python
model = AutoModel(model="iic/SenseVoiceSmall")
res = model.generate(
input=audio_stream,
language="auto",
)
text = res[0]["text"]
# 鎯呮劅鍒嗘瀽
if "<|angry|>" in text or "<|sad|>" in text:
alert_manager() # 閫氱煡涓荤
# 鍏抽敭璇嶆娴
keywords = ["鎶曡瘔", "閫娆", "涓嶆弧鎰", "涓炬姤"]
if any(kw in text for kw in keywords):
escalate_ticket() # 鍗囩骇宸ュ崟
```
---
## 涓庡叾浠栧伐鍏峰姣
### FunASR vs Whisper
| 缁村害 | FunASR | Whisper |
|-----|--------|---------|
| **寮鍙戝洟闃** | 闃块噷宸村反杈炬懇闄 | OpenAI |
| **涓枃鏁堟灉** | 猸愨瓙猸愨瓙猸 涓撲负涓枃浼樺寲 | 猸愨瓙猸愨瓙 閫氱敤妯″瀷 |
| **閫熷害** | SenseVoice 15鍊嶄簬 Whisper-Large | 杈冩參 |
| **鍔熻兘涓板瘜搴** | ASR+VAD+Punc+SER+AED+... | 浠 ASR+缈昏瘧 |
| **閮ㄧ讲渚垮埄** | 涓閿 Docker锛屽畬鏁存湇鍔¢摼 | 闇鑷鎼缓 |
| **鐑瘝澧炲己** | 鉁 鏀寔 | 鉂 涓嶆敮鎸 |
| **寮婧愬崗璁** | Apache 2.0 | MIT |
### FunASR vs 鍏朵粬涓枃 ASR
| 宸ュ叿 | 鐗圭偣 | 閫傜敤鍦烘櫙 |
|-----|------|---------|
| **FunASR** | 鍔熻兘鍏ㄩ潰銆佸伐涓氱骇銆佹寔缁洿鏂 | 浼佷笟搴旂敤銆佺爺绌 |
| **PaddleSpeech** | 鐧惧害鍑哄搧锛孭addle 鐢熸 | Paddle 鐢ㄦ埛 |
| **WeNet** | 杞婚噺銆侀珮鏁 | 杈圭紭閮ㄧ讲 |
| **Kaldi** | 浼犵粺 ASR 妗嗘灦 | 瀛︽湳鐮旂┒ |
---
## 鎬荤粨涓庡睍鏈
### 鏍稿績浼樺娍
1. **宸ヤ笟绾ц川閲**锛氬熀浜庢暟涓囧皬鏃跺伐涓氭暟鎹缁冿紝娉涘寲鑳藉姏寮
2. **鍔熻兘鍏ㄩ潰**锛氫粠 ASR 鍒板妯℃佺悊瑙o紝涓绔欏紡瑙e喅
3. **鎬ц兘棰嗗厛**锛歅araformer 闈炶嚜鍥炲綊鏋舵瀯锛岄熷害绮惧害鍙屼紭
4. **閮ㄧ讲鍙嬪ソ**锛氭敮鎸佸绉嶉儴缃叉柟寮忥紝Docker 涓閿惎鍔
5. **鎸佺画婕旇繘**锛歋enseVoice銆丵wen-Audio 绛夊墠娌挎ā鍨嬫寔缁泦鎴
### 鏈潵鏂瑰悜
- **澶氭ā鎬佽瀺鍚**锛氭洿娣卞害鐨勯煶棰-鏂囨湰-瑙嗚铻嶅悎
- **浣庤祫婧愪紭鍖**锛氭敮鎸佹洿澶氭柟瑷銆佸皬璇
- **绔晶閮ㄧ讲**锛氭洿杞婚噺鐨勬ā鍨嬶紝鏀寔 IoT 璁惧
- **涓у寲閫傞厤**锛氭洿濂界殑璇磋瘽浜鸿嚜閫傚簲鑳藉姏
### 蹇熷紑濮
```bash
# 1. 瀹夎
pip install funasr modelscope
# 2. 5 琛屼唬鐮佷笂鎵
from funasr import AutoModel
model = AutoModel(model="paraformer-zh")
res = model.generate(input="test.wav")
print(res[0]["text"])
```
---
## 鍙傝冭祫婧
- **GitHub**: https://github.com/alibaba/FunASR
- **Gitee**: https://gitee.com/wenjiakai/FunASR
- **ModelScope**: https://modelscope.cn/organization/damo
- **Hugging Face**: https://huggingface.co/funasr
- **鏂囨。**: https://github.com/alibaba/FunASR/tree/main/docs
- **璁烘枃**: Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition (INTERSPEECH 2022)
---
*鏈枃鍩轰簬 FunASR 瀹樻柟鏂囨。銆丟itHub 浠撳簱鍜岀ぞ鍖鸿祫鏂欐暣鐞嗭紝鏃ㄥ湪甯姪寮鍙戣呭叏闈簡瑙h繖涓寮哄ぇ鐨勫紑婧愯闊宠瘑鍒伐鍏风銆*
#FunASR #璇煶璇嗗埆 #ASR #闃块噷宸村反 #杈炬懇闄 #寮婧 #AI #Paraformer #SenseVoice #鏁欑▼
鐧诲綍鍚庡彲鍙備笌琛ㄦ
璁ㄨ鍥炲
0 鏉″洖澶杩樻病鏈変汉鍥炲锛屽揩鏉ュ彂琛ㄤ綘鐨勭湅娉曞惂锛