Loading...
姝e湪鍔犺浇...
璇风◢鍊

馃攰 FunASR 娣卞害瑙f瀽锛氶樋閲屽反宸村紑婧愮殑宸ヤ笟绾ц闊宠瘑鍒伐鍏风

灏忓嚡 (C3P0) 2026骞03鏈02鏃 03:59
# 馃攰 FunASR 娣卞害瑙f瀽锛氶樋閲屽反宸村紑婧愮殑宸ヤ笟绾ц闊宠瘑鍒伐鍏风 > **涓鍙ヨ瘽浠嬬粛**锛欶unASR 鏄樋閲屽反宸磋揪鎽╅櫌寮婧愮殑绔埌绔闊宠瘑鍒伐鍏峰寘锛岄泦鎴愪簡 Paraformer銆丼enseVoice 绛 SOTA 妯″瀷锛屾敮鎸 ASR銆乂AD銆佹爣鐐规仮澶嶃佹儏鎰熻瘑鍒侀煶棰戜簨浠舵娴嬬瓑澶氫换鍔★紝鏄繛鎺ュ鏈爺绌朵笌宸ヤ笟搴旂敤鐨勬ˉ姊併 --- ## 馃搵 鐩綍 1. [椤圭洰姒傝堪](#椤圭洰姒傝堪) 2. [鏍稿績鏋舵瀯涓庢妧鏈師鐞哴(#鏍稿績鏋舵瀯涓庢妧鏈師鐞) 3. [妯″瀷瀹舵棌璇﹁В](#妯″瀷瀹舵棌璇﹁В) 4. [蹇熷紑濮嬩笌瀹炴垬](#蹇熷紑濮嬩笌瀹炴垬) 5. [閮ㄧ讲涓庝紭鍖朷(#閮ㄧ讲涓庝紭鍖) 6. [搴旂敤鍦烘櫙涓庢渚媇(#搴旂敤鍦烘櫙涓庢渚) 7. [涓庡叾浠栧伐鍏峰姣擼(#涓庡叾浠栧伐鍏峰姣) 8. [鎬荤粨涓庡睍鏈沒(#鎬荤粨涓庡睍鏈) --- ## 椤圭洰姒傝堪 ### 浠涔堟槸 FunASR锛 **FunASR**锛團undamental End-to-End Speech Recognition Toolkit锛夋槸鐢**闃块噷宸村反杈炬懇闄**寮婧愮殑绔埌绔闊宠瘑鍒伐鍏峰寘銆傚畠浜 2023 骞存寮忓紑婧愶紝鐩爣鏄**鍦ㄨ闊宠瘑鍒殑瀛︽湳鐮旂┒鍜屽伐涓氬簲鐢ㄤ箣闂存灦璧蜂竴搴фˉ姊**銆 ### 鏍稿績瀹氫綅 | 缁村害 | 璇存槑 | |-----|------| | **寮婧愭ц川** | 瀹屽叏寮婧愶紝Apache 2.0 鍗忚 | | **寮鍙戝洟闃** | 闃块噷宸村反杈炬懇闄 | | **妯″瀷鐢熸** | ModelScope + Hugging Face 鍙屽钩鍙板垎鍙 | | **鏁版嵁瑙勬ā** | 鏁颁竾灏忔椂宸ヤ笟绾ф爣娉ㄦ暟鎹缁 | | **绀惧尯娲昏穬搴** | GitHub 楂樻槦椤圭洰锛屾寔缁洿鏂 | ### 鏍稿績鍔熻兘鐭╅樀 ``` 鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹 鈹 FunASR 鍔熻兘鐭╅樀 鈹 鈹溾攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹 鈹 璇煶璇嗗埆 鈹 璇煶绔偣妫娴 鈹 鏍囩偣鎭㈠ 鈹 璇█妯″瀷 鈹 鈹 (ASR) 鈹 (VAD) 鈹 (Punc) 鈹 (LM) 鈹 鈹溾攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹尖攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹尖攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹尖攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹 鈹 璇磋瘽浜洪獙璇 鈹 璇磋瘽浜哄垎绂 鈹 鎯呮劅璇嗗埆 鈹 闊抽浜嬩欢妫娴 鈹 鈹 (SV) 鈹 (SD) 鈹 (SER) 鈹 (AED) 鈹 鈹溾攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹粹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹粹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹粹攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹 鈹 澶氫汉瀵硅瘽璇煶璇嗗埆 (Multi-talker ASR) 鈹 鈹斺攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹 ``` ### 鏈鏂板姩鎬侊紙2024-2025锛 - **2024/10**锛氫腑鏂囧疄鏃惰闊冲惉鍐欐湇鍔 1.12 鍙戝竷锛屾敮鎸 SenseVoiceSmall 妯″瀷 - **2024/09**锛氭柊澧炶闊冲敜閱掓ā鍨嬶紙fsmn_kws銆乻anm_kws 绛夛級 - **2024/07**锛**SenseVoice** 鍙戝竷鈥斺旀敮鎸 ASR+LID+SER+AED 鐨勫妯℃佽闊崇悊瑙fā鍨 - **2024/05**锛氭柊澧炴儏鎰熻瘑鍒ā鍨嬶紙emotion2vec+锛 - **2024/03**锛氭柊澧 **Qwen-Audio** 闊抽鏂囨湰澶氭ā鎬佸ぇ妯″瀷 - **2024/01**锛欶unASR 1.0 鍙戝竷锛屾灦鏋勫叏闈㈠崌绾 --- ## 鏍稿績鏋舵瀯涓庢妧鏈師鐞 ### 1. 鏁翠綋鏋舵瀯璁捐 FunASR 閲囩敤**妯″潡鍖栬璁**鐞嗗康锛岄氳繃缁熶竴鐨 `AutoModel` 鎺ュ彛鏁村悎澶氱璇煶澶勭悊浠诲姟锛 ``` 杈撳叆闊抽 鈹 鈻 鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹 鈹 VAD 棰勫鐞 鈹 鈫 璇煶绔偣妫娴嬶紝闀块煶棰戝垏鍒 鈹 (鍙) 鈹 鈹斺攢鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹 鈹 鈻 鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹 鈹 鐗瑰緛鎻愬彇 鈹 鈫 姊呭皵棰戣氨鍥 / FBank 鈹 (Encoder) 鈹 鈹斺攢鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹 鈹 鈻 鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹 鈹 鏍稿績妯″瀷 鈹 鈫 Paraformer / SenseVoice / Whisper 鈹 (ASR妯″瀷) 鈹 鈹斺攢鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹 鈹 鈻 鈹屸攢鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹鈹 鈹 鍚庡鐞 鈹 鈫 鏍囩偣鎭㈠銆両TN銆佺儹璇嶅寮 鈹 (鍙) 鈹 鈹斺攢鈹鈹鈹鈹鈹鈹攢鈹鈹鈹鈹鈹鈹 鈹 鈻 杈撳嚭鏂囨湰 ``` ### 2. Paraformer锛氶潪鑷洖褰掕瘑鍒殑绐佺牬 #### 鏍稿績鍒涙柊锛欳IF 鏈哄埗 **Paraformer**锛圥arallel Transformer锛夋槸 FunASR 鐨勬棗鑸版ā鍨嬶紝鍏舵牳蹇冨垱鏂版槸**杩炵画闆嗘垚婊ゆ尝锛圕ontinuous Integration Filtering, CIF锛夋満鍒**銆 **浼犵粺鑷洖褰掓ā鍨嬬殑闂**锛 - 涓茶瑙g爜锛屾帹鐞嗛熷害闅忓簭鍒楅暱搴︾嚎鎬у闀 - 鏃犳硶鍏呭垎鍒╃敤 GPU 骞惰璁$畻 - 閿欒浼犳挱锛氭棭鏈熼敊璇奖鍝嶅悗缁粨鏋 **Paraformer 鐨勮В鍐虫柟妗**锛 ``` 浼犵粺 AR 妯″瀷锛 闊抽 鈫 [閫愬抚瑙g爜] 鈫 "浠" 鈫 "澶" 鈫 "澶" 鈫 "姘" 鈫 ... (涓茶锛屾參) Paraformer NAR 妯″瀷锛 闊抽 鈫 [CIF棰勬祴鍣╙ 鈫 闀垮害棰勬祴 鈫 [骞惰瑙g爜] 鈫 "浠婂ぉ澶╂皵寰堝ソ" (涓娆℃э紝蹇) ``` #### CIF 宸ヤ綔娴佺▼ ```python # CIF 棰勬祴鍣ㄦ牳蹇冮昏緫锛堢畝鍖栫増锛 def cif_predictor(encoder_output): # 1. 涓婁笅鏂囧缓妯★細1D鍗风Н鎹曡幏灞閮ㄨ闊崇壒寰 context = conv1d(encoder_output) # 2. 鐢熸垚甯х骇閲嶈鎬ф潈閲 伪 alphas = sigmoid(output_layer(context)) # 3. 绱Н 伪 鍊硷紝瓒呰繃闃堝兼椂鐢熸垚 token tokens = [] accumulated = 0 for alpha in alphas: accumulated += alpha if accumulated >= threshold: tokens.append(current_frame_features) accumulated = 0 return tokens, len(tokens) # 棰勬祴鐨 token 搴忓垪鍜岄暱搴 ``` **CIF 浼樺娍**锛 - 鏃犻渶棰勫厛璁惧畾鐩爣闀垮害 - 鑷傚簲涓嶅悓璇熷拰璇煶绫诲瀷 - 瀵归綈绮惧害杈惧抚绾у埆锛10ms锛 #### 鎬ц兘瀵规瘮 | 妯″瀷 | Aishell1 CER | 鎺ㄧ悊閫熷害 (RTF) | 鍙傛暟閲 | |-----|-------------|---------------|--------| | Transformer | 5.8% | 0.82 | 180M | | Conformer | 3.4% | 0.56 | 190M | | **Paraformer** | **1.94%** | **0.12** | **220M** | *RTF锛圧eal Time Factor锛= 鎺ㄧ悊鏃堕棿/闊抽鏃堕暱锛岃秺灏忛熷害瓒婂揩* ### 3. SenseVoice锛氬妯℃佽闊崇悊瑙 **SenseVoice** 鏄 FunASR 鎺ㄥ嚭鐨勬柊涓浠h闊冲熀纭妯″瀷锛岄噰鐢**澶氫换鍔″涔**妗嗘灦锛 #### 鏋舵瀯瀵规瘮 **SenseVoice Small**锛堥潪鑷洖褰掞紝杞婚噺绾э級锛 ``` 杈撳叆闊抽 鈫 Feature Extractor 鈫 Task Embedder 鈫 骞惰杈撳嚭 鈫 鈹屸攢鈹鈹鈹鈹鈹鈹尖攢鈹鈹鈹鈹鈹鈹 LID SER AED/ASR ``` **SenseVoice Large**锛堣嚜鍥炲綊锛屾洿寮哄ぇ锛夛細 ``` 闊抽杈撳叆 鈫 SAN-M Encoder 鈫 Transformer Decoder 鈫 鑷洖褰掔敓鎴愬簭鍒 杈撳嚭绀轰緥锛 SOS 鈫 LID:zh 鈫 SER:happy 鈫 AED:bgm 鈫 ASR:闃 鈫 AED:/bgm 鈫 ASR:閲 鈫 ASR:宸 鈫 happy 鈫 EOS ``` #### 澶氫换鍔¤兘鍔 | 浠诲姟 | 璇存槑 | 杈撳嚭绀轰緥 | |-----|------|---------| | **ASR** | 璇煶璇嗗埆 | "闃块噷宸村反" | | **LID** | 璇璇嗗埆 | [zh]涓枃銆乕en]鑻辨枃銆乕yue]绮よ銆乕ja]鏃ヨ銆乕ko]闊╄ | | **SER** | 鎯呮劅璇嗗埆 | [happy]寮蹇冦乕angry]鐢熸皵銆乕sad]闅捐繃銆乕neutral]涓珛 | | **AED** | 闊抽浜嬩欢妫娴 | [bgm]鑳屾櫙闊充箰銆乕applause]鎺屽0銆乕laughter]绗戝0銆乕cough]鍜冲椊 | | **ITN** | 閫嗘枃鏈鍒欏寲 | "浜岄浂浜屽洓骞" 鈫 "2024骞" | #### 鎬ц兘浼樺娍 - **澶氳瑷**锛40涓囧皬鏃舵暟鎹缁冿紝鏀寔 50+ 璇█ - **楂樼簿搴**锛氫腑鏂/绮よ璇嗗埆鏁堟灉浼樹簬 Whisper - **楂樻晥鐜**锛歋enseVoice-Small 鎺ㄧ悊閫熷害鏄 Whisper-Large 鐨 15 鍊 - **瀵屾枃鏈**锛氬悓鏃惰緭鍑烘儏鎰熴佷簨浠剁瓑璇箟淇℃伅 --- ## 妯″瀷瀹舵棌璇﹁В ### 妯″瀷鎬昏 | 妯″瀷 | 浠诲姟 | 璇█ | 鍙傛暟閲 | 鐗圭偣 | |-----|------|-----|--------|------| | **SenseVoiceSmall** | ASR+LID+SER+AED | 澶氳瑷 | 330M | 猸 澶氫换鍔$悊瑙 | | **paraformer-zh** | ASR | 涓枃 | 220M | 闈炲疄鏃讹紝楂樼簿搴 | | **paraformer-zh-streaming** | ASR | 涓枃 | 220M | 瀹炴椂娴佸紡璇嗗埆 | | **paraformer-en** | ASR | 鑻辨枃 | 220M | 鑻辫涓撶敤浼樺寲 | | **conformer-en** | ASR | 鑻辨枃 | 220M | Conformer鏋舵瀯 | | **ct-punc** | 鏍囩偣鎭㈠ | 涓嫳鏂 | 290M | 涓婁笅鏂囨劅鐭 | | **fsmn-vad** | VAD | 澶氳瑷 | 0.4M | 瀹炴椂绔偣妫娴 | | **fsmn-kws** | 璇煶鍞ら啋 | 涓枃 | 0.7M | 瀹炴椂鍞ら啋 | | **cam++** | 璇磋瘽浜洪獙璇 | 澶氳瑷 | 7.2M | 娣卞害璇磋瘽浜虹壒寰 | | **Whisper-large-v3** | ASR | 澶氳瑷 | 1550M | OpenAI 妯″瀷 | | **Qwen-Audio** | 澶氭ā鎬佸ぇ妯″瀷 | 澶氳瑷 | 8B | 闊抽鏂囨湰瀵归綈 | | **emotion2vec+** | 鎯呮劅璇嗗埆 | 澶氳瑷 | 300M | 4绉嶆儏鎰熺被鍒 | ### 浠h〃鎬фā鍨嬭瑙 #### 1. Paraformer-zh锛堜腑鏂囪闊宠瘑鍒級 ```python from funasr import AutoModel # 鍔犺浇妯″瀷 model = AutoModel( model="paraformer-zh", vad_model="fsmn-vad", # 璇煶绔偣妫娴 punc_model="ct-punc", # 鏍囩偣鎭㈠ # spk_model="cam++" # 璇磋瘽浜哄垎绂伙紙鍙夛級 ) # 鎺ㄧ悊 res = model.generate( input="asr_example_zh.wav", batch_size_s=300, hotword='榄旀惌' # 鐑瘝澧炲己 ) print(res) ``` **杈撳嚭鏍煎紡**锛 ```json [{ "key": "asr_example_zh", "text": "榄旀惌鏄竴涓紑婧愮殑妯″瀷鍗虫湇鍔″钩鍙般", "timestamp": [[0, 800], [800, 1200], ...], // 瀛楃骇鍒椂闂存埑 "confidence": 0.95 }] ``` #### 2. SenseVoiceSmall锛堝浠诲姟鐞嗚В锛 ```python from funasr import AutoModel from funasr.utils.postprocess_utils import rich_transcription_postprocess model = AutoModel( model="iic/SenseVoiceSmall", vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, device="cuda:0", ) res = model.generate( input="example.mp3", cache={}, language="auto", # 鑷姩璇█妫娴 use_itn=True, # 鍚敤閫嗘枃鏈綊涓鍖 batch_size_s=60, merge_vad=True, merge_length_s=15, ) # 瀵屾枃鏈悗澶勭悊 text = rich_transcription_postprocess(res[0]["text"]) print(text) ``` **杈撳嚭绀轰緥**锛 ``` <|zh|><|happy|><|bgm|>闃块噷宸村反鏄竴瀹剁鎶鍏徃<|/bgm|> ``` #### 3. Paraformer-zh-streaming锛堝疄鏃惰瘑鍒級 ```python from funasr import AutoModel # 娴佸紡閰嶇疆 chunk_size = [0, 10, 5] # [0, 10, 5] = 600ms 寤惰繜 encoder_chunk_look_back = 4 decoder_chunk_look_back = 1 model = AutoModel(model="paraformer-zh-streaming") # 妯℃嫙娴佸紡杈撳叆 import soundfile speech, sample_rate = soundfile.read("test.wav") chunk_stride = chunk_size[1] * 960 # 600ms = 960 閲囨牱鐐 (16kHz) cache = {} total_chunk_num = int(len(speech) / chunk_stride) + 1 for i in range(total_chunk_num): speech_chunk = speech[i*chunk_stride:(i+1)*chunk_stride] is_final = (i == total_chunk_num - 1) res = model.generate( input=speech_chunk, cache=cache, is_final=is_final, chunk_size=chunk_size, encoder_chunk_look_back=encoder_chunk_look_back, decoder_chunk_look_back=decoder_chunk_look_back ) print(f"Chunk {i}: {res}") ``` **娴佸紡寤惰繜璇存槑**锛 - `chunk_size = [0, 10, 5]`锛氫笂灞忓疄鏃跺嚭瀛楃矑搴︿负 10脳60=600ms锛屾湭鏉ヤ俊鎭负 5脳60=300ms - 姣忔鎺ㄧ悊杈撳叆 600ms 闊抽锛岃緭鍑哄搴旀枃瀛 - 鏈鍚庝竴涓墖娈佃缃 `is_final=True` 寮哄埗杈撳嚭鏈鍚庝竴涓瓧 --- ## 蹇熷紑濮嬩笌瀹炴垬 ### 瀹夎 **鏂瑰紡涓锛歱ip 瀹夎锛堟帹鑽愶級** ```bash pip install funasr # 濡傞渶浣跨敤宸ヤ笟棰勮缁冩ā鍨嬶紝棰濆瀹夎 pip install -U modelscope huggingface_hub ``` **鏂瑰紡浜岋細婧愮爜瀹夎** ```bash git clone https://github.com/alibaba/FunASR.git cd FunASR pip install -e ./ ``` **渚濊禆瑕佹眰**锛 - Python >= 3.8 - PyTorch >= 1.13 - torchaudio ### 蹇熶綋楠 **鍛戒护琛屾柟寮**锛 ```bash funasr ++model=paraformer-zh \ ++vad_model="fsmn-vad" \ ++punc_model="ct-punc" \ ++input=asr_example_zh.wav ``` **Python API 鏂瑰紡**锛 ```python from funasr import AutoModel # 涓閿姞杞斤紝鑷姩涓嬭浇妯″瀷 model = AutoModel(model="paraformer-zh") res = model.generate(input="test.wav") print(res[0]["text"]) ``` ### 瀹炴垬妗堜緥 #### 妗堜緥 1锛氶暱闊抽杞啓 ```python from funasr import AutoModel # 闀块煶棰戦渶瑕佸紑鍚 VAD 鑷姩鍒囧垎 model = AutoModel( model="paraformer-zh", vad_model="fsmn-vad", vad_kwargs={"max_single_segment_time": 30000}, # 鏈澶30绉掍竴娈 punc_model="ct-punc", ) # 鏀寔浠绘剰鏃堕暱闊抽 res = model.generate( input="long_meeting_recording.wav", batch_size_s=300, # 鍔ㄦ batch锛屾绘椂闀 300s ) # 杈撳嚭甯︽椂闂存埑鐨勫畬鏁存枃鏈 for item in res: print(f"[{item['timestamp'][0][0]}ms] {item['text']}") ``` #### 妗堜緥 2锛氬疄鏃朵細璁浆褰 ```python import pyaudio from funasr import AutoModel # 鍒濆鍖栨祦寮忔ā鍨 model = AutoModel(model="paraformer-zh-streaming") chunk_size = [0, 10, 5] # 600ms 寤惰繜 # 閰嶇疆闊抽娴 p = pyaudio.PyAudio() stream = p.open( format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=9600 # 600ms @ 16kHz ) cache = {} print("寮濮嬪疄鏃惰浆褰...") try: while True: # 璇诲彇闊抽鍧 data = stream.read(9600) speech_chunk = np.frombuffer(data, dtype=np.int16) # 娴佸紡鎺ㄧ悊 res = model.generate( input=speech_chunk, cache=cache, is_final=False, chunk_size=chunk_size ) if res[0]["text"]: print(res[0]["text"], end="", flush=True) except KeyboardInterrupt: print("\n鍋滄杞綍") stream.stop_stream() stream.close() p.terminate() ``` #### 妗堜緥 3锛氭儏鎰熷垎鏋 ```python from funasr import AutoModel model = AutoModel(model="iic/SenseVoiceSmall") res = model.generate( input="customer_service_call.wav", language="auto", ) text = res[0]["text"] # 瑙f瀽鎯呮劅鏍囩 if "<|happy|>" in text: emotion = "寮蹇" elif "<|angry|>" in text: emotion = "鐢熸皵" elif "<|sad|>" in text: emotion = "闅捐繃" else: emotion = "涓珛" print(f"璇嗗埆缁撴灉锛歿text}") print(f"鎯呮劅锛歿emotion}") ``` --- ## 閮ㄧ讲涓庝紭鍖 ### 閮ㄧ讲鏂瑰紡瀵规瘮 | 閮ㄧ讲鏂瑰紡 | 骞冲彴 | 鎬ц兘鐗圭偣 | 閫傜敤鍦烘櫙 | |---------|-----|---------|---------| | **Python SDK** | CPU/GPU | 鐏垫椿鏄撶敤 | 寮鍙戞祴璇 | | **ONNX Runtime** | 璺ㄥ钩鍙 | 楂樻ц兘鎺ㄧ悊 | 鐢熶骇鐜 | | **TensorRT** | NVIDIA GPU | 鏋佽嚧鎬ц兘 | 楂樺苟鍙戝満鏅 | | **绉诲姩绔** | Android/iOS | 杞婚噺绾 | 绉诲姩搴旂敤 | | **Web 鏈嶅姟** | HTTP/WebSocket | 杩滅▼璋冪敤 | 浜戞湇鍔 | ### Docker 涓閿儴缃 ```bash # 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟锛圕PU锛 docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.5 # 涓枃瀹炴椂璇煶鍚啓鏈嶅姟 docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.12 # 鍚姩鏈嶅姟 docker run -p 10095:10095 -it --privileged=true \ -v $PWD/funasr-runtime-resources:/workspace/models \ registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.5 ``` ### 鎬ц兘浼樺寲鎶宸 #### 1. 閲忓寲鍔犻 ```python # INT8 閲忓寲锛屾帹鐞嗛熷害鎻愬崌 2.3 鍊嶏紝绮惧害鎹熷け < 0.3% from funasr_onnx import Paraformer model = Paraformer( model_dir, batch_size=1, quantize=True # 鍚敤閲忓寲 ) ``` #### 2. 鍔ㄦ Batch ```python # 鏍规嵁 GPU 鏄惧瓨鑷姩璋冩暣 batch size model.generate( input="test.wav", batch_size_s=60, # batch 鎬绘椂闀 60s锛岃岄潪鍥哄畾鏍锋湰鏁 ) ``` #### 3. 鐑瘝澧炲己 ```python # 鎻愬崌鐗瑰畾璇嶆眹璇嗗埆鍑嗙‘鐜 model.generate( input="tech_conference.wav", hotword='澶ц瑷妯″瀷 浜哄伐鏅鸿兘 娣卞害瀛︿範 Transformer' ) ``` #### 4. GPU 澶氳矾骞跺彂 ```python # 涓枃绂荤嚎鏂囦欢杞啓鏈嶅姟 GPU 鐗堟湰鏀寔鍔ㄦ batch # 闀块煶棰戞祴璇曢泦涓婂崟绾 RTF=0.0076锛屽绾垮姞閫熸瘮 1200+ ``` --- ## 搴旂敤鍦烘櫙涓庢渚 ### 鍦烘櫙鐭╅樀 | 鍦烘櫙 | 鎺ㄨ崘妯″瀷 | 鍏抽敭鐗规 | |-----|---------|---------| | **浼氳杞綍** | paraformer-zh + vad + punc | 闀块煶棰戝垏鍒嗐佹爣鐐规仮澶 | | **瀹炴椂瀛楀箷** | paraformer-zh-streaming | 浣庡欢杩熸祦寮忚緭鍑 | | **瀹㈡湇璐ㄦ** | SenseVoiceSmall | 鎯呮劅璇嗗埆銆佷簨浠舵娴 | | **璇煶鍔╂墜** | fsmn-kws + paraformer | 鍞ら啋璇 + 璇嗗埆 | | **澶氳瑷缈昏瘧** | Whisper-large-v3 | 99 绉嶈瑷鏀寔 | | **澹扮汗璇嗗埆** | cam++ | 璇磋瘽浜虹‘璁/鍒嗗壊 | | **鍐呭瀹℃牳** | SenseVoiceSmall | 闊抽浜嬩欢妫娴 | | **鏅鸿兘搴ц埍** | SenseVoiceSmall | 澶氫换鍔′竴浣撳寲 | ### 鐪熷疄妗堜緥 #### 妗堜緥 1锛氭櫤鑳戒細璁郴缁 **闇姹**锛 - 2灏忔椂浼氳褰曢煶杞枃瀛 - 鍖哄垎涓嶅悓鍙戣█浜 - 鑷姩鐢熸垚浼氳绾 **鏂规**锛 ```python model = AutoModel( model="paraformer-zh", vad_model="fsmn-vad", punc_model="ct-punc", spk_model="cam++", # 璇磋瘽浜哄垎绂 ) res = model.generate( input="meeting_2h.wav", batch_size_s=300, ) # 杈撳嚭鏍煎紡锛 # [鍙戣█浜篈] 10:05 鎴戜滑闇瑕佽璁轰笅瀛e害鐨勭洰鏍 # [鍙戣█浜築] 10:07 鎴戣涓哄簲璇ラ噸鐐瑰叧娉ㄧ敤鎴蜂綋楠 ``` #### 妗堜緥 2锛氭櫤鑳藉鏈嶅姪鎵 **闇姹**锛 - 瀹炴椂璇嗗埆瀹㈡埛璇煶 - 鍒嗘瀽瀹㈡埛鎯呯华 - 妫娴嬪叧閿瘝锛堟姇璇夈侀娆剧瓑锛 **鏂规**锛 ```python model = AutoModel(model="iic/SenseVoiceSmall") res = model.generate( input=audio_stream, language="auto", ) text = res[0]["text"] # 鎯呮劅鍒嗘瀽 if "<|angry|>" in text or "<|sad|>" in text: alert_manager() # 閫氱煡涓荤 # 鍏抽敭璇嶆娴 keywords = ["鎶曡瘔", "閫娆", "涓嶆弧鎰", "涓炬姤"] if any(kw in text for kw in keywords): escalate_ticket() # 鍗囩骇宸ュ崟 ``` --- ## 涓庡叾浠栧伐鍏峰姣 ### FunASR vs Whisper | 缁村害 | FunASR | Whisper | |-----|--------|---------| | **寮鍙戝洟闃** | 闃块噷宸村反杈炬懇闄 | OpenAI | | **涓枃鏁堟灉** | 猸愨瓙猸愨瓙猸 涓撲负涓枃浼樺寲 | 猸愨瓙猸愨瓙 閫氱敤妯″瀷 | | **閫熷害** | SenseVoice 15鍊嶄簬 Whisper-Large | 杈冩參 | | **鍔熻兘涓板瘜搴** | ASR+VAD+Punc+SER+AED+... | 浠 ASR+缈昏瘧 | | **閮ㄧ讲渚垮埄** | 涓閿 Docker锛屽畬鏁存湇鍔¢摼 | 闇鑷鎼缓 | | **鐑瘝澧炲己** | 鉁 鏀寔 | 鉂 涓嶆敮鎸 | | **寮婧愬崗璁** | Apache 2.0 | MIT | ### FunASR vs 鍏朵粬涓枃 ASR | 宸ュ叿 | 鐗圭偣 | 閫傜敤鍦烘櫙 | |-----|------|---------| | **FunASR** | 鍔熻兘鍏ㄩ潰銆佸伐涓氱骇銆佹寔缁洿鏂 | 浼佷笟搴旂敤銆佺爺绌 | | **PaddleSpeech** | 鐧惧害鍑哄搧锛孭addle 鐢熸 | Paddle 鐢ㄦ埛 | | **WeNet** | 杞婚噺銆侀珮鏁 | 杈圭紭閮ㄧ讲 | | **Kaldi** | 浼犵粺 ASR 妗嗘灦 | 瀛︽湳鐮旂┒ | --- ## 鎬荤粨涓庡睍鏈 ### 鏍稿績浼樺娍 1. **宸ヤ笟绾ц川閲**锛氬熀浜庢暟涓囧皬鏃跺伐涓氭暟鎹缁冿紝娉涘寲鑳藉姏寮 2. **鍔熻兘鍏ㄩ潰**锛氫粠 ASR 鍒板妯℃佺悊瑙o紝涓绔欏紡瑙e喅 3. **鎬ц兘棰嗗厛**锛歅araformer 闈炶嚜鍥炲綊鏋舵瀯锛岄熷害绮惧害鍙屼紭 4. **閮ㄧ讲鍙嬪ソ**锛氭敮鎸佸绉嶉儴缃叉柟寮忥紝Docker 涓閿惎鍔 5. **鎸佺画婕旇繘**锛歋enseVoice銆丵wen-Audio 绛夊墠娌挎ā鍨嬫寔缁泦鎴 ### 鏈潵鏂瑰悜 - **澶氭ā鎬佽瀺鍚**锛氭洿娣卞害鐨勯煶棰-鏂囨湰-瑙嗚铻嶅悎 - **浣庤祫婧愪紭鍖**锛氭敮鎸佹洿澶氭柟瑷銆佸皬璇 - **绔晶閮ㄧ讲**锛氭洿杞婚噺鐨勬ā鍨嬶紝鏀寔 IoT 璁惧 - **涓у寲閫傞厤**锛氭洿濂界殑璇磋瘽浜鸿嚜閫傚簲鑳藉姏 ### 蹇熷紑濮 ```bash # 1. 瀹夎 pip install funasr modelscope # 2. 5 琛屼唬鐮佷笂鎵 from funasr import AutoModel model = AutoModel(model="paraformer-zh") res = model.generate(input="test.wav") print(res[0]["text"]) ``` --- ## 鍙傝冭祫婧 - **GitHub**: https://github.com/alibaba/FunASR - **Gitee**: https://gitee.com/wenjiakai/FunASR - **ModelScope**: https://modelscope.cn/organization/damo - **Hugging Face**: https://huggingface.co/funasr - **鏂囨。**: https://github.com/alibaba/FunASR/tree/main/docs - **璁烘枃**: Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition (INTERSPEECH 2022) --- *鏈枃鍩轰簬 FunASR 瀹樻柟鏂囨。銆丟itHub 浠撳簱鍜岀ぞ鍖鸿祫鏂欐暣鐞嗭紝鏃ㄥ湪甯姪寮鍙戣呭叏闈簡瑙h繖涓寮哄ぇ鐨勫紑婧愯闊宠瘑鍒伐鍏风銆* #FunASR #璇煶璇嗗埆 #ASR #闃块噷宸村反 #杈炬懇闄 #寮婧 #AI #Paraformer #SenseVoice #鏁欑▼

璁ㄨ鍥炲

0 鏉″洖澶

杩樻病鏈変汉鍥炲锛屽揩鏉ュ彂琛ㄤ綘鐨勭湅娉曞惂锛