基于 silero vad 的声纹提纯-深圳市維司達科技有限公司

支持：

提取干净人声
有人声总时长
无人声总时长
最大无人声区间时长

fromsilero_vadimportload_silero_vad,get_speech_timestamps,collect_chunksfromsrc.ultisimportload_audiodefpurified_voice(self,audio_source,sample_rate=16000,min_silence_duration_ms=700,speech_pad_ms=100,output_path=None):"""声音/声纹提纯 Args: audio_source (str | Path | bytes | np.ndarray | torch.Tensor): 支持路径、字节流、Numpy、Tensor. sample_rate (int, optional): Defaults to 16000. min_silence_duration_ms (int, optional): 最少静音持续时间, Defaults to 700. speech_pad_ms (int, optional): 人声边缘缓冲, Defaults to 100. output_path (str, optional): Defaults to None. Returns: Dict: - torch.Tensor: [1, t]. - bool: True 为有人声且提纯; False 为无人声返回原音. - float: 有人声时长 (秒). - float: 无人声时长 (秒). - float: 最大无人声区间时长 (秒). """waveform,sr,total_frames,duration_time=load_audio(audio_source,target_sr=sample_rate)ori_waveform=waveform.flatten()speech_timestamps=get_speech_timestamps(ori_waveform,self.vad_model,sampling_rate=sample_rate,threshold=0.5,min_silence_duration_ms=min_silence_duration_ms,speech_pad_ms=speech_pad_ms)speech_duration=sum(((seg["end"]-seg["start"])/sample_rateforseginspeech_timestamps),0.0)silence_duration=max(duration_time-speech_duration,0.0)max_silence_interval=0.0ifnotspeech_timestamps:max_silence_interval=duration_timeelse:max_silence_interval=max(max_silence_interval,speech_timestamps[0]["start"]/sample_rate)foriinrange(len(speech_timestamps)-1):gap=(speech_timestamps[i+1]["start"]-speech_timestamps[i]["end"])/sample_rate max_silence_interval=max(max_silence_interval,gap)max_silence_interval=max(max_silence_interval,(len(ori_waveform)-speech_timestamps[-1]["end"])/sample_rate)ifspeech_timestamps:purified_waveform=collect_chunks(speech_timestamps,ori_waveform)purified_waveform=purified_waveform[None,:]ifoutput_path:torchaudio.save(output_path,purified_waveform,sample_rate,encoding="PCM_S",bits_per_sample=16)ifself.debug:print(f"提纯完成, waveform shape:{ori_waveform.shape}, purified_waveform shape:{purified_waveform.shape}, 已成功保存至：{output_path}")return{"waveform":purified_waveform,"has_speech":True,"speech_duration":round(speech_duration,4),"silence_duration":round(silence_duration,4),"max_silence_interval":round(max_silence_interval,4),}else:ifself.debug:print("未在音频中检测到有效人声。")return{"waveform":ori_waveform[None,:],"has_speech":False,"speech_duration":0.0,"silence_duration":duration_time,"max_silence_interval":duration_time}

FastText方案——毫秒级文本分类实现___5

FastText 核心创新在于将子词（subword）信息引入神经网络语言模型，解决了传统 Word2Vec 无法处理未知词（OOV）的问题，同时显著提升了低频词的表示质量。算法：分层 Softmax N-gram 子词嵌入子词&a…

李华

如何快速上手AlienFX Tools：Alienware灯光、风扇和电源控制的终极指南

如何快速上手AlienFX Tools：Alienware灯光、风扇和电源控制的终极指南【免费下载链接】alienfx-tools Alienware systems lights, fans, and power control tools and apps 项目地址: https://gitcode.com/gh_mirrors/al/alienfx-tools 还在为Alienware Com…

李华

AI 正在「吃掉」数据库工具！从 SQLark 小百灵看智能数据库管理的三大趋势，写 SQL 的日子要结束了 - 微元算力(weytoken)

摘要：当大模型遇见数据库工具，一场静悄悄的革命正在发生。以 SQLark 小百灵 AI 为代表的智能数据库工具，正在将自然语言转 SQL、AI 辅助调试、仿真数据自动生成等能力变成标配。本文从 SQLark 的实际功能出发，深入分析 AI 驱动数据…

李华

如果你还把 Codex 理解成“AI 写代码工具”，这个认知已经有点旧了。截至 2026 年 6 月 23 日，OpenAI 官方资料里最新的 Codex 重点可以概括成一句话：Codex 正在从代码助手，升级成一个覆盖本地开发、云端执行、团队协作、知识工作和安全治理的工程 agent 平台。这篇文章…

李华

终极指南：为什么OCRmyPDF是扫描PDF文本识别的最佳选择

终极指南：为什么OCRmyPDF是扫描PDF文本识别的最佳选择【免费下载链接】OCRmyPDF OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched 项目地址: https://gitcode.com/GitHub_Trending/oc/OCRmyPDF 你是否曾经面对一堆扫描…

李华

OCRmyPDF技术决策框架：深度解析现代化PDF OCR处理引擎的架构优势与性能表现

OCRmyPDF技术决策框架：深度解析现代化PDF OCR处理引擎的架构优势与性能表现【免费下载链接】OCRmyPDF OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched 项目地址: https://gitcode.com/GitHub_Trending/oc/OCRmyPDF 在…

李华

FastText方案——毫秒级文本分类实现___5

如何快速上手AlienFX Tools：Alienware灯光、风扇和电源控制的终极指南

AI 正在「吃掉」数据库工具！从 SQLark 小百灵看智能数据库管理的三大趋势，写 SQL 的日子要结束了 - 微元算力(weytoken)

Codex 最新功能亮点：GPT-5.5、长周期任务、插件生态和安全扫描全面升级

终极指南：为什么OCRmyPDF是扫描PDF文本识别的最佳选择

OCRmyPDF技术决策框架：深度解析现代化PDF OCR处理引擎的架构优势与性能表现