Emotion2Vec+输出结果详解：JSON和npy文件怎么用-深圳市維司達科技有限公司

Emotion2Vec+输出结果详解：JSON和npy文件怎么用

内容目录

为什么需要关注输出文件格式
result.json结构深度解析
embedding.npy使用全指南
实战：用Python处理情感识别结果
二次开发常见场景与代码模板
避坑指南：新手常犯的5个错误

为什么需要关注输出文件格式

你上传了一段3秒的语音，点击“开始识别”，几秒钟后界面上跳出一个笑脸和85.3%的置信度——看起来任务完成了。但如果你只是停留在这个层面，就错过了Emotion2Vec+系统最核心的价值。

真正的价值藏在两个不起眼的文件里：result.json和embedding.npy。它们不是简单的结果快照，而是通往二次开发的大门钥匙。

result.json是结构化决策依据：它把模型的“思考过程”变成可编程的字典，让你能写逻辑判断、做数据统计、接入业务系统
embedding.npy是音频的数学指纹：它把声音转化成一串数字，让相似语音可以计算距离、聚类分组、构建知识图谱

举个实际例子：某在线教育平台用这套系统分析10万条学生课堂发言录音。他们没停留在“快乐/悲伤”的标签上，而是用embedding.npy计算每节课的情感波动曲线，再结合出勤率、答题正确率做相关性分析——最终发现当学生情感得分连续3分钟低于0.4时，后续知识点掌握率下降37%。

这才是AI落地的真实模样：不是炫技，而是把模型能力编织进业务逻辑里。

result.json结构深度解析

基础字段逐行拆解

打开任意一次识别生成的result.json，你会看到这样的结构：

{ "emotion": "happy", "confidence": 0.853, "scores": { "angry": 0.012, "disgusted": 0.008, "fearful": 0.015, "happy": 0.853, "neutral": 0.045, "other": 0.023, "sad": 0.018, "surprised": 0.021, "unknown": 0.005 }, "granularity": "utterance", "timestamp": "2024-01-04 22:30:00" }

我们逐字段说明实际用途：

`emotion`和`confidence`

不是简单标签："happy"是模型综合所有9个维度得分后选出的最优解，0.853代表这个结论的确定程度
业务逻辑切入点：比如客服质检系统可设定规则——当confidence < 0.6时自动转人工复核

`scores`对象

隐藏的黄金数据：9个数值总和恒为1.0，每个值代表该情感在当前语音中的“存在感”
混合情感识别关键：看happy:0.853的同时，surprised:0.021虽小但非零，可能暗示“惊喜式快乐”，这在广告效果分析中至关重要

`granularity`

粒度决定分析深度：
- "utterance"（整句级）：适合快速判断整体情绪倾向
- "frame"（帧级）：生成时间序列数组，可绘制情感变化折线图，识别“先愤怒后平静”的情绪转折点

`timestamp`

时间戳不只是记录：配合批量处理时的目录名outputs_20240104_223000/，能精准追溯每条数据的生成环境

进阶用法：从JSON到业务规则引擎

假设你要构建一个会议情绪监测系统，需要实时预警“负面情绪聚集”。这时不能只看单次emotion字段，而要设计复合规则：

import json def analyze_meeting_emotion(json_path): with open(json_path, 'r') as f: data = json.load(f) # 规则1：高置信度负面情绪立即预警 if data['emotion'] in ['angry', 'fearful', 'sad'] and data['confidence'] > 0.7: return "CRITICAL_ALERT" # 规则2：中性情绪占比过高（>60%）表示参与度低 if data['scores']['neutral'] > 0.6: return "LOW_ENGAGEMENT" # 规则3：多种负面情绪得分均>0.15，提示潜在冲突 negative_scores = [data['scores'][e] for e in ['angry', 'disgusted', 'fearful', 'sad']] if all(s > 0.15 for s in negative_scores): return "POTENTIAL_CONFLICT" return "NORMAL" # 调用示例 alert_level = analyze_meeting_emotion("outputs_20240104_223000/result.json") print(f"会议情绪等级：{alert_level}")

这种基于JSON字段的规则引擎，比单纯依赖UI界面展示强大得多。

embedding.npy使用全指南

理解这个文件的本质

当你勾选“提取Embedding特征”时，系统做的不是简单保存音频，而是运行一个深度神经网络，把16kHz的WAV文件压缩成一个固定长度的向量。这个向量就像人的DNA——不同语音的向量距离，直接反映情感语义的相似度。

技术细节：

文件格式：NumPy二进制（.npy）
维度：Emotion2Vec+ Large模型输出1024维向量
数据类型：float32（32位浮点数，平衡精度与存储）

读取与基础操作

import numpy as np # 1. 读取embedding embedding = np.load('outputs_20240104_223000/embedding.npy') print(f"向量形状：{embedding.shape}") # 输出：(1024,) print(f"数据类型：{embedding.dtype}") # 输出：float32 # 2. 验证向量有效性（避免空向量） if np.all(embedding == 0): print("警告：检测到空embedding，请检查音频质量") # 3. 计算L2范数（向量长度） norm = np.linalg.norm(embedding) print(f"L2范数：{norm:.4f}") # 正常值应在0.8-1.2区间

核心应用场景实战

场景1：语音情感聚类分析

from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler import matplotlib.pyplot as plt # 假设你有100个音频的embedding文件 embeddings = [] for i in range(100): emb = np.load(f'outputs_{i:06d}/embedding.npy') embeddings.append(emb) X = np.array(embeddings) # 形状：(100, 1024) # 标准化（重要！避免维度量纲影响） scaler = StandardScaler() X_scaled = scaler.fit_transform(X) # K-means聚类（k=5代表预设5类情感模式） kmeans = KMeans(n_clusters=5, random_state=42, n_init=10) labels = kmeans.fit_predict(X_scaled) # 可视化（降维到2D） from sklearn.decomposition import PCA pca = PCA(n_components=2) X_pca = pca.fit_transform(X_scaled) plt.scatter(X_pca[:, 0], X_pca[:, 1], c=labels, cmap='viridis') plt.title('100段语音情感模式聚类') plt.xlabel(f'PCA1 ({pca.explained_variance_ratio_[0]:.2%}方差)') plt.ylabel(f'PCA2 ({pca.explained_variance_ratio_[1]:.2%}方差)') plt.colorbar() plt.show() print(f"聚类中心数量：{kmeans.cluster_centers_.shape[0]}")

这段代码会把100段语音按情感模式自动分成5组，每组内语音的embedding距离最近——这比人工听辨100段录音高效太多。

场景2：构建情感相似度搜索

from sklearn.metrics.pairwise import cosine_similarity def find_similar_voices(target_embedding, all_embeddings, top_k=3): """ 找出与目标语音最相似的K段语音 Args: target_embedding: 目标语音的embedding向量 (1024,) all_embeddings: 所有语音embedding矩阵 (N, 1024) top_k: 返回最相似的前K个 Returns: list: [(索引, 相似度), ...] """ # 计算余弦相似度（值域[-1,1]，1最相似） similarities = cosine_similarity( target_embedding.reshape(1, -1), all_embeddings ).flatten() # 获取top_k索引 top_indices = np.argsort(similarities)[::-1][:top_k] return [(i, similarities[i]) for i in top_indices] # 使用示例 target_emb = np.load('target_voice/embedding.npy') all_embs = np.stack([np.load(f'voice_{i}/embedding.npy') for i in range(50)]) similar_pairs = find_similar_voices(target_emb, all_embs) for idx, score in similar_pairs: print(f"相似语音 {idx}: 余弦相似度 {score:.4f}")

这个功能在客服质检中特别实用：当发现一段“愤怒”语音时，快速找出其他具有相似声学特征的语音，批量复查是否存在系统性服务问题。

实战：用Python处理情感识别结果

批量处理脚本模板

import os import json import numpy as np import pandas as pd from datetime import datetime from pathlib import Path def process_batch_results(base_dir="outputs"): """ 批量处理所有outputs子目录下的识别结果 Returns: pd.DataFrame: 包含所有语音分析结果的DataFrame """ results = [] # 遍历所有outputs_YYYYMMDD_HHMMSS目录 for output_dir in Path(base_dir).glob("outputs_*"): if not output_dir.is_dir(): continue json_path = output_dir / "result.json" npy_path = output_dir / "embedding.npy" if not json_path.exists(): continue # 读取JSON结果 with open(json_path, 'r') as f: data = json.load(f) # 提取基础信息 result = { 'timestamp': data.get('timestamp', ''), 'emotion': data['emotion'], 'confidence': data['confidence'], 'granularity': data['granularity'], } # 展开scores字段为独立列 for emotion, score in data['scores'].items(): result[f'score_{emotion}'] = score # 添加embedding信息（如果存在） if npy_path.exists(): emb = np.load(npy_path) result['embedding_norm'] = np.linalg.norm(emb) result['embedding_dim'] = len(emb) else: result['embedding_norm'] = None result['embedding_dim'] = None results.append(result) return pd.DataFrame(results) # 执行批量处理 df = process_batch_results() print(f"成功处理 {len(df)} 条语音记录") print(df.head()) # 保存为CSV便于后续分析 df.to_csv('batch_analysis_results.csv', index=False, encoding='utf-8-sig')

运行后你会得到一个结构化表格，包含所有语音的详细情感得分，可直接用Excel或BI工具做可视化分析。

情感趋势分析（时间序列）

import matplotlib.pyplot as plt import seaborn as sns # 假设df已加载（来自上一步） df['datetime'] = pd.to_datetime(df['timestamp']) # 按小时聚合情感分布 hourly_stats = df.groupby(df['datetime'].dt.hour).agg({ 'confidence': ['mean', 'std'], 'score_happy': 'mean', 'score_angry': 'mean', 'score_sad': 'mean' }).round(3) # 绘制情感趋势图 fig, axes = plt.subplots(2, 1, figsize=(12, 10)) # 置信度趋势 axes[0].plot(hourly_stats.index, hourly_stats[('confidence', 'mean')], marker='o', label='平均置信度') axes[0].fill_between(hourly_stats.index, hourly_stats[('confidence', 'mean')] - hourly_stats[('confidence', 'std')], hourly_stats[('confidence', 'mean')] + hourly_stats[('confidence', 'std')], alpha=0.2) axes[0].set_ylabel('置信度') axes[0].set_title('每小时情感识别置信度趋势') axes[0].legend() # 主要情感占比 emotions = ['happy', 'angry', 'sad'] for emo in emotions: axes[1].plot(hourly_stats.index, hourly_stats[f'score_{emo}'], marker='s', label=f'{emo.capitalize()}') axes[1].set_xlabel('小时（24小时制）') axes[1].set_ylabel('情感得分') axes[1].set_title('主要情感得分小时分布') axes[1].legend() plt.tight_layout() plt.show() # 输出关键洞察 peak_happy_hour = hourly_stats['score_happy'].idxmax() print(f"最快乐时段：{peak_happy_hour}:00-{peak_happy_hour+1}:00")

这个分析能帮你发现业务规律，比如客服团队在下午3点后快乐得分显著下降，可能需要调整排班。

二次开发常见场景与代码模板

场景1：Web API封装（FastAPI）

from fastapi import FastAPI, UploadFile, File, HTTPException from pydantic import BaseModel import subprocess import json import os from pathlib import Path app = FastAPI(title="Emotion2Vec+ API", version="1.0") class EmotionResult(BaseModel): emotion: str confidence: float scores: dict granularity: str @app.post("/analyze", response_model=EmotionResult) async def analyze_audio(file: UploadFile = File(...)): # 1. 保存上传文件 upload_dir = Path("temp_uploads") upload_dir.mkdir(exist_ok=True) file_path = upload_dir / file.filename with open(file_path, "wb") as buffer: buffer.write(await file.read()) try: # 2. 调用本地WebUI识别（需确保WebUI正在运行） # 这里用curl模拟WebUI的API调用（实际需根据WebUI暴露的API调整） # 由于WebUI是Gradio界面，更推荐用Gradio Client方式调用 from gradio_client import Client client = Client("http://localhost:7860") # 3. 调用识别（注意：实际需适配WebUI的输入输出格式） result = client.predict( file_path, # 音频文件路径 "utterance", # 粒度 True, # 是否导出embedding api_name="/predict" ) # 4. 解析result.json（假设WebUI返回了文件路径） result_json = json.load(open("outputs/latest/result.json")) return EmotionResult(**result_json) except Exception as e: raise HTTPException(status_code=500, detail=str(e)) finally: # 清理临时文件 if file_path.exists(): file_path.unlink()

场景2：与企业微信机器人集成

import requests import json def send_to_wechat_work(emotion_data, webhook_url): """ 将情感分析结果发送到企业微信机器人 Args: emotion_data: result.json解析后的字典 webhook_url: 企业微信机器人webhook地址 """ # 构建消息卡片 message = { "msgtype": "template_card", "template_card": { "card_type": "text_notice", "source": { "icon_url": "https://example.com/emotion-icon.png", "desc": "Emotion2Vec+ 分析结果", "desc_color": 0 }, "main_title": { "title": f"检测到 {emotion_data['emotion'].capitalize()} 情绪" }, "emphasis_content": { "title": f"{emotion_data['confidence']*100:.1f}%", "desc": "置信度" }, "quote_area": { "type": 0, "url": "", "appid": "", "pagepath": "" }, "horizontal_content_list": [ { "keyname": "快乐", "value": f"{emotion_data['scores']['happy']:.2%}" }, { "keyname": "愤怒", "value": f"{emotion_data['scores']['angry']:.2%}" }, { "keyname": "悲伤", "value": f"{emotion_data['scores']['sad']:.2%}" } ], "jump_list": [], "card_action": { "type": 1, "url": "http://localhost:7860" } } } # 发送消息 response = requests.post(webhook_url, json=message) return response.status_code == 200 # 使用示例 with open("outputs_20240104_223000/result.json") as f: data = json.load(f) send_to_wechat_work(data, "https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=xxx")

场景3：离线批量处理（无WebUI依赖）

from emotion2vec_plus import Emotion2VecPlus # 假设官方提供Python SDK # 初始化模型（需提前下载模型权重） model = Emotion2VecPlus( model_path="/root/models/emotion2vec_plus_large", device="cuda" # 或 "cpu" ) def offline_analyze(audio_path, granularity="utterance"): """离线模式下直接调用模型""" # 加载音频（支持wav/mp3等） waveform, sample_rate = model.load_audio(audio_path) # 模型推理 result = model.inference( waveform=waveform, sample_rate=sample_rate, granularity=granularity, return_embedding=True ) # 保存结果 output_dir = Path("offline_outputs") / datetime.now().strftime("outputs_%Y%m%d_%H%M%S") output_dir.mkdir(parents=True) # 保存JSON with open(output_dir / "result.json", "w") as f: json.dump(result, f, indent=2, ensure_ascii=False) # 保存embedding if "embedding" in result: np.save(output_dir / "embedding.npy", result["embedding"]) return result # 处理单个文件 result = offline_analyze("test.wav") print(f"离线分析结果：{result['emotion']} ({result['confidence']:.2%})")

避坑指南：新手常犯的5个错误

错误1：直接用字符串比较emotion字段

错误写法：

# 危险！大小写敏感且易拼错 if data['emotion'] == 'Happy': do_something()

正确做法：

# 使用预定义常量 EMOTION_MAP = { 'angry': '愤怒', 'disgusted': '厌恶', 'fearful': '恐惧', 'happy': '快乐', 'neutral': '中性', 'other': '其他', 'sad': '悲伤', 'surprised': '惊讶', 'unknown': '未知' } # 安全比较 if data['emotion'] in EMOTION_MAP: chinese_label = EMOTION_MAP[data['emotion']] print(f"中文标签：{chinese_label}")

错误2：忽略granularity导致数据错乱

典型问题：
当选择frame粒度时，result.json结构完全不同，scores变成二维数组，emotion字段消失。

正确处理：

def safe_parse_result(json_path): with open(json_path) as f: data = json.load(f) if data['granularity'] == 'utterance': # 整句级处理 return { 'type': 'utterance', 'emotion': data['emotion'], 'confidence': data['confidence'], 'scores': data['scores'] } else: # frame级 # frame级数据结构示例： # { # "frame_scores": [[0.1,0.2,...], [0.05,0.85,...], ...], # "frame_emotions": ["neutral", "happy", ...], # "frame_confidences": [0.92, 0.87, ...] # } return { 'type': 'frame', 'frame_scores': data['frame_scores'], 'frame_emotions': data['frame_emotions'], 'frame_confidences': data['frame_confidences'] } # 使用 parsed = safe_parse_result("result.json") if parsed['type'] == 'frame': print(f"共{len(parsed['frame_emotions'])}帧分析结果")

错误3：未验证embedding有效性就计算

风险：某些音频因质量问题导致embedding全为零，后续计算会出错。

防御性编程：

def load_embedding_safely(npy_path): try: emb = np.load(npy_path) # 检查是否为空向量 if np.all(emb == 0): raise ValueError("embedding全为零，音频可能损坏") # 检查维度是否正确 if emb.shape[0] != 1024: raise ValueError(f"embedding维度错误：期望1024，得到{emb.shape[0]}") # 检查数值范围（避免NaN或无穷大） if not np.isfinite(emb).all(): raise ValueError("embedding包含NaN或无穷大值") return emb except Exception as e: print(f"加载embedding失败：{e}") return None # 使用 embedding = load_embedding_safely("embedding.npy") if embedding is not None: # 安全执行后续操作 similarity = np.dot(embedding, other_embedding)

错误4：JSON中文乱码

问题原因：Python默认用ASCII编码写入JSON，中文会转成\uXXXX格式。

解决方案：

# 默认写法（中文显示为\uXXXX） with open("result.json", "w") as f: json.dump(data, f) # 正确写法（保留中文） with open("result.json", "w", encoding="utf-8") as f: json.dump(data, f, ensure_ascii=False, indent=2)

错误5：忽略时间戳时区问题

隐患：timestamp字段是本地时间，跨时区部署时可能导致时间错乱。

最佳实践：

from datetime import datetime, timezone # 生成带时区的时间戳（推荐） def get_utc_timestamp(): return datetime.now(timezone.utc).isoformat() # 在result.json中存储UTC时间 result_data = { "timestamp_utc": get_utc_timestamp(), "timestamp_local": datetime.now().isoformat(), # 保留本地时间供参考 # ... 其他字段 } # 读取时统一转为UTC def parse_timestamp(timestamp_str): try: # 尝试解析带时区的时间戳 dt = datetime.fromisoformat(timestamp_str.replace('Z', '+00:00')) return dt.astimezone(timezone.utc) except: # 降级处理：假设为本地时间并转UTC（需知道本地时区） dt = datetime.fromisoformat(timestamp_str) return dt.astimezone(timezone.utc)

获取更多AI镜像
想探索更多AI镜像和应用场景？访问 CSDN星图镜像广场，提供丰富的预置镜像，覆盖大模型推理、图像生成、视频生成、模型微调等多个领域，支持一键部署。

Emotion2Vec+输出结果详解：JSON和npy文件怎么用