移动asr模型路径，虚拟环境改用conda

2026-05-06 10:40:05 +08:00 · 2026-05-06 10:40:05 +08:00 · bdfce46d7c
parent cebfddd13b
commit bdfce46d7c
5 changed files with 34 additions and 114 deletions
--- a/CLEANUP_SUMMARY.md
+++ b/CLEANUP_SUMMARY.md
@ -1,107 +0,0 @@
-# 代码清理总结
-
-## 清理目标
-以 `main.py` 的流程为主，删除其他文件中的未使用代码。
-
-## 当前流程（main.py）
-
-```
-阶段 1: 说话人分离 (3D-Speaker)
-  ↓
-清理 CUDA 缓存
-  ↓
-阶段 2: ASR 识别 + 合并结果
-```
-
-## 清理内容
-
-### 1. asr_service.py
-
-**删除的功能**:
- ❌ `use_3d_speaker` 参数及相关逻辑（已在 main.py 中手动处理）
- ❌ `_merge_diarization_segments()` 方法（未使用）
- ❌ `_map_asr_to_speaker()` 方法（已在 main.py 中内联实现）
-
-**保留的功能**:
- ✅ `recognize()` - 基础 ASR 识别
- ✅ `_parse_result()` - 解析识别结果
- ✅ `export_to_json()` / `export_to_srt()` - 导出功能
-
-**修改说明**:
- ASR 识别结果中的默认说话人统一设为 `speaker_0`
- 不再在 ASR 服务内部调用 3D-Speaker
-
-### 2. map_speaker.py
-
-**删除的功能**:
- ❌ `find_speaker()` 函数（已在 main.py 中内联实现）
- ❌ `main()` 函数（过时的示例代码）
-
-**保留的功能**:
- ✅ `load_json()` - 加载 JSON 文件
- ✅ `save_json()` - 保存 JSON 文件
-
-### 3. example_usage.py
-
-**修改内容**:
- 更新输出示例中的说话人格式：`SPEAKER_00` → `speaker_0`
- 保持示例代码的参考价值
-
-### 4. 删除的文件
-
- ❌ `test_staged.py` - 临时测试脚本
- ❌ `test_model_load.py` - 临时测试脚本
-
-## 核心逻辑（main.py）
-
-### 阶段 1: 说话人分离
-```python
-diar_service = DiarizationService()
-diar_service._load_model()
-
-for video in videos:
-    wav = extract_wav(video)
-    segments = diar_service.diarize(wav)
-    save_json(temp_file, {"segments": segments})
-```
-
-### 阶段 2: ASR 识别 + 合并
-```python
-asr_service = ASRService()
-asr_service._load_model()
-
-for video in videos:
-    asr_sentences = asr_service.recognize(wav)
-    
-    # 合并说话人（只使用 3D-Speaker 结果）
-    for sentence in asr_sentences:
-        matched_speaker = 查找最大重叠的说话人
-        if matched_speaker:
-            sentence.speaker = matched_speaker
-        else:
-            sentence.speaker = "speaker_0"
-    
-    export_to_json(output_file, asr_sentences)
-```
-
-## 优势
-
-1. **逻辑清晰**: 只在一个地方（main.py）处理说话人合并
-2. **避免重复**: 删除了多处重复的说话人对齐逻辑
-3. **易于维护**: 核心流程集中在 main.py，服务类只负责基础功能
-4. **统一格式**: 所有说话人标签统一为 `speaker_0`, `speaker_1`, ...
-
-## 文件依赖关系
-
-```
-main.py
-  ├── asr_service.py (基础 ASR 识别)
-  ├── diarization_service.py (说话人分离)
-  └── map_speaker.py (JSON 工具函数)
-```
-
-## 未清理的文件
-
- `server.py` - Web API 服务（独立功能）
- `test_asr.py` - 测试脚本（可保留）
- `example_usage.py` - 示例代码（已更新）
--- a/INSTALL.md
+++ b/INSTALL.md
@ -30,12 +30,12 @@ source funasr_env/bin/activate

 **CUDA 11.8:**
 ```bash
-pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
+pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
 ```

 **CPU 版本:**
 ```bash
-pip install torch torchvision torchaudio
+pip install torch torchaudio
 ```

 ### 4. 安装 3D-Speaker
--- a/README.md
+++ b/README.md
@ -1,3 +1,31 @@
+**获取miniconda安装包**
+Invoke-WebRequest -Uri "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -OutFile ".\Miniconda3-latest-Windows-x86_64.exe"
+
+**激活conda仓库**
+conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
+conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
+conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/msys2
+
+**在目录下进入conda环境**
+%WINDIR%\System32\cmd.exe "/K" D:\ProgramData\miniconda3\Scripts\activate.bat D:\ProgramData\miniconda3
+
+**创建虚拟环境**
+conda create -n audio python=3.10 -y
+**激活**
+conda activate audio
+**取消激活**
+conda deactivate
+
+**conda中执行**
+pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
+pip install -r requirements.txt
+
+**使用audio虚拟环境的python解释器**
+D:\ProgramData\miniconda3\envs\audio\python.exe
+
+
+
+
 # FunASR 语音识别服务

 基于阿里达摩院 [FunASR](https://github.com/alibaba-damo-academy/FunASR) 的本地语音识别解决方案。
--- a/asr_service.py
+++ b/asr_service.py
@ -6,7 +6,7 @@ FunASR 语音识别服务
 import os
 import sys

-MODEL_CACHE_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "models")
+MODEL_CACHE_DIR = os.path.dirname(os.path.abspath(__file__))
 os.makedirs(MODEL_CACHE_DIR, exist_ok=True)
 os.environ["MODELSCOPE_CACHE"] = MODEL_CACHE_DIR
 os.environ["FUNASR_MODELS_DIR"] = MODEL_CACHE_DIR
--- a/requirements.txt
+++ b/requirements.txt
@ -10,7 +10,6 @@
 # ---------- 核心框架 ----------
 torch>=2.7.0
 torchaudio>=2.7.0
-torchvision>=0.22.0

 # ---------- FunASR 语音识别 ----------
 funasr>=1.3.0
@ -20,13 +19,13 @@ transformers>=5.7.0
 # ---------- 3D-Speaker 说话人分离 ----------
 # 注意：3D-Speaker 需要手动克隆到项目目录
 # git clone https://github.com/alibaba-damo-academy/3D-Speaker.git
-speakerlab>=1.0.0
+speakerlab>=0.0.6

 # ---------- 音频处理 ----------
 soundfile>=0.12.0
 librosa>=0.11.0
 scipy>=1.15.0
-numpy>=2.2.0
+numpy>=1.26.0

 # ---------- 机器学习基础库 ----------
 scikit-learn>=1.7.0
@ -44,7 +43,7 @@ lightning>=2.6.0
 pyannote.audio>=3.4.0

 # ---------- 数据处理 ----------
-datasets>=4.8.0
+datasets>=3.0.0,<4.0.0
 pyarrow>=24.0.0
 sentencepiece>=0.2.1