移动asr模型路径,虚拟环境改用conda
This commit is contained in:
parent
cebfddd13b
commit
bdfce46d7c
|
|
@ -1,107 +0,0 @@
|
||||||
# 代码清理总结
|
|
||||||
|
|
||||||
## 清理目标
|
|
||||||
以 `main.py` 的流程为主,删除其他文件中的未使用代码。
|
|
||||||
|
|
||||||
## 当前流程(main.py)
|
|
||||||
|
|
||||||
```
|
|
||||||
阶段 1: 说话人分离 (3D-Speaker)
|
|
||||||
↓
|
|
||||||
清理 CUDA 缓存
|
|
||||||
↓
|
|
||||||
阶段 2: ASR 识别 + 合并结果
|
|
||||||
```
|
|
||||||
|
|
||||||
## 清理内容
|
|
||||||
|
|
||||||
### 1. asr_service.py
|
|
||||||
|
|
||||||
**删除的功能**:
|
|
||||||
- ❌ `use_3d_speaker` 参数及相关逻辑(已在 main.py 中手动处理)
|
|
||||||
- ❌ `_merge_diarization_segments()` 方法(未使用)
|
|
||||||
- ❌ `_map_asr_to_speaker()` 方法(已在 main.py 中内联实现)
|
|
||||||
|
|
||||||
**保留的功能**:
|
|
||||||
- ✅ `recognize()` - 基础 ASR 识别
|
|
||||||
- ✅ `_parse_result()` - 解析识别结果
|
|
||||||
- ✅ `export_to_json()` / `export_to_srt()` - 导出功能
|
|
||||||
|
|
||||||
**修改说明**:
|
|
||||||
- ASR 识别结果中的默认说话人统一设为 `speaker_0`
|
|
||||||
- 不再在 ASR 服务内部调用 3D-Speaker
|
|
||||||
|
|
||||||
### 2. map_speaker.py
|
|
||||||
|
|
||||||
**删除的功能**:
|
|
||||||
- ❌ `find_speaker()` 函数(已在 main.py 中内联实现)
|
|
||||||
- ❌ `main()` 函数(过时的示例代码)
|
|
||||||
|
|
||||||
**保留的功能**:
|
|
||||||
- ✅ `load_json()` - 加载 JSON 文件
|
|
||||||
- ✅ `save_json()` - 保存 JSON 文件
|
|
||||||
|
|
||||||
### 3. example_usage.py
|
|
||||||
|
|
||||||
**修改内容**:
|
|
||||||
- 更新输出示例中的说话人格式:`SPEAKER_00` → `speaker_0`
|
|
||||||
- 保持示例代码的参考价值
|
|
||||||
|
|
||||||
### 4. 删除的文件
|
|
||||||
|
|
||||||
- ❌ `test_staged.py` - 临时测试脚本
|
|
||||||
- ❌ `test_model_load.py` - 临时测试脚本
|
|
||||||
|
|
||||||
## 核心逻辑(main.py)
|
|
||||||
|
|
||||||
### 阶段 1: 说话人分离
|
|
||||||
```python
|
|
||||||
diar_service = DiarizationService()
|
|
||||||
diar_service._load_model()
|
|
||||||
|
|
||||||
for video in videos:
|
|
||||||
wav = extract_wav(video)
|
|
||||||
segments = diar_service.diarize(wav)
|
|
||||||
save_json(temp_file, {"segments": segments})
|
|
||||||
```
|
|
||||||
|
|
||||||
### 阶段 2: ASR 识别 + 合并
|
|
||||||
```python
|
|
||||||
asr_service = ASRService()
|
|
||||||
asr_service._load_model()
|
|
||||||
|
|
||||||
for video in videos:
|
|
||||||
asr_sentences = asr_service.recognize(wav)
|
|
||||||
|
|
||||||
# 合并说话人(只使用 3D-Speaker 结果)
|
|
||||||
for sentence in asr_sentences:
|
|
||||||
matched_speaker = 查找最大重叠的说话人
|
|
||||||
if matched_speaker:
|
|
||||||
sentence.speaker = matched_speaker
|
|
||||||
else:
|
|
||||||
sentence.speaker = "speaker_0"
|
|
||||||
|
|
||||||
export_to_json(output_file, asr_sentences)
|
|
||||||
```
|
|
||||||
|
|
||||||
## 优势
|
|
||||||
|
|
||||||
1. **逻辑清晰**: 只在一个地方(main.py)处理说话人合并
|
|
||||||
2. **避免重复**: 删除了多处重复的说话人对齐逻辑
|
|
||||||
3. **易于维护**: 核心流程集中在 main.py,服务类只负责基础功能
|
|
||||||
4. **统一格式**: 所有说话人标签统一为 `speaker_0`, `speaker_1`, ...
|
|
||||||
|
|
||||||
## 文件依赖关系
|
|
||||||
|
|
||||||
```
|
|
||||||
main.py
|
|
||||||
├── asr_service.py (基础 ASR 识别)
|
|
||||||
├── diarization_service.py (说话人分离)
|
|
||||||
└── map_speaker.py (JSON 工具函数)
|
|
||||||
```
|
|
||||||
|
|
||||||
## 未清理的文件
|
|
||||||
|
|
||||||
- `server.py` - Web API 服务(独立功能)
|
|
||||||
- `test_asr.py` - 测试脚本(可保留)
|
|
||||||
- `example_usage.py` - 示例代码(已更新)
|
|
||||||
|
|
@ -30,12 +30,12 @@ source funasr_env/bin/activate
|
||||||
|
|
||||||
**CUDA 11.8:**
|
**CUDA 11.8:**
|
||||||
```bash
|
```bash
|
||||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
|
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
|
||||||
```
|
```
|
||||||
|
|
||||||
**CPU 版本:**
|
**CPU 版本:**
|
||||||
```bash
|
```bash
|
||||||
pip install torch torchvision torchaudio
|
pip install torch torchaudio
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. 安装 3D-Speaker
|
### 4. 安装 3D-Speaker
|
||||||
|
|
|
||||||
28
README.md
28
README.md
|
|
@ -1,3 +1,31 @@
|
||||||
|
**获取miniconda安装包**
|
||||||
|
Invoke-WebRequest -Uri "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -OutFile ".\Miniconda3-latest-Windows-x86_64.exe"
|
||||||
|
|
||||||
|
**激活conda仓库**
|
||||||
|
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
|
||||||
|
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
|
||||||
|
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/msys2
|
||||||
|
|
||||||
|
**在目录下进入conda环境**
|
||||||
|
%WINDIR%\System32\cmd.exe "/K" D:\ProgramData\miniconda3\Scripts\activate.bat D:\ProgramData\miniconda3
|
||||||
|
|
||||||
|
**创建虚拟环境**
|
||||||
|
conda create -n audio python=3.10 -y
|
||||||
|
**激活**
|
||||||
|
conda activate audio
|
||||||
|
**取消激活**
|
||||||
|
conda deactivate
|
||||||
|
|
||||||
|
**conda中执行**
|
||||||
|
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
|
||||||
|
pip install -r requirements.txt
|
||||||
|
|
||||||
|
**使用audio虚拟环境的python解释器**
|
||||||
|
D:\ProgramData\miniconda3\envs\audio\python.exe
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
# FunASR 语音识别服务
|
# FunASR 语音识别服务
|
||||||
|
|
||||||
基于阿里达摩院 [FunASR](https://github.com/alibaba-damo-academy/FunASR) 的本地语音识别解决方案。
|
基于阿里达摩院 [FunASR](https://github.com/alibaba-damo-academy/FunASR) 的本地语音识别解决方案。
|
||||||
|
|
|
||||||
|
|
@ -6,7 +6,7 @@ FunASR 语音识别服务
|
||||||
import os
|
import os
|
||||||
import sys
|
import sys
|
||||||
|
|
||||||
MODEL_CACHE_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "models")
|
MODEL_CACHE_DIR = os.path.dirname(os.path.abspath(__file__))
|
||||||
os.makedirs(MODEL_CACHE_DIR, exist_ok=True)
|
os.makedirs(MODEL_CACHE_DIR, exist_ok=True)
|
||||||
os.environ["MODELSCOPE_CACHE"] = MODEL_CACHE_DIR
|
os.environ["MODELSCOPE_CACHE"] = MODEL_CACHE_DIR
|
||||||
os.environ["FUNASR_MODELS_DIR"] = MODEL_CACHE_DIR
|
os.environ["FUNASR_MODELS_DIR"] = MODEL_CACHE_DIR
|
||||||
|
|
|
||||||
|
|
@ -10,7 +10,6 @@
|
||||||
# ---------- 核心框架 ----------
|
# ---------- 核心框架 ----------
|
||||||
torch>=2.7.0
|
torch>=2.7.0
|
||||||
torchaudio>=2.7.0
|
torchaudio>=2.7.0
|
||||||
torchvision>=0.22.0
|
|
||||||
|
|
||||||
# ---------- FunASR 语音识别 ----------
|
# ---------- FunASR 语音识别 ----------
|
||||||
funasr>=1.3.0
|
funasr>=1.3.0
|
||||||
|
|
@ -20,13 +19,13 @@ transformers>=5.7.0
|
||||||
# ---------- 3D-Speaker 说话人分离 ----------
|
# ---------- 3D-Speaker 说话人分离 ----------
|
||||||
# 注意:3D-Speaker 需要手动克隆到项目目录
|
# 注意:3D-Speaker 需要手动克隆到项目目录
|
||||||
# git clone https://github.com/alibaba-damo-academy/3D-Speaker.git
|
# git clone https://github.com/alibaba-damo-academy/3D-Speaker.git
|
||||||
speakerlab>=1.0.0
|
speakerlab>=0.0.6
|
||||||
|
|
||||||
# ---------- 音频处理 ----------
|
# ---------- 音频处理 ----------
|
||||||
soundfile>=0.12.0
|
soundfile>=0.12.0
|
||||||
librosa>=0.11.0
|
librosa>=0.11.0
|
||||||
scipy>=1.15.0
|
scipy>=1.15.0
|
||||||
numpy>=2.2.0
|
numpy>=1.26.0
|
||||||
|
|
||||||
# ---------- 机器学习基础库 ----------
|
# ---------- 机器学习基础库 ----------
|
||||||
scikit-learn>=1.7.0
|
scikit-learn>=1.7.0
|
||||||
|
|
@ -44,7 +43,7 @@ lightning>=2.6.0
|
||||||
pyannote.audio>=3.4.0
|
pyannote.audio>=3.4.0
|
||||||
|
|
||||||
# ---------- 数据处理 ----------
|
# ---------- 数据处理 ----------
|
||||||
datasets>=4.8.0
|
datasets>=3.0.0,<4.0.0
|
||||||
pyarrow>=24.0.0
|
pyarrow>=24.0.0
|
||||||
sentencepiece>=0.2.1
|
sentencepiece>=0.2.1
|
||||||
|
|
||||||
|
|
|
||||||
Loading…
Reference in New Issue