移动asr模型路径,虚拟环境改用conda
This commit is contained in:
parent
cebfddd13b
commit
bdfce46d7c
|
|
@ -1,107 +0,0 @@
|
|||
# 代码清理总结
|
||||
|
||||
## 清理目标
|
||||
以 `main.py` 的流程为主,删除其他文件中的未使用代码。
|
||||
|
||||
## 当前流程(main.py)
|
||||
|
||||
```
|
||||
阶段 1: 说话人分离 (3D-Speaker)
|
||||
↓
|
||||
清理 CUDA 缓存
|
||||
↓
|
||||
阶段 2: ASR 识别 + 合并结果
|
||||
```
|
||||
|
||||
## 清理内容
|
||||
|
||||
### 1. asr_service.py
|
||||
|
||||
**删除的功能**:
|
||||
- ❌ `use_3d_speaker` 参数及相关逻辑(已在 main.py 中手动处理)
|
||||
- ❌ `_merge_diarization_segments()` 方法(未使用)
|
||||
- ❌ `_map_asr_to_speaker()` 方法(已在 main.py 中内联实现)
|
||||
|
||||
**保留的功能**:
|
||||
- ✅ `recognize()` - 基础 ASR 识别
|
||||
- ✅ `_parse_result()` - 解析识别结果
|
||||
- ✅ `export_to_json()` / `export_to_srt()` - 导出功能
|
||||
|
||||
**修改说明**:
|
||||
- ASR 识别结果中的默认说话人统一设为 `speaker_0`
|
||||
- 不再在 ASR 服务内部调用 3D-Speaker
|
||||
|
||||
### 2. map_speaker.py
|
||||
|
||||
**删除的功能**:
|
||||
- ❌ `find_speaker()` 函数(已在 main.py 中内联实现)
|
||||
- ❌ `main()` 函数(过时的示例代码)
|
||||
|
||||
**保留的功能**:
|
||||
- ✅ `load_json()` - 加载 JSON 文件
|
||||
- ✅ `save_json()` - 保存 JSON 文件
|
||||
|
||||
### 3. example_usage.py
|
||||
|
||||
**修改内容**:
|
||||
- 更新输出示例中的说话人格式:`SPEAKER_00` → `speaker_0`
|
||||
- 保持示例代码的参考价值
|
||||
|
||||
### 4. 删除的文件
|
||||
|
||||
- ❌ `test_staged.py` - 临时测试脚本
|
||||
- ❌ `test_model_load.py` - 临时测试脚本
|
||||
|
||||
## 核心逻辑(main.py)
|
||||
|
||||
### 阶段 1: 说话人分离
|
||||
```python
|
||||
diar_service = DiarizationService()
|
||||
diar_service._load_model()
|
||||
|
||||
for video in videos:
|
||||
wav = extract_wav(video)
|
||||
segments = diar_service.diarize(wav)
|
||||
save_json(temp_file, {"segments": segments})
|
||||
```
|
||||
|
||||
### 阶段 2: ASR 识别 + 合并
|
||||
```python
|
||||
asr_service = ASRService()
|
||||
asr_service._load_model()
|
||||
|
||||
for video in videos:
|
||||
asr_sentences = asr_service.recognize(wav)
|
||||
|
||||
# 合并说话人(只使用 3D-Speaker 结果)
|
||||
for sentence in asr_sentences:
|
||||
matched_speaker = 查找最大重叠的说话人
|
||||
if matched_speaker:
|
||||
sentence.speaker = matched_speaker
|
||||
else:
|
||||
sentence.speaker = "speaker_0"
|
||||
|
||||
export_to_json(output_file, asr_sentences)
|
||||
```
|
||||
|
||||
## 优势
|
||||
|
||||
1. **逻辑清晰**: 只在一个地方(main.py)处理说话人合并
|
||||
2. **避免重复**: 删除了多处重复的说话人对齐逻辑
|
||||
3. **易于维护**: 核心流程集中在 main.py,服务类只负责基础功能
|
||||
4. **统一格式**: 所有说话人标签统一为 `speaker_0`, `speaker_1`, ...
|
||||
|
||||
## 文件依赖关系
|
||||
|
||||
```
|
||||
main.py
|
||||
├── asr_service.py (基础 ASR 识别)
|
||||
├── diarization_service.py (说话人分离)
|
||||
└── map_speaker.py (JSON 工具函数)
|
||||
```
|
||||
|
||||
## 未清理的文件
|
||||
|
||||
- `server.py` - Web API 服务(独立功能)
|
||||
- `test_asr.py` - 测试脚本(可保留)
|
||||
- `example_usage.py` - 示例代码(已更新)
|
||||
|
|
@ -30,12 +30,12 @@ source funasr_env/bin/activate
|
|||
|
||||
**CUDA 11.8:**
|
||||
```bash
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
|
||||
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
|
||||
```
|
||||
|
||||
**CPU 版本:**
|
||||
```bash
|
||||
pip install torch torchvision torchaudio
|
||||
pip install torch torchaudio
|
||||
```
|
||||
|
||||
### 4. 安装 3D-Speaker
|
||||
|
|
|
|||
28
README.md
28
README.md
|
|
@ -1,3 +1,31 @@
|
|||
**获取miniconda安装包**
|
||||
Invoke-WebRequest -Uri "https://repo.anaconda.com/miniconda/Miniconda3-latest-Windows-x86_64.exe" -OutFile ".\Miniconda3-latest-Windows-x86_64.exe"
|
||||
|
||||
**激活conda仓库**
|
||||
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/main
|
||||
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/r
|
||||
conda tos accept --override-channels --channel https://repo.anaconda.com/pkgs/msys2
|
||||
|
||||
**在目录下进入conda环境**
|
||||
%WINDIR%\System32\cmd.exe "/K" D:\ProgramData\miniconda3\Scripts\activate.bat D:\ProgramData\miniconda3
|
||||
|
||||
**创建虚拟环境**
|
||||
conda create -n audio python=3.10 -y
|
||||
**激活**
|
||||
conda activate audio
|
||||
**取消激活**
|
||||
conda deactivate
|
||||
|
||||
**conda中执行**
|
||||
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
|
||||
pip install -r requirements.txt
|
||||
|
||||
**使用audio虚拟环境的python解释器**
|
||||
D:\ProgramData\miniconda3\envs\audio\python.exe
|
||||
|
||||
|
||||
|
||||
|
||||
# FunASR 语音识别服务
|
||||
|
||||
基于阿里达摩院 [FunASR](https://github.com/alibaba-damo-academy/FunASR) 的本地语音识别解决方案。
|
||||
|
|
|
|||
|
|
@ -6,7 +6,7 @@ FunASR 语音识别服务
|
|||
import os
|
||||
import sys
|
||||
|
||||
MODEL_CACHE_DIR = os.path.join(os.path.dirname(os.path.abspath(__file__)), "models")
|
||||
MODEL_CACHE_DIR = os.path.dirname(os.path.abspath(__file__))
|
||||
os.makedirs(MODEL_CACHE_DIR, exist_ok=True)
|
||||
os.environ["MODELSCOPE_CACHE"] = MODEL_CACHE_DIR
|
||||
os.environ["FUNASR_MODELS_DIR"] = MODEL_CACHE_DIR
|
||||
|
|
|
|||
|
|
@ -10,7 +10,6 @@
|
|||
# ---------- 核心框架 ----------
|
||||
torch>=2.7.0
|
||||
torchaudio>=2.7.0
|
||||
torchvision>=0.22.0
|
||||
|
||||
# ---------- FunASR 语音识别 ----------
|
||||
funasr>=1.3.0
|
||||
|
|
@ -20,13 +19,13 @@ transformers>=5.7.0
|
|||
# ---------- 3D-Speaker 说话人分离 ----------
|
||||
# 注意:3D-Speaker 需要手动克隆到项目目录
|
||||
# git clone https://github.com/alibaba-damo-academy/3D-Speaker.git
|
||||
speakerlab>=1.0.0
|
||||
speakerlab>=0.0.6
|
||||
|
||||
# ---------- 音频处理 ----------
|
||||
soundfile>=0.12.0
|
||||
librosa>=0.11.0
|
||||
scipy>=1.15.0
|
||||
numpy>=2.2.0
|
||||
numpy>=1.26.0
|
||||
|
||||
# ---------- 机器学习基础库 ----------
|
||||
scikit-learn>=1.7.0
|
||||
|
|
@ -44,7 +43,7 @@ lightning>=2.6.0
|
|||
pyannote.audio>=3.4.0
|
||||
|
||||
# ---------- 数据处理 ----------
|
||||
datasets>=4.8.0
|
||||
datasets>=3.0.0,<4.0.0
|
||||
pyarrow>=24.0.0
|
||||
sentencepiece>=0.2.1
|
||||
|
||||
|
|
|
|||
Loading…
Reference in New Issue