新增：视频转码接口及功能

- 将视频转码为h264
2026-05-09 11:00:06 +08:00 · 2026-05-09 11:00:06 +08:00 · 6afea795f8
parent 79b933e66e
commit 6afea795f8
3 changed files with 83 additions and 196 deletions
--- a/.gitignore
+++ b/.gitignore
@ -54,6 +54,7 @@ Thumbs.db
 # 输入输出
 output/
 input/
+vid_h264/

 # 音频文件（可选，根据需要调整）
 # *.wav
--- a/INSTALL.md
+++ b/INSTALL.md
@ -1,196 +0,0 @@
-# 项目依赖安装指南
-
-## 📋 环境要求
-
- **Python**: 3.10+
- **CUDA**: 11.8+ (可选，用于 GPU 加速)
- **系统**: Windows 10/11, Linux, macOS
-
-## 🚀 快速安装
-
-### 1. 创建虚拟环境
-
-```bash
-python -m venv funasr_env
-```
-
-### 2. 激活虚拟环境
-
-**Windows:**
-```bash
-funasr_env\Scripts\activate
-```
-
-**Linux/macOS:**
-```bash
-source funasr_env/bin/activate
-```
-
-### 3. 安装 PyTorch (带 CUDA 支持)
-
-**CUDA 11.8:**
-```bash
-pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu118
-```
-
-**CPU 版本:**
-```bash
-pip install torch torchaudio
-```
-
-### 4. 安装 3D-Speaker
-
-```bash
-# 克隆 3D-Speaker 项目到父目录
-cd ..
-git clone https://github.com/alibaba-damo-academy/3D-Speaker.git
-
-# 安装 3D-Speaker 依赖
-cd 3D-Speaker
-pip install -e .
-```
-
-### 5. 安装其他依赖
-
-```bash
-# 返回项目目录
-cd ../audio2
-
-# 安装 requirements.txt
-pip install -r requirements.txt
-```
-
-## 📦 依赖说明
-
-### 核心依赖
-
-| 包名 | 用途 | 必需 |
-|------|------|------|
-| torch | 深度学习框架 | ✅ |
-| funasr | 语音识别引擎 | ✅ |
-| modelscope | 模型下载与管理 | ✅ |
-| speakerlab | 3D-Speaker 说话人分离 | ✅ |
-| soundfile | 音频文件读写 | ✅ |
-| librosa | 音频分析 | ✅ |
-
-### 可选依赖
-
-| 包名 | 用途 | 何时需要 |
-|------|------|----------|
-| onnxruntime-gpu | ONNX 推理加速 | 需要更高性能时 |
-| Flask | Web API 服务 | 需要部署 Web 服务时 |
-| SQLAlchemy | 数据库 ORM | 需要持久化存储时 |
-
-## 🔧 验证安装
-
-运行测试脚本验证安装：
-
-```bash
-# 测试模型加载
-python test_model_load.py
-
-# 运行主程序
-python main.py
-```
-
-## ⚠️ 常见问题
-
-### 1. CUDA 版本不匹配
-
-**错误信息:**
-```
-RuntimeError: CUDA error: no kernel image is available for execution
-```
-
-**解决方案:**
-```bash
-# 卸载当前 PyTorch
-pip uninstall torch torchvision torchaudio
-
-# 根据 CUDA 版本重新安装
-# CUDA 11.8
-pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
-
-# CUDA 12.1
-pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
-```
-
-### 2. 3D-Speaker 导入失败
-
-**错误信息:**
-```
-ModuleNotFoundError: No module named 'speakerlab'
-```
-
-**解决方案:**
-```bash
-# 确保 3D-Speaker 在项目父目录
-# 结构应为:
-#   project/
-#     ├── audio2/
-#     └── 3D-Speaker/
-
-# 重新安装 3D-Speaker
-cd 3D-Speaker
-pip install -e .
-```
-
-### 3. 模型下载失败
-
-**错误信息:**
-```
-ConnectionError: Failed to download model from ModelScope
-```
-
-**解决方案:**
-```bash
-# 使用阿里云镜像
-export MODELSCOPE_CACHE="./models"
-
-# 或手动下载模型后放入缓存目录
-```
-
-### 4. 内存不足
-
-**错误信息:**
-```
-RuntimeError: CUDA out of memory
-```
-
-**解决方案:**
- 减少并发数：修改 `main.py` 中的 `max_workers=1`
- 使用 CPU 模式：`device='cpu'`
- 关闭其他占用 GPU 的程序
-
-## 📝 依赖版本锁定
-
-如需精确控制版本，使用：
-
-```bash
-# 生成当前环境的依赖快照
-pip freeze > requirements.lock.txt
-
-# 使用锁定的版本安装
-pip install -r requirements.lock.txt
-```
-
-## 🎯 最小化安装
-
-如果只需要基础功能：
-
-```bash
-# 最小依赖集
-pip install torch funasr modelscope soundfile scipy numpy tqdm pyyaml
-```
-
-## 📊 磁盘空间需求
-
-| 组件 | 空间需求 |
-|------|----------|
-| 基础依赖 | ~2 GB |
-| PyTorch (CUDA) | ~3 GB |
-| FunASR 模型 | ~2 GB |
-| 3D-Speaker 模型 | ~1 GB |
-| **总计** | **~8 GB** |
-
-建议预留 **10 GB** 以上可用空间。
--- a/server.py
+++ b/server.py
@ -4,6 +4,7 @@ Web API Server for ASR and Speaker Diarization
 """

 import os
+import subprocess
 import sys
 import gc
 import json
@ -16,6 +17,8 @@ from werkzeug.utils import secure_filename
 import uuid
 from datetime import datetime, timezone

+from lib.convert import convert_to_h264
+

 def make_response(status="success", data=None, errors=None, message=None, extra=None):
    """
@ -213,6 +216,78 @@ def result():
            errors=[str(e)]
        )), 500

+@app.route('/api/convert', methods=['GET'])
+def convert():
+    """视频文件转码"""
+    try:
+        # 从请求参数获取路径
+        path = request.args.get('path', '')
+
+        if not path:
+            return jsonify(make_response(
+                status="error",
+                message="请提供文件路径",
+                errors=["缺少必要参数：path"]
+            )), 400
+
+        # 转码视频文件
+        output_path = convert_to_h264(path)
+
+        return jsonify(make_response(
+            status="success",
+            message="视频文件转码完成",
+            data={"path": output_path}
+        )), 200
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        return jsonify(make_response(
+            status="error",
+            message=str(e),
+            errors=[str(e)]
+        )), 500
+
+@app.route('/api/getVidUrl', methods=['GET'])
+def getVidUrl():
+    """获取视频文件URL"""
+    try:
+        # 从请求参数获取路径
+        path = request.args.get('path', '')
+
+        if not path:
+            return jsonify(make_response(
+                status="error",
+                message="请提供文件路径",
+                errors=["缺少必要参数：path"]
+            )), 400
+
+
+        # 检查视频文件是否存在
+        if not Path(f"vid_h264/{Path(path).stem}_h264.mp4").exists():
+            return jsonify(make_response(
+                status="error",
+                message="视频文件不存在",
+                errors=["视频文件不存在"]
+            )), 404
+
+        # 生成视频文件URL
+        url = f"http://localhost:8086/{Path(path).stem}_h264.mp4"
+        print(url)
+
+        return jsonify(make_response(
+            status="success",
+            message="获取成功",
+            data={"url": url}
+        )), 200
+    except Exception as e:
+        import traceback
+        traceback.print_exc()
+        return jsonify(make_response(
+            status="error",
+            message=str(e),
+            errors=[str(e)]
+        )), 500
+
 if __name__ == '__main__':
    print("=" * 60)
    print("          ASR & Speaker Diarization API Server")
@ -220,10 +295,17 @@ if __name__ == '__main__':
    print("\nAPI 接口:")
    print("  GET /api/recognize - 文件推理")
    print("  GET /api/result - 获取文件推理结果")
+    print("  GET /api/convert - 转码视频文件")
+    print("  GET /api/getVidUrl - 获取视频文件URL")
    print("\n" + "=" * 60)
    print("启动服务：http://localhost:5000")
    print("使用 Waitress WSGI 服务器（无超时限制）")
    print("=" * 60)

+    # 启动 Caddy 服务（后台运行）
+    caddy_dir = os.path.join(os.path.dirname(__file__), "vid_h264")
+    caddy_exe = os.path.join(os.path.dirname(__file__), "lib", "caddy_windows_amd64.exe")
+    subprocess.Popen([caddy_exe, "file-server", "--listen", ":8086", "--browse"], cwd=caddy_dir, shell=True)
+
    from waitress import serve
    serve(app, host='0.0.0.0', port=5000, threads=4, connection_limit=100)