UnisMindMap/projects/multi_gpu_v2/README_zh.md

87 lines
2.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

# MinerU v2.0 多GPU服务器
[English](README.md)
这是一个精简的多GPU服务器实现。
## 快速开始
### 1. 安装 MinerU
```bash
pip install --upgrade pip
pip install uv
uv pip install -U "mineru[core]"
uv pip install litserve aiohttp loguru
```
### 2. 启动服务器
```bash
python server.py
```
### 3. 启动客户端
```bash
python client.py
```
现在,`[demo](../../demo/)` 文件夹下的PDF文件将并行处理。假设您有2个GPU如果您将 `workers_per_device` 更改为 `2`则可以同时处理4个PDF文件
## 自定义
### 服务器
以下示例展示了如何启动带有自定义设置的服务器:
```python
server = ls.LitServer(
MinerUAPI(output_dir='/tmp/mineru_output'), # 自定义输出文件夹
accelerator='auto', # 您可以指定 'cuda'
devices='auto', # "auto" 使用所有可用的GPU
workers_per_device=1, # 每个GPU启动一个工作实例
timeout=False # 禁用超时,用于长时间处理
)
server.run(port=8000, generate_client_file=False)
```
### 客户端
客户端支持同步和异步处理:
```python
import asyncio
import aiohttp
from client import mineru_parse_async
async def process_documents():
async with aiohttp.ClientSession() as session:
# 基本用法
result = await mineru_parse_async(session, 'document.pdf')
# 带自定义选项
result = await mineru_parse_async(
session,
'document.pdf',
backend='pipeline',
lang='ch',
formula_enable=True,
table_enable=True
)
# 运行异步处理
asyncio.run(process_documents())
```
### 并行处理
同时处理多个文件:
```python
async def process_multiple_files():
files = ['doc1.pdf', 'doc2.pdf', 'doc3.pdf']
async with aiohttp.ClientSession() as session:
tasks = [mineru_parse_async(session, file) for file in files]
results = await asyncio.gather(*tasks)
return results
```