M1 MacBook 运行大模型笔记

大模型作为效率工具确实很有用，但唯一的问题就是 GPT 需要特殊网络环境，同时还需要花钱。如果能在本地部署运行，想必会很方便、省钱。

恰好有一台 M1 Pro 的 MacBook，内存 32G，感觉性能还行。网上查了一下，通过量化的方式，确实可以在本地运行起来大模型。

这里以百川大模型为例，小结下本地执行百川的方式。

1.下载大模型

几乎所有的开源大模型都可以在 huggingface 上面找到，百川 13B 2.0 的大模型下载链接在这里。

大模型比较大，下载需要一段时间，因此我们先下载大模型。如果网络环境不太好，那么网上也有一些第三方备份的网站，可以自行查找下载。

1
2


git lfs install
git clone https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat

2. 下载 chat.cpp

chatglm.cpp 是一个 ChatGLM-6B、ChatGLM2-6B、ChatGLM3-6B 等的 C++ 实现 LLMs，可在 MacBook 上进行实时聊天。

使用命令直接将仓库克隆到本地。

1
2
3


git clone --recursive https://github.com/li-plus/chatglm.cpp.git && cd chatglm.cpp

git submodule update --init --recursive

编译。

1
2
3
4
5
6


# 使用brew 安装 cmake
brew install cmake

# 编译
cmake -B build
cmake --build build -j --config Release

3. 创建 Python 虚拟环境

1
2
3
4
5
6


# 创建虚拟环境并激活
conda create -n baichuan python=3.10
conda activate baichuan

# 安装依赖
python3 -m pip install torch tabulate tqdm transformers accelerate sentencepiece

4. 量化模型

如果模型已经下载完成，那么使用 chat.cpp 提供的工具量化模型。

1
2
3
4


# -i 参数后跟下载到的百川模型的路径
# -t 参数指定量化的参数
# -o 参数指定量化后的模型输出
python3 chatglm_cpp/convert.py -i <baichuan_path> -t q4_0 -o baichuan-ggml.bin

可选的 -t 参数如下，更详细的可以参考 chat.cpp 的 readme。

q4_0: 4-bit integer quantization with fp16 scales.
q4_1: 4-bit integer quantization with fp16 scales and minimum values.
q5_0: 5-bit integer quantization with fp16 scales.
q5_1: 5-bit integer quantization with fp16 scales and minimum values.
q8_0: 8-bit integer quantization with fp16 scales.
f16: half precision floating point weights without quantization.
f32: single precision floating point weights without quantization.

5. 运行模型

接下来就可以使用 chatglm.cpp 运行量化后的模型了。

1

./build/bin/main -m baichuan-ggml.bin -i

Pasted image 20240323182039.webp

除了百川模型外，通过 chat.cpp 也可以运行 chatglm 等其他模型，不过中文大模型中，百川效果最好，这里直接就是用百川了，其他的不用尝试了。

如果想使用 web 页面，那么首先安装依赖库。

1

pip install chatglm-cp gradio

之后执行命令。

1

python web_demo.py -m ../baichuan-ggml.bin

之后浏览器将会自动打开页面，接下来就可以愉快的玩耍了。

Pasted image 20240323220337.webp

文章目录

1.下载大模型

2. 下载 chat.cpp

3. 创建 Python 虚拟环境

4. 量化模型

5. 运行模型