问题说明

共有3个模块,分别是

  1. loader可执行文件,加载下方的test.so动态链接库。
  2. test.so,该so链接了pythonx.x.so,并引入了pybind11库,加载下方的logic.py脚本,调用其一些方法。
  3. logic.py,一些python逻辑,import了numpy库。

在执行loader进程时,报错信息如下,加载numpy失败了。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
terminate called after throwing an instance of 'pybind11::error_already_set'
  what():  ImportError: 

IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!

Importing the numpy C-extensions failed. This error can happen for
many reasons, often due to issues with your setup or how NumPy was
installed.

We have compiled some common reasons and troubleshooting tips at:

    https://numpy.org/devdocs/user/troubleshooting-importerror.html

Please note and check the following:

  * The Python version is: Python3.7 from "/usr/bin/python3"
  * The NumPy version is: "1.21.2"

and make sure that they are the versions you expect.
Please carefully study the documentation linked above for further help.

Original error was: /home/lighthouse/.local/lib/python3.7/site-packages/numpy/core/_multiarray_umath.cpython-37m-x86_64-linux-gnu.so: undefined symbol: PyExc_ImportError


At:
  /home/lighthouse/.local/lib/python3.7/site-packages/numpy/core/__init__.py(51): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(728): exec_module
  <frozen importlib._bootstrap>(677): _load_unlocked
  <frozen importlib._bootstrap>(967): _find_and_load_unlocked
  <frozen importlib._bootstrap>(983): _find_and_load
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap>(1043): _handle_fromlist
  /home/lighthouse/.local/lib/python3.7/site-packages/numpy/__init__.py(150): <module>
  <frozen importlib._bootstrap>(219): _call_with_frames_removed
  <frozen importlib._bootstrap_external>(728): exec_module
  <frozen importlib._bootstrap>(677): _load_unlocked
  <frozen importlib._bootstrap>(967): _find_and_load_unlocked
  <frozen importlib._bootstrap>(983): _find_and_load

Aborted (core dumped)

解决

参考文章Numpy import fails on multiarray extension library when called from embedded Python within a C++ application后,发现根本原因是模块_multiarray_umath.cpython-37m-x86_64-linux-gnu.so没有显式的链接libpythonx.x.so,但是_multiarray_umath.cpython-37m-x86_64-linux-gnu.so又依赖libpythonx.x.so,如果是在python进程中,python会加载libpythonx.x.so,因此不会有问题。

但奇怪的是,我这里虽然loader没有链接libpythonx.x.so文件,但是在test.so是有链接libpythonx.x.so的。

再仔细看了下上面的回答,理解应该是这样。在_multiarray_umath.cpython-37m-x86_64-linux-gnu.so模块中依赖了PyExc_ImportError符号,这个符号由libpythonx.x.so导出。当在python进程中时,是python二进制链接了libpythonx.x.so文件,在动态加载_multiarray_umath.cpython-37m-x86_64-linux-gnu.so时,虽然它自身没有依赖libpythonx.x.so,但是会去查主进程的依赖项,进而找到libpythonx.x.so以及其导出的PyExc_ImportError符号,因此没有问题。

而当我使用test.so链接libpythonx.x.so时,虽然加载了libpythonx.x.so,但是由于没有被主进程二进制链接,因此加载_multiarray_umath.cpython-37m-x86_64-linux-gnu.so时,无法通过主进程中找到PyExc_ImportError符号。

因此解决方法有两个:

  1. 在编译时,主进程模块显式的链接libpythonx.x.so,这样导出的PyExc_ImportError符号可以被后续加载的其他模块查找到。
  2. 在代码中,使用dlopen("libpythonx.x.so", RTLD_LAZY | RTLD_GLOBAL)再次显式的加载libpythonx.x.so模块,其中RTLD_GLOBAL选项可以让libpythonx.x.so中定义的符号被其后打开的其他模块重定位解析使用,这样在加载_multiarray_umath.cpython-37m-x86_64-linux-gnu.so时,也就能查找到PyExc_ImportError符号了。

参考链接

  1. Numpy import fails on multiarray extension library when called from embedded Python within a C++ application
  2. How are symbols contained in the libpythonX.X linked to numpy extension dynamic libraries?