This is the official codebase of the paper
Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction
Zhaocheng Zhu, Zuobai Zhang, Louis-Pascal Xhonneux, Jian Tang
A PyG re-implementation of NBFNet can be found here.
NBFNet is a graph neural network framework inspired by traditional path-based methods. It enjoys the advantages of both traditional path-based methods and modern graph neural networks, including generalization in the inductive setting, interpretability, high model capacity and scalability. NBFNet can be applied to solve link prediction on both homogeneous graphs and knowledge graphs.
This codebase is based on PyTorch and TorchDrug. It supports training and inference with multiple GPUs or multiple machines.
You may install the dependencies via either conda or pip. Generally, NBFNet works with Python 3.7/3.8 and PyTorch version >= 1.8.0.
bash
conda install torchdrug pytorch=1.8.2 cudatoolkit=11.1 -c milagraph -c pytorch-lts -c pyg -c conda-forge
conda install ogb easydict pyyaml -c conda-forge
bash
pip install torch==1.8.2+cu111 -f https://download.pytorch.org/whl/lts/1.8/torch_lts.html
pip install torchdrug
pip install ogb easydict pyyaml
To reproduce the results of NBFNet, use the following command. Alternatively, you
may use --gpus null
to run NBFNet on a CPU. All the datasets will be automatically
downloaded in the code.
bash
python script/run.py -c config/inductive/wn18rr.yaml --gpus [0] --version v1
We provide the hyperparameters for each experiment in configuration files.
All the configuration files can be found in config/*/*.yaml
.
For experiments on inductive relation prediction, you need to additionally specify
the split version with --version v1
.
To run NBFNet with multiple GPUs or multiple machines, use the following commands
bash
python -m torch.distributed.launch --nproc_per_node=4 script/run.py -c config/inductive/wn18rr.yaml --gpus [0,1,2,3]
bash
python -m torch.distributed.launch --nnodes=4 --nproc_per_node=4 script/run.py -c config/inductive/wn18rr.yaml --gpus [0,1,2,3,0,1,2,3,0,1,2,3,0,1,2,3]
Once you have models trained on FB15k237, you can visualize the path interpretations with the following line. Please replace the checkpoint with your own path.
bash
python script/visualize.py -c config/knowledge_graph/fb15k237_visualize.yaml --checkpoint /path/to/nbfnet/experiment/model_epoch_20.pth
Due to the large size of ogbl-biokg, we only evaluate on a small portion of the validation set during training. The following line evaluates a model on the full validation / test sets of ogbl-biokg. Please replace the checkpoint with your own path.
bash
python script/run.py -c config/knowledge_graph/ogbl-biokg_test.yaml --checkpoint /path/to/nbfnet/experiment/model_epoch_10.pth
Here are the results of NBFNet on standard benchmark datasets. All the results are obtained with 4 V100 GPUs (32GB). Note results may be slightly different if the model is trained with 1 GPU and/or a smaller batch size.
Dataset | MR | MRR | [email protected] | [email protected] | [email protected] |
---|---|---|---|---|---|
FB15k-237 | 114 | 0.415 | 0.321 | 0.454 | 0.599 |
WN18RR | 636 | 0.551 | 0.497 | 0.573 | 0.666 |
ogbl-biokg | - | 0.829 | 0.768 | 0.870 | 0.946 |
Dataset | AUROC | AP |
---|---|---|
Cora | 0.956 | 0.962 |
CiteSeer | 0.923 | 0.936 |
PubMed | 0.983 | 0.982 |
Dataset | [email protected] (50 sample) | |||
---|---|---|---|---|
v1 | v2 | v3 | v4 | |
FB15k-237 | 0.834 | 0.949 | 0.951 | 0.960 |
WN18RR | 0.948 | 0.905 | 0.893 | 0.890 |
This is probably because the JIT cache is broken.
Try rm -r ~/.cache/torch_extensions/*
and run the code again.
If you find this codebase useful in your research, please cite the following paper.
bibtex
@article{zhu2021neural,
title={Neural bellman-ford networks: A general graph neural network framework for link prediction},
author={Zhu, Zhaocheng and Zhang, Zuobai and Xhonneux, Louis-Pascal and Tang, Jian},
journal={Advances in Neural Information Processing Systems},
volume={34},
year={2021}
}
第一次更新
Hi there.
I have tried running this code on one of my machine with four RTX3090 GPUs (GPU memory 24GB for each)
python -m torch.distributed.launch --nproc_per_node=4 script/run.py -c config/inductive/wn18rr.yaml --gpus [0,1,2,3]
I do not change any other parts of this repo. However, I encountered the CUDA error saying that I need more GPU memory. Later I modified this code as follows:
python script/run.py -c config/inductive/wn18rr.yaml --gpus [0]
and run it on a machine with one A100 GPU with 40GB GPU memory. The code runs successfully and costs roughly 32GB GPU memory. I am really puzzled for this: why the code does not properly utilize the total 24GB*4=96GB GPU memory and still report a memory issue? Is there something wrong with my setups?
Hi, Doctor. I meet some problems when I run the code on the Linux. I do really need your help. Could you help me? It really troubles me a lot.
``` 15:43:32 Preprocess training set 15:43:36 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 15:43:36 Epoch 0 begin Traceback (most recent call last): File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1666, in _run_ninja_build subprocess.run( File "/data1/home/wza/.conda/envs/linkp/lib/python3.8/subprocess.py", line 516, in run raise CalledProcessError(retcode, process.args, subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "script/run.py", line 62, in
Hi, I followed the instruction to reproduce results but had a problem with module 'spmm'. My torch version is 1.8.2, torchdrug is 0.1.2. Any ideas how to fix it?
12:53:15 Epoch 0 begin Traceback (most recent call last): File "script/run.py", line 78, in
File "script/run.py", line 30, in train_and_validate File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\core\engine.py", line 143, in train loss, metric = model(batch) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\tasks\reasoning.py", line 85, in forward pred = self.predict(batch, all_loss, metric) File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\task.py", line 288, in predict pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\model.py", line 149, in forward output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0]) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\decorator.py", line 232, in fun return caller(func, *(extras + args), kw) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\decorator.py", line 56, in wrapper return forward(self, args, kwargs) File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\model.py", line 115, in bellmanford hidden = layer(step_graph, layer_input) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl result = self.forward(input, **kwargs) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\layers\conv.py", line 91, in forward update = self.message_and_aggregate(graph, input) File "C:\Users\Pengfei\Documents\cse research\NBFNet-master\nbfnet\layer.py", line 140, in message_and_aggregate sum = functional.generalized_rspmm(adjacency, relation_input, input, sum="add", mul=mul) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\layers\functional\spmm.py", line 378, in generalized_rspmm return Function.apply(sparse.coalesce(), relation, input) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\layers\functional\spmm.py", line 172, in forward forward = spmm.rspmm_add_mul_forward_cuda File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\torch.py", line 27, in getattr return getattr(self.module, key) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\decorator.py", line 21, in get result = self.func(obj) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torchdrug\utils\torch.py", line 31, in module return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags, File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\utils\cpp_extension.py", line 1079, in load return _jit_compile( File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\utils\cpp_extension.py", line 1317, in _jit_compile return _import_module_from_library(name, build_directory, is_python_module) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\site-packages\torch\utils\cpp_extension.py", line 1700, in _import_module_from_library file, path, description = imp.find_module(module_name, [path]) File "C:\Users\Pengfei\anaconda3\envs\py38\lib\imp.py", line 296, in find_module raise ImportError(_ERR_MSG.format(name), name=name) ImportError: No module named 'spmm'
Hello,
I followed the instruction to install the torchdrug-related packages and matching PyTorch/CUDA version. However, I got this following error when initializing the code. Any ideas to fix this? The system has intel/19.0.3.199 loaded.
01:24:15 Epoch 0 begin
Traceback (most recent call last):
File "script/run.py", line 62, in <module>
train_and_validate(cfg, solver)
File "script/run.py", line 27, in train_and_validate
solver.train(**kwargs)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/core/engine.py", line 143, in train
loss, metric = model(batch)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/tasks/reasoning.py", line 85, in forward
pred = self.predict(batch, all_loss, metric)
File "~/Workspace/Python/NBFNet/nbfnet/task.py", line 288, in predict
pred = self.model(graph, h_index, t_index, r_index, all_loss=all_loss, metric=metric)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "~/Workspace/Python/NBFNet/nbfnet/model.py", line 149, in forward
output = self.bellmanford(graph, h_index[:, 0], r_index[:, 0])
File "<decorator-gen-888>", line 2, in bellmanford
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 56, in wrapper
return forward(self, *args, **kwargs)
File "~/Workspace/Python/NBFNet/nbfnet/model.py", line 115, in bellmanford
hidden = layer(step_graph, layer_input)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/layers/conv.py", line 91, in forward
update = self.message_and_aggregate(graph, input)
File "~/Workspace/Python/NBFNet/nbfnet/layer.py", line 124, in message_and_aggregate
adjacency = graph.adjacency.transpose(0, 1)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 21, in __get__
result = self.func(obj)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/data/graph.py", line 658, in adjacency
return utils.sparse_coo_tensor(self.edge_list.t(), self.edge_weight, self.shape)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 182, in sparse_coo_tensor
return torch_ext.sparse_coo_tensor_unsafe(indices, values, size)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 27, in __getattr__
return getattr(self.module, key)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/decorator.py", line 21, in __get__
result = self.func(obj)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torchdrug/utils/torch.py", line 31, in module
return cpp_extension.load(self.name, self.sources, self.extra_cflags, self.extra_cuda_cflags,
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1079, in load
return _jit_compile(
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1292, in _jit_compile
_write_ninja_file_and_build_library(
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1378, in _write_ninja_file_and_build_library
check_compiler_abi_compatibility(compiler)
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 282, in check_compiler_abi_compatibility
if not check_compiler_ok_for_platform(compiler):
File "~/anaconda3/envs/dlg_env/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 249, in check_compiler_ok_for_platform
version_string = subprocess.check_output([compiler, '-v'], stderr=subprocess.STDOUT).decode()
File "~/anaconda3/envs/dlg_env/lib/python3.8/subprocess.py", line 415, in check_output
return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
File "~/anaconda3/envs/dlg_env/lib/python3.8/subprocess.py", line 516, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['icpc', '-v']' returned non-zero exit status 1.
Research group led by Prof. Jian Tang at Mila-Quebec AI Institute (https://mila.quebec/) focusing on graph representation learning and graph neural networks.
GitHub Repositorygraph-neural-networks link-prediction knowledge-graph reasoning