This repo hosts the code for the paper Biomedical Event Extraction with Hierarchical Knowledge Graphs. We represent knowledge from UMLS in hierarchical knowledge graphs, and integrate knowledge into pre-trained contextual representations with a proposed graph neural networks, Graph Edge-conditioned Attention Neural Networks (GEANet). Currently, only SciBERT Baseline is available, and our best performing model GEANet-SciBERT will soon be released.
| Model | Dev Set F1 | Test Set F1 | | ------------- |-------------:| -----:| | SciBERT Baseline | 59.33 | 58.50 | | GEANet-SciBERT | 60.38 | 60.06 | | Previous SOTA | N/A | 58.65 |
With the increasing concern about the COVID-19 pandemic, researchers have been putting much effort in providing useful insights into the COVID-19 Open Research Dataset Challenge (CORD-19). This repo also demonstrates how we extract biomedical events with SciBERT, a BERT trained on scientific corpus, which was fine-tuned on the GENIA BioNLP shared task 2011. We took reference from the pipeline described in Bjorne et al., where the pipeline can be broken into 3 stages: trigger detection, edge/argument detection and unmerging. The extracted events can be found here.
In addition, we adopted the framework from Han et al., where trigger and edge detections are trained in a multitask setting.
Our experiments were ran with Python 3.6.10, PyTorch 1.4.0 and CUDA 10.1 on a CentOS machine. Please follow the instructions of GPU dependencies and usage as illustrated in the official PyTorch website. We utilized the NER model en_ner_jnlpba_md
provided by ScispaCy for tagging biomedical entities.
Run pip install -r requirements
to install all the required packages.
(Optional) If you would like to use the knowledge incorporation component, additional packages are required for Pytorch Geometric can be installed as follows:
$ pip install torch-scatter==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-sparse==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-cluster==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
$ pip install torch-spline-conv==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
, where ${CUDA}
should be replaced by your installed CUDA version. (e.g. cpu
, cu92
, cu101
, cu102
)
The trained model can be downloaded from here. You need to unzip decompress it before running the program.
Preprocess CORD.ipynb
to generate the processed .txt
files to genia_cord_19
from the json files in the custom_license/custom_license/pmc_json/
directory.. run_multitask_bert.sh
in the terminal to run the whole event extraction pipeline and generate event annotations into genia_cord_19_output
.Call the biomedical_evet_extraction
function in the predict.py
script. This function takes in a single string as parameter. The event extraction pipeline will be run on this string.
```
biomedical_evet_extraction("BMP-6 inhibits growth of mature human B cells; induction of Smad phosphorylation and upregulation of Id1.") [{'tokens': ['BMP-6', 'inhibits', 'growth', 'of', 'mature', 'human', 'B', 'cells', ';', 'induction', 'of', 'Smad', 'phosphorylation', 'and', 'upregulation', 'of', 'Id1', '.'], 'events': [{'event_type': 'Positive_regulation', 'triggers': [{'event_type': 'Positive_regulation', 'text': 'upregulation', 'start_token': 14, 'end_token': 14}], 'arguments': [{'role': 'Theme', 'text': 'Id1', 'start_token': 16, 'end_token': 16}]}], 'ner': [[0, 0, 'PROTEIN'], [4, 7, 'CELL_TYPE'], [11, 11, 'PROTEIN'], [16, 16, 'PROTEIN']]}] ```
``` ├── genia_cord_19 ├── genia_cord_19_output ├── preprocessed_data ├── eval └── weights
```
bibtex
@inproceedings{huang-etal-2020-biomedical,
title = "Biomedical Event Extraction with Hierarchical Knowledge Graphs",
author = "Huang, Kung-Hsiang and
Yang, Mu and
Peng, Nanyun",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2020",
month = nov,
year = "2020",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://www.aclweb.org/anthology/2020.findings-emnlp.114",
doi = "10.18653/v1/2020.findings-emnlp.114",
pages = "1277--1285",
abstract = "Biomedical event extraction is critical in understanding biomolecular interactions described in scientific corpus. One of the main challenges is to identify nested structured events that are associated with non-indicative trigger words. We propose to incorporate domain knowledge from Unified Medical Language System (UMLS) to a pre-trained language model via Graph Edge-conditioned Attention Networks (GEANet) and hierarchical graph representation. To better recognize the trigger words, each sentence is first grounded to a sentence graph based on a jointly modeled hierarchical knowledge graph from UMLS. The grounded graphs are then propagated by GEANet, a novel graph neural networks for enhanced capabilities in inferring complex events. On BioNLP 2011 GENIA Event Extraction task, our approach achieved 1.41{\%} F1 and 3.19{\%} F1 improvements on all events and complex events, respectively. Ablation studies confirm the importance of GEANet and hierarchical KG.",
}
Bumps certifi from 2020.4.5.2 to 2022.12.7.
9e9e840
2022.12.07b81bdb2
2022.09.24939a28f
2022.09.14aca828a
2022.06.15.2de0eae1
Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...b8eb5e9
2022.06.15.147fb7ab
Fix deprecation warning on Python 3.11 (#199)b0b48e0
fixes #198 -- update link in license9d514b4
2022.06.154151e88
Add py.typed to MANIFEST.in to package in sdist (#196)Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps pillow from 7.1.2 to 9.3.0.
Sourced from pillow's releases.
9.3.0
https://pillow.readthedocs.io/en/stable/releasenotes/9.3.0.html
Changes
- Initialize libtiff buffer when saving #6699 [
@radarhere
]- Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [
@wiredfool
]- Inline fname2char to fix memory leak #6329 [
@nulano
]- Fix memory leaks related to text features #6330 [
@nulano
]- Use double quotes for version check on old CPython on Windows #6695 [
@hugovk
]- GHA: replace deprecated set-output command with GITHUB_OUTPUT file #6697 [
@nulano
]- Remove backup implementation of Round for Windows platforms #6693 [
@cgohlke
]- Upload fribidi.dll to GitHub Actions #6532 [
@nulano
]- Fixed set_variation_by_name offset #6445 [
@radarhere
]- Windows build improvements #6562 [
@nulano
]- Fix malloc in _imagingft.c:font_setvaraxes #6690 [
@cgohlke
]- Only use ASCII characters in C source file #6691 [
@cgohlke
]- Release Python GIL when converting images using matrix operations #6418 [
@hmaarrfk
]- Added ExifTags enums #6630 [
@radarhere
]- Do not modify previous frame when calculating delta in PNG #6683 [
@radarhere
]- Added support for reading BMP images with RLE4 compression #6674 [
@npjg
]- Decode JPEG compressed BLP1 data in original mode #6678 [
@radarhere
]- pylint warnings #6659 [
@marksmayo
]- Added GPS TIFF tag info #6661 [
@radarhere
]- Added conversion between RGB/RGBA/RGBX and LAB #6647 [
@radarhere
]- Do not attempt normalization if mode is already normal #6644 [
@radarhere
]- Fixed seeking to an L frame in a GIF #6576 [
@radarhere
]- Consider all frames when selecting mode for PNG save_all #6610 [
@radarhere
]- Don't reassign crc on ChunkStream close #6627 [
@radarhere
]- Raise a warning if NumPy failed to raise an error during conversion #6594 [
@radarhere
]- Only read a maximum of 100 bytes at a time in IMT header #6623 [
@radarhere
]- Show all frames in ImageShow #6611 [
@radarhere
]- Allow FLI palette chunk to not be first #6626 [
@radarhere
]- If first GIF frame has transparency for RGB_ALWAYS loading strategy, use RGBA mode #6592 [
@radarhere
]- Round box position to integer when pasting embedded color #6517 [
@radarhere
]- Removed EXIF prefix when saving WebP #6582 [
@radarhere
]- Pad IM palette to 768 bytes when saving #6579 [
@radarhere
]- Added DDS BC6H reading #6449 [
@ShadelessFox
]- Added support for opening WhiteIsZero 16-bit integer TIFF images #6642 [
@JayWiz
]- Raise an error when allocating translucent color to RGB palette #6654 [
@jsbueno
]- Moved mode check outside of loops #6650 [
@radarhere
]- Added reading of TIFF child images #6569 [
@radarhere
]- Improved ImageOps palette handling #6596 [
@PososikTeam
]- Defer parsing of palette into colors #6567 [
@radarhere
]- Apply transparency to P images in ImageTk.PhotoImage #6559 [
@radarhere
]- Use rounding in ImageOps contain() and pad() #6522 [
@bibinhashley
]- Fixed GIF remapping to palette with duplicate entries #6548 [
@radarhere
]- Allow remap_palette() to return an image with less than 256 palette entries #6543 [
@radarhere
]- Corrected BMP and TGA palette size when saving #6500 [
@radarhere
]
... (truncated)
Sourced from pillow's changelog.
9.3.0 (2022-10-29)
Limit SAMPLESPERPIXEL to avoid runtime DOS #6700 [wiredfool]
Initialize libtiff buffer when saving #6699 [radarhere]
Inline fname2char to fix memory leak #6329 [nulano]
Fix memory leaks related to text features #6330 [nulano]
Use double quotes for version check on old CPython on Windows #6695 [hugovk]
Remove backup implementation of Round for Windows platforms #6693 [cgohlke]
Fixed set_variation_by_name offset #6445 [radarhere]
Fix malloc in _imagingft.c:font_setvaraxes #6690 [cgohlke]
Release Python GIL when converting images using matrix operations #6418 [hmaarrfk]
Added ExifTags enums #6630 [radarhere]
Do not modify previous frame when calculating delta in PNG #6683 [radarhere]
Added support for reading BMP images with RLE4 compression #6674 [npjg, radarhere]
Decode JPEG compressed BLP1 data in original mode #6678 [radarhere]
Added GPS TIFF tag info #6661 [radarhere]
Added conversion between RGB/RGBA/RGBX and LAB #6647 [radarhere]
Do not attempt normalization if mode is already normal #6644 [radarhere]
... (truncated)
d594f4c
Update CHANGES.rst [ci skip]909dc64
9.3.0 version bump1a51ce7
Merge pull request #6699 from hugovk/security-libtiff_buffer2444cdd
Merge pull request #6700 from hugovk/security-samples_per_pixel-sec744f455
Added release notes0846bfa
Add to release notes799a6a0
Fix linting00b25fd
Hide UserWarning in logs05b175e
Tighter test case13f2c5a
Prevent DOS with large SAMPLESPERPIXEL in Tiff IFDDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps joblib from 0.14.1 to 1.2.0.
Sourced from joblib's changelog.
Release 1.2.0
Fix a security issue where
eval(pre_dispatch)
could potentially run arbitrary code. Now only basic numerics are supported. joblib/joblib#1327Make sure that joblib works even when multiprocessing is not available, for instance with Pyodide joblib/joblib#1256
Avoid unnecessary warnings when workers and main process delete the temporary memmap folder contents concurrently. joblib/joblib#1263
Fix memory alignment bug for pickles containing numpy arrays. This is especially important when loading the pickle with
mmap_mode != None
as the resultingnumpy.memmap
object would not be able to correct the misalignment without performing a memory copy. This bug would cause invalid computation and segmentation faults with native code that would directly access the underlying data buffer of a numpy array, for instance C/C++/Cython code compiled with older GCC versions or some old OpenBLAS written in platform specific assembly. joblib/joblib#1254Vendor cloudpickle 2.2.0 which adds support for PyPy 3.8+.
Vendor loky 3.3.0 which fixes several bugs including:
robustly forcibly terminating worker processes in case of a crash (joblib/joblib#1269);
avoiding leaking worker processes in case of nested loky parallel calls;
reliability spawn the correct number of reusable workers.
Release 1.1.0
Fix byte order inconsistency issue during deserialization using joblib.load in cross-endian environment: the numpy arrays are now always loaded to use the system byte order, independently of the byte order of the system that serialized the pickle. joblib/joblib#1181
Fix joblib.Memory bug with the
ignore
parameter when the cached function is a decorated function.
... (truncated)
5991350
Release 1.2.03fa2188
MAINT cleanup numpy warnings related to np.matrix in tests (#1340)cea26ff
CI test the future loky-3.3.0 branch (#1338)8aca6f4
MAINT: remove pytest.warns(None) warnings in pytest 7 (#1264)067ed4f
XFAIL test_child_raises_parent_exits_cleanly with multiprocessing (#1339)ac4ebd5
MAINT add back pytest warnings plugin (#1337)a23427d
Test child raises parent exits cleanly more reliable on macos (#1335)ac09691
[MAINT] various test updates (#1334)4a314b1
Vendor loky 3.2.0 (#1333)bdf47e9
Make test_parallel_with_interactively_defined_functions_default_backend timeo...Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps nbconvert from 5.6.1 to 6.5.1.
Sourced from nbconvert's releases.
Release 6.5.1
No release notes provided.
6.5.0
What's Changed
- Drop dependency on testpath. by
@anntzer
in jupyter/nbconvert#1723- Adopt pre-commit by
@blink1073
in jupyter/nbconvert#1744- Add pytest settings and handle warnings by
@blink1073
in jupyter/nbconvert#1745- Apply Autoformatters by
@blink1073
in jupyter/nbconvert#1746- Add git-blame-ignore-revs by
@blink1073
in jupyter/nbconvert#1748- Update flake8 config by
@blink1073
in jupyter/nbconvert#1749- support bleach 5, add packaging and tinycss2 dependencies by
@bollwyvl
in jupyter/nbconvert#1755- [pre-commit.ci] pre-commit autoupdate by
@pre-commit-ci
in jupyter/nbconvert#1752- update cli example by
@leahecole
in jupyter/nbconvert#1753- Clean up pre-commit by
@blink1073
in jupyter/nbconvert#1757- Clean up workflows by
@blink1073
in jupyter/nbconvert#1750New Contributors
@pre-commit-ci
made their first contribution in jupyter/nbconvert#1752Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.5...6.5
6.4.3
What's Changed
- Add section to
customizing
showing how to use template inheritance by@stefanv
in jupyter/nbconvert#1719- Remove ipython genutils by
@rgs258
in jupyter/nbconvert#1727- Update changelog for 6.4.3 by
@blink1073
in jupyter/nbconvert#1728New Contributors
@stefanv
made their first contribution in jupyter/nbconvert#1719@rgs258
made their first contribution in jupyter/nbconvert#1727Full Changelog: https://github.com/jupyter/nbconvert/compare/6.4.2...6.4.3
6.4.0
What's Changed
- Optionally speed up validation by
@gwincr11
in jupyter/nbconvert#1672- Adding missing div compared to JupyterLab DOM structure by
@SylvainCorlay
in jupyter/nbconvert#1678- Allow passing extra args to code highlighter by
@yuvipanda
in jupyter/nbconvert#1683- Prevent page breaks in outputs when printing by
@SylvainCorlay
in jupyter/nbconvert#1679- Add collapsers to template by
@SylvainCorlay
in jupyter/nbconvert#1689- Fix recent pandoc latex tables by adding calc and array (#1536, #1566) by
@cgevans
in jupyter/nbconvert#1686- Add an invalid notebook error by
@gwincr11
in jupyter/nbconvert#1675- Fix typos in execute.py by
@TylerAnderson22
in jupyter/nbconvert#1692- Modernize latex greek math handling (partially fixes #1673) by
@cgevans
in jupyter/nbconvert#1687- Fix use of deprecated API and update test matrix by
@blink1073
in jupyter/nbconvert#1696- Update nbconvert_library.ipynb by
@letterphile
in jupyter/nbconvert#1695- Changelog for 6.4 by
@blink1073
in jupyter/nbconvert#1697New Contributors
... (truncated)
7471b75
Release 6.5.1c1943e0
Fix pre-commit8685e93
Fix tests0abf290
Run black and prettier418d545
Run test on 6.x branchbef65d7
Convert input to string prior to escape HTML0818628
Check input type before escapingb206470
GHSL-2021-1017, GHSL-2021-1020, GHSL-2021-1021a03cbb8
GHSL-2021-1026, GHSL-2021-102548fe71e
GHSL-2021-1024Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps mistune from 0.8.4 to 2.0.3.
Sourced from mistune's releases.
Version 2.0.2
Fix
escape_url
via lepture/mistune#295Version 2.0.1
Fix XSS for image link syntax.
Version 2.0.0
First release of Mistune v2.
Version 2.0.0 RC1
In this release, we have a Security Fix for harmful links.
Version 2.0.0 Alpha 1
This is the first release of v2. An alpha version for users to have a preview of the new mistune.
Sourced from mistune's changelog.
Changelog
Here is the full history of mistune v2.
Version 2.0.4
Released on Jul 15, 2022
- Fix
url
plugin in<a>
tag- Fix
*
formattingVersion 2.0.3
Released on Jun 27, 2022
- Fix
table
plugin- Security fix for CVE-2022-34749
Version 2.0.2
Released on Jan 14, 2022
Fix
escape_url
Version 2.0.1
Released on Dec 30, 2021
XSS fix for image link syntax.
Version 2.0.0
Released on Dec 5, 2021
This is the first non-alpha release of mistune v2.
Version 2.0.0rc1
Released on Feb 16, 2021
Version 2.0.0a6
</tr></table>
... (truncated)
3f422f1
Version bump 2.0.3a6d4321
Fix asteris emphasis regex CVE-2022-347495638e46
Merge pull request #307 from jieter/patch-10eba471
Fix typo in guide.rst61e9337
Fix table plugin76dec68
Add documentation for renderer heading when TOC enabled799cd11
Version bump 2.0.2babb0cf
Merge pull request #295 from dairiki/bug.escape_urlfc2cd53
Make mistune.util.escape_url less aggressive3e8d352
Version bump 2.0.1Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
Bumps numpy from 1.16.0 to 1.22.0.
Sourced from numpy's releases.
v1.22.0
NumPy 1.22.0 Release Notes
NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:
- Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.
- A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.
- NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.
- New methods for
quantile
,percentile
, and related functions. The new methods provide a complete set of the methods commonly found in the literature.- A new configurable allocator for use by downstream projects.
These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.
The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.
Expired deprecations
Deprecated numeric style dtype strings have been removed
Using the strings
"Bytes0"
,"Datetime64"
,"Str0"
,"Uint32"
, and"Uint64"
as a dtype will now raise aTypeError
.(gh-19539)
Expired deprecations for
loads
,ndfromtxt
, andmafromtxt
in npyio
numpy.loads
was deprecated in v1.15, with the recommendation that users usepickle.loads
instead.ndfromtxt
andmafromtxt
were both deprecated in v1.17 - users should usenumpy.genfromtxt
instead with the appropriate value for theusemask
parameter.(gh-19615)
... (truncated)
4adc87d
Merge pull request #20685 from charris/prepare-for-1.22.0-releasefd66547
REL: Prepare for the NumPy 1.22.0 release.125304b
wipc283859
Merge pull request #20682 from charris/backport-204165399c03
Merge pull request #20681 from charris/backport-20954f9c45f8
Merge pull request #20680 from charris/backport-20663794b36f
Update armccompiler.pyd93b14e
Update test_public_api.py7662c07
Update init.py311ab52
Update armccompiler.pyDependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase
.
CORD19 Biomedical Event Extraction First Release
event-extraction biomedical covid-19 cord-19 natural-language-processing bert multitask-learning deep-learning