Implementation of XLNet that can load pretrained checkpoints

CyberZHG, updated 🕥 2022-01-22 11:15:20

Keras XLNet

Version License


Unofficial implementation of XLNet. Embedding extraction and embedding extract with memory show how to get the outputs of the last transformer layer using pre-trained checkpoints.


bash pip install keras-xlnet


Fine-tuning on GLUE

Click the task name to see the demos with base model:

|Task Name |Metrics |Approximate Results on Dev Set| |:-------------------------------|:----------------------------:|----:| |CoLA |Matthew Corr. |52 | |SST-2|Accuracy |93 | |MRPC |Accuracy/F1 |86/89| |STS-B|Pearson Corr. / Spearman Corr.|86/87| |QQP |Accuracy/F1 |90/86| |MNLI |Accuracy |84/84| |QNLI |Accuracy |86 | |RTE |Accuracy |64 | |WNLI |Accuracy |56 |

(Only 0s are predicted in WNLI dataset)

Load Pretrained Checkpoints

```python import os from keras_xlnet import Tokenizer, load_trained_model_from_checkpoint, ATTENTION_TYPE_BI

checkpoint_path = '.../xlnet_cased_L-24_H-1024_A-16'

tokenizer = Tokenizer(os.path.join(checkpoint_path, 'spiece.model')) model = load_trained_model_from_checkpoint( config_path=os.path.join(checkpoint_path, 'xlnet_config.json'), checkpoint_path=os.path.join(checkpoint_path, 'xlnet_model.ckpt'), batch_size=16, memory_len=512, target_len=128, in_train_phase=False, attention_type=ATTENTION_TYPE_BI, ) model.summary() ```

Arguments batch_size, memory_len and target_len are maximum sizes used for initialization of memories. The model used for training a language model is returned if in_train_phase is True, otherwise a model used for fine-tuning will be returned.

About I/O

Note that shuffle should be False in either fit or fit_generator if memories are used.

in_train_phase is False

3 inputs:

  • IDs of tokens, with shape (batch_size, target_len).
  • IDs of segments, with shape (batch_size, target_len).
  • Length of memories, with shape (batch_size, 1).

1 output:

  • The feature for each token, with shape (batch_size, target_len, units).

in_train_phase is True

4 inputs:

  • IDs of tokens, with shape (batch_size, target_len).
  • IDs of segments, with shape (batch_size, target_len).
  • Length of memories, with shape (batch_size, 1).
  • Masks of tokens, with shape (batch_size, target_len).

1 output:

  • The probability of each token in each position, with shape (batch_size, target_len, num_token).


AttributeError: 'Node' object has no attribute 'output_masks'

opened on 2022-07-23 08:16:21 by yangbin-Neil

Describe the Bug I am getting AttributeError: 'Node' object has no attribute 'output_masks' when I use keras-xlnet Version Info keras 2.2.0 tensorflow 1.9.0 keras-xlnet 0.16.0 scikit-learn 0.19.1 numpy 1.19.5 python 3.6.13 Minimal Codes To Reproduce The location of the error is as follows:

File "D:\anaconda3\envs\Xlnet-gru-crf36new\lib\site-packages\keras_xlnet\", line 128, in build_xlnet )([token_embed, query_input]) File "D:\anaconda3\envs\Xlnet-gru-crf36new\lib\site-packages\keras\engine\", line 446, in call previous_mask = _collect_previous_mask(inputs) File "D:\anaconda3\envs\Xlnet-gru-crf36new\lib\site-packages\keras\engine\", line 1326, in _collect_previous_mask mask = node.output_masks[tensor_index] AttributeError: 'Node' object has no attribute 'output_masks' ```python def _collect_previous_mask(input_tensors): """Retrieves the output mask(s) of the previous node.

# Arguments
    input_tensors: A tensor or list of tensors.

# Returns
    A mask tensor or list of mask tensors.
input_tensors = to_list(input_tensors)
masks = []
for x in input_tensors:
    if hasattr(x, '_keras_history'):
        inbound_layer, node_index, tensor_index = x._keras_history
        node = inbound_layer._inbound_nodes[node_index]
        mask = node.output_masks[tensor_index]             # I got an error here, but I don't know why
if len(masks) == 1:
    return masks[0]
return masks


Zhao HG

Knowledge is bacon. Please don't send emails.

GitHub Repository Homepage

keras xlnet language-model nlp glue