Abstractive Pre-trained Models & Results

BART Converted to LongformerEncoderDecoder

Important

The models in this section are the output from the convert_bart_to_longformerencoderdecoder.py script without any gradient updates. This means that they need to be fine-tuned on a long document summarization dataset, such as Arxiv-PubMed, in order to create a model that can summarize long sequences.

The additional position embeddings for these models were initialized by copying the embeddings of the first 512 positions. This initialization is crucial for the model performance (check table 6 in the longformer paper for performance without this initialization).

The models output from the convert_bart_to_longformerencoderdecoder.py script do not work for long documents without further training. Tables 6 and 11 in the longformer paper suggest that models converted to be able to handle long content may perform well before any additional gradient updates. However, this does not appear to be true for summarization. The converted facebook/bart-large-cnn model from huggingface/transformers (aka longformer-encdec-bart-large-cnn-converted) produces almost random summaries that rarely pertain to the input document. Thus, these models need to be fine-tuned on a long document summarization dataset.

These are huggingface/transformers models, so they need to be used with the --model_name_or_path option. They can also be loaded directly in huggingface/transformers using LEDForConditionalGeneration.from_pretrained().

The Google Drive folder containing my contributions to the below models is available at this link.

Name (Shortcut Code)

Initialized From

GDrive Download

allenai/led-base-16384

facebook/bart-base

allenai/led-large-16384

facebook/bart-large

HHousen/distil-led-large-cnn-16384

sshleifer/distilbart-cnn-12-6

Folder Link

Note

In pervious versions of TransformerSum, this section listed models that could be used with the outdated LED model (using custom versions of huggingface/transformers and allenai/longformer). Those models can still be found in this Google Drive Folder.

arXiv-PubMed

Name

Comments

Model Download

Data Download

led-base-4096-arxiv-pubmed

None

Model & All Checkpoints

Not yet..

led-large-4096-arxiv-pubmed

None

Not yet…

Not yet..

led-base-16384-arxiv-pubmed

None

Not yet…

Not yet..

led-large-16384-arxiv-pubmed

None

Not yet…

Not yet..

arXiv-PubMed ROUGE Scores

Test set results on the arXiv-PubMed dataset using ROUGE F1.

Name

ROUGE-1

ROUGE-2

ROUGE-L

ROUGE-L-Sum

led-base-4096-arxiv-pubmed

Not yet…

Not yet…

Not yet…

Not yet…

led-large-4096-arxiv-pubmed

Not yet…

Not yet…

Not yet…

Not yet…

led-base-16384-arxiv-pubmed

Not yet…

Not yet…

Not yet…

Not yet…

led-large-16384-arxiv-pubmed

Not yet…

Not yet…

Not yet…

Not yet…

Individual ArXiv and PubMed models

The huggingface model hub has two pre-trained models for long text summarization: allenai/led-large-16384-arxiv and patrickvonplaten/led-large-16384-pubmed. These models can be used with pipelines to easily summarize long documents. Please see their model cards (by clicking on their names above) for more information.