Continually updated repository housing cross lingual and mono lingual summarization data for different language pairs.
- Added English-English v0.1.
- English - English
document: the document text.document_lang: predominant langcode of the document text.summary: the summary text.summary_lang: predominant langcode of the summary text.cosine_similarity: cosine distance between the distUSE embeddings for both the document and summary.unigram_overlap: lexical overlap(0-1)or translated lexical overlap (for cross lingual pairs) quotient. Higher means more unigram overlap between the summary and the document.
- TBD
P.S.: More language pairs will be added soon.