Code

Papers

2021

  • Corpulyzer: A Novel Framework for Building Low Resource Language Corpora — Bilal Tahir, Muhammad Amir Mehmood – University of Engineering and Technology, Lahore, Pakistan
  • A COVID-19 news coverage mood map of Europe — Frankie Robertson, Jarkko Lagus, Kaisla Kajava – University of Jyväskylä, Finland; University of Helsinki, Finland
  • Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets — Isaac Caswell, Julia Kreutzer, Lisa Wang, Ahsan Wahab, Daan van Esch, Nasanbayar Ulzii-Orshikh, Allahsera Tapo, Nishant Subramani, Artem Sokolov, Claytone Sikasote, Monang Setyawan, Supheakmungkol Sarin, Sokhar Samb, Benoît Sagot, Clara Rivera, Annette Rios, Isabel Papadimitriou, Salomey Osei, Pedro Javier Ortiz Suárez, Iroro Orife, Kelechi Ogueji, Rubungo Andre Niyongabo, Toan Q. Nguyen, Mathias Müller, André Müller, Shamsuddeen Hassan Muhammad, Nanda Muhammad, Ayanda Mnyakeni, Jamshidbek Mirzakhalov, Tapiwanashe Matangira, Colin Leong, Nze Lawson, Sneha Kudugunta, Yacine Jernite, Mathias Jenny, Orhan Firat, Bonaventure F. P. Dossou, Sakhile Dlamini, Nisansa de Silva, Sakine Çabuk Ballı, Stella Biderman, Alessia Battisti, Ahmed Baruwa, Ankur Bapna, Pallavi Baljekar, Israel Abebe Azime, Ayodele Awokoya, Duygu Ataman, Orevaoghene Ahia, Oghenefego Ahia, Sweta Agrawal, Mofetoluwa Adeyemi – Google Research; Masakhane NLP; Turkic Interlingua; Haverford College; RobotsMali; Intel Labs; University of Zambia; Google; AIMS-AMMI; Inria; University of Zurich; Stanford University; Kwame Nkrumah University of Science and Technology; Sorbonne Université; Niger-Volta LTI; University of WaterlooqUniversity of Electronic Science and Technology of China; University of Notre Dame; Bayero University Kano; University of South Florida; Hugging Face; Jacobs University Bremen; University of Moratuwa; EleutherAI; Obafemi Awolowo University; University of Ibadan; Instadeep; University of Maryland; Defence Space Administration Abuja
  • Documenting the English Colossal Clean Crawled Corpus — Jesse Dodge, Maarten Sap, Ana Marasovic, William Agnew, Gabriel Ilharco, Dirk Groeneveld, Matt Gardner – Paul G. Allen School of Computer Science & Engineering, University of Washington, USA; Allen Institute for Artificial Intelligence, USA
  • mT5: A massively multilingual pre-trained text-to-text transformer — Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, Colin Raffel – Google Research
  • Detecting Phishing Sites — An Overview — P. Kalaharsha, B. M. Mehtre – Institute for Development and Research in Banking Technology (IDRBT), Hyderabad, Indiab; School of Computer Science and Information Sciences (SCIS), University of Hyderabad, Hyderabad, India

2020

2019

2018

2017

2016

2015

2014

2013

2012