News Crawl
News is a text genre that is often discussed on our user and developer mailing list.
Yet our monthly crawl and release schedule is not well-adapted to this type of content which is based on developing and current events. By decoupling the news from the main dataset, as a smaller sub-dataset, it is feasible to publish the WARC files shortly after they are written.
Yet our monthly crawl and release schedule is not well-adapted to this type of content which is based on developing and current events. By decoupling the news from the main dataset, as a smaller sub-dataset, it is feasible to publish the WARC files shortly after they are written.