The '''ACL Data and Code Repository''' is a repository of data (e.g., hand-labeled text, hand-parsed text, feature vectors for machine learning, etc.) and source code (e.g., taggers, parsers, chunkers, etc.) for computational linguistics and natural language processing. The goal of the repository is to make it easier for researchers to replicate each other's work and to compare different approaches using the same benchmarks.

The '''ACL Data and Code Repository''' is a repository of data (e.g., hand-labeled text, hand-parsed text, feature vectors for machine learning, etc.) and source code (e.g., taggers, parsers, chunkers, etc.) for computational linguistics and natural language processing. The goal of the repository is to make it easier for researchers to replicate each other's work and to compare different approaches using the same benchmarks.

−

If you are contributing source code, you should consider [http://sourceforge.net/ SourceForge] or another code hosting service, instead of the ''ACL Data and Code Repository''. These services have features, such as version control and bug tracking, that are not available here. The ''ACL Data and Code Repository'' is more suitable for static, archival, historical source code, rather than dynamic, evolving source code.

+

The ''ACL Data and Code Repository'' is experimental. Upload file size is currently limited to '''8 MB'''. If you encounter problems, let [[User_talk:Pdturney|us]] know.

−

The ''ACL Data and Code Repository'' is experimental. There is a limit to the size of an upload. If you pass the limit or encounter other problems, let [[User_talk:Pdturney|us]] know.

+

== Downloading ==

−

== Instructions for contributors ==

+

* [[Resources by Date (Repository)]]

−

* '''Metadata''': Each item in the repository must have an associated metadata entry in the ACL Wiki. Choose a good name for your contribution and add "(Repository)" to the name (e.g., "Wall Street Journal Corpus (Repository)"). Add a link below, under ''Data'' or ''Source Code'', to a new ACL Wiki entry, using this name. Click on the link to begin editing the metadata for your new entry.

+

== Uploading ==

−

* '''Contributor''': The first entry in the metadata should be the name of the person who is depositing the data or code. Include contact information, such as a link to your personal web page. Include the date the deposit was made.

+

* [[Instructions for contributors (Repository)]]

−

−

* '''Copyright''': The second entry in the metadata should state who owns the copyright for the data or code, and the date of the copyright. If you (the depositor) do not own the copyright, then you must explicitly state that the owner of the copyright has granted you permission to contribute the data or code to the ''ACL Data and Code Repository''. Contributions that violate copyright will be deleted.

−

−

* '''Licensing''': State the terms under which the data or code may be used. For data, we suggest one of the [http://creativecommons.org/ Creative Commons] licenses. For code, we suggest one of the [http://www.gnu.org/licenses/gpl.html GNU] licenses. You may use any other widely recognized license. You should not create your own special custom license. Contributions without license terms will be deleted.

−

−

* '''Citation''': State how (or whether) you would like the contribution to be cited or acknowledged in any publications that result from using the contribution.

−

−

* '''Description''': Briefly describe the data or code. A longer description and documentation should be included in the download package.

−

−

* '''Packaging''': The item should be a .zip or .gz file. If you have multiple files (and you should have at least the data or code file plus the documentation file), bundle them together into a single .zip or .gz file.

−

−

* '''Uploading''': Upload [[Special:Upload|here]].

−

−

* '''Linking''': Add a link to the uploaded file in the metadata page.

−

−

== Data ==

* [[Template for Data (Repository)]]

* [[Template for Data (Repository)]]

−

−

* [[CLAIR collection of fraud email (Repository)]]

−

−

== Source Code ==

* [[Template for Source Code (Repository)]]

* [[Template for Source Code (Repository)]]

−

[[Category:Data and code repository]]

+

[[Category:Data and code repository| ]]

Latest revision as of 05:03, 19 August 2012

The ACL Data and Code Repository is a repository of data (e.g., hand-labeled text, hand-parsed text, feature vectors for machine learning, etc.) and source code (e.g., taggers, parsers, chunkers, etc.) for computational linguistics and natural language processing. The goal of the repository is to make it easier for researchers to replicate each other's work and to compare different approaches using the same benchmarks.

The ACL Data and Code Repository is experimental. Upload file size is currently limited to 8 MB. If you encounter problems, let us know.