tag:blogger.com,1999:blog-2196533844101218567.post9203879894604847303..comments2016-04-19T00:15:44.781-07:00Comments on Tatoeba Project Blog: Submission policy - What kind of content do we want?Allan SIMONhttp://www.blogger.com/profile/13749792303223682327noreply@blogger.comBlogger2125tag:blogger.com,1999:blog-2196533844101218567.post-40424959840692415032010-08-07T09:22:47.659-07:002010-08-07T09:22:47.659-07:00I have added links to a discussion we had on the W...I have added links to a discussion we had on the Wall about this. <br /><br />=&gt; http://tatoeba.org/eng/wall/show_message/1085<br /><br /><br />I&#39;ll copy my response here as well:<br /><br />Our position is: people can do whatever they like. If they want to add all the possible variations, they can. If they don&#39;t want to, they don&#39;t have to. <br /><br />It doesn&#39;t hurt to have &quot;near duplicates&quot;. It just make Tatoeba a bit noisy. But that&#39;s our job, as engineers, to figure out how to filter and organize data so that it can be used efficiently for language learners.<br /><br />Meanwhile, as sysko said, variations of sentences can be very useful for language processing, so we shouldn&#39;t delete them.<br /><br />[http://tatoeba.org/eng/wall/show_message/1237#message_1237]Tranghttps://www.blogger.com/profile/16592803601736153085noreply@blogger.comtag:blogger.com,1999:blog-2196533844101218567.post-17345099279994090122010-08-03T17:53:48.727-07:002010-08-03T17:53:48.727-07:00A quick clarification/reconsideration request on n...A quick clarification/reconsideration request on near duplicate sentences. So far, I&#39;ve resisted adding near duplicate translations as I don&#39;t see how these would add any real value when there are already heaps of search results to wade through.<br /><br />Most commonly it&#39;s something like a translation of &#39;they&#39; from English to Icelandic which can be any of &#39;þeir&#39;, &#39;þær&#39; or &#39;þau&#39; depending on whether they are all male, all female or a mixture thereof. Other times it&#39;s synonyms or all-but identical parts of sentences.<br /><br />In these cases I&#39;ve added comments on these alternatives, looking forward to the time where there may be fields for meta-data (I sent an email to Allan/sysco with this feature request and he noted that this had been discussed in the dev team). I&#39;d be hesitant to, but is it really better to add bunches of near-redundant sentences?Anonymousnoreply@blogger.com