Saturday, March 29, 2008

Revision of Assignment

On my webpage http://nmrpredict.orc.univie.ac.at/csearchlite/NMRSHIFTDB_March_2008.html I have proposed a reassignment of 10 signals (out of 16 !) solely based on spectrum prediction using CSEARCH despite the original assignment has been done by means of HH-COSY, HMQC and HMBC. I am glad that this proposals has been fully integrated into NMRShiftDB, obviously after extensive verification by another program according to the protocols of my web-server.

The statement 'This was possible, because the data are open' is definitely wrong - within a more professional system such a wrong entry would never be able to step from the 'purgatory database' into the 'production database.' The detailed analysis can be found on the webpage given above.

3 comments:

- Once more, thanks that you made the errors public.- I made the reassignments of the NMRShiftDB entry.

Comments: - Also PROFESSIONAL prediction tools like CSEARCH and ACD-Predictor give/gave prediction errors greater than 20 ppm ! - Also "professional/production databases" show severe assignment errors. - This is the reality DESPITE of data checking by a "nmr-robot-referee". - Therefore it is the person BEHIND the "nmr-robot-referee", who has to decide whether the data are reliable or not. Not the "robot-referee". - Assignments of this NMRShiftDB entry were qualified by experiments like cosy, hmqc and hmbc. - Therefore the reviewer/referee normally could assume that the assignments are reliable. - However he failed in this case, whatever may be the reason. - Therefore the misassignments of this NMRShiftDB entry are "common errors" as they can be found in many OTHER examples on your webpage. - Therefore errors of this type are not a speciality of NMRSHiftDB entries. - As already noticed, errors can be easily detected and removed by the community (social scientists), if the data are open. - Purgatory databases are an indication that there are thousands of misassignmnets in the literature. Despite the usage of a "nmr-robot-referee".

Thanks a lot for your comments ! As you can see from my examples (meanwhile 10 are online) - I DO NOT FOCUS on a specific journal/publisher/database, the only specification necessary to appear in my 'Hall of Fame' is a deviation of more than approx. 15-20ppm between experimental and predicted shift value AND manual verification, that there is really some error and not a specific prediction problem !

You said: ****snip***: Errors can be detected and removed by the community (social scientists): ***snip*** In principle true, but the community of social scientists seems to consist mainly of you (contributing to NMRShiftDB AND doing data curation), the second 'social scientists' seems to be Anthony Williams (he verified the errors I had found - and maybe found some additional - during the first debate last year), the third social scientist seems to be me, because I do regular checks and it seems that the most sophisticated checking-procedures are available in CSEARCH. (Maybe some people might think here 'social' should be substituted by 'antisocial' in my case, because my criticism is always very direct ;-)) )

The error-detection and curation itself is DEFINITELY NOT based on or limited to OPEN data. In this NMRSHIFTDB-example, I have simply created 2 datasets manually and run them through CSEARCH. I would apply exactly the same procedure if I am proud user of any other (closed) package and I would report my findings to the supplier. The only difference might be faster curation in an open system ( as long as there is some 'social' NMR-spectroscopist like you, doing this cumbersome work)