"The astronomical community did not believe we would ever really make the data public," says Mr. Szalay. The typical practice in the mid-1990s was to guard data because it was so difficult to get telescope time, and scholars did not want to get scooped on an analysis of something they gathered.

One incident demonstrates the mood at the time. A young astronomer saw a data set in a published journal and wanted to reanalyze it, so he asked his colleague for the numbers. The scholar who published the paper refused, so the junior scholar took the published scatterplot, guessed the numbers, and published his own analysis. The original scholar was so upset that he called for the second journal to retract the young scholar's paper.

Mr. Szalay said that astronomers changed their minds once the first big data sets hit the Web, starting with some images from NASA, followed by the official release of the first Sloan survey results in 2000.

I was surprised by that anecdote, but then I only started working in physics in '97. I recall though converting one or the other figure into a table to be able to reuse the data - an extremely annoying procedure, even with the use of suitable software. However, these were figures from decade-old textbooks, the data of which I needed to check whether a code I had written would make a sufficiently good fit. And 5 years back or so, when I had a phase of sudden interest in neutrino physics, I noticed that while one finds plenty of papers on the results of Monte-Carlo simulations to fit neutrino experiments, the data used is not for all experiments listed. In one case, I ended up browsing a bulk of Japanese PhD thesis (luckily in English) till I found the tables in the appendix of one, and then I had to type them off. Not sure how much the situation in that area has changed since. But change is inevitably on its way...

17 comments:

If scientists are collecting data for money of public, they're expected to submit such data to the public with no mercy - or they can look for research job in some private company.

The same applies for many scientific journals held by private companies, who are making money just by keeping public data behind firewalls. Such approach could be understood at the time of paper based information, because paper medium costs some money - but now such behavior is apparently amoral and harmful for the rest of society.

Your idealism is all well and nice, but a little naive. If you force scientists to make their data public before they themselves have had a chance to analyze it, then the risk is they'll wonder what's the point of doing the experiment to begin with. They could as well sit patiently and wait for somebody else to do the work. That's not so much an academic consideration, but an economic one. Scientists, as well as funding agencies, need to have incentives to invest money, time and effort. That you, as a taxpayer (I presume), want somebody to do this and that isn't going to convince anybody. You'll have to make a point it's a change in the interest of the society, and for that you need to consider the, potentially not beneficial, change on the progress of research in the scientific community. Best,

Astronomy caved when only distributed computation (tens of thousands of eyeballs) had sufficient capacity to analyze data tsunamis. Accelerator physics is self-contained (if both the analysis model and the interaction model from which it evolves are valid.) SETI was wholly external.

Have the idea, generate the data, analyze the data, make sense of it all; secure grant funding. A division of labor is evident. PIs get glory, grad students get degrees, everybody else is euchred.

/*...if you force scientists to make their data public before they themselves have had a chance to analyze it..*/

I didn't say something like this.

But at the moment, when they publish first analysis of it, they should make the underlying data available, too. If nothing else, then for the sake of reproducibility and falsifiability of their results.

BTW In general, the experimentalist aren't the best arbiters for independent analysis of their own data. The separation of measurements and analysis would remove many cases of less or more intentional misconduct.

Therefore, the making of results available is a matter of public control, too.

It is indeed about cross pollination of the sectors of different science branches, that each can lend to the other, an advancement of thinking while pursuing one's own distinctive branch of the trade, where individually, a scientist may have focused on.

In my opinion it may, and is helpful to combine such efforts to see in new ways?

Bee as a lay person with a deep interest in these sciences and as the years go by, the beautiful pictures can not accurately describe the work of many of your colleagues who bring forward a response in a illuminative style picture. It helps us see currently the real vastness of the cosmos with real time data, and experimental data processing that helps us to understand the world we live in better.

In January of 2004, Ben Segal and François Grey of the IT Department were asked to plan an outreach event for CERN’s 50th anniversary that would allow people around the world to get an impression of the computational challenges facing the LHC. Ben and François got in touch with Dave Anderson, the Director of SETI@home, who was just beginning to test the new BOINC platform his team had developed. At the same time, a couple of Danish students got in touch with François, eager to find an exciting project for their Masters thesis. This was the beginning of LHC@home. Christian Søttrup and Jakob Pedersen worked furiously all spring and summer to get SixTrack and BOINC to function together. You can read their thesis , which describes the opportunities for combining public resource computing, such as LHC@home, with Grid computing like the LHC Computing Grid.LHC@Home

For example, such thinking has been correlated to data mining (use of your screens)with regard to Seti and the information from LIGO data? I vaguely remember this association for public impute but I could be wrong here.

So the idea "exemplified in the article" and the search over the vast ocean for Mr. Gray, has a corroboratory feel to what can be asked of the public. The Public can represent his colleagues at the time, as shown in the article you linked.

The goal of the Large Hadron Collider (LHC) is to link roughly 6,000 scientists so they can perform large-scale experiments and simulations to help the world better understand subatomic particles. The grid will ultimately link more than 200 research institutions.

"This service challenge is a key step on the way to managing the torrents of data anticipated from the LHC," Jamie Shiers, manager of the service challenges at CERN, said in a statement. "When the LHC starts operating in 2007, it will be the most data-intensive physics instrument on the planet, producing more than 1,500 megabytes of data every second for over a decade." (Sorry link dead) BOld added for emphasis

Everything that you need is on the web: Cosmology – a collection of 70/80/90 year old postulates [a fancy word for assumptions]1, vigorously defended by adherents of these several quaintly bizarre constructs, are now stagnating unexamined in a highly compartmentalized collection of fantastic data sets, and thereby obscuring the path to understanding. These postulates have less physical support then any religion, consequentially require a much more rigorous defense system – heresy; damnation to the unbeliever. Heresy be damned. Feynman’s [start from scratch] principals, Sven’s [Occam razor] approach and “on the web” modern NASA technology to the rescue provides us with a very simple view of our Universe:Abstract of Sven’s 12 year research of NASA and other equivalent data finds that the - Triangulation and constructive limitations mapping the universe, using the coordinates of the Hubble North and South Deep Field studies and the WMAP Eridanus Cold Spot observations, combined with the physics of light’s line of sight, greatly restrict the geometry and location of Earth along with our related local family of galaxies to a position right next to the epicenter of the Big Bang’s Explosive conversion of Dark Energy2 into matter as demonstrated by Stanford Labs. We are some 0.2% of the distance from the epicenter to the CMB. Expanded treatment of this article* available at my web site: *Center of the Universe Located by Triangulation of NASA Data 9/25/10 http://www.allnewuniverse.com/Center-by-Triangulation.pdf

Sorry for the misunderstanding. It seems then that you agree with me for what the outcome is concerned. However, either way, the point of my comment was that the reason you give to justify your opinion is unconvincing and not well thought through. "The public" wants a lot of things, and sometimes these wishes are in conflict with each other. That's why we don't have direct democracies. If you want to argue for public access to experimental data, once the analysis is published, you'll have to make a case it's beneficial for scientific process, rather than claiming the public "expects the scientists" to do this and that. Which, I'm quite sure, is not true anyway. The majority of the public doesn't care one way or the other.

Besides, a suitable analysis of raw data is most often pretty much impossible without access to the experiment. The raw data itself would be pretty much useless for anybody else, so a "separation of measurement and analysis" that you ask for simply isn't feasible. Best,

Well, yes, I agree with you, a cross-pollination between the computer sciences and the natural sciences is certainly something that - as one can already see - has a lot of promise. It has also its pitfalls though, so one needs to watch out. For example, reliance on wide-spread, shared, software increases the risk for systematic mistakes. And, as we discussed several times earlier, crowd-sourcing only works well under certain circumstances, some details of which are not well understood. Best,

Sorry, but I'll have to disagree. It clearly isn't. Both Stefan and I could give you many examples of topics that were poorly, if at all, covered online, and sources that were not available. I'll give you just two examples: 1) I was looking for the original version of Einstein's 1916 letter to Dällenbach, but for all I can tell it's not on the web. This is true for a lot of historic documents btw. 2) Stefan and I recently were trying to figure out some details of the Lund String model. The Wikipedia entry has basically no content, and refers to the paper where it was first laid out, but that's neither a complete nor a particularly useful reference. The best references in this case are good old-fashioned text-books. This is the case for many not-so-recent topics in physics if you need some details. You're still better off going into a library and looking up a textbook on heavy-ion physics or waveguides, or whatever it is you're interested in. It is, at least at the moment, far from clear to me whether all that now missing information will ever make it on the web. For what physics is concerned, it's probably just a matter of time, but there's the risk that things that are not considered interesting right now will never make it, and become increasingly forgotten because people believe "everything is on the web"... Best,

An intriguing reference regarding the ongoing struggle to have complete, immediate and free access of the work of researchers to be made available to their colleagues and more generally beyond. The truth of it however is this has been a tug of war for hundreds, if not thousands years, between those who labour to produce results and those that wish access.

Your article has me mindful of a particular battle which ensued between Isaac Newton and the first astronomer royal, John Flamsteed, regarding having the hastened publication of Flamsteed’s catalogue of the then observable heavenly bodies before Flamsteed was content they were complete and concise enough to be published. Essentially Newton wished to have access to the data to aid his calculations to provide a more accurate account of motion of the moon, which he wished to be included in the planned second edition of his Principia Mathematica. On the other hand Flamsteed thought that he should be the one to decide when his observations where complete enough as to have them be published, to become available and thus be more generally useful. I then find it interesting that Szalay to hold a position that is empathetic to both sentiments and taken steps as to act upon it.

”The observatory was founded to the intent that a complete catalogue of the fixed stars should be composed by observations to be made at Greenwich and the duty of your place to furnish the observations. But you have delivered an imperfect catalogue without so much as sending the observations of the stars that are wanting, and I hear the press now stops for want of them. You are therefore desired either to send the rest of your catalogue to Dr. Arbuthnot, or at least to send him the observations which are wanting to complete it, that the press might proceed. And if instead thereof you propose anything else or make any excuses or unnecessary delays it will be taken for an indirect refusal to comply with her Majesty’s order. Your speedy and direct answer and compliance is expected.”

”I have now spent 35 years in composing and work of my catalogue, which may in time be published for the use of her Majesty’s subjects and ingenious men the world over. I have endured long and painful tempers by my night watches and day labours. I have spent a large sum of money above my appointment, out of my own estate, to complete my catalogue and complete my work under my own hands. Do not tease me with banter by telling me yet these alterations are made to please me when you are sensible nothing can be more displeasing nor injurious than to be told so. .”

science purely for the intellectual love of it, as distinct from the highly professionalized and career-oriented science of universities and industry) that's enabled by the deluge of highly accessible data, powerful low-cost observational, computational and experimental gear enabled in part by digital technologies, and the ability to collaborate and disseminate results globally using the web.

There is of course parameters which define the length and extent to which observation has been enhanced through perceptions of educational understanding for sure, so in a sense, the areas of observation may be consistent with the category of that education?

Do you want to constraint observational data from the possibility of the intellectual capabilities inherent in the observational data constraint toward a category of university given that perception can be extended given the educational possibility of the public moving forward their education beyond universities?

Hanny van Arkel, a Dutch schoolteacher, made a major astronomy discovery with a public Web site of telescope images.

Now that naturalism has become an accepted component of philosophy, there has recently been interest in reassessing Kuhn's work in the light of developments in the relevant sciences, many of which provide corroboration for Kuhn's claim that science is driven by relations of perceived similarity and analogy to existing problems and their solutions (Nickles 2003b, Nersessian 2003). It may yet be that a characteristically Kuhnian thesis will play a prominent part in our understanding of science.Thomas Kuhn

While I respect the cautions, I do not like creativity constraint according to the parameters that are applied to it's "own limit of viewing.":)

One has to take solace in the fact that one's observation is constraint by the parameters it sets for itself, and all the world in observation, appears as such?

Shall I constraint your observations to your knowledge, or give you the opportunity to grow?