For over a decade, the SETI@home experiment has engaged millions of people in the search for extra-terrestrial intelligence by harnessing the spare CPU cycles of project participants to process data collected at the Arecibo radio telescope. The immensely powerful supercomputer formed by SETI@home is enabling the most sensitive and thorough sky survey for extraterrestrial technological radio emission ever performed. Today we are launching a new public participation SETI project to complement SETI@home – SETI Brainstorm. In addition to offering the opportunity to passively process SETI data, we are now making some of the raw data from our observations available to the world and asking YOU to take a look with your own eyes, ears and algorithms. Our experiments generate massive amounts of data, often many gigabits per second, so for now we will only distribute observations of a single target. If response to this project is strong, we will find ways to make even more of our observations accessible.

The initially released data are observations of the binary star system 55 Cancri collected during the commissioning phase of our Kepler Field SETI experiment at the Green Bank Telescope (GBT). 55 Cancri A (the larger of the two stars in the binary system) is known to host at least five extrasolar planets, and there may be more. The 55 Cancri system is about 40 ly away from our own Solar System in the constellation Cancer. 55 Cancri was the target of a directed interstellar communication signal in 2003, scheduled to arrive in 2044.

These data are now available for BitTorrent download using the links at SETI Brainstorm. If you have questions, please post here first. SETI@home and Kepler SETI scientists will be checking in regularly to help. Good luck, and good hunting!

Our hope is that interested participants will search these data using their own ideas and algorithms, create software tools for visualizations and perhaps audio realizations of the data and share their results back with the community. Signal detection in time series data is a challenge in many branches of science and engineering, and we know that there are techniques in use in other fields that could be useful for SETI. Moreover, we are frequently asked by both astronomers and the public if raw data is available, and in the past it has been difficult to provide because we didn't have easy to use code to extract and explain our data. This project is also designed to simply make our data more accessible to a wider group of people.

Andrew, it is sometimes difficult for me, who live in Italy, to understand why there are different SETI projects related to the Berkeley campus of the University of California. I am a SETI cruncher but also an Einstein cruncher, a QMC cruncher, a CPDN cruncher, a Beta tester of LHC@home 2.0 using VirtualBox. Your project seems to me a carbon copy of SetiQuest which I tried to understand at www,setiquest.org. I have downloaded the Open SonATA source code and tried to compile it on my SuSE Linux 11.1. I did not succeed because it requires a SuSE Linux 11.3 which I have but do not want to install, at least now. So I think what you are asking me is beyond my skills. I can compile a program like mplayer and mplayerplug-in but most of the programs I use are executable written and compiled by others. On my BOINC_VM virtual machine I am running CERN programs written in FORTRAN in a Scientific Linux environment.That is all I can do. Cheers.
Tullio

Andrew,
Interesting idea. But i can't see this taking off. People, volunteers, might be useful at seeing something unusual in a visual image displaying a signal on a graph. But offering dot py files to the public and asking them to spot sequences in the numbers will get you nowhere.

To be honest, it sounds like you would be better off writing to the editor of one of the astronomy journals or science magazines and asking them to publish a short article asking for participation from the science community to help process the data using new methods.

This will flop if your asking the public to manually scan pages of numbers and unprocessed data. Take example from the success they have had with Galaxy Zoo. Galaxy Zoo is successful because its an interesting human task to look a pictures.

Thanks for your thoughts. Our intent is not for the general public to rotely scan pages of numbers. Hopefully the smaller fraction of people with some experience in signal processing and electrical engineering will be able to put together tools to translate the data into a form where the less technically inclined majority can participate. We are working on ways to do this within our group, but we would love to have your help. Galaxy Zoo is a fantastic project, but we are still experimenting with depictions of our data that would be effective at engaging the public's minds as well as their machines. Again, one of the goals of this project is to 'brainstorm' ways of looking at and thinking about our data. If you have some specific ideas, please share them!

I think that this initiative is worthwhile. Seti has limited funds and a finite number of people to work on the ever increasing amount of data that we continue to feed into them. By providing some of this raw data to others, this allows hundreds more, maybe thousands of other scientists and gifted people to have a look at it and analyse it, in maybe different ways than Seti does.

It is just extending the basic principle of distributed computing a little further. DC computing collects the data, now DC computing is distributing the most promising parts of it back, for further analysis.

Good Luck !Those are my principles, and if you don't like them ... well, I have others.
Groucho Marx 1895-1977

I also have mine, and if you don't like them ... tough, live with it.
Chris S 2017

This is a fantastic initiative and exactly the type of openness in science we need. There are literally millions of IT professionals globally. Many participate (like actually write software) in a wide variety of open source projects - think Linux, Firefox, OpenOffice, BOINC - we are talking millions of lines of code donated by qualified developers. Almost all of whom grew up on star trek, run Seti@home, and keep up with news on Kepler. Some, such as myself, even have a statistical programming background, and run multi-million dollar IT projects. You have plenty of potential talent.

What you need is better marketing. The only reason I heard about this effort is because I randomly happened upon the forums. You need to get the word out to the right crowd. You also need to find a few key people who can form a "backbone" group that can answer questions about development environments, useful libraries, kick around algo ideas, database structures, etc...

The only downside is that you're going to initially get a huge amount of skepticism around a project like this. Most scientists or software developers don't think the "public" can seriously contribute to science and typically exist in a culture where your data, your technology, and your methods must be strictly guarded (maybe the reason you were only able to release one small piece of data). What if someone publishes before us?!
These folks won't understand crowd sourcing project.

I strongly encourage you to keep this idea alive despite the nay-sayers.

Several years ago I happened to crunch workunits with Seti@home v3.08 and returned 486 classic workunits back to the server.

If a typical Seti@home user was allowed access to the material which obviously has been collected over the years, should this user still be able to draw his or her personal conclusions from the material that is readily available to him or her?

What if something should be visible in this material?

It's a good point. We should probably have a few different data samples. Some have signals (maybe pulsars, satellites, or terrestrial interference), so we can test any algorithms we develop.

...Although part of the point is for us to come up with altogether new methods of data analysis that would vet the data in a new way.

Maybe we could develop a distance adjusted power threshold (i.e. the further the signal, the lower the threshold power could trigger interest)?

1. What license will the data be released under? It's imperative that a good licensing model is selected from the outset.

2. I have a good book on digital signal processing, generously donated by an ace programmer at SETI Institute. I would like to write a series of articles on this topic to get others started. Is there any chance that someone at SETI@home could please set up a wiki for the project where I could post this information?

Your project seems to me a carbon copy of SetiQuest which I tried to understand at www,setiquest.org.

I see a couple of differences.

On the upside, this initiative is publicly funded. setiQuest is privately funded and SETI Institute, which runs it, is intimately reliant on being able to tell a story to their private donors. Their greatest asset is star power - you've probably heard all the stories about Carl Sagan's tales of the cosmos, Frank Drakes' equations and seminal SETI experiments and not least Jill Tarter as supposedly depicted by Jodie Foster in Contact. These are all parts of SETI Institute's brand, and because this is their main source of revenue, when faced with a decision between star power and science, unfortunately, the former often wins out. One must hope that UC Berkeley, as a public research institution, is able to prioritize differently.

Also, SETI@home already has an extensive infrastructure for continously provisioning millions of computers on the internet with SETI data. Perhaps, at some point in the future, this initiative will be allowed to piggyback on that infrastructure, so the standard SETI@home data can be routinely processed by complex pipelines of any sophisticated new algorithms we come up with here.

On the downside, UC Berkeley has no capability for real-time follow-up on candidate signals. This means that even if we do find something, unless the signal is very persistent, we have no way of investigating it beyond the initial data record. Perhaps in the future, however, data from multiple beams will be made available (who knows?). That would make it easier to rule out a lot of RFI from the results.