About sixteen months ago, I launched the SciPy Documentation Project
and its Marathon. Dozens pitched in and now numpy docs are rapidly
approaching a professional level. The "pink wave" ("Needs Review"
status) is at 56% today! There is consensus among doc writers that
much of the rest can be labeled in the "unimportant" category, so
we're close to starting the review push (hold your fire, there is a
web site mod to be done first).
We're also nearing the end of the summer, and it's time to look ahead.
The path for docs is clear, but the path for SciPy is not. I think
our weakest area right now is organization of the project. There is
no consensus-based plan for improvement of the whole toward a stated
goal, no centralized coordination of work, and no funded work focused
on many of our weaknesses, notwithstanding my doc effort and what
Enthought does for code.
I define success as popular adoption in preference to commercial
packages. I believe in vote-with-your-feet: this goal will not be
reached until all aspects of the package and its presentation to the
world exceed those of our commercial competition. Scipy is now a
grass roots effort, but that takes it only so far. Other projects,
such as OpenOffice and Sage, don't follow this model and do produce
quality products that compete with commercial offerings, at least on
open-source platforms. Before we can even hope for that, we have to
do the following:
- Docs
- Rest of numpy reference pages reviewed and proofed or marked unimportant
- Scipy reference pages
- User manual for the whole toolstack
- Multiple commercial books
- Packaging
- Personal Package Archive or equivalent for every release of every
OS for the full toolstack (There are tools that do this but we
don't use them. NSF requires Metronome - http://nmi.cs.wisc.edu/
- for funding most development grants, so right now we're not even
on NSF's radar.)
- Track record of having the whole toolstack installation "just
work" in a few command lines or clicks for *everyone*
- Regular, scheduled releases of numpy and scipy
- Coordinated releases of numpy, scipy, and stable scikits into PPA system
- Public communication
- A real marketing plan
- Executing on that plan
- Web site geared toward multiple audiences, run by experts at that
kind of communication
- More webinars, conference booths, training, aimed at all levels
- Demos, testimonials, topical forums, all showcased
- Code
- A full design review for numpy 2.0
- No more inconsistencies like median(), lacking "out", degrees
option for angle functions?
- Trimming of financial functions, maybe others, from numpy?
- Package structure review (eliminate "fromnumeric"?)
- Goal that this be the last breakage for numpy API (the real 1.0)
- Scipy
- Is it maintainable? should it be broken up?
- Clear code addition path (or decide never to add more)
- Docs (see above)
- Add-on packages
- Both existence of and good indexing/integration/support for
field-specific packages
- Clearer development path for new packages
- Central hosting system for packages (svn, mailing lists, web,
build integration, etc.)
- Simultaneous releases of stable packages along with numpy/scipy
I posted a basic improvement plan some years back. The core ideas
have not changed; it is linked from the bottom of
http://scipy.org/Developer_Zone. I chose our major weakness to begin
with and started the doc project, using some money I could justify
spending simply for the utility of docs for my own research. I funded
the work of two doc coordinators, one each this summer and last.
Looking at http://docs.scipy.org/numpy/stats/, you can see that when a
doc coordinator was being paid (summers), work got done. When not,
then not. Without publicly announcing what these guys made, I'll be
the first to admit that it wasn't a lot. Yet, those small sums bought
a huge contribution to numpy through the work of several dozen
volunteers and the major contributions of a few.
My conclusion is that active and constant coordination is central to
motivating volunteer work, and that without a salary we cannot depend
on coordination remaining active. On the other hand, I have heard
Enthought's leaders bemoan the high cost of devoting employee time to
this project, and the low returns available from selling support to
universities and non-profit research institutes. Their leadership has
moved us forward, particularly in the area of code, but has not
provided the momentum necessary to carry us forward on all fronts. It
is time for the public and education sectors to kick in some resources
and organizational leadership. We are, after all, benefitting
immensely.
Since the cost of employee time is not so high for us in the public
and education sectors, I propose to continue hiring people like Stefan
and David as UCF employees or contractors, and to expand to hiring
others in areas like packaging and marketing, provided that funding
for those hires can be found. However, my grant situation is no
longer as rich as it has been the past two years, and the needs going
forward are greater than in the past if we're now to tackle all the
points above. So, I will not be hiring another doc guru from my
research grants next year.
I am confident that others are willing to pitch in financially, but
few will pitch in a full FTE, and we need several. We can (and will)
set up a donations site, but donation sites tend to receive pizza
money unless a sugar daddy comes along. Those benefitting most from
the software, notably education, non-profit research, and government
institutions, are *forbidden* from making donations by the terms of
their grants. NSF doesn't give you money so you can give it away.
We need to provide services they can buy on subcontract and a means
for handling payments from them. Selling support does not solve the
problem, as that requires spending most of the income on servicing
that particular client. Rather, we need to sell a chunk of
documentation or the packaging of a particular release, and then
provide the product not just to that client but to everyone.
We can also propose directly for federal and corporate grant funds. I
have spoken with several NASA and NSF program managers and with
Google's Federal Accounts Representative, and the possibilities for
funding are good. But, I am not going to do this alone. We need a
strong proposal team to be credible.
So, I am seeking a group that is willing to work with me to put up the
infrastructure of a funded project, to write grant proposals, and to
coordinate a financial effort. Members of this group must have a
track record of funded grants, business success, foundation support,
etc. We might call it the SciPy Foundation. It could be based at
UCF, which has a low overhead rate and has infrastructure (like an HR
staff), or it might be independent if we can find a good director
willing to devote significant time for relatively low pay compared to
what they can likely make elsewhere. I would envision hiring
permanent coordinators for docs, packaging, and marketing
communications. Enthought appears to have code covered by virtue of
having hired Travis, Robert, etc.; how to integrate that with this
effort is an open question but not a difficult one, I think, as code
is our strongest asset at this point.
I invite discussion of this approach and the task list above on the
scipy-dev@scipy.org mailing list. If you are seeing this post
elsewhere, please reply only on scipy-dev@scipy.org.
If you are eligible to lead funding proposals and are interested in
participating in grant writing and management activities related to
work in our weak areas, please contact me directly.
Thanks,
--jh--
Prof. Joseph Harrington
Planetary Sciences Group
Department of Physics
MAP 414
4000 Central Florida Blvd.
University of Central Florida
Orlando, FL 32816-2385
jh@physics.ucf.edu
planets.ucf.edu