See my comments within the message below.
Steve Christensen
>
> In article <cmfd5u$7ta$1 at smc.vnet.net>,
> "Steven M. Christensen" <steve at smc.vnet.net> wrote:
>
> > I want to take the opportunity to reply to Paul's suggestion in
> > as much detail as possible.
> >
> > I am sorry I was not at the event at the Wolfram Technology
> > Conference when this was discussed.
> >
> > First, here are the steps I take each day to moderate this group.
> > Figuring out where in these steps to put in categorization would need
> > to fit into this.
> >
> > 1. I get perhaps 2500-3000 emails a day, every day. Of these, perhaps
> > 500 are not spam. Because the Mathgroup addresses are easily found
> > by spammers, there is no way around getting a lot of spam.
>
> Do you mean that the spammers are forging email addresses of MathGroup
> participants and using these to post messages to MathGroup
> (mathgroup at wolfram.com)? I can see how that would make things more difficult
> to filter.
Yes, this happens all the time. Spam comes to mathgroup via mailing
list messages, newsgroup posts, spammers who have just found addresses
in the newsgroups and archives.
>
> If I understand you correctly, requiring individuals to "register" with
> you, possibly listing multiple email addresses, and bouncing email that
> is not from registered participants, with a message telling them how to
> register, would not work.
No this would not work. I even get spam from wolfram.com addresses
even though I know it did not come from there. I sometimes get
spam from myself!
>
> Because I usually post from a news reader, my messages have the
> following field:
>
> Newsgroups: comp.soft-sys.math.mathematica
>
> Could this be used as a filter (or do spammers forge this as well)?
Spammers forge every element of posts.
>
> [As an aside, a solution to SPAM needs to be found. To me, it should
> cost money, only of the order of a couple of cents, to send any email
> message. You would need to purchase a valid one-off "e-stamp", using
> some form of encryption technology, from some site (I'm suprised that
> the automatic billing sites have not already done this). Then only valid
> e-stamps would be routed though the network. There are, of course, many
> issues with this proposal ...]
>
> > Further, because MathGroup users often, unfortunately,
> > send html email or other attachments, maybe 10-20 of their mails get
> > filtered by my, fairly sophisticated but not perfect, spam filters into
> > my spam folder.
>
> To me, one of the major limitations of MathGroup is that we cannot
> attach Notebooks (without including them in the body of the message).
Attaching notebooks causes numerous problems.
1. Notebooks as attachments are very often rejected by spam filters
either at ISP's, moderation level, or end users.
2. Can a windows user really trust that a notebook attachment is not
a virus or worm? If I were using a Windows machine and saw an
attachment, I would not open it.
3. Many notebooks are very long and some mail systems will not be able
to handle them. Rules about attached notebooks would have
to be devised. Not a simple matter given that I get so many
posts that can't follow even simpler rules.
It is far simpler to have someone put their notebook on a server somewhere
where it can be downloaded and then include a link within the post.
>
> > 2. Of the 500 good emails that get past my spam filters, I then have to
> > filter out those mails that are for Mathgroup. Then, I have to
> > go through the spam folder to find any MathGroup posts that might be
> > there. So,there are usually about 70 emails relevant to MathGroup.
> > Some, maybe 10 do not follow the rules - flames, licensing questions,
> > discussions of other systems, really trivial items, totally
> > non-Mathematica
> > related. In the end, there are 30-60 emails to read in more detail.
>
> Actually, if the Subject line included question categories as is being
> proposed, couldn't you use this as the primary filter (or again, do
> spammers forge this as well)?
Again, spammers will grab email addresses, Subject lines, even
content sometimes. Most of that comes to me where I filter it.
But I have had some reports that people get email from mathgroup
and I did not send.
>
> > 3. Once I decide that the posts are OK, I run them through a number of
> > UNIX scripts and do some more editing to take out unneeded mail headers
> > etc.
> >
> > 4. Then the mails are run through scripts that send them to the
> > newsgroup and the mailing list. One of the scripts adds the
> > numbers to the Subject line of the mail that goes to
> > the mailing list. Note that the [ ] are really needed.
>
> As I read MathGroup in a newsreader or sometimes via Google at
>
> http://groups.google.com/groups?q=comp.soft-sys.math.mathematica
>
> I do not see the numbers or the []. Google seems to handle threading
> better than my newsreader.
>
The [mg ... ] numbers only go out to the mailing list to help with
filtering. They will not be seen in the newsgroup or on google.
> The numbers do not appear at
>
> http://forums.wolfram.com/mathgroup/archive/2004/Nov/
>
> until you click on a particular message so I'm not sure exactly how they
> are useful (but then again, I avoid mailing lists and prefer to use
> newsgroups or the web). (And I wonder why the Mathgroup archive is not
> threaded?)
The archive gets its message from the mailing list and I also think
it just uses a mail to html script and not a threaded system. I do
not do the archive.
>
> > Suppose you just put Statistics in the Subject line, mail filters might
> > not always know how to do the filtering, whereas [Statistics]
> > is easier to filter.
>
> > This process takes from 1-3 hours typically, depending on the
> > number of emails, their complexity, etc.
>
> I did not realise exactly how big a task you face.
Clearly if it weren't for the spam, it would be easier.
>
> > So, the questions are, when during this process would categorisation
> > take place? Who would do it?
>
> It would be best if contributors did such a categorisation for you, i.e.
> at the time of posting.
>
> > What would it look like?
>
> Instead of [], another suggestion would be (mock Mathematica syntax
> using /:), e.g.,
>
> Statistics /: Chi-square test
>
> This would also be harder to forge and should still be easy to filter.
It might be possible if we can define say only 10 categories and
then put the category either in a special header or within the
test of a message. This could be done in a voluntary way by
the person sending the post.
If people want to send me a list of 10 categories, I can collect
them and see if there really are 10 or maybe 100, which would
be silly I think.
Another idea would be for someone clever to write a script that
could categorize a post. For example, all words in a post
could be extracted to a list and then compared to a list of
categories and those categories that that fit could be chosen
and put on say the top line of the post to help with filtering.
Some posts might not be easy to treat in this way, but it might
help.
Paul, this is your suggestion and you are known to be very clever, want to
write such a program?
In truth, I don't think I want to do anything unless there is
a significant vote from end users to do it and a nice way
to handle it consistently
>
> > How would it effect mail and newsgroup readers?
>
> I imagine that it would have little effect, except the desired one of
> allowing better filtering.
>
> > I think it would be a bad idea to put things like [Statistics] in
> > the Subject line. Would newsgroup and mail readers be able to
> > thread such Subject lines?
>
> Surely that is exactly what they are designed to do.
>
> And I could filter the messages into subfolders of my MathGroup folder
> automatically.
>
> > It might be better to put it in something like an X-Category mail header,
> > but I am not sure that all readers could handle this.
>
> This idea has merit and, again, it might be harder to forge, but I don't
> know enough about these headers.
>
> > Personally, I think they would just make the Subject lines longer
> > and harder to read.
>
> Nested Re: Re: Re: ... already does this, though Google handles this
> very well, in its threading, dropping all Re at the top level, listing
> only the subject, and then listing the contributor for each item in the
> thread.
Yes, the Re Re Re is a problem and I will try to fix that.
>
> > Who is going to do the categorisation?
>
> The contributor.
>
> > I know a lot about
> > Mathematica and mathematics, but certainly not enough to figure
> > out what every message best fits into. If I make a poor selection
> > and a message has gone out it is virtually impossible to re-do
> > the categorization in the newsgroups, mailing list, google group
> > listings, archives, etc.
>
> Sometimes categorizations have to change. You could have
>
> Numerics -> Graphics /: Accurate plotting
>
> when there is such a change.
>
> > Search therefore becomes inaccurate very quickly.
>
> I don't think that this is true.
>
> > What if someone disagrees with my selections?
>
> Not a big issue, I think. I think the group will come to consensus on a
> categorization, or move on to a different categorization as required.
>
> > How much time will this add to moderation?
>
> I would hope that it would greatly _reduce_ your moderation time.
I can assure you that adding more complexity to the posts will
increase moderation time.
>
> > If others select the categories to help me out, that will just
> > delay moderation.
>
> I do not see why.
>
> > Maybe, we can urge the person who originally writes the message to select
> > a category, but how does a new user know what category to pick?
>
> There should be a list in the rules section at
>
> http://smc.vnet.net/mathgroup.html
>
> > What if a users forgets to include a categorisation?
>
> You can add one.
>
> > Is someone going to go back and categorise the 51,000 messages that
> > are already in the archive?
>
> Unlikely, I think. However, I expect that the archive has grown
> exponentially and will continue to do so.
>
> > The simplest thing to do would be to have some group that is willing
> > categorise the posts once they get into the Wolfram Research
> > archive only. Then search could be done fairly easily.
> >
> > This sort of categorisation may be done in other newsgroups, but
> > I have not seen it.
>
> I expect that it is used on other newsgroups, but I have not seen it, or
> there are subgroups.
>
> sci.math
> sci.math.symbolic
If you look at these groups you will find no real categorisation
of any kind. I could not find any group that had any.
>
> > I am open to suggestions and comments, but I frankly this this
> > is going to be a very difficult process to do.
>
> It was intended as a suggestion to reduce your workload, to speed up the
> rate of posting to MathGroup, and to improve the automatic filtering
> (and threading) of messages.
>
> Cheers,
> Paul
>
>
> >
> >
> > Hi all, and especially Steve Christensen:
> >
> > At the recent Wolfram Technology Conference in Champaign, Luc Barthelet
> > <lucb at ea.com>, a regular user of MathGroup suggested that it would be
> > good if all postings to MathGroup included a categorisation in their
> > header, e.g.
> >
> > Newbies, Graphics, Functions, Programming, Statistics, Teaching,
> > Integration, Numerics, Symbolic Algebra, Special Functions, ...
> >
> > so a Subject line might take the form
> >
> > [Statistics]: How to fit to an elliptical function?
> >
> > (not sure if the [ ] are required or useful). In this way, sorting by
> > Subject would be easier. Of course, it's not always easy to do such a
> > categorisation, and they may change with time (as a problem stated as a
> > Numerics might end up being solved using Symbolic Algebra).
> > Nevertheless, I think such a change would be very useful. It should also
> > help when doing searches on MathGroup archives.
> >
> > Cheers,
> > Paul
>
> --
> Paul Abbott Phone: +61 8 6488 2734
> School of Physics, M013 Fax: +61 8 6488 1014
> The University of Western Australia (CRICOS Provider No 00126G)
> 35 Stirling Highway
> Crawley WA 6009 mailto:paul at physics.uwa.edu.au
> AUSTRALIA http://physics.uwa.edu.au/~paul
>