PAD Spam filter

Disclaimer

This essay does not describe an existing computer program, just one that should exist. This essay is about a suggested student project in
Java programming. This essay gives a rough overview of how it might work. I have no source, object, specifications, file layouts or anything
else useful to implementing this project. Everything I have prepared to help you is right here.

This project outline is not like the artificial, tidy little problems you are spoon-fed in school, when all the facts you need are included, nothing extraneous is mentioned, the answer is
fully specified, along with hints to nudge you toward a single expected canonical solution. This project is much more like the real world of messy problems where it is up to you to fully the
define the end point, or a series of ever more difficult versions of this project and research the information yourself to solve them.

Everything I have to say to help you with this project is written below. I am not prepared to help you implement it; or give you any additional materials. I have too many
other projects of my own.

Though I am a programmer by profession, I don’t do people’s homework for them. That just robs them of an education.

You have my full permission to implement this project in any way you please and to keep all the profits from your endeavour.

Please do not email me about this project without reading the disclaimer above.

If you run a website such as the ones in this list, you will be bombarded by
requests for people to list their programs. The requests arrive in the form of
URLs (Uniform Resource Locators)
pointing to PAD files. Unfortunately,
most of this is junk, a form of spam. Some are advertisements disguised as programs. Some
are harmful trojan programs. Some are porn videos. Some are ebooks. Some are useless
junk. It takes an inordinate amount of time to sift through this to find suitable
programs to list. What the world needs is a filtering mechanism specialised for
PAD (Portable Application Description) files. What mechanisms might it use?

A Bayesian filter on the various PAD
fields.

A blacklist/whitelist of hosting websites.

A collaborative list of judgements on various PAD
URLS, websites and other identifying indicators. Participants see a histogram of how
other sites adjudicated the PAD.

A PAD
verifier to make sure all fields are present and correctly filled in.

A list of certified URLs
that you personally research and guarantee to be spam free, virus free, trojan free.
You might select PADs (Portable Application Descriptions)
for this treatment that others have rated highly.

Rules

You can only vote on PADs
if you maintain a PAD distribution website. We don’t allow the general
public or program authors into warp the results.

You can get a list of N good PADs
contributed by others that you don’t already have, if you contribute N new good
PADs
to the common pool. You can request specific categories of interest. Normally, you
submit your entire catalog and then get back N entries, where N was the number of
entries that were new. You don’t have to participate in this to get the filtering
service. URLs
you submit for filtering are not shared, just the filtering result. If there are not
enoughh new good PADs
currently available, you get to bank your points to be redeemed later when there
are.

Cheating

If we discover someone cheating, by their ratings being much higher or
lower than average, we ban them and effectively withdraw all their adjudications, but
keep URLs
they submitted (most likely with bad ratiings). We alse catch cheaters who rate
PADs
highly and don’t list them themselves, or who rate them as stinkers and list them
anyway.

Implementation

This program would most easily be implemented with a Java Web Start interface running
on each client and an SQL (Standard Query Language)
database running on a server, exchanging binary messages. The PAD
site manager would feed it lists of PADs
or PADURLs. For tighter
integration, it might have a Servlet interface and decide the
instant a new PAD arrives or even before it is submitted (based on
IP (Internet Protocol)) where the
PAD is
acceptable. PAD sites tend to be written in PHP (Pre-Hypertext Processor).
To integrate, PAD sites would have to host a Java server as well as a
PHP
server. This could easily be too technically challenging to be acceptable.

Likely you would not write a Bayesian filter or PAD
verifier from scratch. Generally merging existing software requires more skill than
writing from scratch.

However, the biggest hurdle is political. How do you get websites to use it? How do
you get them to share any information with the competition? I have seen this problem
before with the Phoenix project. Its job was to help NGOs in Africa coordinate their
development efforts by letting everyone know what everyone else was doing in any given
area. Everyone said it was a wonderful idea. But it turned out, everyone wanted the
information about what others were doing but were completely unwilling to share what they
were doing. I was shocked at the pettiness of the Red Cross and similar organisations.
Web site owners might be equally reluctant to share any information. So you might start
the project without any collaboration features and add them gradually.

Before embarking on this, you might write all the PAD
vendors in the hassle free
list or the minor hassle
list to see if they would be interested in the service, what they might be willing to
pay and what else they need.

Other than offering the system free, you might offer it in return for a plug on the
submissions page. Here is a great place to tout your company’s expertise to
software developers.