Contents

Made by: Nadir Latif (nadir.latif@yahoo.com)
Dependencies: Uses NNTP class made by Tony Leake, which is available at http://www.phpclasses.org/browse/file/525.html.
This package is a collection of scripts that can be used to extract news from public NNTP servers. The scripts can be used to extract list of public nntp servers from well known websites. They can also retrieve list of groups present in any given public NNTP server. For each group in an NNTP server the listings in that group can be downloaded. All downloaded information is stored in a database. For the purpose of scalability the large volume of listings is stored in multiple tables. The scripts can be used to download huge volumes of listings from public nntp servers.
1) Usage:
-run the sql script in the file script.sql. this will create tables for server and groups.
-Edit the config.inc file. It contains the database connections settings for the server on which save_group_listings.php script is run as well as the connection settings for the server that runs the rest of the scripts.
-The index.php file lets the user choose which script to run. Copy the files to the directory of a web server and run index.php from the browser or from cron depending on which script is to be run. The scripts are described below :
-get_server_list.php (used to get a list of public nntp servers from well known web sites. can be run directly from the browser)
-save_group_info.php (used to get a list of groups for each server obtained from the above script. should be run from cron. see below for details)
-get_group_description.php (used to get description from google for each group obtained from the above script. should be run from cron. see below for details)
-save_group_listings.php (used to get listings from each group obtained from the above script. should be run from cron. see below for details)
2) What does this script do?
get_server_list.php
Extracts list of nntp servers from well known websites using regular expressions. The list of servers is saved to the mp_usenet_servers table. The scripts fetches around 2,000 nntp server names. The script has to be run once and can be run directly from the browser.
save_group_info.php
Uses a socket to connect to port 119 of each NNTP server. The list of group names is retrieved along with number of ariticles in the group. The information is stored in the mp_usenet_groups table. Since retriving the list of groups from over 2,000 servers can take a while its best to run the script as a background task using cron.
get_group_description.php
Uses regular expressions to extract group description from google, for each group in the mp_usenet_groups table. Since retriving the list of groups from over 2,000 servers can take a while its best to run the script as a background task using cron.
save_group_listings.php
Used to extract listings from each group in the mp_usenet_groups table. The listings from each server are stored in a seperate table. For each listing, the title of the listing, its description and author name is stored. The parent/child relationship of listings can esily be determined from the table of listings. Since retriving the list of messages for hundreds of thousands of groups can take a while the script should be run as a background task using cron. It cannot be run from the browser. Ideally several instances of the script should be run from cron, each script will fetch listings from one NNTP server. The script for getting the listings can be run from cron with the command : php -f full_path_to_script script_name server_name, where script_name is any name given to the script and server name is the name of the server on which the script is run (can be any name).
3)List of files:
a)connect.inc (main program file)
b)nntp.inc (main program file)
b)get_group_description.php (main program file)
c)get_server_list.php (main program file)
d)save_group_info.php (main program file)
e)save_group_listings.php (main program file)
f)config.inc.php (main program file)
g)index.php (initital file)
-Feel free to contact me for any assistance regarding this script.