The SitePoint Forums have moved.

You can now find them here.
This forum is now closed to new posts, but you can browse existing content.
You can find out more information about the move and how to open a new account (if necessary) here.
If you get stuck you can get support by emailing forums@sitepoint.com

If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Curl and preg_match

I do not think I am doing this correctly; what I want to do is get the htm page title from a page into a variable. The next step is to put the url into a variable as well.
I am trying to create a sitemap but my pages are dynamic and I want something like: http://www.rubblewebs.co.uk/imagemagick/xml/sitemap.php

This brings me to my next stumbling block as I need to get my links and some links are relative e.g. <a href="server/server.php"> some have a class e.g. <a class="index" href="server/server.php> and some are up a directory e.g. <a href="../forum"> although I do not think I have any absolute links

Taken a while to get back to this as I kept going around in circles !
I found some interesting code on http://www.merchantos.com/makebeta/p.../#put_together although I do not know anything about DOM and curl for that matter it does what I want and when I get time I will look into it more.
The code has a couple of "Bodges" to get the output I wanted; I could modify the site links to make it better. Again I may do that later.
Anyway my code is below and it needs quite a bit of cleaning up but what it is doing simply is reading the links on the webpages then using this information to get the page titles. All being saved into a database; I am then going to use this to create a site map. The next stage is to try and get a cron job running probably a couple of times a month to update it.

// ********** Start the code **********
// Connect to the database using the details entered into the variable above
$conn = mysql_connect( "$host", "$username", "$password" );
// If the connection can not be made print Could not connect MySQL
if ( !$conn ) die ( "Could not connect to MySQL server" );
// If the database could not be opened or found print Could not open database
mysql_select_db( $database,$conn ) or die ( "Could not open database" );

// http://www.justin-cook.com/wp/2006/12/12/remove-duplicate-entries-rows-a-mysql-database-table/
// Remove duplicate data based on the url column
// Create a new table with the data from the current table without the duplicates
mysql_query( "CREATE TABLE temp_table AS SELECT * FROM links WHERE 1 GROUP BY url" )
or die( 'Error: CREATE TABLE failed'.mysql_error() );
// Delete the first table
mysql_query( "DROP TABLE links" )
or die( 'DROP TABLE failed'.mysql_error() );
// Rename the new table to the original name
mysql_query( "RENAME TABLE temp_table TO links" )
or die( 'RENAME TABLE failed'.mysql_error() );

// Code to sort out the pages that returned a 404 error. This was caused by the files with links inside a folder e.g server/server.php
$query = "SELECT * FROM links ORDER BY title";
$returned = mysql_query( $query ) or die( 'Error, SELECT query for 404 error failed' );