Geograph is a web-based project collecting a large number of geographically located images with the aim to get a very broad coverage. The PHP source code is open-source to facilitate projects for other countries.

This is a summary of some ideas for projects within the Geograph Project or simply to use the Geograph Archive of images. Presented here are some of the more standalone ideas, that would make a good project in their own right.

We have lots of raw data to play with - including pretty pictures, all semantically referenced (primarily location, but also time, subject categoization etc.) - as well as large quantities of textual data. This leads to many possibilities for visualizations, interactive exploration tools, map mashups, and searching/browsing tools. Such tools could easily be reused for other non-Geograph sources of data.

Alternatively you could choose to work on a project more directly involved with the website itself: enhancing features, new features, or making the open-source code even easier to reuse.

Image Search/Browse technology

These projects aim at giving external users a better, faster and more visual browsing experience. The code developed will not be confined to Geograph - other projects collating mixed datasets of visual and textual information such as image or map galleries, conservation projects building databases of recorded species or museum collections will have similar needs and can benefit from these developments.

Sample selection algorithm

Given a reasonable sized collection of images - say in the range of 30 - 5000 images, create a resource efficient method to generate a representative sample of, say, 20 images. The sample should not contain too many similar images, but rather show a wide range of images without excluding 'minority' sub-collections. Also allow for a slowly evolving sample as more images are added, a bonus would be a few variations of the algorithm to get a few different samples.

Difficulty: Easy-Medium

Requirements: PHP/MySQL

Skills developed: Data processing/algorithms

Faceted Browsing

Because our data is highly categorized, it lends itself well to browsing by so called Facets.
We've previouslly explored using specific techologies (noted below), but a similar home-grown system could be done too.

Flamenco is a search interface for browsing large information spaces. A prototype installation and dataset was tested here. We have nearly three million photos to display; Flamenco doesn't seem to scale to that number, so the task is to optimize it for larger datasets.

An alternative might be possible with senseidb - a more modern system along similar lines. Again, a prototype installation has been tried for small collection only; it didn't work when tried with over 10,000 records.

Pivot from Microsoft Live Labs was an impressive faceted browser. It's no longer being developed, but is available here. It may still be feasible to get Geograph data working in the framework.

imdb.cloudmining.net - is an excellent demonstration of the sort of thing that could be done. It is based on the fSphinx framework and SphinxSearch (which is already used heavily by Geograph).

A self built prototype - built using javascript/jquery - interacting with the Geograph API.

Interactive Graph Visualization

Create a visualization front-end to allow browsing of information and photos by exploring links between nodes in a 2D graph network. In particular, it needs an interface to link a graphical front end with the Geograph Database. Screenshots from a TouchGraph prototype. Another demo using the Arbor JS framework.

Difficulty: Easy

Requirements: XML familiarity, PHP/MySQL for XML generation.

Skills developed: UI Design, data processing

Photo Clustering

Given a large collection of images, find an automated way to group images into clusters, for example by geographic location, subject, date or a combination of these. The aim is to make browsing large collections easier. Rather than simply getting a long list of images, the user would get a good overview and be able to drill down into interesting areas. This could be implemented either as a bulk offline process to be displayed later, or as an interactive front-end.

Timeline View

Take an arbitrary collection of images (say, results of a search) and organize them by date taken. This will be particulalry useful to align images of the same location to identify changes. If there are lots of images, there could be sliders to zoom into specific periods.
Maybe powered by something like SIMILE or timemap etc

Difficulty: Easy-Medium

Requirements: PHP/MySQL/HTML

Skills developed: UI development, search/browse technologies.

Visually Similar Images

An interesting way to browse/search images would be by visual similarity. Firstly, we will need a method to compare images and to find similar/dissimilar images. A number of frameworks exist for comparing images, so the main task would be implementing one of these as a searchable database. A bonus would be a search interface that takes advantage of this data; for example we could either exclude groups of similar images (to get a broad selection), or group/cluster them by similarity (i.e. "find more like this").

Difficulty: Medium

Requirements: Ideally PHP/MySQL and the Linux environment. Experience with basic html a bonus

'Term' Identification in freeform text

For example given an image description like "A peaceful reach of the South Esk in winter, taken from the bridge near Clova Hotel.", automatically identify the terms useful for further searching, e.g. in the above "South Esk" and "Clova Hotel" could be considered such terms. A site visitor could use these terms to search related images in the area. Something like Link might make a good starting point.

Difficulty: Medium

Requirements: PHP/MySQL/HTML. Linguistics

Skills developed: Text processing

Related Images

Given the current image, find a list of related images, by various means. Geo-location, subject, timeline etc. Project deliverables are the algorithm to locate images as well as the interface to display them without being intrusive. This is basically a 'more like this' page, given an arbitary image as input.

Difficulty: Easy

Requirements: PHP/MySQL/HTML/

Skills developed: Code development, UI development. Data Processing

Develop GeoBrowser further

GeoBrowser is an interactive application for exploring a large collection of images by various means. It's designed to be a standalone project interacting via the Geograph API, running in javascript in the user's Browser. This could be re-implemented with Link .

Difficulty: Easy

Requirements: Experience with javascript/html a bonus

Skills developed: UI Development, Design, javascript/html

Educational tools and games

The projects in this section develop applications which help students (in a classroom context or otherwise) to learn about maps and geography in a hopefully interesting and enjoyable way. Teachers can use some of these tools to build course materials to provide their students with a personalised and localised take on textbook topics. Code developed in these projects may be reused in projects in other countries, and the course material builder will be applicable to other subjects drawing on visual information.

Map/Photo Games

Having access to large numbers of geolocated images and interactive maps creates lots of opportunities for interactive educational games.

Difficulty: Easy-Medium

Requirements: Javascript/Flash experience ideally

Skills developed: UI development, games, educational software

Possible examples:

Draw a map based on a picture (or a few pictures) of a grid square, then compare it with the map.

Map interpretation game. The player would be shown a map excerpt showing a camera position and view cone. They would then draw on a canvas a schematic diagram of what they see from that point. This would probably make a good smartphone app as it's easier to draw on a touchscreen than with a mouse. It would also need a few icons (houses, woods etc.) to drop on that canvas. At the end, the drawing can be compared with a photo of the scene.

Find features along a route - guide creator

A tool for virtual tourists. The user specifies a route (by drawing it on a map or by uploading a gpx file), the software translates this into centisquares of the Geograph grid along that route and looks up nearby points of interest in a database. Ideally, a user-configurable filter could determine which features are given prominence in areas where there are a lot of photos in the database. A nice feature would be if the route could be generated from Google Maps's Get Directions facility, so people could plan their travel according to interest en route. This would basically allow the creation of personalised travel guides based on information on Geograph.

Themed collections of images

Something like Link - build a tool to create a categorized and themed collection of images, see Link .
The categoriztion could be crowd-sourced, but a hierarchy would first need to be defined. A search/browse interface would also be needed.
This article, saved searches and tags are very basic prototypes. The goal is to pre-select large quanties of images - to help students/teachers browse geographical images.

Illustrated quiz system

Build a system to allow creation of multiple choice quizes. Each question or answer can be illustrated with images from Geograph. Users would be able to create and share the quizzes they create. Visitors can fill out quizzes and compete on leaderboards. Prototype

Annotating images and footnotes

This project will help teachers to prepare course materials by being able to annotate and perhaps draw onto images and adding footnotes. Of course there is wider application potential, e.g. outdoor enthusiasts could indicate routes up cliffs and mountains or down river rapids (early prototype).

Curated Collections Creator

Create an interface for users to pick and choose from a large collation of images, to create a highly specific 'Gallery'. The interface should work/scale to potentially thousands of images, for example seeding images from keyword search results. The idea is to keep control of the collection, but not have to copy/paste every single result.
A separate project would be to facilitate browsing of the collection by end users.
This has already been started: Geograph Portals

Difficulty: Easy-Medium

Requirements: PHP/MySQL/HTML

Skills developed: UI development, search/browse technologies.

Website Development

Port the site to a new Country

Already the site has been ported to Germany, but there are plenty more countries out there!

Make the generic version of the site truly generic

We have started making a generic version of the code - using the current projects as a starting point. However, it still contains a number of wordings/features specific to one country. Cleaning up this code and putting all strings and messages in common files will make porting the site much easier.

Difficulty: Medium

Requirements: PHP/MySQL/HTML

Skills developed: Code development, version control (SVN). International mapping systems.

Smartphone interface with mapping

A location-aware smartphone app that will allow plotting Geograph coverage data, images (selection narrowed down by user input) and image-specific data on a zoomable map. This needs to work with a number of different mapping and grid systems to allow international roaming coverage, e.g. Open Streetmap, Ordnance Survey OpenSpace and Google Maps. The app should also allow direct upload while on the move. Here are twoscreenshots from a prototype developed for Geograph Deutschland.
This could either be a dedicated iPhone, Android etc. app (built to interface with APIs) or a specific HTML version of the website, probably using localstorage for offline use.

Content/Collection search

Besides images, Geograph has a wide range of 'content', such as Articles, Galleries, Local Discussions, Placenames, Shared Descriptions, Routes, and User Profiles. Some of this is geographical (either referring to precise point locations or to ill defined general areas) but not all of it. The goal is to provide a single unified search/browse interface, in particular to 'find interesting stuff near here' or 'about this subject'.

Site-wide Filter

Build a site-wide filter, so that a user could e.g. filter the whole site to only show photos taken during a sepecific period or showing a specific geography. This would affect the search, maps, check-sheets, leaderboards and general site browsing. The two major tasks here are to identify all the places that could be filtered, and to implement 'namespacing' within each technology (search/map tiles/database query cache/smarty cache). A sort of lightweight version of this is implemented via [url=url=http://www.geographs.org/portals/]Portals[/url], which create mini-websites which have already been filtered.

Difficulty: Medium

Requirements: PHP/Mysql/Sphinx

Skills developed: coding, development, database, scaling

Website/Template Redesign

Come up with a new fresh template for the site. In particular, help intergrate it into the current framework - working out any features needing tweaking to work with the new layout.

Difficulty: Easy

Requirements: UI/Design flair

Skills developed: UI development, html/css, user testing

Other

Bulk data download server

We have an API for making small extracts (up to 1000 results) and site dumps of the whole database offering 2.8M images. There is a niche for mid-size dumps, e.g. getting all of a user's contribution (which may number 50,000 results), or all photos in a hectad (which can be 25,000+ images). This mechanism should perhaps be tailored to delivering between 1000 and 250,000 results in a single download.
This may take the form of an on-demand dump service, i.e. the user submits a 'request' and then the system prepares the dump and lets the user know when it's ready (as the dump could take minutes to prepare). We could perhaps offer a choice of CSV or mysqldump formats.

Difficulty: Easy-Medium

Requirements: PHP/MySQL

Skills developed: APIs / Data processing

Streaming Server/Clients

We maintain the authoritative copy of the "Geograph Archive" in a mysql database. There are lots of 'interested parties' that would like to maintain their own copy - in close to real time. This could be used to power internal services (eg sphinxsearch RT index), offsite backups (to log files/dumps), or third-party websites (eg portals). We also need a server-side component to either 'broadcast' the changes out or just publish them somewhere. Then client adapters can either receive data from the server or contact it periodically, and put the data into their host application (database/index/files etc.).

Maybe use pubsubhubbub - it then needs a 'publisher' script that publishes a feed of updates and notifies subscribers.

Project auto-installer

While installing a copy of the site is relativly easy for an experienced web developer, there are lots of dependencies (php/mysql/apache/sphinx/memcache/redis etc.) which need configuring. To simplify this, build an installer that will check for and install/configure if requried the dependencies as well as download the latest copy of the site code. The goal would be for someone to get a running copy of the site in less than half an hour!

Difficulty: Medium

Requirements: Linux fundermentals, knowledge of package installers

Skills developed: Linux system administration

Access Log Processing

We have years of apache Access Logs but have never really analysed them. The analysis should be tailored to the structure of the site, e.g. to aggregate by photo, location or contributor. Could work at identifying patterns of how people arrive at the site, and the subjects people use to find the site in search engines.

Difficulty: Easy-Medium

Requirements: General scripting and data patterns a bonus

Skills developed: Data processing. Statistics and analytics

Resources

In fact many of the above projects could be done in isolation as a standalone project, rather than directly integrated into the main codebase. Of course the project SVN Repository etc can be used to hold code, but data could be processed remotely. On the other hand some features would be directly integrated into the website, in which case having a local development version of the site would be essential.

Code: SVN and Downloads (Write access is available for anyone interesting in contributing code)

Virtual Machine: We have produced a VMware machine that runs the Geograph Site. This is possibly the easiest way to get going on developing website code. Runs with the free VMplayer software. Once running we can provide database dumps to get a more realistic test environment.