BioGeography is a module under development by [[Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project. It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd]. The code currently lives at the nmatzke branch on [http://github.com/nmatzke/biopython/tree/master GitHub], and you can see a timeline and other info about ongoing development [http://github.com/nmatzke/biopython/tree/master here].

+

BioGeography is a module under development by [[User:Matzke|Nick Matzke]] for a [http://socghop.appspot.com/program/home/google/gsoc2009 Google Summer of Code 2009] project. It is run through NESCENT's [https://www.nescent.org/wg_phyloinformatics/Phyloinformatics_Summer_of_Code_2009 Phyloinformatics Summer of Code 2009]. See the project proposal at: [http://socghop.appspot.com/student_project/show/google/gsoc2009/nescent/t124022798250 Biogeographical Phylogenetics for BioPython]. The mentors are [http://blackrim.org/ Stephen Smith] (primary), [http://bcbio.wordpress.com/ Brad Chapman], and [http://evoviz.nescent.org/ David Kidd]. The code currently lives at the nmatzke branch on [http://github.com/nmatzke/biopython/tree/master GitHub], and you can see a timeline and other info about ongoing development [http://github.com/nmatzke/biopython/tree/master here].

'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).

'''Abstract:''' Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).

Abstract: Create a BioPython module that will enable users to automatically access and parse species locality records from online biodiversity databases; link these to user-specified phylogenies; calculate basic alpha- and beta-phylodiversity summary statistics, produce input files for input into the various inference algorithms available for inferring historical biogeography; convert output from these programs into files suitable for mapping, e.g. in Google Earth (KML files).

Input geographic points, determine which region (polygon) each range falls in (via point-in-polygon algorithm); also output points that are unclassified, e.g. some GBIF locations were mis-typed in the source database, so a record will fall in the middle of the ocean.

Note: creating functions for all possible interactions with GBIF is not possible in the time available, I will just focus on searching and downloading basic record occurrence record data. The relevant GBIF web service is here: http://data.gbif.org/ws/rest/occurrence

Function: searchGBIFrecords – user inputs parameters and a list of GBIF records is returned

Function: check_input_lagrange_tree – checks if input phylogeny meets the requirements for lagrange, i.e. has ultrametric branchlengths, tips end at time 0, tip names are in the species/ranges input file

Function: parse_lagrange_output – take the output file from lagrange and get ages and estimated regions for each node

Regarding where to put reconstructed nodes, or tips that where the only location information is region. Within regions, dealing with linking already geo-located tips, spatial averaging can be used as currently happens with GeoPhyloBuilder. If there is only one node in a region the centroid or something similar could be used (i.e. the "root" of the polygon skeleton would deal even with weird concave polygons).

If there are multiple ancestral nodes or region-only tips in a region, they need to be spread out inside the polygon, or lines will just be drawn on top of each other. This can be done by putting the most ancient node at the root of the polygon skeleton/medial axis, and then spreading out the daughter nodes along the skeleton/medial axis of the polygon.

Function: assign_node_locations_in_region -- within a region’s polygon, given a list of nodes, their relationship, and ages, spread the nodes out along the middle 50% of the longest axis of the polygon skeleton, with the oldest node in the middle

Function: assign_node_locations_between_regions – connect the nodes that are linked to branches that cross between regions (for this initial project, just the great circle lines)

Function: write_history_to_shapefile -- write the biogeographic history to a shapefile

Function: write_history_to_KML – write the biogeographic history to a KML file for input into Google Earth

August, week 2: Beta testing

Make the series of functions available, along with suggested input files; have others run on various platforms, with various levels of expertise (e.g. Evolutionary Biogeography Discussion Group at U.C. Berkeley). Also get final feedback from mentors and advisors.