Methods

- Phylogeny and biogeography of African Murinae based on mitochondrial and nuclear gene sequences, with a new tribal classification of the subfamily

Taxon and gene sampling

We obtained sequences from 83 species including representatives of
49 murine genera from most previously identified major murine lineages,
as well as eight genera of Deomyinae and Gerbillinae (Table 3) for use as outgroups [5-7].
Our sampling for African Murinae and otomyines covers 25 out of 32
living African genera and includes representatives of all the four
previously identified lineages. Most genera are represented by a single
species but multiple representatives are included for highly
diversified or potentially paraphyletic genera.

Table 3. List of the taxa examined in this study and their GenBank accession numbers.

The nuclear genes were chosen because of their proven utility for
understanding muroid relationships and the presence of an existing
sequence dataset for this group [6,7,14,24,116,117]. The GHR and IRBP genes are not genetically linked and their location is variable, on chromosomes 15 and 14 in Mus, and chromosomes 2 and 16 in Rattus [118]. The mitochondrial cytochrome b gene
was chosen because it provides a third independent marker that evolves
at a faster rate than either of the two nuclear genes, and also is well
represented in previous datasets.

Most taxa are represented by sequences from two or three genes, the one exception being Parotomys for which we have only GHR sequence (Table 3).
All ingroup genera are represented by sequences from the same species
and where possible, by sequences from the same DNA sample. Chimeric
data (i.e. different sequences deriving from more than one species of a
genus) were used only for two outgroup taxa: Acomys (A. cahirinus and A. ignitus) and Meriones (M. unguiculatus and M. shawi).

DNA extraction and sequencing

Total genomic DNA was extracted from tissues preserved in ethanol using a CTAB protocol [119] or a QiaAmp extraction kit (Qiagen). The cytochrome b (1140 bp) gene was amplified as described in Lecompte et al. [25] or Montgelard et al. [120].
PCRs used the following thermal cycling parameters: one step at 94°C
for 4 min, followed by 35 cycles (40 s at 94°C, 45 s at 50°C, 1 min at
72°C). The final extension at the end of the profile was at 72°C for 10
min.

Part of exon 1 of IRBP (ca 1270 bp) was sequenced, using the methods of Poux and Douzery [121].
Amplification of the IRBP gene was performed under the same conditions:
one cycle of 94°C denaturation (5 min), 50°C annealing (45 s), 72°C
extension (1 min); 34 cycles of 94°C denaturation (45 s), 50°C (or
60°C) annealing (45 s), 72°C extension (1 min); and a final extension
of 72°C (10 min).

Double-stranded PCR products were purified directly from the PCR
product or from agarose gel using the MinElute purification kit
(Qiagen) or Amicon Ultrafree-DNA columns (Millipore) and sequenced
directly on both strands using an automatic sequencer CEQ2000 (Beckman)
or an ABI 310 (PE Applied Biosystems).

The new sequences were deposited in the EMBL data bank. Accession
numbers for all sequences used in this analysis are listed in Table 3.

Analyses

Phylogenetic reconstruction

Sequences were manually aligned with the ED editor of the MUST package version 2000 [123].
Nonsequenced positions and gaps were coded as missing data.
Phylogenetic reconstructions were performed on the complete DNA data
set by maximum likelihood (ML) with PAUP* (version 4 beta 10) [124], and by Bayesian inference (BI) with MrBayes (version 3.1.2) [125].

Modeltest 3.7 [126]
was used to determine the sequence evolution model that best fits our
data using the Akaike Information Criterion (AIC). This program
examined the fit of 56 models, with either a proportion of invariable
sites (I), a gamma distribution of substitution rate variation
among-sites (G), or a combination of both (I + G).

To avoid excessive calculation times, our PAUP* ML analyses were
conducted in two steps. A ML heuristic search was first conducted by
Tree Bisection Reconnection (TBR) branch swapping to identify the
optimal tree under parameters estimated by Modeltest. This tree was
re-used for a new round of parameter estimation/branch swapping. This
procedure was repeated until there was a stabilization of both
topologies and parameters. The robustness of nodes was estimated in
PHYML [127] with ML bootstrap percentages (BPML)
estimated from 1000 pseudoreplicates using as a starting tree the best
ML tree obtained from PAUP. PHYML was preferred over PAUP* for
bootstrap analyses because of its rapidity. We also performed Bayesian
Inference, as calculated by MrBayes, and report Posterior Probabilities
(PP) for recovered nodes. For the Bayesian analysis we used 9
partitions, one for each codon position of each gene.

Estimating dates of divergences

Divergence times were
estimated for the optimum ML topology. The hypothesis of a constant
molecular clock was tested by a Likelihood Ratio Test as proposed by
Felsenstein [128] and calculated in PAUP*4.0b10. We used a relaxed Bayesian molecular clock approach as implemented in MultiDivTime [129], using parameter estimates derived with PAML [130] as described by Yoder and Young [131]. Divergence times were estimated with two fossil-based calibration intervals: 1) the Mus/Rattus divergence set to between 10–12 Mya [65,66,132,133]; and 2) the divergence between Apodemus mystacinus and all the species of subgenus Sylvaemus (A. flavicollis and A. sylvaticus) set to a minimum of 7 Mya [51,78].