EMBOSS: Project Meeting (Mon 26th October 09)

Attendees

1. Minutes of the last meeting

2. Maintenance etc.

2.1 Applications

Mahmut has committed needleall. Calculations use single
precision floats.

2.2 Libraries

Peter noted that a user has pointed out data access is
remarkably fast from ACNUC servers, although the data itself may be
rather old if the server has not been recently updated. He will look
into adding ACNUC as a data access method.

Peter will look into issues with plotting in
mEMBOSS. prettyplot produces poorly scaled text and boxes in a
'win3' graphics window under mEMBOSS.

Mahmut has updated the ajAlignDefineSS function to specify
'const' sequences as the originals are unchanged. The constructors
have been refactored to make further constructors and ajAlignDefine
functions easier to add.

2.3 Other

Mahmut suggested quoting any wildcard characters in command
lines reported by Jemboss. There was a recent example where a user
specified '*' as a search term for extractfeat and reported very
confusing error messages.

Mahmut will implement the proposed Jemboss jar file
changes. Two additional jar files for 3D output are not well supported
and will be removed. Another jar file to print DNA alignments can be
merged into the main jar file.

Mahmut has checked in a file of Illumina adaptor sequences
obtained from users at Sanger. He is using a BioPerl example data file
for testing as it is publicly available data.

3. New developments

Alan has completed work on the ensembl/ library under
Unix, and is now testing with mEMBOSS. There are no issues with DLL
file compatibility. The SQL application runs. The ensembl application
fails because there is a schema change to the exon handling fields
in a recent ensembl release.

Michael explained that schema changes are only allowed in
alternate releases, around 5 per year. We can test the latest
development server and compare results with the Ensembl BioPerl API.
Peter will incorporate these tests into the QA suite.

Michael will update the ensembl library code after the next
ensembl release. This is his turn as release coordinator so he will
have no time until next month.

Peter explained the modifications needed to function and
datatype naming to conform to EMBOSS library standards. There was some
discussion about the readability of code with long multi-word
datatypes. As these can be renamed with a script (renaming datatypes
and functions, and adjusting @namrule numbers) the current cleanup
will continue with a code review when it is completed.

Michael has split one of the source files as it was
exceptionally large. He will send details of the split so that we can
update the EMBOSS copy of the library.

Michael explained the preference for ENSEMBL to serve whole
results sets rather than chunks. The server receives a very large number
of requests and would be more strained by repeated queries to serve
chunks of data.

Michael will work though the 'FIXME' comments in the code. Many
of these are use cases where the native ensembl API may need changes.

Michael suggested that for complex queries it may be faster to
use a local copy of the ENSEMBL data. We can try both local and remote
access.

Alan is now looking into the BioMart (Perl) API. This uses XML, for
which we will need to use the expat library. As we aim to avoid
additional dependencies, the current expat source code will be
included in an AJAX sub-library with the name 'libeexpat'. Functions
will be renamed using macros to avoid name clashes with the native
library which may be linked in automatically on some systems (possibly
with X11 under CYGWIN).

Alan noted a similar renaming is needed in pcre where
the 'regcomp' function has a conflict warning on MacOSX.

Peter will look into reading large next generation sequence
data assemblies from SAM or BAM data files. This will require a
lighter version of the AjPAlign data structure to limit the memory
requirements. BAM format will also need a compression/decompression
library. Further information is in the samtools documentation.

Jon is preparing the first beta release of the EDAM
ontology. The model has been checked by Tony Burdett. Meetings have
been held with the EBI BioCatalogue team and the links to the ontology
and documentation have been passed to the EMBRACE registry and other
interested EMBRACE partners. An issue remains in the relations between
reports and datatypes.

The BioCatalogue team at EBI are keen to use EDAM to supplement or
replace their internal ontology, especially for use in registry
searches. Their main concern is the sustainability of the ontology
beyond the end of EMBRACE.

UCL will check new terms for the beta release. They have also provided
a list of protein domain/family proposed terms.

Jon has been in communication with Paul Gordon in Calgary. Paul
has recommended syntax for SAWSDL annotations that conform to practice
in Canadian BioMOBY services. The annotations can use PURL persistent
URLs or LSID life science identifiers, with the PURL option preferred
to avoid issues with LSID resolution and so that users get more useful
messages from links that have since moved.

Paul also suggested adding some annotation for the data schema in
SAWSDL, especially useful for defining workflow inputs and outputs but
also useful for linking services more accurately.

Jon reported from a meeting with EBI External Services that
they would like to see the EB-eye interface included as it is
particularly good at defining and reporting cross-referenced data.
Peter plans to define a "SERVER" object
in emboss.defaults which represents access to a set of data
resources which can be defined individually with reference to the
server, or referenced by the user through a server name and database
name combination.

4. Administration

Alan will look into converting the code repository to SVN. This
would allow branches and make it easier to delete or move files. We
currently do not use CVS tags or branches so a move should be
straightforward.

Alan will look into options for Windows 7, including the need
for separate 32 and 64 bit versions.

5. Documentation and Training

5.1 Books

Jon had a reply from the typesetters. There are a few required
fonts which we need to check for. In some cases stylesheet
modifications will be needed.

The main issue the publishers have is the timing of physical
production as they would like to have a target conference for the book
launch.

The time line is about 8 months to produce the books. We would clearly
have a new release out by then. We can provide an addendum to the
books for each release to keep them up to date.

A deadline of December 24th was agreed for the final book text,
with 2 weeks to test the generated text before passing to CUP at around
the time of the next EMBOSS release.

5.2 Other

Peter reported that Thunderbird 3.0b4 is more stable after the
latest bugfix although it does not specifically refer to the inbox
display problems.

6. User queries and answers

Mahmut tested the prettyseq problem but found it worked
in the current CVS version.

Peter will look into extractfeat and features on the
reverse strand under Linux, then Alan can check on mEMBOSS.

Mahmut checked on a report of problems with tfm and the
location of documentation files. This was a local environment variable
issue.

Peter is working on extending the Phylip formats for sequence
input and output.

A user reported that gap penalty limits are not consistent. We
will review all applications and set similar limits for each of them.

A user has requested an option to define disulphide bridges between
Cysteines in iep for molecular weight calculation.

The pepstats documentation fails to describe the local data
file Emolwt.dat.

Mahmut has corrected vectorstrip to trim the sequence
quality scores.

A user has sent a patch to water to align multiple sequence
inputs.

7. AOB

Peter reported from the Large Database and Networks meeting in
Beijing. He gained the impression that the EMBOSS updatable indexing
capability could be particularly important for remote sites needing to
limit their network traffic for database update. He also discussed
the WebLab interface with the development/support team.