A single object might be writeable to more than one file format. For example,
an skbio.alignment.Alignment object could be written to FASTA, FASTQ,
QSEQ, PHYLIP, or Stockholm formats, just to name a few.

You might not know the exact file format of your file, but you want to read
it into an appropriate object.

You might want to read multiple files into a single object, or write an
object to multiple files.

Instead of reading a file into an object, you might want to stream the file
using a generator (e.g., if the file cannot be fully loaded into memory).

To address these issues (and others), scikit-bio provides a simple, powerful
interface for dealing with I/O. We accomplish this by using a single I/O
registry. Below is a description of how to use the registry and how to extend
it.

In the case of skbio.io.read if into is not provided, then a generator
will be returned. What the generator yields will depend on what format is being
read.

When into is provided, format may be omitted and the registry will use its
knowledge of the available formats for the requested class to infer the correct
format. This format inference is also available in the OO interface, meaning
that format may be omitted there as well.

We call format inference sniffing, much like the
csv module of
Python’s standard library. The goal of a sniffer is twofold: to identify if a
file is a specific format, and if it is, to provide **kwargs which can be
used to better parse the file.

Note

There is a built-in sniffer which results in a useful error message
if an empty file is provided as input and the format was omitted.

In the procedural interface, format is required. Without it, scikit-bio does
not know how you want to serialize an object. OO interfaces define a default
format, so it may not be necessary to include it.

To extend I/O in skbio, developers should create a submodule in skbio/io/
named after the file format it implements.

For example, if you were to create readers and writers for a fasta file, you
would create a submodule skbio/io/fasta.py.
In this submodule you would use the following decorators:
register_writer, register_reader, and register_sniffer.
These associate your functionality to a format string and potentially an skbio
class. Please see the relevant documenation for more information about these
functions and the specifications for readers, writers, and sniffers.

Once you are satisfied with the functionality, you will need to ensure that
skbio/io/__init__.py contains an import of your new submodule so the
decorators are executed on importing the user functions above. Use the function
import_module('skbio.io.my_new_format').

The following keyword args may not be used when defining new readers or
writers as they already have special meaning to the registry system:

format

into

mode

verify

If a keyword argument is a file, such as in the case of fasta with qual,
then you can set the default to a specific marker, or sentinel, to indicate to
the registry that the kwarg should have special handling. For example:

After the registry reads your function, it will replace FileSentinel with
None allowing you to perform normal checks for kwargs
(e.g. if my_kwarg is not None:). If a user provides input for the kwarg, the
registry will convert it to an open filehandle.

Note

Keyword arguments are not permitted in sniffers. Sniffers may not
raise exceptions; if an exception is thrown by a sniffer, the user will be
asked to report it on our issue tracker.

When raising errors in readers and writers, the error should be a subclass of
FileFormatError specific to your new format.

Because scikit-bio handles all of the I/O boilerplate, you only need to test
the actual business logic of your readers, writers, and sniffers. The
easiest way to accomplish this is to create a list of files and their expected
results when deserialized. Then you can iterate through the list ensuring the
expected results occur and that the expected results can be reserialized into
an equivalent file. This process is called ‘roundtripping’.

It is also important to test some invalid inputs and ensure that the correct
error is raised by your readers. Consider using assertRaises as a context
manager like so:

withself.assertRaises(SomeFileFormatErrorSubclass)ascm:do_something_wrong()self.assertIn('action verb or subject of an error',str(cm.exception))

A good example to review when preparing to write your first I/O unit tests is
the ordination test code (see in skbio/io/tests/test_ordination.py).