OMR engine output file format

Disclaimer: I cannot guarantee the correctness or completeness of
the information here. I cannot guarantee that the format
will not change in the future, though I have made a serious attempt to
make it future-proof.

Overview

The format is text-based and human readable (with difficulty). It is not
intended to be edited by hand.

It is extendible, so that programs that read the format should be able to
read newer versions and skip the parts they don't understand. I doubt that I
have acheived this aim entirely, but I hope the changes needed to keep
reading programs up to date will be minimised.

It is only intended that information will stored in this format as a
transition into another notation format. It is too bulky and limited as a
general purpose format.

SharpEye is written in C and the format reflects that. It is intended to be
read using fscanf() and its structure closely reflects the C structs and
arrays I use.

The output from the OMR engine (Liszt) is less interpreted than the
output from SharpEye. The same format is used for both kinds
of file.

The current interpretations done by SharpEye: SharpEye unifies time
signatures (makes all time signatures occuring at the same time the same)
and makes the score 'rectangular', ie the same number of staves per system.
SharpEye attaches slurs/ties to notes where it can, and decides if they are
ties or not. SharpEye attaches lyric syllables to notes where it can.
SharpEye does rhythm analysis when it loads, and this includes guessing
which notes belong to a triplet.

I use the file extension .mro on Windows, a file of type 'SharpEye'
or 0x183 on RISC OS for these files.

General Preliminaries

Syntactical structure

The first thing in the file is an identifier. The rest consists of

[name] [value]

pairs. Each [name] is a text string made of printable non-whitespace ASCII
characters. It is a 'slot name' or 'field name'. The [value] can be of two
main types: simple or complex.

Simple values are of two types. (1) They can be strings containing printable
non-whitespace ASCII characters, usually representing numeric, boolean, or
'type' information. (2) They can be strings enclosed in double quotes (as in
a CSV file) representing textual information. Non-ASCII characters may
appear between the double quotes but nowhere else in the file.

It is important to note that names, values, '{', and '}' are always
separated from one another by whitespace.

The name preceeding a textual value always ends with a '$'. Other names
never do.

Many items in the file are in arrays or lists. In order to conform with the
"name-value" structure, arrays look like:

arrayname { nof 2 elementname {...} elementname {...} }

(The 'nof' is literal, arrayname and elementname will vary.)

When reading this format, you should not assume that parts of any structure
occur in any particular order. You should assume that you will find [name]
tokens you don't understand.

Within reason, you should not count on finding things within a structure.
For example, you won't find a list of clefs in a bar with no clefs in it,
or a list of lyric lines for a stave if there are no lyrics. In most cases,
the bottom level fields will all be present. It would be absurd to have a
note with no pitch, for example. Mostly, information will be explicitly
present even when there are obvious defaults, eg the note head structure
will say "accid None", but is safest for future compatibility
to construct defaults, then overwrite them with what you find
in the file (if you find anything).

Low level interpretations

The value 'True' means true or present, 'False' for false or absent. Eg:

staccato True

- this note has a staccato dot.

Integer values are represented in decimal form with an optional minus sign.
Eg:

nofpages 3

- there are 3 pages in this score.

A pair of integer values is represented by two decimal numbers with a comma
between, often to represent a position. Eg:

flagposn 66,86

- the flag position of this chord is 66 units from the top and 86 units
from the left of the stave's top-left.

A rational number is represented by two decimal numbers with a forward slash
'/' between, often to represent a time. Eg:

tupletransform 2/3

- this note is to last 2/3 of its normal value.

There are no floating point numbers in the current version.

Text strings in double quotes, using "" within the string if a double
quote is needed. The text may be encoded in ASCII, or ISO8859-1, possibly
UTF8 or others in the future. Note that ISO8859-1 and UTF8 are both
extensions of ASCII, so a string in ASCII is the same in all three
encodings.

Some general conventions

These are not strictly adhered to. They are to aid readability.

* Use all lower case for slots (field names).

* Use upper case initials for types/shapes.

* For integer values which are normally nonnegative, use -1 for
impossible/nonexistent/unknown. Use 0/0 for a similar purpose for rationals.

Comments

The names 'comment' and 'comment$' are reserved. They will never be used to
represent any musical element. Therefore, as long as the values following
them obey the syntax, they will be skipped by a reading program.

The most generally useful form is:

comment$ "This is a comment"

Units

All graphical coordinates increase to right and down. They written
as row,column pairs, ie y,x.

There are 16 units between stave lines in the output, at least for the
current version. Nearly all coordinates are in these units. Exceptions wll
be pointed out.

Some values are in 'pitch units' where the midline of the stave is zero,
with values going up towards the bottom. So the note B on a the midline of a
treble stave is 0, C is -1, D is -2, etc.

Main structure

Since the format is extendible, there could be other structure in
later versions. Things will be added within the score structure,
so what follows is a minimum you can expect to find.

A score has some information to itself, plus a list of pages.

A page has some information to itself, plus a list of systems.

A system has some information to itself, plus a list of staves, plus a list
of slurs/ties.

A stave has some information to itself, plus a list of bars, plus a list
of lyric lines, plus a list of dynamics (ppp...fff and hairpins).

A bar has has some information to itself, plus a list of clefs, a list of keysigs,
a list of chords, a bar line, and possibly a timesig.

A chord is fairly complex, and is used to represent single notes, rests as
well as proper chords.

A slur represents a slur or tie or phrase mark.

A lyric line has some information to itself, plus a list of elements
(syllables).

fileheader {...} is
version [N] characterencoding [E]

[N] is the version number of the file as an integer. It is 1000-1999 for
version 1 of SharpEye (currently only 1000 used). It is 2000-??? for
version 2 of SharpEye. Currently 2000 (SharpEye 2.00-2.10),
2011 (SharpEye 2.11-2.30), 3000 (SharpEye 2.31-2.49),
3100 (SharpEye 2.50-??) are possible.

[E] is "ASCII" or "ISO88591" or "UTF8". It is ASCII in v1000, and ISO88591
in v2000,v2011,v3000,v3100.

[T] is the title of the piece of music. It may be the empty string, ie "".
[U] is the number of units per stave spacing. In the current version (SharpEye
1 and 2) this is 16. Thus a normal 5-line staff is 64 units high. The positions of
objects are stored in these units.

Note that the file contains some positions relating to the input image.
These are in pixels. When generating another format you would skip these,
and they are ignored here.

[K], [R], [C], are for mapping between output and input coordinates by
SharpEye.

[S] is the input staff spacing units of 1024 per pixel, for the 'dominant'
size of staff on the page. In general that means the most common size of
staff, but don't count on that: since the scale is estimated before staves
are found, it is even possible that it will find no staves of the dominant
size. This number can be used together with unitsperstavespacing in the
score structure to relate the dimensions in the output to the original image.

[T] is the distance between top of page and top of system. [L] is the
distance between left of page and left of system. [W] is the width of the
system, and [H] is its height, from top of top stave to bottom of bottom
stave.

[T] is the distance between top of page and top of stave. [L] is the
distance between left of page and left of stave. [W] is the width of the
stave. In SharpEye version 1 and 2 at least [L] and [W] will be identical to the values
for the system.

[Z] is the (vertical) size of the stave. It will normally be very close to
64 since the spacing between lines is 16 units. For a stave which is not the
dominant size on the page, it may be bigger or smaller. Also see the 'spacing'
field in the page structure.

[VS] is 'True' or 'False'. It will always be False in an mro file directly
from the recognition engine, but can be set by user of SharpEye. [JB] is
'True' or 'False'. From 2.50 stave braces as join left and right
hand piano staves are recognised and will result in this being True. Both
[VS] and [JB] affect export from SharpEye as NIFF, MusicXML, MIDI.

Note that a 'bar' is a physical/graphical bar, not always a musical/logical
bar. A bar ends at a barline, but that barline may be a double bar line, or
a repeat sign and does not always mean the end of a musical bar.

Note also that symbols in the bar are stored by type, and not left to right.
They need to be sorted in order to make musical sense of them. The symbols
have position information, and this can be used for sorting.

clef {...} is
clef { shape [S] centre [r,c] pitchposn [P] }

[S] is one of 'Treble' 'Bass' or 'Alto' (G clef, F clef, C clef).

[r,c] is the position relative to stave top-left of the centre of the clef.

[P] is the 'pitch position' of the clef. It is in pitch units. For
a standard treble clef it will be 2, bass clef -2, alto 0, tenor -2.
Currently you won't see any other values. It doesn't make much odds
for now, but the [P] value should be used in preference to the r
value ready for the day when eg baritone clefs are recognised.

Version 2000: [S] can now be 'TrebleUp8' 'TrebleDown8' as well as the
above, meaning a treble clef with a little 8 top or bottom to indicate
an octave shift up or down.

Version 2000: [P] can now be any of -4,-2,0,2,4 for Alto clefs.

keysig {...} is
keysig { key [K] centre [r,c] }

[K] is an integer in the range -7 to 7. Negative numbers count flats, and
positive ones count sharps.

[r,c] is the position relative to stave top-left of the centre of the
keysig.

nofmmrestbars is new in version 3100. marcato, staccatissimo,
upbow, downbow, trill, mordent, invmordent are new in version 3100.

Like NIFF's stem, and ENIGMA's entry, this structure represents chords,
single notes, and rests. Single notes are regarded as 'degenerate' chords,
and rests as silent chords.

[V] is 'True' or 'False'. If True it means there is no stem, ie the chord is
a breve, semi-breve or rest. (it's redundant but convenient.)

[U] is 'True' or 'False'. If True it means the stem points up from the
note(s).

[SS] is 'True' or 'False'. If True it means the stem has a slash, as in
an acciaccatura grace note.

[p/q] is the multiplier applied to the time to deal with tuplets. It is
1/1 for most notes, and 2/3 for notes in triplets.

[TC] is 0 for notes not in a tuplet, otherwise a count from 1. This
is used when editing. Tuplets is an area needs reworking, and you
should ignore this. (Sharpeye version 1)

[TI] is -1 for notes not in a tuplet, otherwise an integer >= 0 that
uniquely identifies the tuplet within a bar. (Version 2000 onwards)

[MMR] is the number of bars (measures) in the multi-measure rest.

[E] is 'True' or 'False', and signify the presence or
absence of a various expression marks, articulations and ornaments.
'pause' means a fermata sign. 'invmordent' is an inverted mordent.
The others should be clear. It is likely that these will not be present in later
versions if the value is False. Eg. there will either be "staccato True" or
nothing.

[SDR], [TDR], [PDR], [ADR] are not yet implemented. They are the
vertical offset of the centre of the expression mark from the
chords flag position. They are therefore positive if the expression
is below the flag.

[R] is the number of augmentation dots following the chord. It is 0,1,2 or
3.

[F] is the number of flags on a chord which is not a rest, or part of a
beamed group. It is 1 for a quaver, 2 for a semi-quaver, etc. NB: This
applies to grace notes as well as normal notes. An earlier version of
this documentation said otherwise.

Also note that flags on grace notes are not currently counted (August
2001) so this field will be 1 for all grace notes for the time being.

[SS] is True or False. If True it means the stem has a slash, eg for
acciaccatura. This field will always not be present so assume a default of
false when reading. Stem slashes are not currently (August 2001)
recognised by the engine. (New in version 2000).

[r,c] is the position of the flag or beam end of the stem on this chord. In
the case of a stemless note (rest, breve, semi-breve) the c value is still
valid, and is the centre of the note, chord or rest.

[H] is the position of the head that is furthest from the flag or beam. It
is in 'pitch' units, which means the midline of the stave is zero, with
values going up towards the bottom. So the note B on a the midline of a
treble stave is 0, C is -1, D is -2, etc.

Note that there will always be at least one note in the note list, which has
further information.

beam {...} is
beam
{ id [I] nofnodes [N] nofleft [L] nofright [R] }

Like NIFF, beams are made of 'nodes' There is one node for each chord that
the beam joins.

Version 2000: Grace notes are new. Version 3100: multi-bar rests are new.

[O] is the stave offset of this notehead. It is usually zero, meaning that
the notehead belongs to the same stave as the chord structure. However, when
a chord or beamed group spans more than one stave, it is regarded as
belonging logically to the uppermost stave on which it has any noteheads,
and any noteheads which belong to staves below this will have a positive
stave offset. The engine currently doesn't recognise multi-stave
objects like this, so [O] will be 0.

Version 2000: The engine does now recognise multi-stave objects, but
it chops them up into single-stave objects, so [O] remains at zero.

[P] is the pitch position, using the same encoding as the [H] field in the
chord structure. It also gives the vertical position of a rest in the
case where a chord structure is used for a rest. In this case, [P]
is the position of the centre of the rest in most cases, but the top
of a rest for semibreve rest, and the bottom for a minim rest.

Slurs, ties, phrase marks, any other curves found are approximated
by an arc, and assigned to a system, but not interpreted further by the OMR
engine.

The coordinates are relative to the system top left. [RAD] is a signed value,
a negative values means the slur is above the centre of the arc, ie it is
like /^\, and a positive value means \_/. The absolute values of RAD
is the radius.

[A] is the vertical coordinate of the line relative to the top of the stave.
It is the postion of the baseline of the text, the bottom of an 'a', not a 'g'.

[H] is the height of the text. This is the point size of the text (height of
'Ăg'). Not properly implemented until version 2011 of this format (version
2.11 of SharpEye). If reading an earlier version than 2011, should ignore
this and make up a default. 36 is about right, ie 2.25 stave spacings. From
2011 this is based on the scan. It is probably best to average these values
for all the lyrics in the score.

[E] is 'True' or 'False'. If true it means this lyricelement is an extender
line like this_______ and the text$ field should be ignored. Currently, [E]
will always be False as extender lines are not recognised.

[L], [R], [M] are the left, right, and middle x-posns of the element.
I intend using [L] and [R] for extender lines and [M] for syllables.
Currently you should only rely on [M].

If [E] is False, [TEXT] is the text of the syllable, in double quotes. Syllables
with one or more hyphens following are represented by making the last character
in the syllable a hyphen.

[T] is the type (function) of the text. 0 is musical direction, 1 is chord.
Currently, 0 means anything that isn't a lyric or a chord, since the recognition
does not distinguish further. Version 3000: type is new.