LibreOffice & OpenOffice documents

All simple quotes "'" in texts are coded with "&apos;" but they are automatically replaced by the OpenTBS plugin.

The main information is stored in the file 'content.xml'.

The pictures are stored in the directory 'Pictures' and should be registered into the file 'META-INF/manifest.xml'. (OpenTBS does it automatically for you when you use parameter "addpic").

Since OpenOffice 3.2, if a picture is not registered in the Manifest file, then it can produce a message error when opening the document.

Video and sound cannot be stored in OpenOffice documents.

Main file (content.xml)

Synopsis

<office:document-content>
...
<office:body>
<office:text>
<text:p text:style-name="Standard">
Normal new lines are made with a new paragraphs <text:p>...</text:p>
Simple new lines are made with <text:line-break/>
Tabs are made with <text:tab/>
Page-breaks are made with a new paragraph having a style which has the attribute fo:break-before="page" or fo:break-after="page".
Note that the page-break does not work if the attribute is in the paragraph element. A "break-before" at the first page, or a "break-after" on the last page has no effect.
Local styles (bold, color,...) are made with <text:span text:style-name="T1">...</text:span>
</text:p>
<text:h>...</text:h> A paragraph typed as Header
<text:list>...</text:list> A list of items
<table:table>...</table:table> A table
</office:text>
</office:body>
</office:document-content>

Cells merged vertically

Cell that are not displayed because of vertical merging or horizontal mergin, are replaced with <table:covered-table-cell/>. But such entities seems to be optionnal: LibreOffice display correclty the table if they are ommited.

Attributes table:number-rows-spanned="1" seems to be supported and has no effect.

ODP

Header and footer contents are saved in "content.xml" file.
They are defined as available styles and can be use for any slide and for the whole document in the "handout" view which is a set of several slide.

Microsoft Office documents

That is documents with extension DOCX, XLSX, PPTX.

Microsoft Word document (.DOCX)

The main file is usually "word/document.xml", but its actual location is defined in the file "[Content_Types].xml", in the element: <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>

Note: I've test to change the "word/document.xml" name in both the "[Content_Types].xml" file and the archive, but this makes
Word 2010 to be unable to open the document, saying it is corrupted.

The main file "word/document.xml" (DOCX)

<w:document>
<w:body>
<w:p> // New paragraph
<w:pPr> // Parameters of the paragraph
<w:rPr> ... </w:rPr> // Set of parameters for a Run
<w:sectPr> // Start a new section. Sections are a set of page layout (margin, columns, ...) available until the next section.
<w:type w:val="continuous"/> // May be present whe the section is defined manually.
</w:sectPr>
<w:pageBreakBefore/> // Page break before the paragraph (way #1)
</w:pPr>
<w:r> // New run item. A run item is a set of content having common layout properties.
<w:rPr> // Set of parameters for a Run. Examples: <w:i/> is italic, <w:b/> is bold. </w:rPr>
<w:t>Your text is here</w:t>
// Simple new lines are made with <w:br/>// Page breaks can also be made with <w:br w:type="page"/> (way #2)
</w:r>
<w:tab/> // Tabs are placed between <w:r> elements.
<w:r>
<w:t xml:space="preserve"> Next text </w:t>
</w:r> // spaces between entities are dealt using attribute xml:space="preserve"
</w:p>
</w:body>
</w:document>

What are attributes "w:rsidR" and "w:rsidRPr" for?

Attribute "w:rsidR" is a Revision ID. Each new user on a doc has a new id,
and each of its modification is marked with its RsID.

Cells merged vertically

<w:tc>
<w:tcPr>
<w:vMerge w:val="restart"/> // marks the cell to start a new cell-merging
<w:vMerge w:val="continue"/> // marks the cell to continue the cell-mergin (the cell is merged with a previous one having "restart" or "continue")
<w:vMerge/> // same as above // no <w:vMerge> entity means the cell is not merged.
</w:tcPr>
...
</w:tc>

Cells mertged horizontally

Headers and footers (DOCX)

They are 3 types of headers and footers in Microsoft Word : Default, Even (for even numbered pages only) and First (for the first page only). Event and First types are optional.

Each section of the document may have its own set of header/footer of the 3 types, but by default a new section has his headers/footers linked with the previous sections.

Each headers and footers are saved in separated XML file. If no header/footer is defined for the document, then they are no header nor footer XML files. Even and First headers are optional, they may not be defined for a section, and so have no corresponding XML files.

Example of header and footer files: "word/header1.xml" and "word/footer1.xml".

The actual type and locations of Headers and Footers are defined in the main document "word/document.xml"with the section's properties.

Charts (DOCX)

The first chart is saved under "word/charts/chart1.xml", and so on for the next ones. The XML file of the chart contains a copy of the data used for the chart.

If the chart is designed manually, then "chart1.xml" also contains references to cells of an Ms Excel files that is used by Ms Word for managing series.

The Excel file is emmbeded in the Docx file, for exemple: "word/embeddings/Worksheet_Microsoft_Excel1.xlsx". The path of the Excel file is saved into "word/charts/_rels/chart1.xml.rels".
Nevertheless the references to that Excel file are optional and can be deleted from the XML of the chart.

Title of the chart, the axes and the series are saved in "chart1.xml". Other custom text boxes are saved in a shape file. For example : "word/drawings/drawing1.xml".

Example of a series saved in the XML (the tags are different for an XY series):

Microsoft Excel spreadsheet (XLSX)

General

An Excel workbook can have one or several worksheets. The contents of cells are saved in worksheets.
Worksheets files are named "xl/worksheets/sheet1.xml", and also sheet2.xml, sheet3.xml...

The file names are not the names defined in Excel by the user, they are internal names. But it seems that there is always at least a worksheet named "sheet1.xml".

All string values of cells are stored in the file "xl/sharedStrings.xml". The cells contains in fact the index of the string in the sharedStrings.xml file. This separation will probably make difficulties to merge an Excel sheet.

All sheets of the workbook are listed in the file "xl/workbook.xml".

Synopsis of a sheet file like "xl/worksheets/sheet1.xml" (XLSX)

<worksheet>
...
<sheetData>
<row r="2" spans="2:2" ht="90">
// A range of one row in wich several cells are defined
<c r="B2" s="1" t="s">
/* Definition of a cell:
* Attribute r is the address if the cell in the sheet (format A1). This attribute is optional.
* Attribute s is the style of the cell (the format). Styles are saved into the file 'xl/styles.xml' but I have not found the link yet.
* Attribute t is the type of data, by default it is numerical
* t="s" means that the displayed value is a string, the saved value is the index if the string taken in file "sharedStrings.xml".
* b: boolean, d: date, e: error, n: number, s: shared string, inlineStr: inlinde string, str: string as the result of a formula
*/
<f>B13+B14</f> // The formula if any. If there is no formula, this tag is absent. The type of <c> is the type of the result.
<v>0</v> // The inner value without formatting.
// If t="s" then the value is in fact the index of the string in the "xl/sharedStrings.xml" file.
// If t="str" then the value is the string result of the formula.
</c>
<c r="C2" s="1" t="inlineStr">
/* The type "inlineStr" is a special value that allows the string to be stored in the cell instead of in the file "sharedStrings.xml".
* It is used by OpenTBS to transert string with TBS fields from "sharedStrings.xml" into the XML of the sheet.
*/
<is><t>This is a string</t></is>
</c>
</row>
</sheetData>
</worksheet>

Headers and footers (XLSX)

Like as DCX, they can be up to 6 headers/footers for each sheet.
Header and footers are saved in the sheet file.

<headerFooter differentOddEven="1" differentFirst="1">
<oddHeader>My header for odd page in this sheet</oddHeader>
<evenHeader>My header for even page in this sheet</evenHeader>
<firstHeader>My header for first page in this sheet</firstHeader>
</headerFooter>

Pictures (XLSX)

Binary contents is saved as a file in "xl/media/".

The presence of pictures in the sheet is mentioned with a single <drawing> entity at the bottom of the <worksheet> entity. The <drawing> entity carries a reference id, which is defined the Rels file of the sheet. All properties of all the pictures in a sheet are finally saved in a third XML file.

File "xl\worksheets\sheet1.xml"

<worksheet ...>
...
<drawing r:id="rId2"/> // (only one entity for all pictures in the sheet)
</worksheet>

File "xl\worksheets\_rels\sheet1.xml.rels"

<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/drawing" Target="../drawings/drawing1.xml"/>
// (only one entity for all pictures in the sheet)