Tales of LibreOffice interoperability: the missing files

I’m welcoming the release of LibreOffice 4.3.0 in the name of Igalia with the last post in this series. This time we will talk about the preservation of embeddings in OOXML text documents, and devote some lines to the support of Standard Document Tags.

Embedded content in documents

Our goal this time was the preservation of embedded content in OOXML text documents, as a first step towards full support like we did with other features. The insertion of new embeddings in .docx documents or the edition of existing ones will have to come later in the future.

An embedded document usually consists of two files; one of them is a preview picture to be shown in the parent document, and the other one is the actual embedded document. For the case of a spreadsheet embedded in a text document, the most common case, you will find these two files in the document:

[code]
/word/media/image1.emf
/word/embeddings/Microsoft_Excel_Worksheet1.xlsx
[/code]

With the corresponding entries in the relations file:

[code language=”xml”]
<Relationship Id="rId2"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/package"
Target="embeddings/oleObject1.xlsx" />
<Relationship Id="rId3"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="media/image1.emf" />
[/code]

The relevant bits in the document.xml file are below. Notice a w:object consists of one shape, which is filled with data from an image file, and the OLE object itself, linked to the embedded spreadsheet. Also notice the attribute ProgID, which defines the program, document type and version:

[code language=”xml”]
<w:object>
<v:shape id="ole_rId2"
style="width:362.25pt;height:146.25pt" o:ole="">
<v:imagedata r:id="rId3" o:title="" />
</v:shape>
<o:OLEObject Type="Embed" ProgID="Excel.Sheet.12"
ShapeID="ole_rId2" DrawAspect="Content"
ObjectID="_570182397" r:id="rId2" />
</w:object>
[/code]

There is one more element that allows the embedded file to be properly detected by Word, it’s a content type definition in the content types file:

[code language=”xml”]
<Override PartName="/word/embeddings/oleObject1.xlsx"
ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" />
[/code]

As you can see there are three elements that determine the kind of embedding we are dealing with, and Word requires the right combination of the three of them:

  • The properties in the tag in document.xml
  • The ContentType for the file defined in [Content_Types].xml
  • The Type of the Relationship defined in document.xml.rels

The most convenient way to achieve our goal was using the grab bag technique to store the ProgID attribute of the object, and infer the correct content type and relation type. Some examples:

  • An object with ProgID Excel.Sheet.12 is a OOXML spreadsheet. Its media type must be application/vnd.openxmlformats-officedocument.spreadsheetml.sheet and the relation type is http://schemas.openxmlformats.org/officeDocument/2006/relationships/package.
  • If the ProgID is Excel.Sheet.8, this is an old Office spreadsheet. Now the media type must be application/vnd.ms-excel and the relation type http://schemas.openxmlformats.org/officeDocument/2006/relationships/oleObject.

If you detect a particular type of embedding in your documents that isn’t being preserved, drop us a line in the Bugzilla. A patch to add new relations of this kind should be quick and easy.

Bonus track: Structured Document Tags

Structured Document Tags (SDTs) is a family of document objects that contains form-like controls, citations, contents tables or bibliography tables among many other. This variety of uses means that they can live inside a paragraph or they can be a high-level element that contains several paragraphs and even shapes, which of course is tricky to implement.

For 4.3.0 we have worked on some of these tags, and we can say we properly implemented the import and export of combo, date and check boxes. We also wrote some code to preserve generic SDTs and now most of the tags are preserved but there are formatting issues. The proper way to support every kind of SDT is translating them to the equivalent objects in LibreOffice on import and translate them back to SDTs on export, but that will require time and work. Any volunteers? 😉

Wrap-up

Despite the 6-month development cycles, I’m feeling like the development of 4.3 line started a long time ago and I may have forgotten to write about some little feature or fix… Anyway, it’s time to close this batch of blog posts about interoperability features, all of them developed by Igalia and sponsored by CloudOn.

Enjoy our shiny new LibreOffice, and happy hacking!

2 thoughts on “Tales of LibreOffice interoperability: the missing files

  1. So basically, viewing embedded object like picture, OOXML document/spreadsheet, plain text in a MS Office document is still not supported by LibreOffice (at least untill LibO 5.3 version in my machine?

    Is there any plan to suppport this? I’m very frustated with my daily office document because of this.

Leave a Reply

Your email address will not be published. Required fields are marked *