Using Microsoft Open XML SDK for on-demand document services

ON-DEMAND DOCUMENT SERVICES

Many of our customer-facing applications have the requirement to provide documents on-demand. One example is, that we want to offer a possibility to get real-time quotes of our products in the most simple and intuitive way for our users.

CHOOSING THE RIGHT TECHNOLOGY – MICROSOFT OPEN XML

Each type of document has to be provided in a widely adopted file format. In the case of spreadsheets, the file format needs to be supported by one of the industry standards – Microsoft Office Excel.

Microsoft, Ecma and ISO / IEC introduced Open XML file standard back in 2006 and now it’s a widely accepted open standard for spreadsheet documents, charts, presentations, and word processing documents.

Microsoft provides a .net Standard compatible NuGet package library – an OpenXML SDK DocumentFormat.OpenXML, available under MIT license.

DEVELOPMENT

To develop the needed document solution, the only required part is the NuGet package.  Although OpenXML SDK is meant to abstract XML structure away from the developer for easier development, as a developer, you need to work with many abstraction layers.

The library does not include almost any validation logic. The developer can easily create a document, which is not built by OpenXML standards – no exception or error is thrown at a runtime, but by opening a document in Office editor, it might. This is why unit testing is a must.

GENERIC FUNCTIONALITY

We started with the development of a functionality base. A set of methods that create and manipulate a document has been written in parallel with unit tests. Once the “solid ground” was built, we have done some optimization refactoring.

We ended up with many logically separated methods. In the following code snippet, you can see one of those methods. This example shows a method, which copies a set of rows in the Spreadsheet document and inserts them at a given index.

Manipulating a Spreadsheet document – method make a copy of rows and insert them at a given index

 

 TEMPLATING

Focusing on spreadsheet and word documents, we separated static and dynamic content. In case of the spreadsheet document, we designed the template (design and content) in Excel editor and developed a service that fills the template with additional content. The end result is our desired spreadsheet file.

Chunk of Spreadsheet template – cells with keys define the design waiting to be filled

 

 OPTIMIZATION PROBLEMS

Manipulating the whole document structure should be well thought out. If every node change demands an additional search for that node in the document object and the complete restructuring of that document, so that its structure is in a correct state, a more customized approach is necessary. In general, logic should not be “too generic”.

The previously mentioned method CopyRowsAndInsertAt copies needed rows and reindex all others every time it is called. Many sequential calls of such logic are better to replace with a more customized logic, for example, to do all copies of the needed rows and then reindex all rows in one big swing. Such an approach makes execution way faster.

Part of generated Pricelist excel file – also optimized for printing

LOOKING INTO THE BRIGHT FUTURE

Although the pace of development was not as fast as we anticipated, we believe that the OpenXML format is well accepted by the community as well as DocumentFormat.OpenXML library by developers. That gives us the confidence to use this technology as a robust platform for on-demand document services in the future.

back to overview

Leave a Reply

Your email address will not be published. Required fields are marked *