Effective conversion of all the file types in use required desktop processing. The solution is a native Windows application, tightly coupled to the existing web repository.
PDF files proved the most difficult format to convert; tabular content in particular. We overcame this by providing the ability to recapture PDF documents as images where the conversion to text was inadequate.
The software uses a RESTful service to authenticate users with their existing web application credentials. Once logged in, data is retrieved from and written to the web using the same service.
With a couple of mouse clicks the desktop application can generate a zip of the current log file and a dump of processing data, allowing any issues that arise to be resolved straightforwardly.
With thanks to …
The software relies heavily on external applications: Microsoft Office DLLs, Calibre and PDFToImage for document conversion; HTMLTidy to keep formatting consistent; Nvu for HTML viewing and editing. It uses SQLite, both for data storage and for logging. On reflection perhaps I shouldn’t claim to have written the software at all: most of the functionality that matters comes from others.