Extracting and viewing PDF files in a SimpleIndex database

SimpleIndex is an application that stores PDF files in SQL Server databases.  The PDFs are stored in the General table, in the Image column.  That column is of the SQL Server image type, generally used to store binary data (or blobs).

We recently had a user that could not export the PDF files from his SimpleIndex database.    The user was very helpful to send us the original PDF file, and the PDF file content as stored in his database for comparison.  Turns out that SimpleIndex first converts the binary data in your PDF files to a unicode text string, then stores this unicode text.

This is certainly a strange way to go about storing a file in a column that has a data type that’s perfectly suited to storing binary files.  In addition to making it difficult to extract the PDF file, it doubles the storage requirements.  Whatever the reason, SQL Image Viewer could not identify and display the PDF files.

So the first thing we did was to convert the exported ‘mangled’ PDF files back to their original state.  We added this feature to our UTF8Tool application.

Now the user could export the content from his database, then use this tool to convert those files into proper PDF files.

However, the user wanted to be able to query and view the PDF files from within SQL Image Viewer.  We ended up with the easiest option – we created a second column to store the proper PDF content, then create an application to convert the mangled PDF files from the first column and store them in this new column.

It’s a very basic unpolished conversion application, and you can download and use this application freely.

IMPORTANT NOTE:  This application updates a column in your database, so please make a backup of your database first, in case things don’t turn out exactly the way you want it.

Currently, this application supports only SQL Server databases.  If you need a version that supports another database engine, or if you require some modifications, drop us a line at support@yohz.com.

Now that the user could display the PDF directly in SQL Image Viewer, we had another issue.  By default, SQL Image Viewer only displays the first page of each PDF file that it detects e.g.

This made it difficult to review each PDF file in details.  Fortunately, SQL Image Viewer (professional edition) supports custom layouts.  With custom layouts, you can choose to display a specific number of pages from your PDF files, and also the size of the pages e.g.

Now the user could easily preview more of his PDF files prior to exporting them.

In summary:

  • if you need to convert binary data from unicode files to ansi files, give our free UTF8Tool a try
  • if you want to convert a column containing binary data in unicode text format to ansi text format and store the converted data in a separate column, give this free application a try (but please back up your database first)
  • if you want to preview multiple pages of a PDF file, use the custom layouts feature in SQL Image Viewer (professional edition)
  • if you have data that SQL Image Viewer/Access OLE Export/SQL Blob Export does not recognize, send us an email at support@yohz.com

 

Leave a Reply

Your email address will not be published. Required fields are marked *