Monthly Archives: July 2021

The day SQL Image Viewer outsmarted me

Never thought this would happen, but I was stumped by my own application for a good 24 hours before realizing what was happening.  I’m getting too old for this ****

A user wrote in asking why SQL Image Viewer could identify XML content in his SQL Server table, but he could not use his database’s XML functions to query the XML data.

No problem, or so I thought.  Using a subset of the database he sent, I could see the first few bytes of the XML content and also query the size in SSMS:

Ok, so that’s about 119 Kb of data.  In SQL Image Viewer, the following is returned:

So far so good, SQL Image Viewer identified the XML content.  Now what happens when I try to use the XML function in SSMS:

SQL Server raised an error: XML parsing: line 1, character 2, illegal xml character

Ok, so what’s wrong with character 2?  Back in SQL Image Viewer, I opened the hex viewer, and the XML header looked fine:

Ok then, I decided to export the XML file to disk, and could open it in my browser.  Then it stuck me that the file was over 2 MB in size, but the blob size was only 119 Kb.

What was going on here?  In the hex viewer I scrolled to the end of the XML content, and it too was indicating the content was about 2.6 MB in size.

Why was SSMS and SQL Image Viewer reporting the blob size as 119 Kb, but when viewed in the hex viewer or exported, the content was closer to 2.6 MB in size?

This being a SQL Server database, my first thought was that row or column compression was in place.  However, the compression/decompression would have been done transparently and wouldn’t be showing up this way.  I checked anyway, and sure enough, no row/column compression was active.

Feeling rather lost, I explained the situation to the user hoping they might shed some light on what was going on.  Later, in the shower, while clearing my head (literally and figuratively), it stuck me: zlib streams.

Basically, if the blob content is a zlib compressed stream, SQL Image Viewer automatically decompresses the stream, identifies the content type, and lets you work with the decompressed data.  That was exactly what was happening here – the zlib compressed stream of 119 Kb was being uncompressed to 2.6 MB, identified as an XML file, and subsequent viewing and exporting allowed me to work with the uncompressed data.  I had totally forgotten about this feature!

This also explained why the XML functions could not work directly on the blob content – it was a zlib stream and not a XML text file.

Next time, I should compare the first few bytes as displayed in SSMS against the  values displayed in SQL Image Viewer.  It would then have been immediately clear that we were working with a compressed stream.

And that is how my own application fooled me (or my memory is just getting poorer).

Extracting and viewing PDF files in a SimpleIndex database

SimpleIndex is an application that stores PDF files in SQL Server databases.  The PDFs are stored in the General table, in the Image column.  That column is of the SQL Server image type, generally used to store binary data (or blobs).

We recently had a user that could not export the PDF files from his SimpleIndex database.    The user was very helpful to send us the original PDF file, and the PDF file content as stored in his database for comparison.  Turns out that SimpleIndex first converts the binary data in your PDF files to a unicode text string, then stores this unicode text.

This is certainly a strange way to go about storing a file in a column that has a data type that’s perfectly suited to storing binary files.  In addition to making it difficult to extract the PDF file, it doubles the storage requirements.  Whatever the reason, SQL Image Viewer could not identify and display the PDF files.

So the first thing we did was to convert the exported ‘mangled’ PDF files back to their original state.  We added this feature to our UTF8Tool application.

Now the user could export the content from his database, then use this tool to convert those files into proper PDF files.

However, the user wanted to be able to query and view the PDF files from within SQL Image Viewer.  We ended up with the easiest option – we created a second column to store the proper PDF content, then create an application to convert the mangled PDF files from the first column and store them in this new column.

It’s a very basic unpolished conversion application, and you can download and use this application freely.

IMPORTANT NOTE:  This application updates a column in your database, so please make a backup of your database first, in case things don’t turn out exactly the way you want it.

Currently, this application supports only SQL Server databases.  If you need a version that supports another database engine, or if you require some modifications, drop us a line at support@yohz.com.

Now that the user could display the PDF directly in SQL Image Viewer, we had another issue.  By default, SQL Image Viewer only displays the first page of each PDF file that it detects e.g.

This made it difficult to review each PDF file in details.  Fortunately, SQL Image Viewer (professional edition) supports custom layouts.  With custom layouts, you can choose to display a specific number of pages from your PDF files, and also the size of the pages e.g.

Now the user could easily preview more of his PDF files prior to exporting them.

In summary:

  • if you need to convert binary data from unicode files to ansi files, give our free UTF8Tool a try
  • if you want to convert a column containing binary data in unicode text format to ansi text format and store the converted data in a separate column, give this free application a try (but please back up your database first)
  • if you want to preview multiple pages of a PDF file, use the custom layouts feature in SQL Image Viewer (professional edition)
  • if you have data that SQL Image Viewer/Access OLE Export/SQL Blob Export does not recognize, send us an email at support@yohz.com

 

New product suggestion: SQL Blob Edit

We recently helped a client convert PDF files stored as a unicode string in a SQL Server binary column to actual PDF files, so that they can view the PDF files from within SQL Image Viewer.

Expanding on this idea, we could develop a product that helps users convert binary data from one form to another e.g.

  • convert images from one format to another
  • process images e.g. resize, add watermark etc
  • compress blob data using zip or lz algorithms to reduce blob size
  • link to external applications to process the blob data (e.g. ghostscript to compress PDF)

We would like to hear your thoughts  to know if such an application would actually be useful to you.  Please leave your comments below, thank you.