Category Archives: Access OLE Export

Extracting and viewing PDF files in a SimpleIndex database

SimpleIndex is an application that stores PDF files in SQL Server databases.  The PDFs are stored in the General table, in the Image column.  That column is of the SQL Server image type, generally used to store binary data (or blobs).

We recently had a user that could not export the PDF files from his SimpleIndex database.    The user was very helpful to send us the original PDF file, and the PDF file content as stored in his database for comparison.  Turns out that SimpleIndex first converts the binary data in your PDF files to a unicode text string, then stores this unicode text.

This is certainly a strange way to go about storing a file in a column that has a data type that’s perfectly suited to storing binary files.  In addition to making it difficult to extract the PDF file, it doubles the storage requirements.  Whatever the reason, SQL Image Viewer could not identify and display the PDF files.

So the first thing we did was to convert the exported ‘mangled’ PDF files back to their original state.  We added this feature to our UTF8Tool application.

Now the user could export the content from his database, then use this tool to convert those files into proper PDF files.

However, the user wanted to be able to query and view the PDF files from within SQL Image Viewer.  We ended up with the easiest option – we created a second column to store the proper PDF content, then create an application to convert the mangled PDF files from the first column and store them in this new column.

It’s a very basic unpolished conversion application, and you can download and use this application freely.

IMPORTANT NOTE:  This application updates a column in your database, so please make a backup of your database first, in case things don’t turn out exactly the way you want it.

Currently, this application supports only SQL Server databases.  If you need a version that supports another database engine, or if you require some modifications, drop us a line at support@yohz.com.

Now that the user could display the PDF directly in SQL Image Viewer, we had another issue.  By default, SQL Image Viewer only displays the first page of each PDF file that it detects e.g.

This made it difficult to review each PDF file in details.  Fortunately, SQL Image Viewer (professional edition) supports custom layouts.  With custom layouts, you can choose to display a specific number of pages from your PDF files, and also the size of the pages e.g.

Now the user could easily preview more of his PDF files prior to exporting them.

In summary:

  • if you need to convert binary data from unicode files to ansi files, give our free UTF8Tool a try
  • if you want to convert a column containing binary data in unicode text format to ansi text format and store the converted data in a separate column, give this free application a try (but please back up your database first)
  • if you want to preview multiple pages of a PDF file, use the custom layouts feature in SQL Image Viewer (professional edition)
  • if you have data that SQL Image Viewer/Access OLE Export/SQL Blob Export does not recognize, send us an email at support@yohz.com

 

Extracting files from a ProgressCRM database

We recently had a user who needed to extract files from a SQL Server database created by ProgressCRM and needed help.  He was helpful enough to send us a couple of the sample files that were not extracted correctly.

It turned out that each file that was being uploaded was stored in an OLE container.  Normally, our products (SQL Image Viewer, Access OLE Export, and SQL Blob Export) can extract the embedded files, but in this case, the entire OLE container itself had a 12-byte prefix, most likely added by ProgressCRM for their internal use.

So the task was to first extract the content without the 12-byte prefix, and then let SQL Blob Export handle the extraction of the files from the OLE container.

Luckily, this was easy enough to do in a SQLServer database.  We could use the SUBSTRING command to retrieve only the content from the 13-th byte onwards e.g.

SELECT SUBSTRING(CAST(<the column containing the binary data> AS VARBINARY(MAX)), 13, DATALENGTH(content))  FROM <the table name>

So if you find that your application is uploading your files with their own prefix data, you can use the SUBSTRING function (in SQL Server) to remove the prefix so that our products can recognize the actual file content and export them for you correctly.

Improved debugging in Access OLE Export / SQL Blob Export

Sometimes it happens that you may have some items that Access OLE Export or SQL Blob Export is unable to export correctly.  This can be due to many reasons e.g.

  • the binary data has been compressed by the uploading application
  • the binary data has been encrypted by the uploading application
  • the uploading application has appended additional header bytes to the data
  • an unsupported OLE server was used to upload the file

To troubleshoot these issues, we would ask the user to send us the raw data exactly as stored in the database.  This usually involves using SQL Image Viewer to perform the extraction.

We’ve now added the option to extract the raw data using Access OLE Export / SQL Blob Export.  You can do this by selecting the debugging – create bin files for analysis item.

Now when you have problems exporting your files, selecting this option will cause a raw copy of the data to be exported together, with the .debug.bin extension.  You can then send us these .debug.bin files for further analysis.

Extracting binary data from a nvarchar(max) column!

Recently we had a user that needed to extract PDF files stored in a SQL Server nvarchar(max) column.  Now, you wouldn’t normally be able to store binary files correctly in a text column, but the application they used (Deacom) first encoded the PDF file to UTF-8 text, then stored the UTF-8 text into the text column.

While SQL Image Viewer (and SQL Blob Export/Access OLE Export) could extract the UTF-8 encoded data from the table and store it in a file, the resulting PDF file cannot be opened because the data contained therein is invalid.  So we had to provide a small utility to convert the UTF-8 encoded file back to its original contents.  You can read more about the utility here.

After converting the PDF files, the user then compressed the PDF files and now wanted to upload the PDF files back into the database.  They used SQL File Import to perform the upload, but first the PDF files had be to UTF-8 encoded again, otherwise the application (Deacom) could not open the PDF files.  That can also be one using the above utility.

Finally, we had one last issue where only 8 Kb of the encoded PDF file was being uploaded.  The cause was traced to the use of the MDAC/ODBC driver to connect to SQL Server.

In our database products, when you connect to a SQL Server instance, the default is to let the application select the best available drivers.

By design, our products will use the SQL Server Native Client drivers if available, and only use MDAC/ODBC drivers when the Native Client drivers are not available.  However, MDAC/ODBC drivers limit text fields to only 8000 bytes.

When the user attempted to use the Native Client driver, the connection failed.

Installing and using the Native Client drivers then resolved the connection and upload issue.

Extracting attachments from an Access table using Access OLE Export or SQL Blob Export

Access OLE Export 4 and SQL Blob Export 4 have added support to extract files from attachment-type columns in Microsoft Access tables.

Say we have a table named table001 with the following structure:

Attachment-type columns in Access contain sub-columns that contains details of one or more attachments stored in that row.  If we just select the table during the export process like this:

we would not be able to extract the attached files.  What we need to do is to select the Extract items using SQL query option, and enter a query to retrieve the sub-column (FileData) that contains the actual file contents.  The other useful sub-column is FileName, which stores the original file name.

We select the sub-columns by entering the column name (files), followed by a dot (.), then the sub-column name.  In the example above, to retrieve the contents of the attachment column, we enter

files.FileData

and to retrieve the original file name, we enter

files.FileName

Now if we want to, we can name the exported files using the original file name.  Enter

<filename>

as the column to use to name all the files that are exported, in the Other types section.

Access OLE Export or SQL Blob Export will then export each attachment returned by your query, and name the files using the original file names.  Of course, you can still use a different naming convention if you need to do so.

Connecting to SQL Server via ODBC using SQL authentication

We recently had a user who had his SQL Server hosted in a data center, and needed to extract files from his database.  He could connect to the instance remotely, but it took too long to extract the files this way.  He tried connecting to the database via another computer in the data center, but received the ‘TCP Provider: An existing connection was forcibly closed by the remote host.‘ error.

This suggests a network configuration setting issue.  Instead of getting the data center to change the server settings, we instead set up an ODBC connection and used that connection in SQL Blob Export.

Using Windows authentication to connect to the instance via ODBC was easy enough.  However, if using SQL Server authentication, the password is not stored in the ODBC settings, and the connection would still fail.

In order to connect to the SQL Server instance via ODBC using SQL Server authentication, choose the Connection string option instead.

Click on the Browse button

and select the ODBC connection to your SQL Server database.

The SQL Server ODBC connection manager will then prompt you to enter the password.

Enter the password in the provided area and click OK.  The ODBC connection string is then filled out, together with your password.

Click on the Connect button.  SQL Blob Export should now successfully connect to your SQL Server database using SQL Server authentication.

The same steps apply to our other database products that support SQL Server and ODBC connections.

If you require further assistance, you can post your questions in our support forum.

Redirecting OLE-Object linked files during export

In SQL Image Viewer 10.3, and Access OLE Export/SQL Blob Export 3.18 we added the option to redirect where to look for OLE-Object linked files during the export process.

A common issue when using OLE-Object linked files is that the location of the linked files have changed.  For e.g. you may have previously linked files from the g:\customer_invoices\ folder, but say those files have now been moved to g:\archive\customer_invoices\2018\.

Previous versions of SQL Image Viewer will fail to locate these items during the export process.  In version 10.3, you will now find a redirect button in the Image and File Options screen.

In Access OLE Export/SQL Blob Export, you will find the Redirect button in the Options screen.

Click on that button to bring up the Redirect window.  Here, you can define how the old folders map to the new folders.  For our previous example, we would enter the following:

Now during the export process, when SQL Image Viewer fails to find the files in the original folder, it will look to the redirection rules defined here and look in the new folder for the files.  It will then export the files that it finds.  You can enter multiple redirection entries, and each entry must be on a new line.

Note that the redirection entries are essentially a search/replace process.  Anything entered prior to the redirection symbol (>) is replaced with the value entered after the symbol.  Thus, the entries are not just limited to redirecting entire paths.

Using the column name in your file naming convention in Access OLE Export / SQL Blob Export

When you export images and files using Access OLE Export / SQL Blob Export and each row of your table or result set contains multiple columns that contain the blob data e.g.

you could previously only use the column index in the file naming convention to identify the source column of the exported file e.g.

which would then name the exported files this way:

Files with the suffix 0002 indicates that it was exported from the 2nd column (ThumbnailPhoto), and 0003 from the 3rd column (LargePhoto).

From version 3.17 onwards, you can now use the column name in the naming convention, using the <%columnname%> tag e.g.

The exported files are then named this way:

which is more meaningful.

 

Merge or combine TIFF files

We had a user who recently asked if SQL Blob Export is able to merge or combine the extracted single-page TIFF files from her database into multi-page TIFF files.  This feature isn’t built-in, but we have provided an external command line application (MergeTiff) that you can freely use to perform the same task.

You can use MergeTiff on any tif files, including those extracted using SQL Image Viewer, SQL Blob Export, and Access OLE Export.  It is important that the extracted files use names that group the pages into the right order.

For e.g. the default naming convention for exported files in SQL Image Viewer is row and column index, so assuming each row in your result set only had 1 blob column, your exported files would be named this way:

0001_0001.tif
0002_0001.tif
0003_0001.tif

This isn’t going to help MergeTiff determine how to group the files.  What you need is a ‘grouping’ value e.g. say a CustomerID column that identifies the customer each image belongs to.  We then use this naming convention in SQL Image Viewer/ SQL Blob Export/ Access OLE Export:

<CustomerID>_<%row:0000%>

and our exported files may be named this way:

A00123_0001.tif
A00123_0002.tif
A00123_0003.tif
C72186_0006.tif
D12472_0007.tif
E88822_0004.tif
E88822_0005.tif

In this way, we know that for customer A00123, we need to merge 3 files, for customer E88822, we need to merge 2 files, and so on.

To use MergeTiff, you need to provide at least 1 parameter:

  • the search pattern for the files to process using the -i  parameter e.g.

MergeTiff.exe -i “g:\exported images\*.tif” 

Other parameters supported by MergeTiff:

-o

this is the output folder to store the merged files.  If not provided, the merged files will be stored in the same folder as the source files.  If you want to store the merged files in a different folder, you would need to provide the -o parameter e.g.

MergeTiff.exe -i “g:\exported images\*.tif” -o “g:\exported images\merged\”

-d

this is the delimiter that the source file name uses to separate elements, default value is _

In our example, our file name elements use the underscore character e.g. _.  You might use another symbol e.g.

A00123-0001.tif
A00123-0002.tif
A00123-0003.tif
E88822-0004.tif

In this case, you need to provide the -d parameter and the delimiter character e.g.

MergeTiff.exe -i “g:\exported images\*.tif” -o “g:\exported images\merged\” -d “-“

-g

this is the grouping element index, default value is 1.  In our example, the grouping element is the first element i.e. all the characters before the first delimiter i.e.

A00123
E88822
C72186
D12472
E88822

If instead our file named had been named this way:

scanneddocsA00123_invoice1005_A00123_0001.tif
scanneddocsA00123_invoice1005_A00123_0002.tif
scanneddocsA00123_invoice1006_A00123_0003.tif
scanneddocsC72186_invoice1008_C72186_0006.tif
scanneddocsD12472_invoice2010_D12472_0007.tif

and we still wanted to merge all documents belonging to the same customer, then our grouping element is the 3rd element, and we would need to use the -g parameter e.g.

MergeTiff.exe -i “g:\exported images\*.tif” -o “g:\exported images\merged\” -g 3

-n

this is the naming element index, default value is 1.  Following on from the above example, supposing we merge our files by invoice number instead of customer code, we would need to use both the -g and -n parameters e.g.

MergeTiff.exe -i “g:\exported images\*.tif” -o “g:\exported images\merged\” -g 2 -n 2

-ow

use this parameter to overwrite any existing files of the same name

If you want MergeTiff to overwrite any existing files, use the -ow parameter e.g.

MergeTiff.exe -i “g:\exported images\*.tif” -ow

-r

use this parameter to also process files in the input path subfolders

To process files recursively beneath the input path, use the -r parameter e.g.

MergeTiff.exe -i “g:\exported images\*.tif” -r

You can download MergeTiff from here.

Incorrectly extracted files in Access OLE Export/SQL Blob Export

Access OLE Export / SQL Blob Export may sometimes extract items that appear to be wrong.  For e.g. it might extract an expected PDF file as an image file.  To troubleshoot such issues, we need your help in sending us the data exactly as stored in your database to us for further investigation.

One way to extract the data is to use SQL Image Viewer.  Once installed, connect to the same database you were trying to extract your items from, and run your query to select the wrongly extracted item(s).

Select the item you’re having problems with, right click to bring up the context menu, and select the View as hex option.

SQL Image Viewer will display the selected item in a hex viewer.  Click on the Save button to save the raw data to a file.

and send us that file at support@yohz.com.  Having the raw data to work with will help us immensely to determine why Access OLE Export / SQL Blob Export appears to be extracting your files wrongly.