Monthly Archives: November 2017

Compressing PDFs in your database

→ This article refers to SQL Blob Viewer, which has now been renamed to SQL Image Viewer.  The techniques described in this blog is still applicable, as the functionality of the product remains the same.  Only the name has changed.

Recently, a user wanted to compress PDFs stored in his database, in order to reduce the overall size of the database.  He asked if we had any application that could do this.  Unfortunately, we don’t, but it got me to exploring the available options.

Turns out that PDF software development kits aren’t cheap at all.  Licensing can run into thousands of dollars, which isn’t feasible for us.  Open-source software is another option, which is what I finally went with.  In this case, I used Ghostscript, an all-purpose PDF toolkit, available at https://www.ghostscript.com/.

There are 3 steps to compressing PDFs in your database – extracting the PDFs, compressing or optimizing them, and finally uploading them back into the database.  We will use SQL Blob Viewer to first extract the PDFs, then Ghostscript to reduce the PDF size, and finally SQL File Import to upload the PDFs back into the database.  For reference purposes, these PDFs were created from document scans, so they have a 600 dpi resolution and are not optimized for PDF storage.  We’re running this example on Windows, but there is also a Linux version of Ghostscript, and both SQL Blob Viewer and SQL File Import will run on Linux using Wine.

 

Extracting the PDFs

Extracting PDFs from your database using SQL Blob Viewer is very simple – first write the SQL command to retrieve the PDFs.

We then export the PDF files to disk, using the primary key value in the ID field to name the exported files.  We do this so that when we upload the compressed files, we can use the ID value to update the correct rows.

If you have a lot of PDFs to export, you should choose to retrieve only the first few rows, to avoid loading the entire data set into memory.  After that, when you export the result set, the entire result set will be exported.  See this page for details on how to export large result sets with SQL Blob Viewer.

 

Compressing the PDFs

Now that we’ve exported the files, it’s time to use Ghostscript to compress the images found in those PDFs.

The easiest way to do this is to reduce the resolution of the images.  You can do this using the PDFSETTINGS option.  The possible values are:

  • /screen – converts to 72 dpi
  • /ebook – converts to 150 dpi
  • /printer – converts to 300 dpi
  • /prepress – converts to 300 dpi, color preserving

Depending on your requirements, you might want to test the various options to see which best suits your needs.  I took one of the exported PDFs, and converted them using each of the 4 options.  As you can see, the size of the PDF drops dramatically for all 4 options.

Here is the DOS batch script I used to convert the PDFs using the /prepress option (NOTE: Ghostscript options are case-sensitive, so you cannot for e.g write -PDFSETTINGS as -PDFSettings):

for %%x in (*.pdf)  do gswin64c.exe -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -dBATCH -dNOPAUSE -dQUIET -SOutputFile=”%%~nx_compressed.pdf” %%x

The options used are:

  • -sDEVICE=pdfwrite – this tells Ghostscript that we want to create a PDF file
  • dPDFSETTINGS=/prepress – this tells Ghostscript to convert all images found in the source PDF to 300 dpi resolution
  • -dBATCH -dNOPAUSE -dQUIET – these options indicate that the process should run non-interactively
  • -SOutputFile=’%%~nx_compressed.pdf’ – this tells Ghostscript how to name the output file.  Since we want to add a _compressed suffix, we first use the ~nx option to extract just the source file name without the extension, add the _compressed suffix, followed by the .pdf extension.
  • %%x – this is the source file name that matches the search pattern in the for %%x in (*.pdf) loop

Basically, this loops through all the PDFs in the current folder, converts all images in each PDF to 300 dpi, and saves the PDF with the _compressed suffix.

As you can see, the new PDFs are significantly smaller than the original PDFs.

 

Updating the database

Now, we need to update the existing record with the optimized PDF file.  We can do this using SQL File Import.  First, we enter the search pattern for the files we want to use i.e. those with the _compressed suffix.

Next, we need to map the columns.  Using the file name as the input value for the ID column, we need to:

  • extract the ID value from the file name
  • indicate that this value is a key field
  • indicate that this is an update process

We do this via the following script:

For the attachment column, we simply indicate to SQL File Import that we want to use the file contents.

Internally, SQL File Import will form the following UPDATE statement based on our script as follows:

UPDATE attachments SET attachment = :attachment WHERE ID = :ID

The test script shows that we have extracted the ID value correctly, and that the attachment column will use the contents of the files.

Now, we just need to run the script in SQL File Import, and our records are updated with compressed versions of the PDF files.

 

That is basically all you need to do if you want to reduce the size of the PDFs in your database.  The steps are similar if you want to process any of your blob data and update them in your database e.g.

  • resize images
  • compress files into archive (zip) files
  • process images e.g. add watermarks, convert to grayscale etc

SQL Blob Viewer and SQL File Import will handle the extraction and update process respectively.  You are free to use any external tools to process your images/files.

Restoring a SQL Server bak file that’s password protected

Here at Yohz Software, we offer a service to remove passwords from a SQL Server backup files (or BAK files) that are password protected.  Please note that this is not the same as decrypting an encrypted backup.

Prior to SQL Server 2012, you could protect a backup with a password using the PASSWORD option e.g.

BACKUP DATABASE mydb TO DISK = ‘g:\backups\mydb.bak’ WITH PASSWORD = ‘mypassword’

You would then need to supply the password when you restored the backup.  An interesting point is that while the backup itself is password protected, the data in the backup file was still identical to that found in a backup file that was not password protected.  Thus, using a text editor, you could still inspect the contents of a password protected backup file.

The PASSWORD option was removed in SQL Server 2012 onwards, and in current versions of SQL Server, you can encrypt the backup file.  Now, the data in the backup file is truly protected, and you need the correct certificate to be able to restore the backup.  This is a much better and secure option compared to the old PASSWORD option.

Now, if you want to restore an old backup file that’s password protected, but you no longer have the password, we can help you with that.  Contact us at sales@yohz.com for a quote.

Exporting and extracting images and files from Microsoft Access databases

→ This article refers to SQL Blob Viewer, which has now been renamed to SQL Image Viewer.  The techniques described in this blog is still applicable, as the functionality of the product remains the same.  Only the name has changed.

Using OLE Object column types, there are 3 ways you can store images and files in a Microsoft Access database, or in a database (e.g. SQL Server) used by a Microsoft Access front-end.

  • embedding

Embedding involves creating the image/file directly using the associated OLE Server application.  For example, if I chose to embed a bitmap image, Access will open Paint for me to create my bitmap image, and save that image into that field.

  • insert from a file

If you already have the image/file you want to store in your database, you can insert the file into your Access database directly using the ‘Create from File’ option.  Access then copies the file into the field.

  • link to a file

Similar to the above, but this time, the image/file is not stored in the database.  Access just creates a link to the external file, much like a hyperlink in your browser.

If you store your images/files using the first 2 methods, then Microsoft Access adds additional data to your image/file, so that it knows which application to use to open that particular image/file.  This means that if you extracted the data from your database, it will appear different from your original file.

For example, let’s insert an image into an OLE Object field.  The image file is 18.7 Kb in size.

Once inserted into the Access database, the size has increased to 19.192 Kb.

The increase is due to the additional data added by Access.  If you now extract the data as is from the database, it cannot be opened by Paint because of the additional data.  The PNG data is preceded by OLE headers.

This is a common issue faced by Microsoft Access users – when they want to extract images and files from OLE Object fields to their original format, there isn’t a way they can do this easily.  Every image/file has been modified by Access.  You have to open each image/file in Access, then save the item to disk manually.

This is the reason we created Access OLE Export.  In just 4 simple steps, Access OLE Export will extract and export your embedded images and files to disk, stripping out the additional data added by Access.  This means that the exported images and files can be opened ‘normally’, using the appropriate application.

If you require more control over what is exported, SQL Blob Viewer is another option.  SQL Blob Viewer allows you to inspect your data before exporting them, but requires that you be able to write SQL commands.

There may be situations where both products are unable to identify embedded images correctly.  This may be because the registered OLE Server for that content type is not yet supported.  In these cases, send us (at support@yohz.com) a sample of the embedded data, and we will add support for that OLE server type.