Monthly Archives: August 2023

Searching PDF files using word stemmers

Easy PDF Search by default searches for complete words/phrases in your PDF files.  For example, if we search for the word like, only files containing that exact word are returned and highlighted in the search results.

If we wanted to search for words starting with the word like, we can perform a prefix search using the * character e.g. like*

This returns all words with the prefix like.  Unrelated words (from a grammar perspective) like likelihood and likewise, will be returned, while a related noun like liking will not be returned.

Stemmed words

Stemming is the process of removing a part of a word, or reducing a word to its stem or root.  In the example above, the words like, likes, liking, liked, and likely all share the same root word i.e. like.

When we want Easy PDF Search to use stem words when searching e.g.

we need to first create a stem database, then search that stem database.

Creating a stem database

To create a stem database, click on the Options > Stemmer language > Settings item.

In the Stemmer Settings window, select up to 5 languages to create a stem database for.

You can create stem databases for the following 27 languages:

  • Armenian
  • Basque
  • Catalan
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Greek
  • Hindi
  • Hungarian
  • Indonesian
  • Irish
  • Italian
  • Lithuanian
  • Nepali
  • Norwegian
  • Portuguese
  • Romanian
  • Russian
  • Serbian
  • Spanish
  • Swedish
  • Tamil
  • Turkish
  • Yiddish

When you want to search the stem database, select the stem language you want to search in from the Options menu.

Easy PDF Search then displays the stem language database that the search will be performed in.

In the search results, the stem database that was searched will also be displayed.

Testing the stemmers

To test which words stem to the same root word, you can use the test utility in the Stemmer Settings window.  Select the language you want to test, then click on the Test … stemmer tab.

Enter the search word, then a list of words you want to check if the root word matches the search word.

Next, click on the Test button.  Non-matches will be displayed in a strike-out manner.

Download a 14-day trial of Easy PDF Search now and experience how easy and fast it is to search your PDF files collection, now with the ability to perform stem word searches.

Using the bookmarks bar in Easy PDF Explorer

The Bookmarks bar in Easy PDF Explorer appears below the main menu in Easy PDF Explorer.

Using the Bookmarks bar

The bookmarks lets you quickly open a folder or file by clicking on the item on the Bookmarks bar.  To place an item, drag and drop the folder or file from the Explorer window to the Bookmarks bar.

For example, this is the visual representation when you drag a folder over the Bookmarks bar.

When you drop the item on the Bookmarks bar, a button is created to represent the folder.

When you now click on that button, the active Explorer window will open to that folder.

Similarly, when you drag a file item on to the Bookmarks bar, a button is created for that file.  In the image below, we have 2 file items – a PDF file and an Excel workbook.

When you click on the file name on the Bookmarks bar, the active Explorer window will open to the folder containing that file, and highlight the file name.

Opening the context menu for items on the Bookmarks bar

You can open the context menu associated with the folder or file item by right-double-clicking on the button in the Bookmarks bar.  The image below shows the context menu when we click on our Excel file item.

Rearranging items on the Bookmarks bar

Right click on the folder or file item, then drag and drop the item on its new position.

Removing items from the Bookmarks bar

Right click on the folder or file item and drag it away from the Bookmarks bar.

 

Extracting files from a Thomson Reuters FileCabinet CS database

We recently had a user that had to extract Word and Excel documents from a Thomson Reuters FileCabinet CS database using  SQL Blob Export.  The user was helpful enough to send us a couple of samples of the raw data stored in the table, allowing us to inspect the data in detail.

Turns out Thomson Reuters stores the files in a different manner to what SQL Blob Export expected, so we had to make some adjustments to the export process.

Beginning with version SQL Blob Export 6.1, you can now export items from your FileCabinet CS database.  The same feature is found in SQL Image Viewer.  When you run a query to return the items, it will be shown as OLE Structured Storage packages.

You can then export the files using the export wizards.

If you encounter a situation where SQL Blob Export or SQL Image Viewer is unable to export your files, please send us an email at support@yohz.com.

It would be very helpful to also attach a couple of samples of the rows that failed to be exported.  To extract the data exactly as stored in the database in SQL Blob Export, please do the following:

  • select the Extract bin files option on the Options page
  • once the export process has completed, you will find files with the .bin extension.  Please send us a couple of those files so that we may check how the data is stored in your database.