Category Archives: Easy PDF Search

Finding PDF files that do not contain any text

The situation is as follows: you use Easy PDF Search to index and search for words and phrases in your PDF files.  Suddenly, you realise that not all your files have been indexed because they actually do not contain any searchable text.

This can happen if your PDF files are actually scans of documents, and you use an OCR application to read and store the text inside the PDF.  Now you need a quick and easy way to identify those PDF files that have not been scanned.

You can easily do this in Easy PDF Explorer.  First, select the Count images and text option.

If you want an accurate count of the number of images and text, you can leave the option at in all pages.  If you just want to know if a file contains any characters, then selecting the but stop when text is found option will speed up the process significantly.

This is because Easy PDF Explorer no longer has to scan every page in the PDF file – the moment it encounters a page that contains text, it stops scanning.  If your aim is just to know which files don’t contain any text, then this option is the fastest.

Once you’ve selected those options, select the PDF files you want to scan in the explorer window.

Easy PDF Explorer then lists down the files together with the number of characters in each file.

Click on the Characters column, and you can quickly sort the files by number of characters found.

Say you want to copy those files with no text to another folder.  Right click and select the Deselect all item.

Now click on the first file with no text, and while holding down the SHIFT key, click on the last file with no text.  The range of files will be highlighted.

Now right click to bring up the context menu, and click on the Select item.  The range of files will then be selected.

Now just click on the Copy to folder button, select the folder to copy the selected files to, and you’re done.  Now all you have to do is run your OCR application on those files.

Download a 14-day trial of Easy PDF Explorer now to work with your PDF files faster.  Also give Easy PDF Search a try – probably the fastest way to search your PDF files for multiple words and/or phrases.

What’s new in Easy PDF Search 4

Easy PDF Search (EPS) 4 was recently released with the following changes:

Notes window

There is now a Notes window in EPS which you can use to take notes while working with your search results.  The Notes Editor is a simple text editor, and will float above all other EPS windows.

Your notes are saved in the rich-text format, and if you have Microsoft Word installed, you can also save the notes to DOCX and PDF formats.

You can also embed PDF documents into the Notes window, so you can view multiple PDF documents simultaneously for reference.

Usability improvements

  • You can now scroll across multiple pages by holding down the SHIFT key and moving the mouse wheel.
  • You can now zoom in and out of a page by holding down the CONTROL key and moving the mouse wheel.
  • You can now specify the font size to be used in the Search and Results windows via the Settings screen.

Database clean-up

Over time, the full-text index database may contain entries for non-existing files.  You can now delete those entries using the Clean function in the Settings screen.

This will free up space in the database for new files, but it will not reduce the size of the file.

For existing users who purchased a license within the last 12 months, you can upgrade to version 4 for free.  For users with older licenses, you can purchase a license extension for only USD 10.

You can download a 14-day trial version using this link.

High DPI support

We recently added high DPI support to some of our applications so that they render better when user displays are scaled to 125% or more.  We may have missed 1 or 2 items, so if you encounter any GUI elements that are oversized or undersized, we would appreciate it very much if you could let us know at support@yohz.com.

The applications we’ve added high DPI support for are:

Search text in multiple PDF files fast

So you want to search for text in multiple PDF files?  You can do that in Adobe Acrobat, and Google will turn up a few guides on doing that.

That’s all good and fine, but what if you need the search results fast and you need to search hundreds or thousands of PDF files?  Then you should consider Easy PDF Search.

Speed

Easy PDF Search is fast.  Watch this video comparing Easy PDF Search with Adobe Acrobat.  In short, to search for a word the second in 46 files totaling 1 GB in size, Easy PDF Search took 3 seconds while Adobe Acrobat took 3 minutes 13 seconds.

We have a user who regularly searches his collection of over 12000 PDF files using Easy PDF Search, and he gets his search results in less than 20 seconds.

Search multiple words simultaneously

Search for multiple words simultaneously.  Why waste time searching the same files for different words?  Easy PDF Search lets you search for as many words or phrases as you require.

Quickly see where your words were found

Easy PDF Search doesn’t just tell you which files your words were found in, it tells you exactly which page you can find the words in, and the frequency of the words on each page and the entire file.

In the integrated PDF viewer, all your words are highlighted on each page.

View results from past searches

Easy PDF Search maintains a search history of the words you searched for and also of the search results.

This means you can easily view the search results from past searches without having to reperform the search.

By now, you can see that Easy PDF Search is designed to save you time and help you search for text in multiple PDF files fast and easily.

In addition to the above, there is a lot more you can do with Easy PDF Search like:

  • merge all the pages from the search results into a single PDF file
  • copy all the files in the search results
  • extract text from the pages where the words were found in
  • perform proximity searches e.g. NEAR (authorities “homeland security”, 20)
  • perform exclusion searches e.g. monitoring NOT daily
  • search PDF annotations and file attributes

Download a 14-day trial of Easy PDF Search and start using your PDF collection to their full potential, or visit our web site for more details.

Full text index for your PDF files

Are you considering creating a full text index on your PDF files, so that you can frequently search for words and phrases fast?  That’s what Easy PDF Search was created for.

Say you have a collection of PDF files for various topics.  You can organize your files into libraries so that when you run your search, you can choose to search only in specific libraries.  You don’t have to always search your entire PDF collection.

In Easy PDF Search, you can search for multiple words simultaneously.  Here, we are searching for all files containing the words monitoring, splices or pressure.

Our search results are then returned, grouped by each search word.

And on each page, our search words are highlighted in a different color.

Now what can you do with those search results?  In Easy PDF Search, lots.

For starters, you can export the search results listing or just the file names, for future or offline reference.

Next, you can work with the PDF pages from the search results.

You could extract each of the pages containing your search words and compile them into a single PDF file.  You could also extract the text found on those pages, or extract the pages into individual PDF files, and much more.

Easy PDF Search also keeps a search history, so you can just refer to it whenever the need arises without having to reperform the search.

Give Easy PDF Search a try.  We offer a 14-day fully functional trial so you can experience for yourself how easy it is to create a full text index for your PDF files and search those files fast.

Introducing Easy PDF Search 3

Easy PDF Search (EPS) 3 focuses on 3 areas – support more search options, more user actions on the search results, and general performance improvements.

More search options

In version 2, we added the option to search only the existing index.  This allows you to make very fast searches without having to check for new or modified files to index, or when the indexed files are not accessible.  In version 3, we added an additional option to search the existing index only for files in the selected libraries.

We also added the option to return only the file names from the search.

A good portion of the search duration is actually spent identifying which words to highlight in the search results.

When you only need the list of files where the search words were found, then selecting the  Return file names only option would speed up your searches even more.

User actions on search results

In previous versions, while you could work with the search results like combining all the pages into a single file, extracting the search pages into individual files etc, you could not work with the results listing itself.

In version 3, you now have a context menu that allows you to perform various actions on the search results listing, like copying the list of files to the clipboard, opening the containing folder etc.

General performance improvements

We have improved the performance where possible, especially when dealing with large collection of files.  The search history listing now loads faster too.

Miscellaneous UI improvements

We have also made various minor UI tweaks to improve usability.  An obvious addition is the availability of in-built icons you can easily add to your library definition.

This helps you to quickly make your libraries more distinctive.  Of course you can still always use your own icons.

If you would like to give Easy PDF Search a try, you can download a free 14-day fully functional trial here.

Easy PDF Search – the search options explained

When searching for words and phrases in Easy PDF Search (EPS), you have 4 options:

For the first option, the process flow is as follows:

  • EPS looks for all the folders set up in the selected libraries
  • in each folder, EPS compiles a list of all the files matching the search pattern
  • for each new file, EPS will index that file
  • for each modified file, EPS will rebuild the index
  • EPS then searches for the entered words/phrases in the list of files it compiled in step 2 above

For the second option, the process flow is as follows:

  • EPS looks for all the folders set up in the selected libraries
  • in each folder, EPS compiles a list of all the files matching the search pattern
  • for each file, EPS deletes any existing index, and builds the index again
  • EPS then searches for the entered words/phrases in the list of files it compiled in step 2 above

For the third option, the process flow is as follows:

  • EPS looks for all the folders set up in the selected libraries
  • in each folder, EPS compiles a list of all the files matching the search pattern
  • EPS then searches for the entered words/phrases only in the files where an index has already been created

For the fourth option, the process flow is as follows:

  • EPS searches for the entered words/phrases in its existing index.

The point to note is that in the first 3 options, Easy PDF Search only returns results from files that exist.  If a PDF file has already been indexed previously but no longer exists, EPS will not search the index of that file.

Searching an existing index in Easy PDF Search

Easy PDF Search indexes your PDF files and allows you to search your files for keywords.  When you perform a search in Easy PDF Search, it first scans your library paths for PDF files.  New and modified files will be indexed, then only existing files are searched.

In some situations, you may not have the source PDF files with you, but only the Easy PDF Search index database.  Or you may not want Easy PDF Search to spend time scanning for existing files, but just want to search for keywords in the already indexed files.

In Easy PDF Search 2.1, we added the option to skip the file scanning process and directly search the existing index.  This is available under the Options menu.

Selecting the Search index only option will search the existing index and return the results, regardless of whether the file exists.

To recap the 4 options:

  • Index new files only
    This option scans the search folders defined in each library, and indexes only the new and modified files it finds, then searches for keywords in those indexed files that exist.
  • index all files
    This option scans the search folders defined in each library and indexes all the files it finds, deleting any existing index for each file.  It then searches for keywords in those indexed files that exist.
  • search only indexed files
    This option scans the search folders defined in each library for files, and searches for keywords in those indexed files.  It ignores any new or modified files.
  • search index only
    This option performs searches on the existing index, and does not scan to check if the indexed files exist.

 

Easy PDF Search – updating the PDF file locations

You may sometimes move your PDF files to another folder, and you don’t want to have to re-index all the files again in Easy PDF Search.

For e.g. you may have one or more libraries that indexes the files in g:\pdflib\.  Let’s say you’ve now moved all the PDF files to a larger drive, say h:\pdfs\.  If you don’t want to have to reindex all the files again, do this.

Select the Tools > Manage library paths item from the main menu.

Easy PDF Search then displays the list of libraries and the paths associated with each library.

As our PDF files have moved from g:\pdflib\ to h:\pdfs\, we need to make the change to our libraries’ paths.

Once you save the new values, DICOM Search will update all the details of the indexed files accordingly, so files previously indexed in g:\pdflib\ will have their source locations updated to h:\pdfs\, if they now exist in h:\pdfs\.  The search path for the libraries will also be updated.

In this way, you do not need to reindex all the files that you have moved.

Moving your Easy PDF Search database

Sometimes you may want to move your Easy PDF Search database to a faster or bigger drive, or to another computer, without having to rebuild the index again.

Moving the database file

Select the File > Settings item from the main menu.

The path value indicates the location where the database file is stored.

In that folder, you should see 3 files.  EPSDataFile.db3  is the main database file.  The other 2 files are support files, and only exist when Easy PDF Search is active.

Enter the new folder where you want to store the database file in.

Once you save the new settings, Easy PDF Search will then copy the database file from the old folder to the new folder.