Category Archives: Easy PDF Search

Multi-user support in Easy PDF Search

Easy PDF Search 5.3 (EPS) now has limited support for multi-user access.  Users running EPS on different computers can now access the same EPS database to search their indexed files.

There are limitations as to the multi-user support.  EPS uses SQLite as the underlying database to store the indexed words.  SQLite is a file-based database.  Unlike a client-server database (PostgreSQL, SQL Server, etc), a file-based database usually needs to transfer all or most of its content to the application using it.  This means that it may not be feasible to run EPS against a 100 GB database, as the users’ computers would be spending a lot of time reading the large file over the network.

Setting up Easy PDF Search for multi-user access

EPS version

Ensure that you are using EPS 5.3 or newer.  You can check the version you have by selecting the Help > About item on the main menu.

In the About screen, the version number is displayed.

It is important that all users accessing the same database be on version 5.3 or newer.

Database file location

The database file needs to be stored on a network share.  You will then need to change the EPS  database settings to point to that network share.  To do that, click on the File > Settings item on the main menu.

Enter the network share path.

PDF files path

To open the PDF files in the search results, each user needs to be able to access the PDF files.  This means that the paths in the library needs to be network shares e.g.

Presently, the path explorer is unable to browse network shares, so you will need to enter the network share manually in the path area.

Performance

By default, every time you perform a search, EPS first scans for new PDF files and index them.  In a multi-user scenario, this is inefficient.

You should designate a single user to update the index for new files.  Other users should use any of the other options that skip the indexing process:

  • Search only indexed files
    EPS will scan the selected libraries’ folders for PDF files, and only return results from files that have been indexed.  New or updated files are not indexed.
  • Search index only (selected libraries)
    EPS will search all indexed files in folders used by the selected libraries.
  • Search index only
    EPS will search all indexed files.

Points to note

We do not expect this multi-user access implementation to support a client-server experience.  It will probably work for database files of a modest size, supporting a small number of users.  The exact numbers will vary, depending on the network infrastructure and machine specifications in use.

If you discover any bugs or have any suggestions to improve EPS, please drop us a line at support@yohz.com.

You can download the latest version of Easy PDF Search using this link.

Displaying logical page numbers in search results

By default, the search results In Easy PDF Search displays the physical page numbers where the search words/phrases are found.

You may sometimes want to display the logical page numbers instead. For e.g. say we have a PDF file that has a cover, followed by a blank page, then 5 pages of prefaces using roman numbering, followed by another blank page, then finally the actual content.

The logical numbering of the PDF would look something like this:

  • cover (physical page 1)
  • blank page (physical page 2)
  • page I (physical page 3)
  • page II (physical page 4)
  • page III (physical page 5)
  • page IV (physical page 6)
  • page V (physical page 7)
  • blank page (physical page 8)
  • page 1 (physical page 9)
  • page 2 (physical page 10)

In Easy PDF Search 5.2, you can now define the logical page numbering for a PDF file and display that numbering scheme in your search results.  The search results using logical page numbering for the above example will then be displayed this way:

To define the logical page numbering for a PDF file, right click on the PDF file name in the search results and click on the Define logical page numbering item.

In the Logical Page Numbers screen, enter the page count for each page type that have their own numbering scheme or none at all. Using the example above, this is how we would define the logical page numbers.

 

Easy PDF Search 5 – highlighting function

In Easy PDF Search 5 (EPS), we added a highlighting function.  This is to help users who need to highlight content in their PDF files after EPS has delivered the search results.  Previously, we had to open the PDF file in another application like Acrobat Reader to highlight the content, thus losing the search results and navigation functions available in EPS.

IMPORTANT NOTE

DON”T use the highlighting function in EPS to redact your PDF content.  The highlighted areas are path ‘objects’, which can be easily removed.  

Getting started

We will need to open the file we want to highlight in EPSs’ internal viewer.  To do that, select the Open PDF file using internal viewer option from the context menu in the search tree.

The file is then opened in its own tab.

The search words are still highlighted, but using outlines instead of a solid color.  Using a solid color is confusing as it becomes difficult to differentiate our own highlighted text and those highlighted by EPS for the search words.

You can still highlight the search terms in solid color by deselecting the Show outlines only for search words option.

Highlighting content

To highlight the content, or remove the highlighting, use these options on the toolbar.

Toggle the Highlight text button to enable and disable highlighting.  When enabled, drag the mouse over the content you want to highlight.  Release the mouse to apply the highlight.

TIP: You can also click on the right mouse button to toggle the Highlight text button.

The color drop down allows you to select the highlighting color.

Click on the Remove highlight button to remove existing highlighted areas.  When enabled, click on a highlighted area to remove the highlight.  Note that EPS is unable to differentiate between highlights applied in EPS and by other applications.  If it detects a path object on where you clicked on, it will simply attempt to remove it.

The Remove all highlights on current page button allows you to remove all highlighted areas on the current page.  Again, EPS cannot differentiate between highlights it applied and existing highlights applied by another application.  It will simply remove all PDF path objects it finds on the current page.

 

Saving the highlighted file

You must save the modified PDF file using a different name from the original file name.  Click on the Save PDF file button to save your modified file.

Once you have saved the file, the modified PDF file name is displayed in the drop down list under the Save button.

To open this file in an external viewer, click on the Open PDF file using external application button.  Windows will then attempt to open the file using the registered PDF application.

Your PDF file is also automatically saved when you close EPS, but you need to have previously saved the file.

Persistence

When you close and reopen EPS, any files that were previously opened in the internal viewer will also be automatically loaded and displayed.  This allows you to continue your work from where you last left off.

Next steps

Depending on user feedback and sales, the following items are currently considered for implementation:

  • an option to remove all highlights from the entire file
  • a function to extract all the highlighted text from a page/file
  • make the highlighting function available to files opened outside of the search function

If you have any other suggestions, please drop us a line at support@yohz.com.

Finding PDF files that do not contain any text

The situation is as follows: you use Easy PDF Search to index and search for words and phrases in your PDF files.  Suddenly, you realise that not all your files have been indexed because they actually do not contain any searchable text.

This can happen if your PDF files are actually scans of documents, and you use an OCR application to read and store the text inside the PDF.  Now you need a quick and easy way to identify those PDF files that have not been scanned.

You can easily do this in Easy PDF Explorer.  First, select the Count images and text option.

If you want an accurate count of the number of images and text, you can leave the option at in all pages.  If you just want to know if a file contains any characters, then selecting the but stop when text is found option will speed up the process significantly.

This is because Easy PDF Explorer no longer has to scan every page in the PDF file – the moment it encounters a page that contains text, it stops scanning.  If your aim is just to know which files don’t contain any text, then this option is the fastest.

Once you’ve selected those options, select the PDF files you want to scan in the explorer window.

Easy PDF Explorer then lists down the files together with the number of characters in each file.

Click on the Characters column, and you can quickly sort the files by number of characters found.

Say you want to copy those files with no text to another folder.  Right click and select the Deselect all item.

Now click on the first file with no text, and while holding down the SHIFT key, click on the last file with no text.  The range of files will be highlighted.

Now right click to bring up the context menu, and click on the Select item.  The range of files will then be selected.

Now just click on the Copy to folder button, select the folder to copy the selected files to, and you’re done.  Now all you have to do is run your OCR application on those files to create a searchable PDF file.

Download a 14-day trial of Easy PDF Explorer now to work with your PDF files faster.

Easy PDF Search

Easy PDF Search is our advanced PDF indexing and search tool.  Read how Roberto Mantovani, Assistant Professor at the University of Urbino (Italy), uses Easy PDF Search to help him search his collection of over 12000 PDF files totalling over 320 GB in size.

What’s new in Easy PDF Search 4

Easy PDF Search (EPS) 4 was recently released with the following changes:

Notes window

There is now a Notes window in EPS which you can use to take notes while working with your search results.  The Notes Editor is a simple text editor, and will float above all other EPS windows.

Your notes are saved in the rich-text format, and if you have Microsoft Word installed, you can also save the notes to DOCX and PDF formats.

You can also embed PDF documents into the Notes window, so you can view multiple PDF documents simultaneously for reference.

Usability improvements

  • You can now scroll across multiple pages by holding down the SHIFT key and moving the mouse wheel.
  • You can now zoom in and out of a page by holding down the CONTROL key and moving the mouse wheel.
  • You can now specify the font size to be used in the Search and Results windows via the Settings screen.

Database clean-up

Over time, the full-text index database may contain entries for non-existing files.  You can now delete those entries using the Clean function in the Settings screen.

This will free up space in the database for new files, but it will not reduce the size of the file.

For existing users who purchased a license within the last 12 months, you can upgrade to version 4 for free.  For users with older licenses, you can purchase a license extension for only USD 10.

You can download a 14-day trial version using this link.

High DPI support

We recently added high DPI support to some of our applications so that they render better when user displays are scaled to 125% or more.  We may have missed 1 or 2 items, so if you encounter any GUI elements that are oversized or undersized, we would appreciate it very much if you could let us know at support@yohz.com.

The applications we’ve added high DPI support for are:

Search text in multiple PDF files fast

So you want to search for text in multiple PDF files?  You can do that in Adobe Acrobat, and Google will turn up a few guides on doing that.

That’s all good and fine, but what if you need the search results fast and you need to search hundreds or thousands of PDF files?  Then you should consider Easy PDF Search.

Speed

Easy PDF Search is fast.  Watch this video comparing Easy PDF Search with Adobe Acrobat.  In short, to search for a word the second in 46 files totaling 1 GB in size, Easy PDF Search took 3 seconds while Adobe Acrobat took 3 minutes 13 seconds.

We have a user who regularly searches his collection of over 12000 PDF files using Easy PDF Search, and he gets his search results in less than 20 seconds.

Search multiple words simultaneously

Search for multiple words simultaneously.  Why waste time searching the same files for different words?  Easy PDF Search lets you search for as many words or phrases as you require.

Quickly see where your words were found

Easy PDF Search doesn’t just tell you which files your words were found in, it tells you exactly which page you can find the words in, and the frequency of the words on each page and the entire file.

In the integrated PDF viewer, all your words are highlighted on each page.

View results from past searches

Easy PDF Search maintains a search history of the words you searched for and also of the search results.

This means you can easily view the search results from past searches without having to reperform the search.

By now, you can see that Easy PDF Search is designed to save you time and help you search for text in multiple PDF files fast and easily.

In addition to the above, there is a lot more you can do with Easy PDF Search like:

  • merge all the pages from the search results into a single PDF file
  • copy all the files in the search results
  • extract text from the pages where the words were found in
  • perform proximity searches e.g. NEAR (authorities “homeland security”, 20)
  • perform exclusion searches e.g. monitoring NOT daily
  • search PDF annotations and file attributes

Download a 14-day trial of Easy PDF Search and start using your PDF collection to their full potential, or visit our web site for more details.

Full text index for your PDF files

Are you considering creating a full text index on your PDF files, so that you can frequently search for words and phrases fast?  That’s what Easy PDF Search was created for.

Say you have a collection of PDF files for various topics.  You can organize your files into libraries so that when you run your search, you can choose to search only in specific libraries.  You don’t have to always search your entire PDF collection.

In Easy PDF Search, you can search for multiple words simultaneously.  Here, we are searching for all files containing the words monitoring, splices or pressure.

Our search results are then returned, grouped by each search word.

And on each page, our search words are highlighted in a different color.

Now what can you do with those search results?  In Easy PDF Search, lots.

For starters, you can export the search results listing or just the file names, for future or offline reference.

Next, you can work with the PDF pages from the search results.

You could extract each of the pages containing your search words and compile them into a single PDF file.  You could also extract the text found on those pages, or extract the pages into individual PDF files, and much more.

Easy PDF Search also keeps a search history, so you can just refer to it whenever the need arises without having to reperform the search.

Give Easy PDF Search a try.  We offer a 14-day fully functional trial so you can experience for yourself how easy it is to create a full text index for your PDF files and search those files fast.

Introducing Easy PDF Search 3

Easy PDF Search (EPS) 3 focuses on 3 areas – support more search options, more user actions on the search results, and general performance improvements.

More search options

In version 2, we added the option to search only the existing index.  This allows you to make very fast searches without having to check for new or modified files to index, or when the indexed files are not accessible.  In version 3, we added an additional option to search the existing index only for files in the selected libraries.

We also added the option to return only the file names from the search.

A good portion of the search duration is actually spent identifying which words to highlight in the search results.

When you only need the list of files where the search words were found, then selecting the  Return file names only option would speed up your searches even more.

User actions on search results

In previous versions, while you could work with the search results like combining all the pages into a single file, extracting the search pages into individual files etc, you could not work with the results listing itself.

In version 3, you now have a context menu that allows you to perform various actions on the search results listing, like copying the list of files to the clipboard, opening the containing folder etc.

General performance improvements

We have improved the performance where possible, especially when dealing with large collection of files.  The search history listing now loads faster too.

Miscellaneous UI improvements

We have also made various minor UI tweaks to improve usability.  An obvious addition is the availability of in-built icons you can easily add to your library definition.

This helps you to quickly make your libraries more distinctive.  Of course you can still always use your own icons.

If you would like to give Easy PDF Search a try, you can download a free 14-day fully functional trial here.

Easy PDF Search – the search options explained

When searching for words and phrases in Easy PDF Search (EPS), you have 4 options:

For the first option, the process flow is as follows:

  • EPS looks for all the folders set up in the selected libraries
  • in each folder, EPS compiles a list of all the files matching the search pattern
  • for each new file, EPS will index that file
  • for each modified file, EPS will rebuild the index
  • EPS then searches for the entered words/phrases in the list of files it compiled in step 2 above

For the second option, the process flow is as follows:

  • EPS looks for all the folders set up in the selected libraries
  • in each folder, EPS compiles a list of all the files matching the search pattern
  • for each file, EPS deletes any existing index, and builds the index again
  • EPS then searches for the entered words/phrases in the list of files it compiled in step 2 above

For the third option, the process flow is as follows:

  • EPS looks for all the folders set up in the selected libraries
  • in each folder, EPS compiles a list of all the files matching the search pattern
  • EPS then searches for the entered words/phrases only in the files where an index has already been created

For the fourth option, the process flow is as follows:

  • EPS searches for the entered words/phrases in its existing index.

The point to note is that in the first 3 options, Easy PDF Search only returns results from files that exist.  If a PDF file has already been indexed previously but no longer exists, EPS will not search the index of that file.