Search PDF using Easy PDF Search

Easy PDF Search lets you search PDF files you own fast.  Unlike other PDF tools, Easy PDF Search will create an index of your PDF file contents.  When you search the file for the first time, Easy PDF Search will index the files automatically.  Subsequent searches will use the index, which can be up to 100x faster than normal searches.

This guide gets you started on configuring Easy PDF Search to search PDF files, but broadly the steps are as follows:

Create a library

A library is a collection of PDF files.  You must create at least one library, and you can create as many libraries as you want.

A library tells Easy PDF Search where your PDF files are located.

When you search PDF files for words, you can choose to search all or some of your libraries.  This gives you great flexibility in limiting your search to specific sets of files.

The first time that Easy PDF Search searches your PDF files, it creates an index of the files’ contents.  On subsequent searches, it uses the indexes directly, which can speed up the search up to 100x faster.

If your PDF files get updated, Easy PDF Search will automatically recreate the indexes for that file.

Enter your search words/phrases

Enter one or more search words/phrases to search for.  This guide provides more details on how you can perform both simple and advanced searches.  Basically, advanced searches allow you to use operators to refine the search.

For e.g. entering the words

data consistency and data concurrency 

will return all files containing the words data, consistency, and, and concurrency.  Entering it in double quotes

“data consistency and data concurrency”

will return all files containing the phrase data consistency and data concurrency.  Entering it this way

“data consistency” OR “data concurrency”

will return all files containing the phrase data consistency or data concurrency.  Entering

“data consistency” AND “data concurrency”

will return only files containing the phrases data consistency and data concurrency. Entering

“data consistency” NOT “data concurrency”

will return only files containing the phrase data consistency and does not contain the phrase data concurrency.  Finally, entering

NEAR(“data consistency”, “concurrency”, 20)

will return files containing the phrase data consistency and where the word concurrency appears 20 words before or after that phrase.

Review the search results

The search results are summarized, and the pages where the search words/phrases are listed.

Expanding the file summary will list out all the pages where the words/phrases were located together with the number of occurrences.

Clicking on a listed page will display the page from your PDF file and the search words/phrases are highlighted.

Beyond just viewing your search results, you can perform additional tasks like combining all the pages from the search results into a single PDF file.  Using the search results, you can also  extract the pages into separate PDF files, extract text and images from those pages, and export those PDF pages as images.

Download a free 14-day trial now and experience how Easy PDF Search makes searching your PDF files so much easier and faster.

See also:

Speeding up searches in Easy PDF Search

When you enter words to search for in Easy PDF Search, you need to be mindful of whether you want to search by each individual word, or the phrase.

For e.g. if you enter the following:

Easy PDF Search will search your library of PDF files for pages that contain the word data, or consistency, or and, or concurrency.  This will slow down the search process significantly because of the inclusion of the word and, which is a common word.  The search will complete faster if you omit the word and.

If however you wanted to search for pages that contain the phrase data consistency and data concurrency, you need to enclose the entire phrase in double quotes i.e.

Now only your PDF files that contain the phrase data consistency and data concurrency are returned by the search.  The search will also complete faster because you are no longer searching for individual words.

You may also have wanted to search your PDF files for pages that contain the word data consistency or data concurrency.  You can do this in one of two ways.  Either enter each phrase on a separate line e.g.

or use the OR operator this way:

Both methods of entering the search phrases will return the same results.  The only difference lies in how the results are displayed.  In the first method, each phrase is displayed under separate headings.

while using the OR operator displays the results under a single heading.

So in summary, a phrase search is faster than a multiple word search, and if you must perform a multiple word search, omitting common words will speed up the search.

 

Cooking in the labs – a DICOM tags search application

This started out as a request from a user, and we thought it would be an interesting project for us.  We’re currently in the early stages of evaluating the feasibility of developing a DICOM tags search application.  The objective is to allow users to use SQL queries to search for DICOM images using values from the DICOM tags embedded in the images.

Step 1 – reading the tags from your DICOM images

We plan to allow you to read the DICOM images from a database, or from DICOM files already on your computer.

Step 2 – storing the data elements/tag values

This is the hardest part.  A DICOM image can contain both standard and private data elements/tags.  For a start, we will only be indexing most of the standard data elements/tags (over 2800).

Step 3 – querying the tag values

Once stored in a database, users can use SQL syntax to query the tag values.  We will also look into adding a query builder to help non-technical users.

Step 4 – displaying the images

Once we have the results of the query, we can display the images using the user’s preferred DICOM viewer.

Issues/bottlenecks

We do not have any experience working with PACS nor have access to any such system.  We are approaching this purely from a database developer perspective – read the images, store the tag values in a database, allow users to query the database, and display the results.

If you are interested in such an application and can spare some time to help us test the application as we go along, or there are features you would like to see in such an application, please drop us an email at support@yohz.com.  Thank you.

Raw image file support in SQL Image Viewer

SQL Image Viewer 10 adds support to display RAW camera image files.  If SQL Image Viewer fails to display your RAW images, please send us a copy of the file for further analysis, to support@yohz.com.

You need to take into account the following when exporting RAW image files from your database using SQL Image Viewer.  Raw image files are exported using the .raw extension by default.  This is because SQL Image Viewer is unable to distinguish between the different raw formats (cr2, crw, nef, tec).

To export the raw image files using the correct extension, you need to have a column that contains the source file name. For e.g our result set contains the file name in the source column.

We cannot use the source column as is in our naming convention like this:

because it would include the path portion.  To use only the file name portion, we use the FILENAME operator e.g.

The FILENAME operator tells SQL Image Viewer to treat the value in the column as a fully qualified file name, and to use only the file name portion of the value.

What is we wanted to name the exported images using the ID column?  We will need to use the FILEEXT operator in this case.  Our file naming convention will be entered like lthis:

The FILEEXT operator tells SQL Image VIewer to treat the value in the column as a fully qualified file name, and to use only the file extension portion of the value (including the . separator).

By using the FILENAME and FILEEXT operators, you have more control over how the exported files are named, and how you can use elements from columns containing file names.

If you have any questions or requests, please do drop us a line at support@yohz.com.

Extracting pages from PDF files to individual files using Easy PDF Explorer

In Easy PDF Explorer 2.5, we added the option to extract the pages from PDF files into individual files.

Select the file(s) you want to extract the pages from, and click on the Extract pages button.

The default option is to create a single PDF file containing the pages you want to extract.  For e.g. if we entered <FIRST:5>, <LAST:10> as the pages to extract:

Easy PDF Explorer will create a single file containing the first 5 pages and last 10 pages for each of our selected files.

To extract each page into an individual file, we select the Store each page in a separate file option.  The suffix value will append the page number to the end of our file name.

Now when we run the task, each page is extracted to a separate file, using the combined naming convention of the file name and suffix.

Let’s say we want to extract each file into a separate folder, and each folder contains the individual pages.  To do that, enter the following values for the folder, file name, and suffix name.

Using the above values:

  • each file will be extracted into its own folder using the file name (without the extension)
  • the base file name will simply be .pdf
  • and the suffix (page_<PAGENUM:0000>) will be appended to the base file name.

Running the task results in the following folders:

and in each folder, each page is exported to its individual file.

Each folder contains the files for the first 5 pages and last 10 pages of each selected PDF file.

If you want to extract all the pages instead of a range of pages, leave the Pages to extract value empty.

 

SQL Image Viewer post processing option

In SQL Image Viewer 9.10, we added an option to allow you to run an application on each exported file.  So say your database contained zip archives, and you use SQL Image Viewer to export those zip archives to disk.  Using the post processing option, you can then use your favorite archiver to extract the files from the zip archives.

The post processing option is only available when you choose the Export images and files option.

In the export wizard, you will find the option on the Log, Email, and Post-processing Options page.

In this example, we will be using the 7zip command-line interface to extract the contents of our zip files.  The most important thing to note here is that whichever application you use, it needs to complete regardless of its execution status.

For example, in 7zip, if a file of the same name already exists, it will prompt you to overwrite or skip the file extraction.

You cannot allow this to happen when running the application from within SQL Image Viewer, because you cannot respond to the application from within SQL Image Viewer, and thus will block the export process.

To prevent this in our example, we use the -aoa flag to choose to always overwrite any existing files.

So to run your application, enter the fully qualified name to its executable file.  In our example, that’s E:\Program Files (x86)\7-zip\7z.exe.  Because the path contains spaces, we need to enclose them in double quotes.

After the executable path name,  enter the required options for your application.  There are 4 tags you can use to represent the exported file to process.  Given a file name of f:\temp\exports\0002_0003.zip:

  • <FILENAME> returns f:\temp\exports\0002_0003.zip
  • <FILENAME_PATH> returns f:\temp\exports\
  • <FILENAME_NOPATH> returns 0002_0003.zip
  • <FILENAME_NOPATH_NOEXT> returns 0002_0003

In our example, we want to extract the files from our zip archive, so we use the e option.  We then need to provide the archive file name, which we do so using the <FILENAME> tag.  Again, we enclose the <FILENAME> tag in double quotes in case it contains spaces.

We then want to specify the folder to extract the items into, using the -o option.  In this case, we want to extract the files into a subfolder using the zip file name.  So given a zip file name of f:\temp\exports\0002_0003.zip, the contents of that zip file will be extracted into the f:\temp\_dump\0002_0003 folder.

Now after every zip file has been extracted to disk, SQL Image Viewer will run 7-zip to extract the contents of the zip files.

One last option is the Delete file after successful processing item.  Selecting this will cause SQL Image Viewer to delete the exported files if the processing application returns an exit code value of 0.  Most command line applications do that.  A non-zero exit code usually signifies an error.

We hope you find this option useful.  If you want to use an application that requires some parameters using the input file that’s not provided by any of our tags, drop us a line at support@yohz.com, and we’ll try to help you out.

 

Using Easy PDF Search

Easy PDF Search helps you search for words or phrases in your PDF files.  You can also search for values in the file annotations and attributes.

To start off, enter the words or phrases you want to search in the search window.  There are a few ways you can refine your search terms, like conditional searching using AND and OR operators, using NOT to exclude words, using NEAR to perform proximity searches, all described here.

For now, let’s just keep things simple, and search for the word performance or monitoring.

Next, we need to tell Easy PDF Search which files to search in.  To do that, we need to define one or more libraries.  Each library can contain one or more paths and search patterns.

You can define generic paths like this:

or more specific search patterns like this:

Once you have defined your libraries, select the libraries you want to include in your search.

In this way, you can easily choose which paths you want to search in if you group files of similar topics into the same libraries.

Easy PDF Search will then index your files where necessary, and display the search results in the main window. Click on any of the items to open the relevant file or page.

Easy PDF Search will highlight all the search terms found on that page.

With the search results, you can now perform various tasks like compiling all the results into a single PDF file, extracting text and images from the pages, converting the pages to images etc.

You can also review your search history and recall the results of those searches.

You can also use the search parameters from your search history to perform a new search if you have new or modified files in your libraries.

Download a 14-day trial of Easy PDF Search now, and see how it lets you work faster and opens up new possibilities on how you can work with your PDF files.

See also:

 

Searching PDF content using Easy PDF Search

When using Easy PDF Search to search for words or phrases, here are a few pointers.

When you enter a single word to search, Easy PDF Search will return all pages containing one or more occurrences of that word.

If you enter two words on different lines e.g.

Easy PDF Search will return all files containing the words monitoring or quality.

Likewise, if you enter multiple words e.g.

all files containing any of the entered words will be returned.

If you enter two words on the same line e.g.

only files containing the first and second words will be returned.

If you enter two lines of two words each e.g.

then only files containing monitoring and sensors or arduino and quality are returned.

You can also search for phrases in place of words.  To search for phrases, enclose the words in double quotes e.g.

This will then return only files containing the phrase monitoring quality.  The rules for words described above apply to phrases too.  E.g.

will return all files containing the phrase monitoring quality and the word arduino.

Refining your search using AND, OR, NOT

When you enter two words on a line to search for e.g.

there’s an implicit AND operator added i.e.

You can use the OR operator if you want the search results to return files containing either of the two words e.g.

You can also use the NOT operator to exclude files containing specific words.  E.g.

will return all files containing the phrase monitoring quality and do not contain the word  arduino.

You can combine multiple operators and words to refine your search e.g.

Use parentheses to make it clear the order in which to apply the search operators and words e.g.

Note that the AND, OR, NOT operators must always be written in uppercase.

Prefix search

Instead of complete words, you can also use prefix searches e.g.

This will then return all files containing words starting with monitor e.g. monitoring, monitored, monitors, etc.

Proximity searches

Proximity searches allow you to search for 2 or more words based on their proximity, using the NEAR operator.  E.g.

will return all files containing the words monitoring and performance when they appear within 20 words of each other.  If you omit the distance value e.g.

a default distance value of 10 words is used.  Note that common words like the, and, it etc are ignored when determining proximity.

Searching by file and by page

By default, Easy PDF Search will treat the entire PDF file as one single page.  Instead of applying the search criteria on the entire file, you can choose to search by individual pages.  For e.g. entering this

will return all files containing the word performance but not optimization.  If however you choose to search by page

then only individual pages containing the word performance but not optimization are returned.

Searching PDF attributes and date values

Each PDF file has a set of common attributes, like author, creator, title, subject, producer etc.  Using Easy PDF Search, you can easily search for PDF files with attributes matching one or more values.

If you want to see which of your PDF files contain attributes, just enter a wildcard search value and select the attributes you’re interested to see.

Easy PDF Search will then return all files containing values for the attribute types you selected.

You can also search on the PDF creation and modification date.  All dates are stored in the format |year|month|date|hour|minute|second

For e.g. July 27, 2010 9:30 PM will be stored as 20100727093000

To search for files created or modified on a specific date, we enter the date elements and use a wildcard for the time elements.  For e.g. to search for files created on March 23, 2009, we would enter the following:

Easy PDF Search will then return all files created on that date, regardless of the time value.

 

Searching PDF keywords

You can search PDF keywords for specific values using Easy PDF Search.

If you enter a value without any wildcard symbols, e.g.

then Easy PDF Search will only return files containing only the keyword urgent or performance.

In most cases, the keywords attribute contain multiple words e.g.

urgent attention required for performance

In these cases, you would need to use the wildcard symbols to find files containing your keywords e.g.

To see all keywords for your files, just enter a wildcard character e.g.

Easy PDF Search will then return all pages containing keywords.