Extracting pages from PDF files to individual files using Easy PDF Explorer

In Easy PDF Explorer 2.5, we added the option to extract the pages from PDF files into individual files.

Select the file(s) you want to extract the pages from, and click on the Extract pages button.

The default option is to create a single PDF file containing the pages you want to extract.  For e.g. if we entered <FIRST:5>, <LAST:10> as the pages to extract:

Easy PDF Explorer will create a single file containing the first 5 pages and last 10 pages for each of our selected files.

To extract each page into an individual file, we select the Store each page in a separate file option.  The suffix value will append the page number to the end of our file name.

Now when we run the task, each page is extracted to a separate file, using the combined naming convention of the file name and suffix.

Let’s say we want to extract each file into a separate folder, and each folder contains the individual pages.  To do that, enter the following values for the folder, file name, and suffix name.

Using the above values:

  • each file will be extracted into its own folder using the file name (without the extension)
  • the base file name will simply be .pdf
  • and the suffix (page_<PAGENUM:0000>) will be appended to the base file name.

Running the task results in the following folders:

and in each folder, each page is exported to its individual file.

Each folder contains the files for the first 5 pages and last 10 pages of each selected PDF file.

If you want to extract all the pages instead of a range of pages, leave the Pages to extract value empty.

 

SQL Image Viewer post processing option

In SQL Image Viewer 9.10, we added an option to allow you to run an application on each exported file.  So say your database contained zip archives, and you use SQL Image Viewer to export those zip archives to disk.  Using the post processing option, you can then use your favorite archiver to extract the files from the zip archives.

The post processing option is only available when you choose the Export images and files option.

In the export wizard, you will find the option on the Log, Email, and Post-processing Options page.

In this example, we will be using the 7zip command-line interface to extract the contents of our zip files.  The most important thing to note here is that whichever application you use, it needs to complete regardless of its execution status.

For example, in 7zip, if a file of the same name already exists, it will prompt you to overwrite or skip the file extraction.

You cannot allow this to happen when running the application from within SQL Image Viewer, because you cannot respond to the application from within SQL Image Viewer, and thus will block the export process.

To prevent this in our example, we use the -aoa flag to choose to always overwrite any existing files.

So to run your application, enter the fully qualified name to its executable file.  In our example, that’s E:\Program Files (x86)\7-zip\7z.exe.  Because the path contains spaces, we need to enclose them in double quotes.

After the executable path name,  enter the required options for your application.  There are 4 tags you can use to represent the exported file to process.  Given a file name of f:\temp\exports\0002_0003.zip:

  • <FILENAME> returns f:\temp\exports\0002_0003.zip
  • <FILENAME_PATH> returns f:\temp\exports\
  • <FILENAME_NOPATH> returns 0002_0003.zip
  • <FILENAME_NOPATH_NOEXT> returns 0002_0003

In our example, we want to extract the files from our zip archive, so we use the e option.  We then need to provide the archive file name, which we do so using the <FILENAME> tag.  Again, we enclose the <FILENAME> tag in double quotes in case it contains spaces.

We then want to specify the folder to extract the items into, using the -o option.  In this case, we want to extract the files into a subfolder using the zip file name.  So given a zip file name of f:\temp\exports\0002_0003.zip, the contents of that zip file will be extracted into the f:\temp\_dump\0002_0003 folder.

Now after every zip file has been extracted to disk, SQL Image Viewer will run 7-zip to extract the contents of the zip files.

One last option is the Delete file after successful processing item.  Selecting this will cause SQL Image Viewer to delete the exported files if the processing application returns an exit code value of 0.  Most command line applications do that.  A non-zero exit code usually signifies an error.

We hope you find this option useful.  If you want to use an application that requires some parameters using the input file that’s not provided by any of our tags, drop us a line at support@yohz.com, and we’ll try to help you out.

 

Using Easy PDF Search

Easy PDF Search helps you search for words or phrases in your PDF files.  You can also search for values in the file annotations and attributes.

To start off, enter the words or phrases you want to search in the search window.  There are a few ways you can refine your search terms, like conditional searching using AND and OR operators, using NOT to exclude words, using NEAR to perform proximity searches, all described here.

For now, let’s just keep things simple, and search for the word performance or monitoring.

Next, we need to tell Easy PDF Search which files to search in.  To do that, we need to define one or more libraries.  Each library can contain one or more paths and search patterns.

You can define generic paths like this:

or more specific search patterns like this:

Once you have defined your libraries, select the libraries you want to include in your search.

In this way, you can easily choose which paths you want to search in if you group files of similar topics into the same libraries.

Easy PDF Search will then index your files where necessary, and display the search results in the main window. Click on any of the items to open the relevant file or page.

Easy PDF Search will highlight all the search terms found on that page.

With the search results, you can now perform various tasks like compiling all the results into a single PDF file, extracting text and images from the pages, converting the pages to images etc.

You can also review your search history and recall the results of those searches.

You can also use the search parameters from your search history to perform a new search if you have new or modified files in your libraries.

Download a 14-day trial of Easy PDF Search now, and see how it lets you work faster and opens up new possibilities on how you can work with your PDF files.

See also:

 

Searching PDF content using Easy PDF Search

When using Easy PDF Search to search for words or phrases, here are a few pointers.

When you enter a single word to search, Easy PDF Search will return all pages containing one or more occurrences of that word.

If you enter two words on different lines e.g.

Easy PDF Search will return all files containing the words monitoring or quality.

Likewise, if you enter multiple words e.g.

all files containing any of the entered words will be returned.

If you enter two words on the same line e.g.

only files containing the first and second words will be returned.

If you enter two lines of two words each e.g.

then only files containing monitoring and sensors or arduino and quality are returned.

You can also search for phrases in place of words.  To search for phrases, enclose the words in double quotes e.g.

This will then return only files containing the phrase monitoring quality.  The rules for words described above apply to phrases too.  E.g.

will return all files containing the phrase monitoring quality and the word arduino.

Refining your search using AND, OR, NOT

When you enter two words on a line to search for e.g.

there’s an implicit AND operator added i.e.

You can use the OR operator if you want the search results to return files containing either of the two words e.g.

You can also use the NOT operator to exclude files containing specific words.  E.g.

will return all files containing the phrase monitoring quality and do not contain the word  arduino.

You can combine multiple operators and words to refine your search e.g.

Use parentheses to make it clear the order in which to apply the search operators and words e.g.

Note that the AND, OR, NOT operators must always be written in uppercase.

Prefix search

Instead of complete words, you can also use prefix searches e.g.

This will then return all files containing words starting with monitor e.g. monitoring, monitored, monitors, etc.

Proximity searches

Proximity searches allow you to search for 2 or more words based on their proximity, using the NEAR operator.  E.g.

will return all files containing the words monitoring and performance when they appear within 20 words of each other.  If you omit the distance value e.g.

a default distance value of 10 words is used.  Note that common words like the, and, it etc are ignored when determining proximity.

Searching by file and by page

By default, Easy PDF Search will treat the entire PDF file as one single page.  Instead of applying the search criteria on the entire file, you can choose to search by individual pages.  For e.g. entering this

will return all files containing the word performance but not optimization.  If however you choose to search by page

then only individual pages containing the word performance but not optimization are returned.

Searching PDF attributes and date values

Each PDF file has a set of common attributes, like author, creator, title, subject, producer etc.  Using Easy PDF Search, you can easily search for PDF files with attributes matching one or more values.

If you want to see which of your PDF files contain attributes, just enter a wildcard search value and select the attributes you’re interested to see.

Easy PDF Search will then return all files containing values for the attribute types you selected.

You can also search on the PDF creation and modification date.  All dates are stored in the format |year|month|date|hour|minute|second

For e.g. July 27, 2010 9:30 PM will be stored as 20100727093000

To search for files created or modified on a specific date, we enter the date elements and use a wildcard for the time elements.  For e.g. to search for files created on March 23, 2009, we would enter the following:

Easy PDF Search will then return all files created on that date, regardless of the time value.

 

Searching PDF keywords

You can search PDF keywords for specific values using Easy PDF Search.

If you enter a value without any wildcard symbols, e.g.

then Easy PDF Search will only return files containing only the keyword urgent or performance.

In most cases, the keywords attribute contain multiple words e.g.

urgent attention required for performance

In these cases, you would need to use the wildcard symbols to find files containing your keywords e.g.

To see all keywords for your files, just enter a wildcard character e.g.

Easy PDF Search will then return all pages containing keywords.

 

Searching PDF annotations

You can search PDF annotations for specific values using Easy PDF Search.

You can easily search for values for one or more annotation types.  You can use wildcards in your search too,   For e.g. to search for an annotation’s author, you could use

david

which would return all pages containing annotations whose author value is david.  You could use

*david*

which would return annotations whose author value contains the text david.  Or you could use

david*

or

*david

which would return annotations starting with the text david or ending with the text david respectively.

If you want to see all annotations for your files, just enter a wildcard character and select the annotation types you wan to view e.g.

Easy PDF Search will then return all pages containing annotations of the free text, highlight, squiggly, strikeout, text and underline types.

 

Difference between search functions in Easy PDF Search and Easy PDF Explorer

Easy PDF Search and Easy PDF Explorer both offer search functions, but they differ in how they work.

  • indexed and non-indexed search

When you first search a file in Easy PDF Search, it indexes the contents of the file so that subsequent searches are completed much faster.  New and modified files are also automatically indexed when needed.  In Easy PDF Explorer, the search is performed by opening each file and searching each page every time.

  • storage requirements

Because Easy PDF Search indexes the contents of your files, it needs additional storage to store the indexes.  The trade-off is your searches are completed much faster compared to if your files were not indexed.

  • search operators

When you search for a phrase like

transactional explication

in Easy PDF Explorer, it will return only files containing pages which contain the exact phrase.Easy PDF Search will instead return all files containing the words transactional or replication anywhere in the file, but both words must exist in the file.  You can add more words e.g.

transactional replication monitoring

in which case only files containing all 3 words are returned.To return all files containing either or some or all of the words, you can use the OR operator e.g.

transactional OR replication OR monitoring

You can even exclude files containing specific words or phrases.  For e.g. to return all files containing the word transactional replication but not monitoring, enter

“transactional replication” NOT monitoring

You can refine your search phrases as much as you want using the operators and parentheses e.g.

(“transactional replication” OR “full backups”) NOT monitoring

To return only files containing a specific phrase, you would enter the phrase enclosed in double quotes i.e.

“transactional replication”

  • proximity searches

Easy PDF Search can perform proximity searches using the NEAR function.  For e.g. entering

NEAR(transactional replication,20)

will return all files containing pages where the words transactional and replication appear within 20 words of each other.

  • wildcard searches

Easy PDF Explorer allows you to search for fragments of a word e.g. searching fortorcan return torn, monitor, or story.  Easy PDF Search works by searching entire words, and it can only perform prefix searches e.g.tor*will return torn.  It does not allow suffix or wildcard searches e.g. *tor or *tor*.

  • searching annotations and attributes

In Easy PDF Search, you can search the values of annotations and file attributes e.g. the annotation author, the PDF author etc.  You can use wildcard searches when searching annotations and attributes.

Naming the exported files in Access OLE Export and SQL Blob Export

When exporting your database blobs to files using Access OLE Export and SQL Blob Export, the default naming convention is the row number and column number (of the blob column).

For example, if your data set contains 10 rows and 2 columns of blobs (say columns 2 and 5), the files will be named in this way:

0001_0002.<file extension>
0001_0005.<file extension>
0002_0002.<file extension>
0002_0005.<file extension>

0010_0002.<file extension>
0010_0005.<file extension>

You can change the naming convention to use values from the exported table or data set (if running a SQL query).  For example, if your table or data set has the following columns:

and you want to name the exported files from the SalesOrderImage column using the SalesOrderID value, you would use the following naming convention:

The exported files will then be named this way:

  • 43659.<file extension>
  • 43660.<file extension>
  • 43661.<file extension>

and so on.

Simply enclose any column values you want to use in angled brackets, as in <SalesOrderID> above.  You can combine multiple columns together, so for e.g. using  <Category>_<SalesOrderID> will name the files

  • Bikes_43659.<file extension>
  • Bikes_43660.<file extension>
  • Components_43661.<file extension>

and so on.

There are also attributes tags you can use as part of the naming convention.  We have already described the <%column%> and <%row%> tags above.  There is also a <%size%> tag, which would return the file size value.  You can freely mix column tags and attribute tags in the naming convention.

Using tags in output folders

You can also use column tags as part of the folder naming convention, if you need to store the files in individual folders or grouped by folders.  For e.g. in this sample data set:

if we wanted to group our exported images by the Category value, we would enter F:\exports\<Category>\ as our output folder naming convention

This will result in the first 2 files stored in the F:\exports\Bikes\ folder, the 3rd file in the F:\exports\Components\ folder, and so on.  You can use multiple column tags in the folder naming convention if required.  Access OLE Export and SQL Blob Export will create the folders if they do not already exist, as long as you have the permissions to create those folders.

OLE-Object types

For files extracted from OLE-Object packages, you have the option of using the original file name:

or a custom name using column and attribute tags, as described above.  If using a custom name, you can use the <%package_file_name%> tag, allowing you to mix column values with the original package file name e.g.

 

Importing from multiple CSV files in Easy Excel Analysis

To import data from multiple CSV files in Easy Excel Analysis, first set up the import settings for a single file as described here.

NOTE: The layout and format of the data in all the additional files must be identical to that of the initial file. 

When you reach this page of the import wizard, you can then set up the additional files to import from.

To add files, click on the Add button below the file list window.

If you need to be able to identify each data source and your worksheets do not contain any columns that identifies the source data, you can add additional columns on this page to help identify the data source.  Click on the Add button below the additional columns list.

On this page, enter the column name, type, length, and source of the column.  There are several options available to use the filename or worksheet details.

Using an example file name of G:\data\Sales_Osaka_July2020.xlsx,

  • filename = Sales_Osaka_July2020
  • filename 1st value = Sales
  • filename 2nd value = Osaka
  • filename 3rd value = July2020

Note that the separators can be a dash symbol (e.g. Sales-Osaka-July2020.xlsx), a comma (e.g. Sales,Osaka,July2020.xlsx), or a dot (e.g. Sales.Osaka.July2020.xlsx).  You cannot mix separator symbols e.g. Sales-Osaka.July2020.xlsx.

For further details, please refer to this topic in the help file.