PDFancyFolders FAQ
If you can't find your answers here, you can always get in touch with support
PDFancyFolders, what is it good for?
Absolutely nothing, unless you want to view or extract text from PDF files that share similar formatting. If all your files are completely different, PDFancyFolders won't help much. If you have folders full of files from the same source, such as meeting minutes, periodic publications, reports or contracts, you may want to extract certain recurring sections for further processing or simply to get a quick summary of what's in there.
Here are some things you could do:
- Find all level 1 to 3 section headings
- Get a list of document titles and bylines
- Get a list of URLs pointing to http://example.com
- Extract all footnotes recognizable by their tiny font size
- Get all quotes printed in an italic/cursive font
- Get all text that has a specific color or background color
- Extract all text from all PDF files in a folder and generate one huge HTML file
What are some of the limitations of PDFancyFolders?
PDFancyFolders only works with PDF files. Other file formats cannot be added at all. The app does not work with password or copy protected PDF files. If you add a password or copy protected file, you will see an exclamation mark on the right hand side of the file indicating that the file has not been processed.
PDFancyFolders currently ignores images completely. Unfortunately, this also means that most scanned documents won't work unless they are preprocessed with some sort of text recognition or OCR software.
How can I add files?
First you have to create a folder to which you can add files. Tap "New Folder" and give it a name. Then tap your new folder to select it and tap "Add Files". You can now select files from your device, from your iCloud Drive or from any other location that you have configured on your device.
Files you add are automatically parsed by PDFancyFolders and made available for filtering. This process takes a while. In the files list you will initially notice an hourglass icon next to each file name. Once a file has been parsed sucessfully the hourglass will turn into a check mark. If parsing fails you will instead see an exclamation mark in a triangle next to the file. If that happens, it could mean that the file is password or copy protected. Password and copy protected files are not supported. In some cases parsing fails for no apparent reason. In these cases it sometimes helps to delete the file (swipe left) and re-add it to the folder.
Where are my files stored?
Files are stored and processed locally on your device. When you add files to PDFancyFolders they are copied into the app. When you delete files from PDFancyFolders, your original files are unaffected.
How can I delete files and folders?
You can delete files and folders by swiping left and tapping the red "Delete" button to confirm. If you delete a folder, the folder and all the files it contains will be deleted permanently from PDFancyFolders but not from the original location you added them from.
There is no undo or file recovery functionality in PDFancyFolders. You should never store files only in PDFancyFolders. The app is not suitable as file management app.
I don't understand the filter results I'm seeing
The most important thing to understand is that filters apply to individual text fragments. Any uninterrupted sequence of characters on the same page sharing the same formatting attributes belongs to the same text fragment. Sometimes a text fragment is just a single character, which looks a bit weird in the results. In other cases a text fragment can be an entire page. Most of the time, it will be a headline, a paragraph or a hyperlink.
Also note that filters are always applied to the selected folder only. There is currently no way to search multiple folders at the same time. Select the folder you want to search, then select the filter you want to apply and tap "Apply Filter" in the toolbar.
How do filter definitions work?
The filter editor shows one or more rows of filter criteria. The boxes in each row are connected by AND. The rows themselves are connected by OR. In order to be included in the filter results, a text fragment has to meet all the criteria in any of the rows. You can add more boxes to a row by tapping "+AND". You can add more rows by tapping "+OR". Tapping the small "x" in the upper right hand corner of each criteria box deletes the box. Once all boxes in a row have been deleted, the row will disappear.
How can I save my filter?
Filters are saved automatically. When you first open the filter tab, it will show a filter named [New]. By entering a name in the "Filter Name" text box underneath the filter picker, you can save your filter under a name of your choice. In the same way, you can rename your filter. Just Change its name in the text box.
The [New] filter does more than its name suggests. It works more like a scratch area that is always saved automatically so that you can do quick ad-hoc queries without ever having to name your filter. The [New] filter will remember your filter definition unless and until you decide to save it under a different name or you tap "Delete".
All saved filters are available in the filter picker above the "Filter Name" text box.
How can I create a new filter?
Just select [New] from the filter picker at the top of the filter tab. Define your filter criteria and save your filter under a name of your choice.
How can I delete a filter?
Tap "Delete" at the bottom of the filter tab. Be careful though. The filter is deleted right away without further confirmation.
Which Filter Types can I use?
The following sections explain all filter types that are available. Some filter types may only be available as a paid add-on. Paid filter types have a $ sign next to their name in the Filter Type picker.
Text Filter Type
The Text filter type allows you to search for text fragments matching the text string you enter. You can select from three different comparison operators. For the = operator to match, the the fragment has to be exactly the same as the text you enter, including any leading and trailing whitespace. As PDF files contain lots of spurious whitespace, exact comparisons often don't work.
The wildard (*) comparison operator is a bit slower but far less frustrating. So if you are searching for something, tap on the wildcard operator and enter *something*
The negation operator <> plays an important role beyond its obvious "not equal to" semantics. Selecting this operator and leaving the filter text box empty matches all text fragments. So this is the way to go if all you want is the entire text contents of all PDF files in the selected folder.
Heading Level Filter Type
Heading Level is the rank of the font size in each individual PDF file. So if you select the = comparison operator and enter 1 in the text box, you will get all text fragments that have the largest font size relative to other text fragments in the same file.
This means that text fragments on the same heading level may have completely different font sizes if they are coming from different files. But they will all be the largest, second largest, etc, relative to other text fragments within their respective files.
This filter type can be very useful to extract section headings even if the font sizes used vary a bit between files. For instance, periodic publications tend to get redesigned once in a while to make them look fresh. From some date onward, the largest section heading may be a bit smaller or a bit larger than before. This filter type will still return all the section headings of a particular level as long as they keep their relative sizes.
Font Family Filter Type
The Font Family filter type lets you select from a list of font family names that occur in any of the files in the selected folder.
Font Size Filter Type
The Font Size Filter Type lets you select text fragments that have a particular font size. By using this filter type more than once in the same row, you can select text fragments with a font size greater than 10 but smaller than 17 units. These are PDF units, i.e. 1/72 of an inch.
If you don't know which font sizes occur in your files (and who does?), you can first run your filter without specifying a font size. The result list will show you the font size of each fragment. You can then narrow down your search by adding a font size restriction.
The Heading Level filter type is almost always easier to use than working with font sizes directly.
Text Color Filter Type
Select from the colors that occur in any of the files in the selected folder.
Background Color Filter Type
Select from the background colors that occur in any of the files in the selected folder.
Bold Filter Type
Select only bold text. This is sometimes useful in combination with other filter types such as font size when you are looking for text that isn't a section heading but uses a bold font face for emphasis or for column headings.
Italics Filter Type
The italics filter type is sometimes useful to select quoted text that is otherwise hard to pin down.
Page Number Filter Type
Select text fragments that are on a specific page. Adding two of these in a row allows you to select page ranges. This filter type is excellent for extracting things like document titles or bylines.
Links Filter Type
This filter type allows you to select all text fragments that have links associated with them. Typically, the fragment text will be the link text (i.e. the underlined or blue part that you can tap on to follow the link). This is not always the case though. Sometimes the URL is attached to an entire paragraph. Unfortunately, it is not always possible to extract exactly the right span of text that the link belongs to.
The search phrase you enter is compared to both the associated text and the URL itself. If either matches, the text fragment is returned. Just like with the Text filter type, it is often advisable to use the wildcard (*) operator rather than the = operator.
You can select all links by tapping on the <> (not equal) comparison operator and leaving the text box empty.
There are two types of links. Page links point to other pages in the same PDF file. Web links point to arbitrary web pages. Page links have the form ?page=N where N is the page number. Web links start with http. To find all web links tap the wildcard (*) comparison operator and enter http*. Don't use http:* or http://* as most URLs nowadays start with https. To find all page links, enter ?page=* again using the wildcard (*) comparison operator.
To find links to a specific domain such as example.com, select the wildcard (*) operator and enter *example.com*. You don't have to use http in this case as your search term cannot be mistaken for a page link anyway.
Position Filter Type
The position filter (a paid add-on) allows you to search for text fragments that start at the top, bottom, left or right of the page. What matters is the position of the first character. For instance, if you know that a particular item always occurs in the footer of the page, you can use this filter to search only the bottom 10% of the page. You can play with the percentages to extract exactly the items you're looking for.
You can use the position filter multiple times in the same row to find text fragments that are located in the header or in the footer rather than in the middle of the page. And of course you can combine this filter type with any other filter type. If you're looking for footnotes, you could look for small text at the bottom of the page.
How can I save the filter results?
PDFancyFolders does not store your search results, but you can export the results as CSV (for import into Excel), plain text or HTML. Just select the format you want from the picker in the search results view and tap the "Export File" button. Note that exporting is only available on paid plans.
Which export formats do you support
PDFancyFolders offers a paid add-on that allows you to export search results as CSV (suitable for importing into Excel), plain text as well as HTML.
The CSV format contains absolutely all information we extract from the PDF files. You can load this file into a spreadsheet or into a database for further processing. The CSV file currently has the following columns:
file name, page number, sequence number, text, color_r, color_g, color_b, background_r, background_g, background_b, font family, font size, bold, italic, heading level, x position, y position, link URL.
We will never remove columns, but we might add further columns.