WCopyfind 2.1 Instructions

Purpose:

WCopyfind 2.1 compares text or word processor documents with one another to determine if they share words in phrases.

Overview:

  1. Select the documents by dragging them from Windows Explorer into WCopyfind.
  2. Adjust the scanning parameters.
  3. Select a folder in which all the report files will be placed.
  4. Run the comparison process.
  5. Examine the results.

Step-by-step Instructions:

Step 1: Start WCopyfind 2.1.

  1. Locate WCopyfind and click on its icon

Click on image to enlarge

Step 2: Choose Documents to Compare

  1. Start Windows Explorer.
  2. In Explorer, select one or more documents that you wish to compare. See Notes for discussion of Web-resident documents.
  3. Drag-and-drop those documents into WCopyfind’s document windows.
  4. Repeat B & C as needed.

Step 3: Adjust Comparison Rule Parameters

  1. Shortest Phrase to Match — Range: 0 to infinite)
  2. This number is the minimum string length that WCopyfind 2.1 will consider to be a match. For example, when this parameter is set to 6, WCopyfind 2.1 will ignore matching phrases that are only 5 words long or less. I recommend leaving this parameter at 6 (words).

  3. Fewest Matches to Report — Range: 0 to infinite
  4. This number is the fewest matching words in a pair of documents that will cause WCopyfind2.1  to report a document match in its “Compare Documents” window and generate a pair of underlined comparison documents in the Report Files Folder. There is no recommended value for this parameter.

  5. Shortest Text String to Consider — Range: 0 to 255
  6. This number is the shortest sequence of printable characters that WCopyfind 2.1 will extract from a word processor or other document to use in its comparisons. Decreasing it will allow WCopyfind to extract shorter snippets of text, but may cause WCopyfind to include some non-text portions of word processor documents in the comparisons. I recommend leaving this parameter at 100 (characters).

  7. Most Imperfections to Allow — Range: 0 to 9
  8. This number is the maximum number of non-matches that WCopyfind 2.1 will allow between perfectly matching portions of a phrase. For example, if this value is set to 2, then WCopyfind 2.1 will bridge its way across two non-matching words to connect pieces of perfectly matching prose. A value of 0 will limit WCopyfind 2.1 to finding only perfect matches, while a value of 1 to 9 will allow WCopyfind 2.1 to find imperfectly matching phrases (matches that contain flaws). Increasing this value slows the program down. I recommend a value of 0 (if speed or absolute matching are your main requirements) or 2 (if you want to find matches despite minor editing).

  9. Minimum % of Matching Words — Range: 0 to 100
  10. This number is the minimum percentage of perfect matches that a phrase can contain and be considered a match. Setting this value at 100 limits WCopyfind 2.1 to finding only perfect matches. I recommend a value of 100 (if speed or absolute matching are your main requirements) or 80 (if you want to find matches despite minor editing).

  11. Ignore All Punctuation — Checked: Yes or No
  12. When checked, this parameter causes WCopyfind 2.1 to ignore all punctuation characters when it is performing its comparisons. While punctuation will continue to appear in the reports that WCopyfind 2.1 generates, it will not affect the phrase matching. The matching will normally increase when punctuation is ignored. I recommend against checking this box unless you really want to ignore all punctuation.

  13. Ignore Outer Punctuation — Checked: Yes or No
  14. When checked, this parameter causes WCopyfind 2.1 to ignore any punctuation characters that appear to the left or right of a word when it is performing its comparisons. For example, the quoted sentence: “The box, which I found, is broken.”  will be treated as though it were simply: The box which I found is broken (with no final period) . While this “outer punctuation” will continue to appear in the reports that WCopyfind 2.1 generates, it will not affect the phrase matching. The matching will normally increase when outer punctuation is ignored. I recommend against checking this box if your want absolute matching, but for checking this box if you want to find matches despite minor editing.

  15. Ignore Numbers — Checked: Yes or No
  16. When checked, this parameter causes WCopyfind 2.1 to ignore any number characters when it is performing its comparisons. For example, the words 8-fold and 10-fold will match if this parameter is checked. While numbers will continue to appear in the reports that WCopyfind 2.1 generates, they will not affect the phrase matching. The matching will normally increase when numbers are ignored. I recommend against checking this box if your want absolute matching, but for checking this box if you want to find matches despite minor editing.

  17. Ignore Letter Case — Checked: Yes or No
  18. When checked, this parameter causes WCopyfind 2.1 to ignore capitalization of letters when it is performing its comparisons. For example, the words Whenever and whenever will match if this parameter is checked. While capital letters will continue to appear in the reports that WCopyfind 2.1 generates, they will not affect the phrase matching. The matching will normally increase when capitalization is ignored. I recommend against checking this box if your want absolute matching, but for checking this box if you want to find matches despite minor editing.

  19. Skip Non-Words — Checked: Yes or No
  20. When checked, this parameter causes WCopyfind 2.1 to completely skip words that contain any characters other than letters, except for internal hyphens and apostrophes. The non-words will neither be used in matching, nor will they appear in the reports that WCopyfind 2.1 generates. If you check this box, I suggest also checking “Ignore Outer Punctuation,” so that words that begin or end with punctuation aren’t skipped over (including plural possessives). I recommend against checking this box if you want absolute matching, but for checking this box if the documents you are comparing contain many non-textual items, including filenames, URL, and other word-processor junk.

  21. Skip Words Longer than _____ Characters — Checked: Yes or No, with Range: 0 to 255
  22. When checked, this parameter causes WCopyfind 2.1 to completely skip words that are longer than the number of characters you select. The too-long-words will neither be used in matching, nor will they appear in the reports that WCopyfind 2.1 generates. I recommend checking this box and setting the number of characters at 20, unless your documents really do contain words longer than that. This choice will allow WCopyfind 2.1 to skip over many non-textual items, including filenames, URL, image data, and other word-processor junk.

  23. Use Word Map — Checked: Yes or No, with File Name and Browse Button
  24. When checked, this parameter causes WCopyfind 2.1 to load and use a word map (a generalized thesaurus) of your choice. Once the map has been loaded, WCopyfind 2.1 will examine each word it reads to see if there is a substitute in the word map. It will then perform that substitution prior to doing any comparisons for matching phrases. For example, if the word map indicates that the word “excellent” should be replaced by the word “good”, then WCopyfind 2.1 will consider beautiful and pretty to be matching words. Any number of word substitutions is allowed. The original, rather than the substituted words, will appear in any reports generated by WCopyfind 2.1. If you check this box, I suggest also checking “Ignore Outer Punctuation” and “Ignore Letter Case” because the word map requires perfect matching—it considers “Excellent” and “excellent” to be different words. Checking this box will slow the loading and hashing of the documents, but not the comparisons themselves. The format of a word map is described below. I recommend against checking this box unless you know what you are doing, have a good word map file prepared, and want to be able to find matches despite the presence of synonyms.

Step 4: Choose Reporting Folder and Style

  1. Browse to locate the reporting folder. Make sure it exists, because WCopyfind 2.1 will not create it for you.
  2. Check "Brief Report" box if you want the comparison files to contain only the matching phrases (see Note).

Step 5: Run Comparison and Examine Results

  1. Click “Run” — matches will be reported in the Comparison Window.
  2. When the comparison is done, open the Reporting Folder in Windows Explorer.
  3. Click on each comparison files to open it in your Internet Browser.
  4. Click on the “matches.txt” file to see a complete list of the matches.

Notes:

  1. The comparison window and “matches.txt” both list two numbers of matches. The first or “Total Match” is the number of perfectly matching words that have been marked in the pair of documents. The second or “Basic Match” is the number of perfectly matching words in phrases of at least “Shortest Phrase to Match” words. That second value is essentially the value that would have been obtained if no imperfections were allowed in the matching. In fact, if the “Most Imperfections to Allow” parameter is set to zero, “Total Match” and “Basic Match” will be the same.
  2. The format of a word map is simple. It must be a normal text file (*.txt) containing a series of one-line entries. Each one-line entry should begin with the substituted word, followed by any number of words that are equivalent to that first word. All of these words should be separated by white-space (e.g. blank or tab characters). For example, the line “good excellent terrific wonderful fabulous” will cause any appearance of the words “excellent,” “terrific,” “wonderful,” and “fabulous” to be matched as though the document contained the word “good” instead.
  3. The “Make Vocab” button is intended to help people develop word maps, but it may prove useful in other ways. When you press this button, WCopyfind 2.1 generates a long list of all of the words that appear in all of the document files. It produces an output file, which you chose after pressing the “Make Vocab” button, listing all the words and the numbers of times they appear in all of the documents. The final list is roughly in descending order of usage frequency, though it is not truly sorted. Generating this vocabulary may take a long time, particularly as the list of words encountered gets longer during the generation process. The ignore and skip parameters are active when making a vocabulary and should be selected carefully. I recommend checking “Ignore Outer Punctuation,” “Ignore Letter Case,” “Skip Non-Words,” and “Skip Words Longer than 20 Characters.”
  4. In the reports, perfect matches are indicated by red-underlined words and bridging, but non-matching words are indicated by green-italicized-underlined words.
  5. WCopyfind 2.1 can "surf the web." It can follow "internet shortcuts." If you want it to load a document from the web, simply create an "internet shortcut" to that web-document and drag the "internet shortcut" into one of the two input windows of WCopyfind 2.1. For example, if you want to include http://www.nytimes.com/ in the documents to be searched, first create an "internet shortcut" to that site somewhere on your desktop or in a folder. You can create it by using an internet browser to open http://www.nytimes.com/ and then dragging the link icon onto your desktop or into a folder. Once you have the "internet shortcut," you can drag this "internet shortcut" into WCopyfind 2.1's document windows. You'll see the "internet shortcut" name, followed by the extension ".url". When you run the comparison, WCopyfind 2.1 will load the document over the web and compare it as though it were a local file. You can save a folder full of "internet shortcuts" and drag them into WCopyfind 2.1 whenever you want to compare local documents against them. Be aware that broken links will terminate the loading process. Also, WCopyfind reloads web pages when preparing reports that contain them, so if a page changes between the load for comparison and the load for reporting (a very unlikely event, except with news services, etc.), the report may be scrambled.
  6. WCopyfind 2.1 "understands" html-formatted documents, both local and web-resident. If you include a local document that has a .htm or .html extension, or a web-resident document that reports itself as html-formatted, WCopyfind 2.1 will do its best to extract only the text from that file. It should handle html tags almost perfectly, but it is less sophisticated with special characters (it doesn't really understand "&char;"-type instructions).
  7. If you select "Brief Report," WCopyfind 2.1 will abbreviate its report files so that they contain only the matching phrases, in the order in which they appear in the document. Each of these phrases will be followed by a line break. In effect, the "Brief Report" option simply suppresses the inclusion of non-matching text in the reports and also inserts a line break (actually a new paragraph) at the end of each matching phrase.

Copyright 1997-2006 © Louis A. Bloomfield, All Rights Reserved
Page Last Updated: May 9, 2002 3:45 PM