Read More | www.bee-man.us | Important Notice |
For example, finding a 10-letter crossword puzzle word matching the pattern below in a dictionary file containing hundreds of thousands of properly spelled English words:
Although many people believe that we are in a "Windows World", this is certainly not true in the world of industrial and high-reliability equipment. Virtually all of this equipment uses some variant of UNIX because the reliability of Windows is simply not high enough for things like telephone exchanges, Internet infrastructure equipment, factory floor automation, transportation, and military systems. UNIX excels in reliability because nearly all of the academic research over the past 30 years has been UNIX based. This in turn is because other Operating Systems (Windows, Mac OS 9 and prior, VAX OS, etc.) are proprietary, and thus not fitting platforms for research. Serious research requires full transparency, rather than secrecy, in the computer hardware and software being used.
One of the reasons for the new-found popularity of the Mac among tech enthusiasts is that it is the only UNIX-based system on which you can run regular mass-market programs (Microsoft Office, Word Perfect, etc.) without complex emulation.
Windows is now the only commonly-used operating system that is not based on UNIX.
Mac users originally didn't have access to anything like this, but since Mac OS X is based on UNIX, the terminal application is available to allow you to access the UNIX (Darwin) layer of the operating system and issue UNIX commands. If you remember "DOS" this was a MicroSoft command-line-based operating system, and the predecessor of "Windows". To activate Terminal, navigate to the terminal application as shown below:
Double-click the "Terminal" application, and it will start up and give you a window that looks something like this:
We have started a new UNIX session (called a "shell"). Notice that the host computer is "Robert-Beemans-Computer" (hardly surprising) and the login ID in use is "bee_hive". On your computer these will have the name of your computer and your login ID rather than mine. The "%" sign followed by a space is called a "prompt", and is the shell's invitation to enter a command. In the following discussion we will ignore prompts, but they will always be there when the computer is ready to accept another command.
The command that we want to enter first is one that takes us to a directory where there is a file containing 234,936 English words, one word per line. The reason we want to go to that directory is that it will minimize the amount of typing we have to do to enter the path to the file in the "grep" command later. To do this we enter a "Change Directory" command, abbreviated "cd" which takes us to a directory by following a path that we name. The file we want is in the following directory:
We then enter another command: "ls" ("LS" but in lower case), short for "list" with a parameter called "l" (lower case "L", not the numeral one - meaning a "long" listing). In UNIX, parameters of commands are entered with a hyphen "-" after the command and immediately preceding the parameter.
The "README" file in the same directory has some interesting insights into the origin of the "web2" and "web2a files, including the fact that web2 is a complete list of all 234,936 words included in Webster's Second International dictionary, which was copyrighted in 1934 and whose copyright (according to Webster's) has now expired. The web2a file contains additional hyphenated words.
Because this file is from 1934, you won't find "microprocessor" or "sputnik". To partially remedy this, John M. Lawler, an Associate Professor of Linguistics at the University of Michigan, searched a lot of web pages and sorted the results, obtaining 69,903 words. The file is posted on his web site Here. This will probably catch most of the words that web2 misses, but searching both files would be safest.
For an online version go to the UNIX Manual Pages at the Huntsville, AL Macintosh User's Group.
Grep works when you type "grep" on the command line, followed by a space (press the space bar) and the pattern you are trying to match, followed by a space and the path to and name of a file that you are going to search. The blank spaces are the standard UNIX way to inicate separate command line elements. You active the command by pressing the "return" or "enter" key.
grep pattern indicators include:
. | (a period) This means "Match any character". |
^ | (a carat or "shift 6") This means the next character in the pattern must be at the beginning of a line. |
$ | (dollar sign or "shift 4) This means the previous character in the pattern must be at the end of a line. |
Other than that, we just enter the explicit letters that we are looking for. For example the following search:
crash | Because the "ash" occurs starting in the 3d position, not the second. Also because the word is 5 letters long, not 4. We allowed only 4-letter words when we specified both the beginning of the word (with a "^") and the end of the word (with a "$"). | |
washer | Because the word is longer than 4 characters. |
The required pattern for this is:
Indicating that it is 10 letters long, that there must be an "s" in the first position, an "o" in the fourth position, an "l" in the sixth position, a "y" in the seventh position, and a "d" in the tenth position, which must be the last letter of the 10-letter word.
After this pattern is one or more spaces, followed by the description of the file to be searched, which is "web2", the file containing the 234,936 English words.
Our command line is thus:
Here is a picture of the terminal window just an instant before we press the return key to start grep searching the web2 file for the indicated pattern:
And here is the terminal window a moment after we press the return key, revealing the one word found in the "web2" file that matches the given pattern:
"Schoolyard" is the word we were seeking. Simple, eh?
When you are through using UNIX, type "exit" at the prompt and then quit the Terminal application, or just quit the Terminal application like any other Mac program.
To receive my assessment of how well you understood this tutorial, search for the following pattern using grep in web2:
_ a _ _ _ f _ c _ n _ _ _ (13 letters)
Happy grep-ing!
My only reward for writing this is the 15 milliseconds of fame I receive from having my name here. Don't deprive me of that.
You can copy this page by simply doing a "Save As" in your browser and putting it somewhere on your hard drive (or your web site). If you stop there the background will be gone. To preserve the background, copy the following file into this same folder, without changing its name, by again using your browser's "Save As". The next time you refresh the page, the background should be restored: