Return to main culling page
Automated Culling of Data from the Internet
Step 2: Compiling the Data Input List
To switch from manual to automatic data culling, you need to create a list
of the inputs that you would have entered, had you been working manually.
This is a plain text file containing the input information. In the
simple case of one entry item per form, just put each item on a separate line.
For multiple entry cases, there is a choice among several formats.
Example 1: Chemicals
Return to chemical example, step 1
A sample data input list for the chemical database example
is in the chemicals.input file.
Jump to step 3
Example 2: Baseball
Return to baseball example, step 1
A sample data input list for the baseball database example
is in the players.dat file.
Jump to baseball example step 3
Example 3: Zip Codes
Return to zip code example, step 1
A sample data input list for the zip code example
is in the firms.dat file. Note that
there are two lines per firm, the name and the street address.
I am restricting this example to Pittsburgh PA only, but
that is easy to change. Also note that we must anaylze pairs
of lines rather than pairs of words, because the firm name and
the address can each have a variable number of words.
Jump to zip code example step 3
Continue on to step 3
Return to main culling page
Send comments/suggestions/corrections to: