Return to main culling page


Automated Culling of Data from the Internet

Step 2: Compiling the Data Input List

To switch from manual to automatic data culling, you need to create a list of the inputs that you would have entered, had you been working manually. This is a plain text file containing the input information. In the simple case of one entry item per form, just put each item on a separate line. For multiple entry cases, there is a choice among several formats.

Example 1: Chemicals

Return to chemical example, step 1

A sample data input list for the chemical database example is in the chemicals.input file.

Jump to step 3


Example 2: Baseball

Return to baseball example, step 1

A sample data input list for the baseball database example is in the players.dat file.

Jump to baseball example step 3


Example 3: Zip Codes

Return to zip code example, step 1

A sample data input list for the zip code example is in the firms.dat file. Note that there are two lines per firm, the name and the street address. I am restricting this example to Pittsburgh PA only, but that is easy to change. Also note that we must anaylze pairs of lines rather than pairs of words, because the firm name and the address can each have a variable number of words.

Jump to zip code example step 3


Continue on to step 3

Return to main culling page

Send comments/suggestions/corrections to: