this program expects your directory to have the following files that stores the lexicons into the following syntactic categories: "lexC" complementizers "lexD" determiners "lexJ" adjectives "lexN" nouns "lexP" prepositions "lexV" verbs "lexX" unknowns "lexY" adverbs you should also have run your emails (or whatever text) through a statistical package that builds bigrams and trigrams. an example is the CMU_Stat_Toolkit. if you have done this and generated .arpa files, you can use the following to "clean up" the formatting: cleanGrams.pl - splits given arpa file into a file containing only unigrams, bigrams, and trigrams - input: filename - output: filename-1, filename-2, filename-3 then run unix "sort -r" (in reverse order) on the individual ngram files now there are two programs. one generates phrases according to the trained language model (genbo.pl) and the other just generates them randomly (ranbo.pl). ranbo.pl - generates random words taken from lexicon - input: POS - output: sentence genbo.pl - takes the sorted trigrams and generates text according to the most probable words in the trigrams (matching the lexicons) - input: POS - output: sentence