Natural Language Processing

For now this is just an archive of various NLP programs and supporting data that I am generating for LING403. Eventually I will clean all this up and make it more presentable.

Prolog Earley Parser

earley.pl
Prolog source for an Earley parser. The top level goal is "earley_parse(L)." where L is a list of words. Use "print_chart(_)." to see the parse chart, use "print_parses." to see the valid parses that are in the chart. Must load a grammar file before parsing.
big.pl
Loads all the files needed to parse with the "big" grammar.
big_grammar.pl
Grammar generated by expand_grammar.pl from grammar.cfg.schema.in.
big_lexicon.pl
A lexicon for the big_grammar.pl grammar.
small.pl
Loads all the files needed to parse with the "small" grammar.
small_grammar.pl
Toy grammar from Jurafsky.
small_lexicon.pl
Lexicon for toy grammar from Jurafsky.

Perl Grammar Schema Expander

expand_grammar.pl
Perl source to expand a schematized CFG into prolog rules. Use -h flag to get usage message.
expand_grammar_out.pl
Prolog output from expand_grammar.pl applied to grammar.cfg.schema.in. This uses a format suitable for Zach and Scott's parsers.
expand_grammar_out_arun.pl
Prolog output from expand_grammar.pl applied to grammar.cfg.schema.in. This uses a format suitable for Arun's parser.
expand_grammar_out.txt
Human readable output from expand_grammar.pl applied to grammar.cfg.schema.in
expand_grammar_test.pl
Same as expand_grammar_out.pl but with some of the lexical items filled in. Suitable for use with earley.pl above.
grammar.cfg.schema.in
Sample input for expand_grammar.pl
Last modified: Tue Nov 16 13:04:56 1999