No OneTemporary
Actions

Size

1 KB

Referenced Files

None

Subscribers

None

View Options

	diff --git a/README b/README
	index bbad423..39b382f 100644
	--- a/README
	+++ b/README
	@@ -1,38 +1,39 @@
	This script allows to extract proper nouns from an English text with NTLK.

	Install dependencies
	--------------------
	* Install NTLK according your OS (pkg install ntlk on FreeBSD for example)
	* Install numpy (pkg install py27-numpy)
	* Download the needed NLTK resources with ntlk.download():
	+** averaged_perceptron_tagger
	** maxent_treebank_pos_tagger
	** punkt
	** treebank

	Source text
	-----------
	You need a copy of the text you want to extract from as plain text.

	Source English word list
	------------------------
	The expected format is a list in lowercase, each line a substantive word.
	Filename should be wordsEn.txt or modified in eliminate-common-nouns script.

	Such file is available at http://www-01.sil.org/linguistics/wordlists/english/

	Usage
	-----
	./extract-proper-nouns source.txt > nouns.txt

	To sort them and eliminate duplicates:
	./extract-proper-nouns source.txt \| sort \| uniq > nouns.txt

	To discard known English words:
	./eliminate-common-nouns nouns.txt

	Acknowledgment
	--------------

	Thank you to Rama for NLTK suggestion and some brief guidance.

	The original code idea is from Alvations, and could be seen at http://stackoverflow.com/a/17672491/1930997.