X hits on this document

Powerpoint document

Python for NLP and the Natural Language Toolkit - page 33 / 47

180 views

0 shares

0 downloads

0 comments

33 / 47

Simple Word Length Example

>>> from nltk.token import WSTokenizer               >>> from nltk.probability import FreqDist              >>> corpus = open('corpus.txt').read()                    >>> tokens = WSTokenizer().tokenize(corpus)      # What is the distribution of word lengths in a corpus? >>> freq_dist = FreqDist()                            for token in tokens:                                                  freq_dist.inc(len(token.type()))

This length is the "outcome" for our experiment, so we use inc() to increment its count in a frequency distribution.

Document info
Document views180
Page views180
Page last viewedSun Jan 22 08:33:42 UTC 2017
Pages47
Paragraphs392
Words1978

Comments