X hits on this document

Powerpoint document

Python for NLP and the Natural Language Toolkit - page 33 / 47

124 views

0 shares

0 downloads

0 comments

33 / 47

Simple Word Length Example

>>> from nltk.token import WSTokenizer               >>> from nltk.probability import FreqDist              >>> corpus = open('corpus.txt').read()                    >>> tokens = WSTokenizer().tokenize(corpus)      # What is the distribution of word lengths in a corpus? >>> freq_dist = FreqDist()                            for token in tokens:                                                  freq_dist.inc(len(token.type()))

This length is the "outcome" for our experiment, so we use inc() to increment its count in a frequency distribution.

Document info
Document views124
Page views124
Page last viewedWed Dec 07 08:56:31 UTC 2016
Pages47
Paragraphs392
Words1978

Comments