X hits on this document

Powerpoint document

Python for NLP and the Natural Language Toolkit - page 32 / 47

155 views

0 shares

0 downloads

0 comments

32 / 47

Simple Word Length Example

>>> from nltk.token import WSTokenizer    >>> from nltk.probability import FreqDist    >>> corpus = open('corpus.txt').read()         >>> tokens = WSTokenizer().tokenize(corpus)               # What is the distribution of word lengths in a corpus? >>> freq_dist = FreqDist()                 for token in tokens:                                       freq_dist.inc(len(token.type()))

What is the "outcome" for our experiment?

Document info
Document views155
Page views155
Page last viewedWed Jan 18 08:51:46 UTC 2017
Pages47
Paragraphs392
Words1978

Comments