X hits on this document

Powerpoint document

Python for NLP and the Natural Language Toolkit - page 32 / 47

104 views

0 shares

0 downloads

0 comments

32 / 47

Simple Word Length Example

>>> from nltk.token import WSTokenizer    >>> from nltk.probability import FreqDist    >>> corpus = open('corpus.txt').read()         >>> tokens = WSTokenizer().tokenize(corpus)               # What is the distribution of word lengths in a corpus? >>> freq_dist = FreqDist()                 for token in tokens:                                       freq_dist.inc(len(token.type()))

What is the "outcome" for our experiment?

Document info
Document views104
Page views104
Page last viewedSat Dec 03 10:51:53 UTC 2016
Pages47
Paragraphs392
Words1978

Comments