X hits on this document

273 views

0 shares

29 / 78

# Chapter 2. The Simulations

24

For the control net, the frequency with which the words appear in text was also taken from the CELEX database. However, using the actual frequencies as found in the CELEX database would require too long a training time until every word present in the training corpus has been presented to the net at least once. This is because there is a very large difference between very high and very low frequency words (e.g. ’that’ appears 217376 times and ’thaw’ only 45 times in the CELEX corpus). For this reason, the frequencies were compressed into log-frequencies according to the formula:

pi =

l o g f i / 1 0 0 ) + 1 ) log m/100)

(2.1)

w h e r e p i i s t h e l o g f r e q u e n c y o f w o r d i , f i i s t h e C E L E X f r e q u e n c y o f w o r d i a n m if the frequency of the most frequent word in the training corpus (see Harm and Seidenberg (1999) for the formula and Plaut et al. (1996) for a discussion of the log frequency). Essentially this expresses the frequencies in terms of the most frequent word. The most frequent word in the corpus gets a frequency of one, all the other words have frequencies between zero and one. Words with a final log-frequency less than 0.05 were given a frequency of 0.05 in order that they appear in the training corpus enough times for the network to learn. Thus the frequency of the most frequent word is 20 times the frequency of the least frequent words. It also means that in the training regime, the most frequent word will appear on each epoch and the least frequent words should appear once every 20 epochs. Note that in the control network, each one of the five different possible positions has the same frequency. Note also that during training the network selects events at random (according to the frequency) from the 9940 possible events, so that it is not the case that a word is presented in all its possible fixation points one after the other. d

For the second net, each one of the possible fixation positions has a different fre- quency according to table 1.1. The original frequencies from the CELEX database were changed according to the table and then the log-frequency was taken. Thus every

 Document views 273 Page views 287 Page last viewed Tue Jan 17 13:09:05 UTC 2017 Pages 78 Paragraphs 3087 Words 17414