Edward II [1].

3.1

## Character

## Distribution

Our basic character distribution attempts to deter- mine di erences in style between authors by deter- mining the probability that either author uses a given character. The basic algorithms is to first determine the relative probabilities that every unique character found in a work, then average those probabilites over each character for each work by a given author. The

relative probabilites are ploted

as

shown

in

Figure[1]

and

a

2

test

carried

out.

A quick glance at Figure[1] shows that there are significant di erence in the usage of certain letters. Malowe tends to use the vowel ”e” far more often then Shakespeare does. It also appears that Marlow uses more spaces in his work, which could imply that Marlowe tends to use shorter words as we see in the next section. All this can be quantified with a ^{2 }test. H in this case is that given the relative probabilites and variances of each character appearing in either authors work, what is the probability that both bod- ies of work share the same source? Table[1] shows the output of our R code, which includes the aver- age probability of each character appearing in both Shakespeare’s and Marlow’s works as well as the dif-

ference and the Z-score. Before even looking at the P-value for the

2

test,

we can see that for the vast majority of characters, the probabilty of both values comming from the same normal distribution is almost 0. The P-value on the

2 test confirms this suspicion, in that we soundly reject H_{O in favor of H , that the works by Shake- }speare and Marlowe do indeed come from di erent sources. Though there’s a di erence in the corpus size between the two authors, the amount of text we know was authored by Marlowe is fairly substancial so our results have some sway.

3.2

## Word

## Length

Analysis

We also chose to analyze the di erence in word length distributions among the authors. For each author, we calculated the fraction of words that are a certain length. Specifically, we looked at words of length 1 to 19 because anything beyond that would not be signif- icant. We then calculated the average proportions for each word length between each author’s work. This is what is depicted on Figure[2].

One can immediately notice that Shakespeare uses significantly more 4 letter words than 3 letter words whereas Marlowe uses more 3 letter words than any

other. We found that Shakespeare was one of the few authors that used more 4 letter words than any

other data,

size. The but the

2 graph test

is a good way to for independence

understand can tell us

if the word length distribution between and Marlowe is actually significantly di test is shown below in Table[2].

Shakespeare erent. This

After running the test between the average word length distribution in 37 Shakespeare works and 5 Marlowe works, we got a P-value of 0.256. We decided

that

we

should

go

with

the

standard

0.05

cuto

so

this

clearly

shows

that

the

distribution

of

## Marlowe’s

word length usage Shakespeares.

is

not

significantly

di

erent

from

3.3

## Proportion of Unique Words

The vocabulary is used in written works often varies from author to author. We decided to look at ratio of unique words in a work to the total number of words used. We averaged the ratios for each of 5 works for Marlowe and got an average ratio of 0.2073 with a variance of 0.0005. The extremely low variance tells us that Marlowe was quite consistent in how many unique words he used in his works relative to the to- tal number of words. We did the same thing with a set of Shakespeare’s works. It is important to note here that we had 37 works for Shakespeare, which is significantly more than our corups for Marlowe. The average ratio of unique words to total words for Shakespeare was 0.16 with a variance of 0.0002. Once again we see a very low variance which gives us hope in using our calculated ratio to distinguish between Shakespeare, the ’real’ Shakespeare, and other con- tempararies. We clearly see that Marlowe and Shake- speare have a significantly di erent writing style when it comes to vocabulary usage. These results are sum- marized in Table[3].

4

# Frances Bacon

The second Shakespeare candidate we examine is Francis Bacon. Francis Bacon was born on 1561 to Sir Nicholas Bacon who held the title of Lord Keeper of the Seal. Francis Bacon had a good education, at- tending Trinity College Cambridge at the age of 12. Francis Bacon was an ambassador, and then a mem- ber of the House of Commons. After the ascension of James VI he was knighted and moved to higher politi- cal positions. Francis Bacon remains a noted philoso- pher who took an interest in learning and scientific discovery. Francis Bacon died in 1626, making him

2