X hits on this document

54 views

0 shares

15 / 23

# SIMPLE SUMMARY STATISTICS

13

R> na_profits <- is.na(Forbes2000\$profits) R> table(na_profits)

na_profits

FALSE

TRUE

1995

5

# R> Forbes2000[na_profits,

+

c("name",

"sales",

"profits",

"assets")]

name sales profits assets

772 1085 1091

NA

42.94

NA

51.65

NA

10.59

AMP

5.40

HHG

5.68

NTL

3.50

1425

## US Airways Group

1909 Laidlaw International

5.50 4.48

NA

8.58

NA

3.98

where the function is.na returns a logical vector being TRUE when the corre- sponding element of the supplied vector is NA. A more comfortable approach is available when we want to remove all observations with at least one miss- ing value from a data.frame object. The function complete.cases takes a data.frame and returns a logical vector being TRUE when the corresponding observation does not contain any missing value:

R> table(complete.cases(Forbes2000))

FALSE

TRUE

5

1995

Subsetting data.frames driven by logical expressions may induce a lot of typing which can be avoided. The subset function takes a data.frame as first argument and a logical expression as second argument. For example, we can select a subset of the Forbes 2000 list consisting of all companies situated in the United Kingdom by

R> UKcomp <- subset(Forbes2000, country == "United Kingdom") R> dim(UKcomp)

[1] 137

8

i.e., 137 of the 2000 companies are from the UK. Note that it is not neces- sary to extract the variable country from the data.frame Forbes2000 when formulating the logical expression.

# 1.7 Simple Summary Statistics

Two functions are helpful for getting an overview about R objects: str and summary, where str is more detailed about data types and summary gives a collection of sensible summary statistics. For example, applying the summary method to the Forbes2000 data set,

R> summary(Forbes2000) results in the following output

rank Min. : 1.0 1st Qu.: 500.8 Median :1000.5

name Length:2000

Class

:character

Mode

:character

country United States :751 Japan :316 United Kingdom:137

Mean

:1000.5

Germany

: 65

 Document views 54 Page views 54 Page last viewed Thu Dec 08 15:22:06 UTC 2016 Pages 23 Paragraphs 684 Words 5996