X hits on this document

# Brian S. Everitt and Torsten Hothorn - page 14 / 23

50 views

0 shares

14 / 23

12

AN INTRODUCTION TO R

17.85

1264.03

15.59

626.93

6.46

647.66

## 3 American Intl Group

76.66

assets

1 2

name

sales profits

Citigroup 94.71 General Electric 134.19

extracts the variables name, sales, profits and assets for the three largest companies. Alternatively, a single variable can be extracted from a data.frame by

R> companies <- Forbes2000\$name which is equivalent to the previously shown statement R> companies <- Forbes2000[,"name"]

We might be interested in extracting the largest companies with respect to an alternative ordering. The three top selling companies can be computed along the following lines. First, we need to compute the ordering of the com- panies’ sales

# R> order_sales <- order(Forbes2000\$sales)

which returns the indices of the ordered elements of the numeric vector sales. Consequently the three companies with the lowest sales are

# R> companies[order_sales[1:3]]

[1] [3]

" Custodia Holding "Minara Resources" "

"Central European Media"

# The indices of the three top sellers are the elements 1998, 1999 and 2000 of the integer vector order_sales

R > F o r b e s 2 0 0 0 [ o r d e r _ s a l e s [ c ( 2 0 0 0 , 1 9 9 9 , 1 9 9 8 ) ] ,

+

c("name",

"sales",

"profits",

"assets")]

name

sales profits assets

10 Wal-Mart Stores 256.33

9.05 104.91

BP 232.57

10.27 177.57

ExxonMobil 222.88

20.96 166.99

5 4

Another way of selecting vector elements is the use of a logical vector being TRUE when the corresponding element is to be selected and FALSE otherwise. The companies with assets of more than 1000 billion US dollars are

# R> Forbes2000[Forbes2000\$assets > 1000,

+

c("name",

"sales",

"profits",

"assets")]

name sales profits

assets

Citigroup 94.71

17.85 1264.03

Fannie Mae 53.13

6.48 1019.17

1 9

## 403 Mizuho Financial 24.40

• -

20.11 1115.90

where the expression Forbes2000\$assets > 1000 indicates a logical vector of length 2000 with

R> table(Forbes2000\$assets > 1000)

FALSE

TRUE

1997

3

elements being either FALSE or TRUE. In fact, for some of the companies the measurement of the profits variable are missing. In R, missing values are treated by a special symbol, NA, indicating that this measurement is not avail- able. The observations with profit information missing can be obtained via

 Document views 50 Page views 50 Page last viewed Mon Dec 05 18:42:43 UTC 2016 Pages 23 Paragraphs 684 Words 5996