14

# AN INTRODUCTION TO R

Banking

: 313

Diversified financials: 158

Insurance

:

112

Utilities

:

110

Materials

:

97

3rd Qu.:1500.2

Max.

:2000.0

category

France Canada (Other) sales

Min. :

0.010

1st Qu.:

2.018

Median :

4.365

Mean :

9.697

3rd Qu.:

9.547

: 63 : 56 :612

Min. :-25.8300

Min. :

0.270

1st Qu.:

0.0800

1st Qu.:

4.025

Median :

0.2000

Median :

9.345

Mean :

0.3811

Mean :

34.042

3rd Qu.:

0.4400

3rd Qu.:

22.793

Max.

Oil & gas operations

: 90

(Other)

:1120

profits

assets

:256.330

marketvalue

Min. :

0.02

1st Qu.:

2.72

Median :

5.15

## Mean

: 11.88

3rd

## Qu.:

10.60

Max. NA's

: 20.9600 :5

Max.

:1264.030

Max.

:328.54

From this output we can immediately see that most of the companies are situated in the US and that most of the companies are working in the banking sector as well as that negative profits, or losses, up to 26 billion US dollars occur.

Internally, summary is a so-called generic function with methods for a multi- tude of classes, i.e., summary can be applied to objects of different classes and will report sensible results. Here, we supply a data.frame object to summary where it is natural to apply summary to each of the variables in this data.frame. Because a data.frame is a list with each variable being an element of that list, the same effect can be achieved by

# R> lapply(Forbes2000, summary)

The members of the apply family help to solve recurring tasks for each element of a data.frame, matrix, list or for each level of a factor. It might be interesting to compare the profits in each of the 27 categories. To do so, we first compute the median profit for each category from

# R> mprofits <- tapply(Forbes2000$profits,

+

Forbes2000$category, median, na.rm = TRUE)

a command that should be read as follows. For each level of the factor cat- egory, determine the corresponding elements of the numeric vector profits and supply them to the median function with additional argument na.rm = TRUE. The latter one is necessary because profits contains missing values which would lead to a non-sensible result of the median function

# R> median(Forbes2000$profits)

[1] NA

The three categories with highest median profit are computed from the vector of sorted median profits

# R> rev(sort(mprofits))[1:3]

Oil & gas operations 0.35 Household & personal products 0.31

Drugs & biotechnology 0.35