X hits on this document





11 / 23

DATA IMPORT AND EXPORT R> median(Forbes2000[,"sales"])


[1] 4.365

R> mean(Forbes2000[,"sales"])

[1] 9.69701

R> range(Forbes2000[,"sales"])


0.01 256.33

The summary method can be applied to a numeric vector to give a set of useful summary statistics namely the minimum, maximum, mean, median and the 25% and 75% quartiles; for example

Min. 1st Qu.





R> summary(Forbes2000[,"sales"])

Mean 3rd Qu.



9.548 256.300

1.5 Data Import and Export

In the previous section, the data from the Forbes 2000 list of the world’s largest companies were loaded into R from the HSAUR package but we will now ex- plore practically more relevant ways to import data into the R system. The most frequent data formats the data analyst is confronted with are comma sep- arated files, Excel spreadsheets, files in SPSS format and a variety of SQL data base engines. Querying data bases is a non-trivial task and requires additional knowledge about querying languages and we therefore refer to the ‘R Data Im- port/Export’ manual – see Section 1.3. We assume that a comma separated file containing the Forbes 2000 list is available as Forbes2000.csv (such a file is part of the HSAUR source package in directory HSAUR/inst/rawdata). When the fields are separated by commas and each row begins with a name (a text format typically created by Excel), we can read in the data as follows using the read.table function

R> csvForbes2000 <- read.table("Forbes2000.csv",


h e a d e r = T R U E , s e p = " , " , r o w . n a m e s = 1 )

The argument header = TRUE indicates that the entries in the first line of the text file "Forbes2000.csv" should be interpreted as variable names. Columns are separated by a comma (sep = ","), users of continental versions of Excel should take care of the character symbol coding for decimal points (by default dec = "."). Finally, the first column should be interpreted as row names but not as a variable (row.names = 1). Alternatively, the function read.csv can be used to read comma separated files. The function read.table by default guesses the class of each variable from the specified file. In our case, character variables are stored as factors

R> class(csvForbes2000[,"name"])

[1] "factor"

which is only suboptimal since the names of the companies are unique. How- ever, we can supply the types for each variable to the colClasses argument

Document info
Document views26
Page views26
Page last viewedTue Oct 25 12:37:39 UTC 2016