Chapter 4 Loading Data
This chapter introduces two functions that can be used to load data from csv- and Excel-files.
Experienced readers will find more information about data imports here:
csv-Files
Load data from comma separated files
# read data from current folder
df <- read.csv("datafile.csv")
# read data from a specific folder
df <- read.csv("C:/Desktop/datafile.csv")
# read data from a file in the internet
df <- read.csv2("https://github.com/hslu-ige-laes/edar/raw/master/sampleData/flatElectricity.csv")
# add arguments on how to parse the content
df <- read.csv("datafile.csv",
header=FALSE,
stringsAsFactors=FALSE,
sep =",",
na.strings = c("", "NA"))
Attention: By default, strings in the data are treated as factors, so add
stringsAsFactors=FALSE
Set your cursor to the function name
read.csv()
and press F1. Then you get in R Studio bottom right in the tabHelp
information about other function arguments you can use.
Note that the function read.csv() has the default
sep =","
and read.csv2() has the defaultsep =";"
Excel-Files
Load data from *.xlsx
Excel files:
# Only need to install once
install.packages("xlsx")
library(xslx)
df <- read.xlsx("datafile.xlsx", 1)
df <- read.xlsx("datafile.xlsx", sheetIndex=2)
df <- read.xlsx("datafile.xlsx", sheetName="Revenues")
# show the first lines of the so called "data frame"
head(df)
For reading older Excel files in the .xls format, the gdata package has the function read.xls:
# Only need to install once
install.packages("gdata")
library(gdata)
df <- read.xls("datafile.xls") # Read first sheet
df <- read.xls("datafile.xls", sheet=2) # Read second sheet
Both the xlsx and gdata packages require other software to be installed on your computer. For xlsx, you need to install Java on your machine. For gdata, you need Perl, which comes as standard on Linux and Mac OS X, but not Windows. On Windows, you’ll need ActiveState Perl. The Community Edition can be obtained for free.