5.1 Add Metadata for later filtering
Firstly we have to load a dataset into a dataframe:
# load data set
df <- read.csv("https://github.com/hslu-ige-laes/edar/raw/master/sampleData/centralOutsideTemp.csv",
stringsAsFactors=FALSE,
sep =";")
5.1.1 Year, Month, Day, Day of Week
To e.g. group, filter and aggregate data we need eventually the date splitted up in day, month and year:
library(dplyr)
library(lubridate)
%>% %>% %>% %>% df$year <- as.Date(cut(df$time, breaks = "year"))
df$month <- as.Date(cut(df$time, breaks = "month"))
df$day <- as.Date(cut(df$time, breaks = "day"))
df$weekday <- wday(df$time,
label = TRUE,
locale = "English",
abbr = TRUE,
week_start = getOption("lubridate.week.start", 1))
This code first parses the timestamp with a specific timezone. Then three columns are added.
Please note that the month also contains the year and a day. This is useful for a later step where you can group the series afterwards.
## time centralOutsideTemp
## 1 2018-03-21 11:00:00 5.2
## 2 2018-03-21 12:00:00 6.7
## time centralOutsideTemp
## 21864 2020-09-17 10:00:00 26.65
## 21865 2020-09-17 11:00:00 28.10
5.1.2 Hour, Minute, Second
5.1.3 Season of Year
For some analyses it is useful to color single points of a scatterplot according to the season. For this we need to have the season in a separate column:
## [1] "Spring"
If you want to change the language, you can give the function dedicated names for the season:
## [1] "Frühling"
To apply this function to a whole dataframe we can use the dplyr mutate function. The code below creates a new column named “season”:
## time centralOutsideTemp season
## 1 2018-03-21 11:00:00 5.2 Spring
## time centralOutsideTemp season
## 21865 2020-09-17 11:00:00 28.1 Summer