R was built by statisticians for statistical computing. It is the dominant language in academia and data science for statistical analysis, hypothesis testi
R has three subsetting operators: [ (returns same type, multiple elements), [[ (returns single element, drops container), and $ (named element, same as [[name]]).
In R, everything is a vector. A single number is a vector of length 1. Operations apply element-wise: c(1,2,3) * 2 returns c(2,4,6). This vectorized design means most operations require no explicit loops — they work on entire vectors at once, which is both concise and fast (the underlying C code handles the loop). The <- operator assigns; = works too but <- is idiomatic R.
Atomic types: numeric (double), integer (1L), character ('hello'), logical (TRUE/FALSE), complex, and raw. Data structures: vector (1D, same type), matrix (2D, same type), list (1D, mixed types), data frame (2D, columns can be different types), and factor (categorical variable with levels). The is.*() family tests types; as.*() coerces. NA represents missing values; NULL represents the absence of an object.
R has three subsetting operators: [ (returns same type, multiple elements), [[ (returns single element, drops container), and $ (named element, same as [[name]]). Subsetting with a logical vector extracts matching elements: x[x > 5] returns values greater than 5. Negative indexing excludes: x[-1] drops the first element. Data frame columns: df$col, df[,'col'], df[[1]].
# Vectors are the foundation
x <- c(1, 4, 9, 16, 25)
cat('Square roots:', sqrt(x), '\n')
# Vectorized operations (no loop needed)
temps_c <- c(0, 20, 37, 100)
temps_f <- temps_c * 9/5 + 32
cat('Fahrenheit:', temps_f, '\n') # 32 68 98.6 212
# Data frame basics
df <- data.frame(
name = c('Alice', 'Bob', 'Carol', 'Dave'),
score = c(92, 78, 95, 85),
grade = c('A', 'C', 'A', 'B'),
stringsAsFactors = FALSE
)
# Subsetting
df[df$score >= 90, ] # rows where score >= 90
df[df$grade == 'A', 'name'] # names of A students
subset(df, score > 80, select = c(name, score))
# Summary statistics
cat('Mean:', mean(df$score), '\n')
cat('SD: ', sd(df$score), '\n')
summary(df)
Before moving on, confirm understanding of these key concepts: