str() Function
Compactly display the internal structure of an R object.
nested_lists <- list(
a = list(
b = list(
matrix = matrix(rnorm(4), ncol = 2),
numeric_vector = 1:10,
character_vector = LETTERS,
data_frame = tibble(data_a = 1:10, data_b = 11:20),
list = list(1:10)
)
)
)
# On an object
str(nested_lists)
## List of 1
## $ a:List of 1
## ..$ b:List of 5
## .. ..$ matrix : num [1:2, 1:2] -0.438 -0.669 0.488 0.592
## .. ..$ numeric_vector : int [1:10] 1 2 3 4 5 6 7 8 9 10
## .. ..$ character_vector: chr [1:26] "A" "B" "C" "D" ...
## .. ..$ data_frame :Classes 'tbl_df', 'tbl' and 'data.frame': 10 obs. of 2 variables:
## .. .. ..$ data_a: int [1:10] 1 2 3 4 5 6 7 8 9 10
## .. .. ..$ data_b: int [1:10] 11 12 13 14 15 16 17 18 19 20
## .. ..$ list :List of 1
## .. .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
## function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
## 'data.frame': 153 obs. of 6 variables:
## $ Ozone : int 41 36 12 18 NA 28 23 19 8 NA ...
## $ Solar.R: int 190 118 149 313 NA NA 299 99 19 194 ...
## $ Wind : num 7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
## $ Temp : int 67 72 74 62 56 66 65 59 61 69 ...
## $ Month : int 5 5 5 5 5 5 5 5 5 5 ...
## $ Day : int 1 2 3 4 5 6 7 8 9 10 ...
Simulation
Generating Random Numbers
Functions for probability distributions:
rnorm()
- generate random normal variates.dnorm()
- evaluate the normal probability density at a point.pnorm()
- evaluate the cumulative distribution function for a normal distributionlrpois()
- generate random Poisson variates with a given rate.
For each distribution there are usually four functions with different prefixes:
- ‘r’ for random numbers.
- ‘d’ for density.
- ‘p’ for cumulative distribution.
- ‘q’ for quantile.
## [1] 109.98245 95.49942 85.53378 97.56938
## [1] 0.5
## [1] 1 2 3 0 0 3 0 2 0 1
## [1] 2 6 4 8 3 3 6 5 5 7
Simulating a Linear Model
Suppose we want to simulate
\[ y = \beta_0+ \beta_1x_1 + \epsilon \]
where:
$$
(0,2^2) x (0,1^2), _0 = 0.5 _1 = 2
$$
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -5.4816 -1.1493 0.7582 0.6422 2.3404 6.1534
What if \(x\) is binary?
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3.3287 -0.1764 1.1098 1.4248 3.1967 6.8452
Simulating a Poisson Model
$$
Y Poisson() \ log() = _0 + _1x \ _0 = 0.5 _1 = 0.3.
$$
set.seed(1)
x <- rnorm(100)
log_mu <- 0.5 + 0.3 * x
y <- rpois(100, exp(log_mu))
tibble(x = x, y = y) %>%
ggplot(aes(x,y)) +
geom_point()
Random Sampling
The sample()
functon draws randomly from a specified set of scalar objects.
## [1] 9 4 7 1
## [1] 2 7 3 6
## [1] "r" "s" "a" "u" "w"
## [1] 10 6 9 2 1 5 8 4 3 7
## [1] 5 5 2 10 9 1 4 3
R Profiler
A very basic tool is to use system.time()
. Returns an object of class proc.time
which has user time, system (kernel) time, and elapsed (wall clock) time.
## user system elapsed
## 7.764 0.072 7.837
- User time is less than than elapsed if the process spends time off CPU.
- User time is greater than elapsed if parallel processing has occurred (multi-threading).swi
RProf
The Rprof()
function starts the profiler in R.
The summaryRprof()
function summarises the output for Rprof()
.
The profiler keeps track of the call stack at regular intervals - default is 0.02 seconds.
##
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
##
## set_names
## The following object is masked from 'package:tidyr':
##
## extract
x <- c(1:2000)
y <- rnorm(2000)
Rprof(tmp <- tempfile())
invisible(
solve( matrix(rnorm(2048 * 2048), ncol = 2048) )
)
Rprof()
summaryRprof(tmp) %>%
use_series(by.self) %>%
kable() %>%
kable_styling()
self.time | self.pct | total.time | total.pct | |
---|---|---|---|---|
“solve.default” | 7.30 | 96.31 | 7.30 | 96.31 |
“rnorm” | 0.26 | 3.43 | 0.26 | 3.43 |
“matrix” | 0.02 | 0.26 | 0.28 | 3.69 |
Note: C or Fortran code is not profiled.
“By Total” and “By Self”
By total is how much time was spent in the function including child calls. By self is how much time is spent in that function only.