Course 2 - R Programming - Week 4 - Notes

Greg Foletta

2019-09-26

str() Function

Compactly display the internal structure of an R object.

## List of 1
##  $ a:List of 1
##   ..$ b:List of 5
##   .. ..$ matrix          : num [1:2, 1:2] -0.438 -0.669 0.488 0.592
##   .. ..$ numeric_vector  : int [1:10] 1 2 3 4 5 6 7 8 9 10
##   .. ..$ character_vector: chr [1:26] "A" "B" "C" "D" ...
##   .. ..$ data_frame      :Classes 'tbl_df', 'tbl' and 'data.frame':  10 obs. of  2 variables:
##   .. .. ..$ data_a: int [1:10] 1 2 3 4 5 6 7 8 9 10
##   .. .. ..$ data_b: int [1:10] 11 12 13 14 15 16 17 18 19 20
##   .. ..$ list            :List of 1
##   .. .. ..$ : int [1:10] 1 2 3 4 5 6 7 8 9 10
## function (data = NA, nrow = 1, ncol = 1, byrow = FALSE, dimnames = NULL)
## 'data.frame':    153 obs. of  6 variables:
##  $ Ozone  : int  41 36 12 18 NA 28 23 19 8 NA ...
##  $ Solar.R: int  190 118 149 313 NA NA 299 99 19 194 ...
##  $ Wind   : num  7.4 8 12.6 11.5 14.3 14.9 8.6 13.8 20.1 8.6 ...
##  $ Temp   : int  67 72 74 62 56 66 65 59 61 69 ...
##  $ Month  : int  5 5 5 5 5 5 5 5 5 5 ...
##  $ Day    : int  1 2 3 4 5 6 7 8 9 10 ...

Simulation

Generating Random Numbers

Functions for probability distributions:

  • rnorm() - generate random normal variates.
  • dnorm() - evaluate the normal probability density at a point.
  • pnorm() - evaluate the cumulative distribution function for a normal distributionl
  • rpois() - generate random Poisson variates with a given rate.

For each distribution there are usually four functions with different prefixes:

  • ‘r’ for random numbers.
  • ‘d’ for density.
  • ‘p’ for cumulative distribution.
  • ‘q’ for quantile.
## [1] 109.98245  95.49942  85.53378  97.56938
## [1] 0.5
##  [1] 1 2 3 0 0 3 0 2 0 1
##  [1] 2 6 4 8 3 3 6 5 5 7

Simulating a Linear Model

Suppose we want to simulate

\[ y = \beta_0+ \beta_1x_1 + \epsilon \]

where:

$$

(0,2^2) x (0,1^2), _0 = 0.5 _1 = 2

$$

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -5.4816 -1.1493  0.7582  0.6422  2.3404  6.1534

What if \(x\) is binary?

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -3.3287 -0.1764  1.1098  1.4248  3.1967  6.8452

Simulating a Poisson Model

$$

Y Poisson() \ log() = _0 + _1x \ _0 = 0.5 _1 = 0.3.

$$

Random Sampling

The sample() functon draws randomly from a specified set of scalar objects.

## [1] 9 4 7 1
## [1] 2 7 3 6
## [1] "r" "s" "a" "u" "w"
##  [1] 10  6  9  2  1  5  8  4  3  7
## [1]  5  5  2 10  9  1  4  3

R Profiler

A very basic tool is to use system.time(). Returns an object of class proc.time which has user time, system (kernel) time, and elapsed (wall clock) time.

##    user  system elapsed 
##   7.764   0.072   7.837
  • User time is less than than elapsed if the process spends time off CPU.
  • User time is greater than elapsed if parallel processing has occurred (multi-threading).swi

RProf

The Rprof() function starts the profiler in R.

The summaryRprof() function summarises the output for Rprof().

The profiler keeps track of the call stack at regular intervals - default is 0.02 seconds.

## 
## Attaching package: 'magrittr'
## The following object is masked from 'package:purrr':
## 
##     set_names
## The following object is masked from 'package:tidyr':
## 
##     extract
self.time self.pct total.time total.pct
“solve.default” 7.30 96.31 7.30 96.31
“rnorm” 0.26 3.43 0.26 3.43
“matrix” 0.02 0.26 0.28 3.69

Note: C or Fortran code is not profiled.

“By Total” and “By Self”

By total is how much time was spent in the function including child calls. By self is how much time is spent in that function only.