Wednesday, January 16, 2013

R and explicit loops

How much difference does it make if you use explicit or implicit loops in R? By "implicit loop", I mean using a sequence within a function. Here is a code snippet illustrating three ways to draw 200,000 normal random variables. Two are explicit loops within functions, the third is an implicit loop, simply supplying an argument to rnorm. The second function states the size of the vector when creating it.
 randnorm <- function(n) {  
  y <- vector()  
  for (i in 1:n) y[i] <- rnorm(1)  
  return(y)  
 }  
 randnorm2 <- function(n) {  
  y <- vector(length=n, mode='numeric')  
  for (i in 1:n) y[i] <- rnorm(1)  
  return(y)  
 }  
 n <- 2e05  
 system.time(y <- randnorm(n))  
 system.time(y <- randnorm2(n))  
 system.time(y <- rnorm(n))  
Here are the results:
 > n <- 2e05  
 > system.time(y <- randnorm(n))  
   user system elapsed   
  28.170  0.352 28.618   
 > system.time(y <- randnorm2(n))  
   user system elapsed   
  0.856  0.000  0.858   
 > system.time(y <- rnorm(n))  
   user system elapsed   
  0.032  0.000  0.031      
What do we conclude?

  • The implicit loop (rnorm(n)) is much faster than the alternatives.
  • If you do have to allocate a vector, it is better to create the vector in advance, as in randnorm2, specifying the length and mode
  • The unsized vector in randnorm creates a very slow function. Moreover, in tests it appears to become exponentially slower as the size of the vector grows. A function written like randnorm should be a last resort.

The moral is, if possible, to to avoid constructs like that in randnorm.

No comments: