Wednesday, January 16, 2013

R and the year 2038 problem

I am teaching an R workshop, which has met twice. There are things I either don't get to in class, or fumble when discussing them. I'm going to occasionally discuss a few of these issues here for the benefit of students, for me (this will serve as a reference in the future), and for anyone else who may be interested.

Dates in base R can be represented using the Date class, POSIXct, or POSIXlt. We can examine date handling by considering the year 2038 problem. The year 2038 problem arises as follows: if you use a signed 32-bit integer to represent the number of seconds since January 1, 1970 (as Unix systems traditionally have), you exhaust the capacity of that integer on January 19, 2038. Using R to verify this requires first that we know what it means to use a "signed 32-bit integer."

Background: Suppose you want to represent an integer as a "signed 3-bit integer". You can do this by representing integers in binary, with the leftmost integer telling you whether the integer is positive or negative. So we have that decimal 0 is 000, decimal 1 is 001, decimal 2 is 010, and decimal 3 is 011. What happens when we go to the next value? Binary 100 is a negative number because the leftmost number is a 1. Thus, when we have a 3-bit signed integer, the largest positive value we can represent is 3 = 2^2 - 1. In general, for an n-bit representation, the largest value is 2^(n-1) - 1. With a 32-bit signed integer, the largest (positive) value you can represent is 2^31 - 1.

R's POSIXlt class provides a general purpose function for displaying and manipulating dates. The Unix epoch (0 seconds) occurred on January 1, 1970. When will the 32-bit time integer be exhausted? We can use POSIXlt to find out:

 > as.POSIXlt(2^31 - 1, origin = '1970-1-1', tz="GMT")  
 [1] "2038-01-19 03:14:07 GMT"  

Thus, unpatched 32-bit UNIX systems will come to a screeching and erratic halt on January 19, 2038, a few seconds after 3:14am, GMT. Note that as.POSIXlt assumes that the number you supply is in seconds, and performs a reasonable (and correct) calculation. The function will try to guess at what you intend.

The POSIXlt calculations in R continue to work for larger integers. You may wish to amuse yourself by discovering the largest integer for which as.POSIXlt will continue to provide a valid date. (Hint: it's between 2^45 and 2^50, and it is millions of years in the future.)

No comments: