The New York Times graphics department maintains a fascinating blog providing a behind-the-scenes look at the construction of graphics appearing in the paper and on the web site. Many of the posts deal with R, which the graphics department apparently uses a lot.
In this post they show you some graphs pertaining to political party dominance in state government, and this subsequent post shows the R code used to design the charts. The post also provides the raw data, so you can recreate the plots. What I really like about this post (and the blog in general) is that it doesn't just advertise the finished product (which is typically excellent), but it shows you the process by which the team arrived at the finished product.
Two of my favorite R-related posts are those depicting all passes made by each player on the US Women's Soccer team, and this one comparing the Facebook IPO to other IPOs (this post doesn't mention R, but I'm 99.9% sure that the graphs are produced in R --- axis variable names like test$date give it away).
Sunday, December 30, 2012
Wednesday, June 27, 2012
Computing the Normal Distribution
I teach options, so I am interested in how accurately different software packages compute the cumulative normal distribution function. Not being a statistician, my usual approach is to compare different packages and look for discrepancies. A few years ago I noticed a tiny discrepancy in option prices computed using Excel and OpenOffice. I became curious, and started checking the calculation of the area under the curve of the standard normal distribution from minus infinity to -5.5. I found agreement between Matlab and Gnumeric, a free, open-source spreadsheet. Everything else differed.
I am revising my derivatives text, so I checked again. Things appear to have changed for the better. At least there is convergence among the different programs.
Recent versions of Matlab and Gnumeric return the same values as several years ago. The current version of Octave matches Gnumeric, as does R. Excel appears to have caught up with Gnumeric (wait, shouldn't that have been the other way around??), and Libreoffice 3.5 comes much closer than it did.
It's hard to generalize other than to note that both commercial and open source software had hits and misses a few years ago. Excel has long been pilloried for sloppy statistical programming, so it's good to see that Microsoft may finally be getting its act together. The percentage differences below were all computed in Excel 2010.
Blogger's table support is non-existent, so you may not be able to read the table below. Here is a pdf version.
I am revising my derivatives text, so I checked again. Things appear to have changed for the better. At least there is convergence among the different programs.
Recent versions of Matlab and Gnumeric return the same values as several years ago. The current version of Octave matches Gnumeric, as does R. Excel appears to have caught up with Gnumeric (wait, shouldn't that have been the other way around??), and Libreoffice 3.5 comes much closer than it did.
It's hard to generalize other than to note that both commercial and open source software had hits and misses a few years ago. Excel has long been pilloried for sloppy statistical programming, so it's good to see that Microsoft may finally be getting its act together. The percentage differences below were all computed in Excel 2010.
Blogger's table support is non-existent, so you may not be able to read the table below. Here is a pdf version.
| Software | Function | Value | % diff from Gnumeric |
|---|---|---|---|
| Gnumeric 1.6.3 | Normsdist(-5.5) | 1.89895624658877E-08 | 0.0000E+00 |
| Matlab 2006b | Normcdf(-5.5) | 1.89895624658877E-08 | 0.0000E+00 |
| Octave 3.0.1 | Normcdf(-5.5) | 1.89895624780334E-08 | 6.3960E-10 |
| Excel 2007 | Normsdist(-5.5) | 1.89895624780334E-08 | 6.3960E-10 |
| OO 2.4.1/3.0 | Normsdist(-5.5) | 1.89901047664698E-08 | 2.8558E-05 |
| Excel 2010 | Normsdist(-5.5) | 1.89895624658877E-08 | -5.2272E-16 |
| R 2.15.1 | pnorm(-5.5) | 1.89895624658877E-08 | 0.0000E+00 |
| Octave 3.6.1 | Normcdf(-5.5) | 1.89895624658877E-08 | 0.0000E+00 |
| Libreoffice 3.5.4 | Normsdist(-5.5) | 1.89895624658900E-08 | 1.2110E-13 |
Subscribe to:
Comments (Atom)