Preface
My programming background is mostly in interpreted languages, so much of what follows will be obvious to experienced C programmers. If you notice any errors please let me know. I share this in the hopes it may be helpful to others taking the leap into C from R.
Getting on CRAN
Over the past couple of years I started including C code in my R packages. At first just by adapting existing code for use in R, and more recently by writing substantial portions of my packages in C.
I know enough about C to realize I need to be more careful with code I write in C than with code I write in R. For my newest package I used covr
to ensure 100% coverage of the code, ran my tests on R-devel and several earlier versions of R, on Linux, OS X, Windows (with Uwe Ligges winbuilder), and even Solaris (with Gábor Csárdi’s r-hub). I used Tomas Kalibera’s rchk to check for protection errors. Finally I re-read Kevin Ushey’s great 2015 Post about using R and valgrind
, and ran my tests locally on OSX with valgrind
.
I convinced myself this would be good enough and submitted to R. A day goes by, and another, … and then my package just shows up on CRAN. Yes! Except shortly afterwards I find in my inbox the following e-mail from an R Core member know not to suffer fools gladly:
See fansi (my package) check results. valgrind is reporting the use of uninitialized strings.
Please correct ASAP and before April 20 to safely retain the package on CRAN.
Oh sh##. But I checked, I swear! I tested again with the same tarball, and again, I could not reproduce the error. Artist rendition of Brodie shortly after failing to reproduce bug:
Docker to The Rescue
Thankfully not long ago I had run across Winston Chang’s r-debug docker container (for a great introduction to using docker with R, see Jim Hester’s great post on docker containers).
So I sacrificed a bleating lamb to the bug-reproducibility gods and fired up r-debug
to run my tests:
docker run --rm -ti -v $(pwd):/mydir wch1/r-debug
RDvalgrind -e "install.packages('/mydir/fansi_0.2.1.tar.gz')"
RDvalgrind -d valgrind # and run tests
and got a bunch of errors, unlike with my OSX version of valgrind
. So a step in the right direction. But here is an example of the type of error I got:
==518== Conditional jump or move depends on uninitialised value(s)
==518== at 0x404DD28: ???
==518== by 0x17F0601C: ???
==518== by 0x17F0601C: ???
==518== by 0x17F0601D: ???
==518== by 0x1FFEFD151F: ???
At least valgrind
is as confused as I am… On CRAN, the errors looked like:
==8932== Conditional jump or move depends on uninitialised value(s)
==8932== at 0x403EBFB: ???
==8932== by 0x10D25C25: ???
==8932== by 0x10D25C25: ???
==8932== by 0x10D25DC7: ???
==8932== by 0x1FFEFE6A5F: ???
==8932== Uninitialised value was created by a heap allocation
==8932== at 0x4C2DB6B: malloc (m_replacemalloc/vg_replace_malloc.c:299)
==8932== by 0x511B31: Rf_allocVector3 (svn/R-devel/src/main/memory.c:2712)
==8932== by 0x4B73D0: Rf_mkCharLenCE (svn/R-devel/src/main/envir.c:3913)
==8932== by 0x1D6B5BE8: FANSI_strip (packages/tests-vg/fansi/src/strip.c:170)
... snip ...
==8932== by 0x4DA52F: Rf_eval (svn/R-devel/src/main/eval.c:624)
==8932==
A lot more information, and most importantly, highlights the particular line in my code potentially responsible for the problems:
==8932== by 0x1D6B5BE8: FANSI_strip (packages/tests-vg/fansi/src/strip.c:170)
At a loss I resorted to re-reading the WRE section on valgrind, and lo and behold:
valgrind is good at spotting the use of uninitialized values: use option –track-origins=yes to show where these originated from.
So I run my tests under:
RDvalgrind -d "valgrind --track-origins=yes"
And boom: I get the exact same error message as CRAN! Perhaps I lead a cloistered life and am easily excited, but there are few things I find as exhilarating as reproducing a hard to reproduce bug. Reproducing the bug in a controlled environment is the climax of the debugging story. The fix barely rates a mention in the epilogue.
What I Learned About Reproducing valgrind
Errors
- Use an R version that was built with level 2
valgrind
instrumentation. This is whatRDvalgrind
inr-debug
has, and this is also part of the reason I could not reproduce the errors that showed up on CRAN. More details to follow. - Compile your package with the
-O0
flag (r-debug
sets this as a default option) to minimize false positive and to get the most useful information.-g
is also a good setting, but R sets that by default. - Use
RDvalgrind -d "valgrind --track-origins=yes"
as CRAN seems to do the same, and this provides useful context. - Use an R-devel version as close as possible to what CRAN is using.
- You can never re-read Writing R Extensions too many times.
Additionally, as part of the debugging process I ended up having to build R myself for more fine grained control over what version of R-devel I tested against. I had never done this before, and It turns out that Tomas Kalibera’s excellent rchck
vagrant image is a fantastic platform to build R in.
All I had to do was vagrant ssh
into his image and:
sudo apt-get install valgrind
svn checkout https://svn.r-project.org/R/trunk
cd ~/trunk
./config --with-valgrind-instrumentation=2
make
And then, after setting the “-O0” package compilation setting in ~/.R/Makevars
:
./bin/R -d "valgrind --track-origins=yes"
More on --with-valgrind-instrumentation=2
--with-valgrind-instrumentation=2
is the configure option that sets up R to build with valgrind
level 2 instrumentation. One thing this does is it allows valgrind
to detect memory errors in portions of memory that are wholly owned by R. For example, R maintains an internal heap for R_alloc
allocations of 128 bytes or fewer. Normally valgrind
would not detect memory errors occurring within this heap. Here is an example from the CRAN errors:
==518== Invalid read of size 1
==518== at 0x4EE519C: substr (character.c:286)
==518== by 0x4EE563D: do_substr (character.c:342)
==518== by 0x4F83444: bcEval (eval.c:6771)
==518== by 0x4F70257: Rf_eval (eval.c:624)
... snip ...
==518== by 0x4F70257: Rf_eval (eval.c:624)
==518== Address 0xc5cb7ef is 3,087 bytes inside a block of size 7,960 alloc'd
==518== at 0x4C2FB0F: malloc (in vgpreload_memcheck-amd64-linux.so)
==518== by 0x4FBF2B9: GetNewPage (memory.c:888)
==518== by 0x4FCD13C: Rf_allocVector3 (memory.c:2691)
==518== by 0x4FB2D9C: Rf_allocVector (Rinlinedfuns.h:577)
... snip ...
==518== by 0x4F4B1AA: duplicate1 (duplicate.c:312)
==518==
This is an error that you would not see without the instrumentation. The give-away is that the read is inside an allocated block. Usually valgrind
errors involve read/writes before, after, etc., an allocated block. The GetNewPage
call in the trace highlights this is related to R’s internal memory pages.
Thankfully I could tell from the trace that nothing in this error involved compiled code from my package…
OMG I think I found a bug in R.
It turns out that I triggered a bug in base::substr
with corner case UTF-8 tests that were part of my package.
Epilogue
None of the errors showing up on CRAN were actually caused by my package. Most of these errors except for the R bug went away with different R-devel builds. Details for the bored.
However, as a result of my new found understanding of these memory errors I went back to look at an old valgrind
error in another package of mine. I had ignored it because I did not understand it, I could not reproduce it, nothing was obviously blowing up, and no one was yelling at me about it. Sure enough, the error was real, and was able to fix before anyone else ran into it.
Acknowledgments
Figuring all this out would have been impossible without all the work others put in and freely shared. Thank you to:
- R core for writing WRE in as much detail as they have, and CRAN maintainers for running a tight ship.
- Winston Chang for the r-debug docker container.
- Gábor Csárdi for helping confirm my interpretation of the
valgrind
error messages. - Kevin Ushey for the blog post about valgrind in R.
- Tomas Kalibera for his rchk image.