R Tips and Links

Links     The Shiny package     Tips for working in R     R Programming Tips     Problems and Solutions     Useful Functions    


Helpful R Links

  1. 2014 MSP Orientation
  2. My R Class Notes    (Advanced)
  3. Official R Site Search     RSeek (Google-type search for R related material)     Inside R (documentation for base R and packages)
  4. Documentation (incl. Download)   (Hint: Try creating a bookmark to "C:\Program Files\R\rw2001\doc\html\rwin.html" in Windows; substitute your most recent version for "rw2001"; the link to "Search Engine and Keywords" is most helpful.)
  5. Packages     Crantastic Package Page     R Example Graph Library
  6. FAQ     R-help for asking questions   Bug reporting    
  7. R Inferno: problems and solutions
  8. R Studio: an integrated development environment for R
  9. R color chart (pdf)
  10. Intro to R
  11. R Language Definition
  12. Operator Precedence
  13. R Data Import/Export
  14. R Reference Guide (pdf, 450 pages, 12MB)
  15. Evaluating the design of the R language (pdf)
  16. Shalizi's guide to R Markdown
  17. Simple R Markdown figure labels
  18. cache_with_help.zip is an R markdown template that sets good cache settings, includes simple R markdown figure labels, and has markdown help at the bottom. To use it, unzip it in the same folder as "github_document". Then choose Template after File / New Markdown in RStudio.
  19. aRrgh: a newcomer's (angry) guide to R (gripes from the CS community)
  20. R inferno (traps and tips)
  21. Advanced R Programming
  22. Vectorization in R
  23. R Data Import/Export
  24. Big Data with R (and Python)
  25. A good proposed Programming Style Guide
  26. Areas of statistics: CRAN Task Views
  27. Lumley's R Fundamentals (pdf)
  28. R Reference Card
  29. Peng's Debugging in R (pdf)
  30. R Programming Resource Center
  31. Mathematical annotation in plots ("plotmath")
  32. R-News
  33. R wiki
  34. Spherula (R quick reference, scripts, book notes, etc.)
  35. OmegaHat: interfaces to other languages
  36. Package gdata has function read.xls() which can read Excel files. An alternative is the RODBC package.
  37. Exchanging data between R and MS Windows apps (Excel, etc)
  38. Rtools needed for building packages (choose "Download R for" and an operating system)
  39. Nabble R forum
  40. Kickstarting R
  41. Using R for psychological research (personality-project)
  42. York U R tips
  43. Books
  44. R Overview and Links
  45. Theresa Scott's tutorial
  46. O'Reilly free tutorial
  47. R for Cats tutorial
  48. Online courses in R from Statistics.com (expensive)
  49. Missing data blog page
  50. CUNY R Tutorial
  51. Zoonek R Tutorial
  52. Tips for Creating, Modifying, and Checking Data Frames
  53. Practical Regression and Anova using R (pdf)
  54. Non-Parametric Inference with R by Larry and Chad (pdf)
  55. Example of running repeated measures in R to match SPSS (etc)
  56. nlme() mixed models guide (pdf)

"Shiny": a package for creating interactive R applications with a web browser interface

  1. Try it: To try shiny, just click on one of the links given here. Note that these apps run on a remote server. Normally you will develop, and perhaps deploy, your shiny apps from within your R session. Also note that I have turned on the optional "showcase" mode so that you can see the code below each app.
  2. Official Homepage: The Shiny website
  3. Function Reference: Shiny functions
  4. Official Tutorial: The shiny tutorial
  5. Advanced Tips: Attali's Tips
  6. Advanced Videos: Developer Conference
  7. ShinyTex: shinyTex
  8. Getting started:
    1. Start R in any directory
    2. Install shiny using install.packages("shiny") if it has not previously been installed.
    3. Run library("shiny") once per R session (or place this command in the .First() function).
    4. Place files named ui.R and server.R in the directory (or in another directory, e.g., named "foo"). You can create these files from scratch, or you may want to start with the files linked here, which implement a simple histogram vs. boxplot app.
    5. From R in your working directory, run runApp() (or runApp("foo") if the ui.R and server.R files were placed in another directory).
    6. Interact with the app in the supplied test window. In this window you can click "Open in Browser" to open the app in your default browser. Certain functions, such as downloading a file using downloadButton() work only in a browser, and not in the test window.
    7. Make whatever additions / changes you need to either file to change the app so that it does what you want it to do (see the official tutorial and Inside R for details).
    8. Note that you can change the ui.R and/or server.R files while the app is running, save the changes, and click "Reload" in your browser to see the effects of the changes without quitting the app. (Check the R console for possible error messages.)
    9. To quit the app, use the "escape" key in the R console window or close the test window or browser window.
  9. Showcase mode: If you want your users to learn about how you made your app, you can turn on showcase mode. The simplest way to do this is to place a DESCRIPTION plain text file in the same directory as your ui.R and server.R files. Create your version by editing this example of a DESCRIPTION file. If you want additional text about the app to show along with your ui.R and server.R files, create a plain text file called Readme.md. Here is an example that demonstrates headers and spacing. Full formatting information for this file type is given at Wikipedia Markdown, but currently not all markdown features are implemented in Shiny.
  10. Five ways to deploy your shiny app:
    1. Distribute your ui.R and server.R files to users. They only need to put them in a directory, load shiny in R (library("shiny"), and run an R command like runApp() or runApp("myShinyDirectory").
    2. Put your ui.R and server.R files in a github gist. Then users only need to load shiny and run an R command like runGist("myFirstShinyApp")
    3. Put your ui.R and server.R files in zip file (as a subdirectory) and put the zip file on your website. Then users only need to load shiny and run an R command like runUrl("myFirstShinyApp.zip")
    4. Let R Studio host your app on their server. You register with "shinyapps", load the "shinyapps" package in R, and run deployApp() to upload your app to the RStudio servers. After some processing, the function tells you the URL that corresponds to your app. Then your users only need to enter the URL in their browser.
    5. Set up your own server (advanced). Your users only need to enter the URL in their browser
  11. In development: locator() equivalent
  12. Shiny apps that do not isolate text inputs and then use an "action button" to initiate any response to the input text will initiate the response roughly after every other letter typed. A good compromise would be to change text inputs so that they "invalidate" (initiate a response) only when Enter or Tab is pressed. Based on information from ZJ in the Google Groups Shiny discussion forum, I suggest this solution. (The main disadvantage is that it must be manually re-done whenever the Shiny package is updated.)

    The core of the problem is that "shiny.js" sets the "key-up" event to trigger a response to text input. So my suggestion is to remove that bit of Javascript code from the shiny library file. Actually, the file "shiny.js" is a human readable, version of the code, but the real code is "minimized" in "shiny.min.js", and that is what must be fixed.

    In addition to just searching for the file, here is a way to find it. In R, run myPaths = .libPaths() followed by print(myPaths) to see the one or several places that packages live on your computer. Then run list.files(myPaths[1]), etc. until you find the "shiny" package. Within that shiny folder, you will find "shiny.min.js" in the shared subfolder of the "www" subfolder.

    For safety sake, first copy the file to a backup name. Then use a plain text editor to open the file and change
    subscribe:function(a,b){$(a).on("keyup.textInputBinding input.textInputBinding",function(a){b(!0)}),$(a).on("change.textInputBinding",function(a){b(!1)})}
    
    to
    subscribe:function(a,b){$(a).on("change.textInputBinding",function(a){b(!1)})}
    
    Save the file and the behavior of textInput will now be fixed.

Tips for Working in R

  1. Use helpstart() to bring up help in a browser; the link to "Search Engine and Keywords" is most useful.
  2. When looking at complex expressions, decode them by working from the inside out. E.g., here is a decomposition of some code to make a density plot of the product to two normal random variables using a sample of size 20. (Note the optional use of "tmp" to keep the lines so that they all use the same random numbers.)
         plot(density(apply(matrix(rnorm(40),20), 1, prod)))
         tmp = rnorm(40)   # a vector of 40 standard normal variates
         tmp
         matrix(tmp,20)  # put into a matrix of 20 rows and 40/20=2 columns
         apply(matrix(tmp,20), 1, prod)  # the 20 products
         density(apply(matrix(tmp,20), 1, prod)) # the density estimate
         plot(density(apply(matrix(tmp,20), 1, prod))) # the plot
        
  3. Keep a text record of all working R commands needed to re-run your analysis. Ideally you should be able to source() the file and recreate your work, e.g. if your client finds an error in the data (which happens 98% of the time according to Seltman's Law of Data Analysis).
  4. Under Linux, "ESS" (Emacs Speaks S) is usually the most efficient way to work. Briefly, you start emacs, then use "Alt-X R" to start R from within emacs. The ESS menu (with keyboard shortcuts) allows you to automatically run code that you write, among other features. The home page is ESS .
  5. Write out TRUE and FALSE, because T and F can be redefined.
  6. When defining and redefining columns of a data.frame, make liberal use of summary() and table(..., exclude=NULL) to verify that you accomplished what you tried to accomplish.
  7. Important:Remember that table() ignores missing data; use table(..., exclude=NULL) to also see missing data.
  8. You can use the .First function to automatically load libraries that you frequently use, or to perform other startup tasks. E.g.
     .First = function() {library(nlme); options(locatorBell=FALSE)} 
    will load the nlme library every time R starts up in the current directory. It also turns off the annoying sound associated with the locator() function.
  9. Use this function to find large, unneeded objects that can be removed to free up space:
         sizes = function() {
           ob = objects(name=parent.frame())
           rslt = sapply(ob,function(x){object.size(get(x))})
           return(sort(rslt))
         }
         
  10. Contrast testing in R (using C() or contrasts()), ignores your scaling, so although the t-values and p-values are correct, the estimates and standard errors (and any confidence intervals you construct from them) are incorrect. To do this correctly, use fit.contrast() in package "gmodels". E.g.
         x = factor(rep(LETTERS[1:3], each=20)); y = rnorm(60)
         m1 = aov(y~x)
         library(gmodels)
         cont = rbind(AvsBC = c(1, -1/2, -1/2), BvsC = c(0, 1, -1))
         fit.contrast(m1, "x", cont, conf.int=0.95)
         
  11. If you are working on a public computer without write access to where most of R lives, you can still install packages to a private space (Windows example shown here, but it is similar on Linux). Make a directory you can write to, e.g., c:\\myPackages. In R, to install, e.g., package "mice" use
         install.packages("mice", "c:\\myPackages")
         
    Then each R session use
         library(mice, lib.loc="c:\\myPackages")
         
  12. A system for documenting data analysis projects:

    Here is an idea for making R code that stores comments and results in a separate, readable file. This is especially nice when you might need to re-source() your code due to changes in the data or analysis (i.e., essentially always). Optionally, you can run reportLatex() after all of your report() commands to create a .tex file that is formatted better and incorporates graphical output (see below).

    (An alternative is sweave. Unlike sweave, report does not require you to understand latex, and it has only a single command to learn.)

    The code and more documentation are at report.R.

    Put these two lines near the top of your code:

         if (!exists("report")) source("http://www.stat.cmu.edu/~hseltman/files/report.R")
         report("Start of my report on project X", new=TRUE, prefix="myProjectX")
         
    This creates a file named "myProjectXYYYY-MM-DD.txt" with the quoted string in the first argument as the text at the top of the file. You can include "\n" in the first argument to write multiple lines in one call. You can optionally add the argument useTime=TRUE to include the time of creation along with the date in the file name if you want to keep multiple versions from the same day.

    Note that the variable "reportFileName" is created in your global environment and you should not delete this variable, at least while you are working on any one report.

    Now anytime in your code, you can include code of the form

     report(x) 
    or
     report(x, ..., z)
    to cause the value of x (or all of the variables x through z) to go to both the screen and the report file. This constructs the report on-the-fly as you work through your analysis. (If multiple arguments are used with report() and they are all strings or single numbers, then they are pasted together without any spaces between them (i.e., using sep="")).

    Note that you can manually erase errors from the report file using a text editor.

    Note that with a little planning, you will be in the situation such that if you re-source() your whole .R file, e.g., after correcting an error in the data or analysis, you will end up with a brand new, complete, readable report of the entire analysis with no effort.

    Note that the screen width affects the output by controlling the usual R text wrapping, e.g., with table(). Normally, you will want to keep the screen width around 60-70 characters to make it easier to read the report.

    Note that whenever you run report(x, new=TRUE, ...), if the report file name matches an existing file, the old file is deleted.

    Here are some examples that demonstrate what you can do:

         report("\nDemographics")
         report(table(age, gender))
         report(paste("Number of visits =", nrow(dat)))
         report("\nSuccess by treatment")
         report(with(dat, table(success, treatment, exclude=NULL)))
         report("\nYears of education:")
         report(summary(demog$educ))
         report(paste("\nDroppping", sum(noVisits|oneVisit), 
                      "subjects with no CERAD's or only 1 visit"))
         report(expression(str(my.data.frame)))
         
    The last example uses "expression()" because the "str()" function breaks the usual R rules and uses "cat()" rather than returning its result as an object. The "stem()" function is another example.

    There three helper functions in report.R.

    1. matForm(x, cols=12) converts a vector (string, numeric or factor) into a string matrix with a specific number of columns (even if length(x)%%cols!=0), so that long vectors don't ruin the appearance of the report.
    2. total(tab, margins=1:2) adds totals to the result of table()
    3. pct(tab, margins=1:2) adds percents to the result of table()

    Note that pct() and total() can both be used on the same table, in either order. Each respects the results of the other to avoid the incorrect and/or confusing output that could result from, e.g., including data and their total when computing percents.

    Optionally, you can use reportLatex() (code and description in reportLatex.R) to convert your .txt file into a .tex (Latex) file. This can incorporate plots as follows: when you are going through your analysis use the report text "See ... in myPlotFile.pdf", e.g.,

         plot(rnorm(20, type="b", main="Random normals", xlab="time", ylab="x")
         fname = "rnorm.pdf"
         dev.copy(pdf, fname); dev.off()
         # Important: Be sure to put a blank between "in" and the end quote
         #            since  sep="" will be in effect.
         report("\nSee 20 Gaussians in ", fname)
         
    With or without these special graphics commands, when you run reportLatex() a .tex file is created with the same base name as your .txt report file. Note that you can manually edit the .tex file at this point if desired.

    You then process this .tex file with pdflatex myReportFile.tex in Linux (or however else you know to process Latex files on any operating system) to produce the .pdf report file.

    If you used the special graphics indicator text "See ... in someFileName.pdf", then the plots will be included in the report, and the caption of the figure will be the text between "See" and "in". Also the caption will include figure numbers starting at "Figure 1".

    If you prefer to use a different graphics file type than "pdf" (as long as it is compatible with whatever version on pdflatex or latex that you are using) just run the optional form, e.g., reportLatex(extension=".pdf") substituting your graphics extension for "pdf".

Tips for Programming in R

  1. End each function with return() or invisible() rather than using implicit returns. This conforms to standard programming practice in most other languages and make your program easier to read.

  2. Start each function with checks of the arguments. It takes a little extra time but will usually repay you (or other users of the function) by pointing out the source of errors. Here is an example:
         myfun = function(dtf, name, p=0.5) {
           if (is.matrix(dtf)) dtf = data.frame(dtf)
           if (!is.data.frame(dtf)) stop("dtf must be a data.frame or matrix")
           if (!is.character(name) || length(name)!=1) stop("name must be a single character string")
           if (p<=0 || p>=1) stop("p must be in the interval (0,1)"
           ...
           return(rslt)
         }
         
  3. Allow for stopping and restarting of functions with long loops (e.g., MCMC).
    A good trick is to setup your function (or even just a loop) as follows:
         myfun = function() {
           if (file.exists("myresults.dat")) {
              ...load and use old results...
           }
           ...
           for (i in 1:10000) {
             if (file.exists("stop")) {
               write.table(myresults, file="myresults.dat")
               stop("Early stop due to detection of stop file")
             }
             ...
           }
           ...
           return(...)
         }
         
    Then, you can create a file called "stop" at any time (e.g., in Linux using "echo stop>stop" at the Linux prompt) and the function will gracefully stop at the start of the next loop iteration. Without too much work, you can probably set up your function to automatically continue wherever you left off. Just remember to delete or rename the "stop" file before running the function again.

  4. Avoid using "attach" as a way to save typing. The major problem is that modification of old elements or creation of new ones is not saved when you quit (and "save workspace") R. This leads to insidious errors. One alternative is to use with(), e.g., something like:
         with(mydtf, plot(x, y, col=gender))
         
    where the columns of "mydtf" are x, y, and gender.

  5. Working with "non-visible functions": If you try, e.g., methods(logLik), you will find some methods (e.g., logLik.glm) that are marked with an asterisk and are "non-visible". Here is how to get a copy of those functions. Use getAnywhere(logLik.glm) to find that it is in the "namespace" called "stats". Then mylogLik.glm=stats:::logLik.glm will get you a copy of the function.

  6. (Advanced) To make a nice user interface with dialog boxes, etc. consider the Tcl/Tk package. Here is a good introduction. Here is the R help. Here is a primer with an update. You might prefer a higher-level package called rpanel, described here and here, with this home page, and this package reference, and this cute little example which needs spacer.gif. Here are more R examples. Here are links about comparing tcl/tk to other systems. And here is a (non-R) Tcl/Tk Electronic Reference.

Problems and Solutions

  1. Problem: After reading data from a file into a data.frame (e.g., with read.csv()) the first column name has a weird "ï.." prefix.     Solution: This is due to Windows adding a "byte order mark" to the beginning of the file. If you use, e.g., read.csv("myfile.csv", fileEncoding = "UTF-8-BOM"), the extra character will be removed, if necessary.
  2. Problem: Loading dates, e.g., from Excel, and working with dates is poorly documented.     Solution: Load datetest.csv, then try the examples in Rdates.R.
  3. Problem: Each click for the locator() annoyingly causes the bell to ring.
        Solution: options(locatorBell=FALSE)
  4. Problem: Create a new data.frame column that is a complex code based on old columns.
        Solution: Create a function for one subject and apply() it to all subjects. This is much more efficient than a for loop. E.g.
         myfun = function(x) {
           # Argument x should contain one row, columns a,b,e,f.
           # The result is the mean of a and f unless b is missing or negative,
           # in which case the min of e and f is returned.
           if (is.na(x[2]) || x[2]<=0) {
             return(min(c(x[3],x[4])))
           } else {
             return((x[1]+x[4])/2)
           }
         }
         dtf$new = apply(dtf[,c("a","b","e","f")], 1, myfun)
         
    An alternative is as follows. The optional first line may prevent some wonky errors, and is good practice.
         dtf$new = NA  # in general, NA is safer than 0, protecting against bad logic
         Sel = is.na(dtf$b) | dtf$b<=0
         dtf[Sel, "new"] = pmin(dtf$e[Sel], dtf$f[Sel])
         Sel = !is.na(dtf$b) & dtf$b>0
         dtf[Sel, "new"] = (dtf$a[Sel]+dtf$f[Sel])/2
         
  5. Problem: Analyze (all) subsets of a data.frame
       Solution: To analyze a single subset of a data.frame, you can use an index vector (logical or numeric) as the "row.selector" (first argument) of the form "incdata[row.selector, col.selector]". For example, the expression median(incdata[incdata$sex=="female", "income"]) calculates the median income of just the female subjects in data.frame "incdata".

    But expressions for each of several categories is awkward and inefficient. So the methods below present efficient alternatives. If no subsetting variable exists, consider using the R function cut() or Problem/Solution #2 to create it.

    Here is code you can paste into R to generate a sample data.frame to use as an example:

         n = 20
         incdata = data.frame(sex=c("male","female")[1+rbinom(n,1,0.5)],
                           race=c("black","white","hispanic","Asian")[1+rbinom(n,3,0.5)],
                           income=round(rnorm(n,50000,15000)),
                           networth=pmax(0,round(rnorm(n,50000,30000))))
         
  6. Problem: Need a function like read.table(), but the data are already in a character variable
        dat = c("category size type", "abc 23.4 g17a72", "aaa 3.2 h19h33", "bar 17 z12z12")
        dtf = read.table(textConnection(dat), header=TRUE)
      
  7. Problem: R is too slow

Useful Functions

Note: In R, use source("somefunction.R"), including the quotes, to make the functions in somefunction.R available.


All links active 5/15/2014. Please report missing links, errors, and suggestions to


up To my Home Page