A comprehensive guide to setup an R development environment for Ubuntu running in the Windows Subsystem for Linux (WSL)
Updates (2022-11-04)
This post is now a few years old, thus some of the specifics are outdated. It installs Ubuntu Bionic (18.04) on Windows via WSL1. The general strategy below is still valid, but I recommend you install a more recent version of Ubuntu via WSL2 (which is now the default)
If you previously installed Ubuntu via WSL1, you can upgrade the existing installation to WSL2
This post describes how to run RStudio Server in the WSL and access it from your browser running in Windows. However, Tom Palmer taught me about an alternative option. You can install a graphical desktop environment in WSL and then run RStudio Desktop directly on Ubuntu. This requires some extra setup, but if you want a more integrated Ubuntu experience, the effort will be worth it. See Tom’s Issue for more details
The Windows Subsystem for Linux (WSL) is really impressive. It provides a convenient and smooth experience for running Ubuntu (and other Linux distributions) from Windows because the Linux executables are able to run natively on the Windows machine. In the past, I’ve tried running Ubuntu in a virtual machine, but it was painfully slow, so much so that I never really used it for real work.1 I’ve also dual-booted multiple machines, but I always found reformatting the hard drive to be a nerve-wracking and time-consuming experience. And while I productively used a dual-boot setup for years, I essentially only used Ubuntu during that time. Only in the rare occasion that I had to use Adobe Illustrator would I boot into Windows. Finally with the WSL, using both Windows and Ubuntu is realistic.
I’ve experimented with the WSL since early 2018 after David Smith provided instructions for installing R in the WSL and Jeroen Ooms announced that you can host RStudio Server and OpenCPU apps from the WSL. However, the most I’ve used it for real work was for speed benchmarking my R package workflowr.2 However, for one of my current freelance jobs, I was provided a Windows laptop. With RStudio and Git Bash, I’ve been productively developing on Windows for the past couple of months. But it has been noticeably slower. Recently I had some code that was taking minutes to run on Windows, so I tried it on the WSL, and it ran it in seconds. This finally convinced me that I should really switch over to primarily using the WSL.
In this post, I’ll describe how to install the WSL, configure it for R development, and share other tips and tricks.
The easiest method for installing the WSL is to use the Windows Store. Follow these steps:
On my work laptop, IT blocked the Microsoft Store app. And when I tried the Microsoft Store via the browser, the “Get” link failed (in multiple browsers, so I assume IT somehow also blocked this). If you find yourself in a similar situation, or if you simply don’t want to create a Microsoft Account, you can perform all the steps above from a PowerShell terminal.
Enable the WSL feature and restart your computer (source).
Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
Download Ubuntu (source).
Invoke-WebRequest -Uri https://aka.ms/wsl-ubuntu-1804 -OutFile Ubuntu.appx -UseBasicParsing
Install Ubuntu (source).
Add-AppxPackage .\Ubuntu.appx
Open Ubuntu and configure your username and password, as described above.
You can install software using the package manager
APT. The very first thing you
should do after installing Ubuntu is to upgrade the software. You’ll need to
enter your Ubuntu password to run commands as the root user (sudo
).
# Check for latest package versions
sudo apt update
# Upgrade currently installed packages
sudo apt upgrade
# Install emacs and unzip. Enter "y" to accept
sudo apt install emacs unzip
It’s a good habit to regularly update the software by running sudo apt update && sudo apt upgrade
.
The software versions available from the default sources used by APT eventually become outdated if you don’t regularly upgrade Ubuntu. Ubuntu 18.04 was released in April 2018, so this is definitely a potential issue. The solution to this is to install more recent versions from Personal Package Archives (PPAs).
For example, the versions of Git available in the default channels are 2.17.0 and 2.17.1.3
$ apt list -a git
Listing... Done
git/bionic-updates,bionic-security,now 1:2.17.1-1ubuntu0.5 amd64 [installed,automatic]
git/bionic 1:2.17.0-1ubuntu1 amd64
Fortunately, the Git developers provide a PPA to install the latest stable upstream version, which can be added using the command below:
sudo add-apt-repository ppa:git-core/ppa
And now Git 2.26.0 is available for download:
$ apt list -a git
Listing... Done
git/bionic 1:2.26.0-1~ppa1~ubuntu18.04.1 amd64 [upgradable from: 1:2.17.1-1ubuntu0.5]
git/bionic-updates,bionic-security,now 1:2.17.1-1ubuntu0.5 amd64 [installed,upgradable to: 1:2.26.0-1~ppa1~ubuntu18.04.1]
git/bionic 1:2.17.0-1ubuntu1 amd64
Running sudo apt upgrade
will replace Git 2.17.1 with 2.26.0.
And since this is a new environment, don’t forget to set your user.name
and
user.email
to sign your Git commits:
git config --global user.name "Your Name"
git config --global user.email "email@domain"
R is available for installation with APT, but similar to Git, it is outdated (R 3.4.4 was released in March 2018):
$ apt list r-base
Listing... Done
r-base/bionic 3.4.4-1ubuntu1 all
The official CRAN Ubuntu installation
instructions describe
how to manually edit the file /etc/apt/sources.list
, which lists the sources
for APT to check for available software. However, since I am recommending the
use of PPAs, I think it is easier to directly add Michael Rutter’s RRutter v3.5
PPA.4 This is the source PPA that is synced to CRAN,
so it is identical to the official CRAN source.
sudo add-apt-repository ppa:marutter/rrutter3.5
Instead of manually editing /etc/apt/sources.list
, the command above added the
new source in a separate file in the directory /etc/apt/sources.list.d/
. Note
that the Git PPA is also there. For me, this makes it easier to manage PPAs. You
can quickly view which PPAs you are using, and you know which file to edit if
you need to update or disable the PPA.
$ ls /etc/apt/sources.list.d/
git-core-ubuntu-ppa-bionic.list git-core-ubuntu-ppa-bionic.list.save marutter-ubuntu-rrutter3_5-bionic.list
Now R 3.6.3 is available to install:
$ apt list -a r-base
Listing... Done
r-base/bionic 3.6.3-1bionic all
r-base/bionic 3.4.4-1ubuntu1 all
Before you install R, an optional but advisable step is to register Michael Rutter’s signing key to ensure the binaries you are installing on your machine are actually the ones he uploaded to his PPA. Note you’ll need to have admin access to your Windows machine for this step.
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
If you have trouble connecting to the server and you use a VPN, try disconnecting from it. Also, the CRAN instructions have alternative commands to try. The below option worked for me on my work laptop:
gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9
gpg -a --export E298A3A825C0D65DFD57CBB651716619E084DAB9 | sudo apt-key add -
Install r-base
(which includes r-base-core
, r-base-dev
, the documentation
files, and the set of recommended packages in r-recommended
).
sudo apt install r-base
To install R packages, you could rely on install.packages()
. But unlike
Windows and macOS, CRAN does not build binaries for any of the Linux
distributions. This means that every package is installed from source, so
packages like stringi will take minutes to install each time there is a new
release. Thus I try as much as possible to rely on the binaries built for
Ubuntu.5
There are almost 500 R package binaries available for Ubuntu.
$ apt list r-cran-* 2> /dev/null | wc -l
496
Unfortunately the RRutter v3.5 PPA (and its equivalent CRAN sync) added above only includes R itself and the dozen or so recommended packages. The remaining packages will be the version from when R 3.4.4. was released years ago, which isn’t very useful.
Luckily Michael Rutter maintains a second PPA, cran2deb4ubuntu_3.5, that builds binaries for all the packages listed in the CRAN Task Views (and their package dependencies), and these packages are regularly updated. It’s not immediate, but if you regularly upgrade your APT packages, you’ll have the latest version of any given R package within a few days of its release to CRAN. Add this PPA to access over 4000 R package binaries.
sudo add-apt-repository ppa:marutter/c2d4u3.5
$ apt list r-cran-* 2> /dev/null | wc -l
4525
This allows you to install the entire tidyverse suite of R packages plus their system dependencies in only a few minutes.
sudo apt install r-cran-tidyverse
If a package you want isn’t available when you try to install r-cran-foo
(note
that the R package should always be lowercase), then you can install it with
install.packages()
into a personal library directory.
Note that there is a downside to using the pre-built binary R packages. Since
they were already built, they don’t include any system dependencies that are
only required for compiling the package. For example, you can install and use
the RCurl package with sudo apt install r-cran-rcurl
, but still get an error
when running install.packages("RCurl")
. This is because RCurl requires the
system library libcurl4 to run,
but it additionally needs the system library
libcurl4-openssl-dev
to compile the package from source. Thus if you find yourself installing R
packages from source because they aren’t available from a PPA, you’ll want to
install some of the commonly required system libraries like
libcurl4-openssl-dev:
sudo apt install libcurl4-openssl-dev
For other system software you may need to install, check out the Dockerfile for the rhub Ubuntu image.
There are a decent number of Bioconductor packages available for Ubuntu, but they are all outdated.
$ apt list r-bioc-* 2> /dev/null | wc -l
84
Thus I recommend installing BiocManager with APT. It’s part of the c2d4u PPA, so it will be kept up-to-date.
sudo apt install r-cran-biocmanager
And then you can install individual Bioconductor packages as needed from within R:
BiocManager::install(c("GenomicRanges", "SummarizedExperiment"))
This will unfortunately take awhile since the packages are built from source.
If some packages fail to install, it is likely due to the new staged install
check
introduced in R 3.6. I got this error trying to install
Biobase, and others have
reported WSL-specific issues, e.g. for
Biobase
and
ps.
A workaround is to disable this check prior to installation by setting an
environment variable with Sys.setenv(R_INSTALL_STAGED = FALSE)
.
Unless you’re a devoted emacs or vi user, you probably aren’t overly excited about the prospect of doing all of your R development from a bash shell. You’re in luck! As I mentioned at the beginning of the post, Jeroen Ooms’ amazing OpenCPU project conveniently provides all the infrastructure to install and serve RStudio Server via the Apache web server.6
As you might have guessed, the first step is to add the OpenCPU PPA. Then you can install OpenCPU server, which also installs the Apache web server, and RStudio Server. Note that you will be prompted about email addresses. Since you are not setting up an email server, you can accept the default settings. Also, you’ll again need to have admin rights to install Apache and RStudio Server.
sudo add-apt-repository -y ppa:opencpu/opencpu-2.1
sudo apt install opencpu-server
sudo apt install rstudio-server
Note that the OpenCPU
PPA is versioned,
i.e. the 2.1
. Instead of updating the software in one PPA, he creates new ones
(thus each PPA is stable long-term). Therefore you might want to check back in
the future to see if there is a new PPA available.7
Now you should be able to navigate to http://localhost/rstudio/ with your
browser. This is the RStudio Server you are running in the Ubuntu WSL. Login
with your Ubuntu username and password, and then run sessionInfo()
to confirm.
RStudio Server is setup automatically when you first install it. However, the
next time you start your machine and open Ubuntu, it won’t be ready to use. Thus
I always create a file called start.sh
that contains the startup steps from
the OpenCPU Server
Manual:
#!/bin/bash
set -eux
# Enable OpenCPU for Apache
a2ensite opencpu
# Restart Apache
apachectl restart
# Start RStudio Server
rstudio-server start
You’ll need root access, so execute it with sudo bash start.sh
. Then you’ll
once again be able to access RStudio Server via your browser at
http://localhost/rstudio/. Importantly, the RStudio Server will continue running
even if you close the Ubuntu terminal. Thus the only time you need to re-run
start.sh
is after you shutdown your computer.
The above instructions work fine on my personal laptop running Windows 10 Home edition. However, I wasn’t able to use port 80 on my work laptop running Windows 10 Pro.
$ sudo bash start.sh
+ a2ensite opencpu
Site opencpu already enabled
+ apachectl restart
httpd not running, trying to start
(13)Permission denied: AH00072: make_sock: could not bind to address [::]:80
(13)Permission denied: AH00072: make_sock: could not bind to address 0.0.0.0:80
no listening sockets available, shutting down
AH00015: Unable to open logs
Action 'restart' failed.
The Apache error log may have more information.
If you also receive the error about not being able to bind to port 80, then follow the instructions in this Ask Ubuntu answer. The default port used by Apache is 80, which you can verify by viewing the ports configuration file.
$ cat /etc/apache2/ports.conf
# If you just change the port or add more ports here, you will likely also
# have to change the VirtualHost statement in
# /etc/apache2/sites-enabled/000-default.conf
Listen 80
<IfModule ssl_module>
Listen 443
</IfModule>
<IfModule mod_gnutls.c>
Listen 443
</IfModule>
# vim: syntax=apache ts=4 sw=4 sts=4 sr noet
To get around a blocked port 80, edit that file (sudo nano /etc/apache2/ports.conf
) so that Apache listens on any high port, e.g. Listen 8080
. After changing the port, you can access RStudio Server in the browser by
including the port in the URL, e.g. http://localhost:8080/rstudio/.
Note that you’ll still probably receive a warning like [core:warn] [pid 62] (92)Protocol not available: AH00076: Failed to enable APR_TCP_DEFER_ACCEPT
. You
can safely ignore this because its a known issue and it doesn’t affect Apache’s
ability to run (source).
Since we are discussing editing ports, it’s a good time to discuss security.
With this setup, you are running a web server on your local machine. Now this
shouldn’t be a big deal, since if your computer was open to outside network
traffic, you’d have way bigger problems
(source).
But if you are worried about any potential issue, you can be overly cautious and
restrict Apache to only listen to ports on your local machine, i.e. localhost
or the equivalent IP address 127.0.0.1
. In other words, edit
/etc/apache2/ports.conf
, replacing Listen 8080
and Listen 443
with Listen 127.0.0.1:8080
and Listen 127.0.0.1:443
(source).
To make it easy to clone, push, and pull repositories from online Git hosting
services, e.g. GitHub, you’ll want to create SSH keys on the WSL and register
them with the service. I recommend the instructions from
GitHub,
which I’ve modified below. Note that I added the flag -o
to use the newer,
more secure OpenSSH format for creating the key.
# Create new SSH keys
ssh-keygen -o -t rsa -b 4096 -C "Any message you would like to annotate the key"
# Start the SSH agent
eval "$(ssh-agent -s)"
# Add SSH key to SSH agent
ssh-add ~/.ssh/id_rsa
Once you’ve created the key pair, you can copy the public key to GitHub and/or the other services you use.
Unfortunately the experience with using SSH keys is not as seamless as it is
with a standard Ubuntu installation. If you use a passphrase with your SSH key,
you will be prompted to enter your passphrase every time. On standard Ubuntu,
you only have to enter it the first time you use your key after restarting your
computer. I found a workaround for this, but it is hacky, so please let me know
if you know of a better solution. First, add the line eval "$(ssh-agent -s)" >/dev/null
to ~/.bashrc
so that the agent is automatically started each time
you open a window. 8 Second, add the line
AddKeysToAgent yes
to the file ~/.ssh/config
. Now you will only have to
enter your passphrase once, though you will have to do it for every new window
you open.
Another nice feature of the WSL is that it is straightforward to access the
files on your Windows machine. Unlike a virtual machine or Docker container, you
don’t have to take any extra steps to configure this. From Ubuntu, you can
access your Windows filesystem mounted at /mnt/c/
. For example, a common use
case for me is to download a file from the internet to my Windows desktop, and
then copy it to the WSL for analysis.
cp /mnt/c/Users/demo/Desktop/file.txt .
However, transferring files between Ubuntu and Windows has its gotchas, so you’ll want to use this feature sparingly. For example, if you want to access your Git repos from Ubuntu and Windows, I’d recommend cloning them separately to Windows and Ubuntu, i.e. treating them as if they were on separate machines.
Some guidelines:
File paths are case-sensitive on Ubuntu, but not on Windows. This can create
some real headaches. If you are accessing Windows files from Ubuntu via
/mnt/c/
, the WSL performs some magic so that the names are
non-case-sensitive. But you really don’t to rely on that behavior. Also, if
you are creating files from Ubuntu on Windows, you can configure the
case-sensitivity of the filenames on a per-directory
basis.
But again, I’d recommend you avoid this complication by naming all your files
on both Ubuntu and Windows such that they don’t rely on case to be
distinguished.
Windows and Linux use different end of line characters to specify a
newline. This is the main reason I
recommended above that you should keep separate copies of Git repos, otherwise
you have configure your Git settings to handle different line
endings,
otherwise it will constantly look like all of your files have been edited.
Confusingly, it may appear that the line endings are fine when copying back
and forth because they display ok. I recommend running dos2unix
after
copying from Windows to Ubuntu, and unix2dos
before copying from Ubuntu to
Windows (you can install these tools with sudo apt install dos2unix
).
While you can, with caution, create or modify Windows files directly from
Ubuntu, never create or modify the Ubuntu files located in the AppData
directory from
Windows.
If you need to copy a file between the two, I recommend always doing this from
the Ubuntu terminal via /mnt/c/
. If you really want to edit Ubuntu files
from Windows, you can run explorer.exe .
in the Ubuntu terminal to launch
Windows File Explorer, or you can access them from PowerShell using the path
\\wsl$\{distro name}\
, e.g. ls \\wsl$\Ubuntu-18.04\home\demo\
(source).
Another option for developing your code from the WSL is to use Visual Studio Code. You can install the extension for remote development in the WSL.9 This extension allows you to open VS Code in Windows, but switch to developing as if you were in the WSL. You can access all your files, and if you open a terminal, it is a bash shell running in the WSL!
Steps:
Ctrl
+Shift
+P
to open the command palettewsl
and then choose Remote-WSL: New Window
(you’ll need admin rights
to approve this the first time)
Ctrl
+Shift
+`
) to run bash commands
directly in the WSL
Remote: Close Remote Connection
If you have any issues with Git, there is documentation for fixing line ending issues and sharing credentials between Windows and the WSL (but you shouldn’t need to worry about the latter if you created SSH keys above).
And if you find you really like VS Code, you can configure it for R development. To learn more, check out these blog posts from Kun Ren (one of the co-developers of the languageserver package), Miles McBain, and Jozef Hajnala.
If you only want to occasionally test your code on Ubuntu, you can open a terminal running bash in the WSL directly in RStudio.
Steps:
Tools
->Global Options...
->Terminal
New terminals open with:
to Bash (Windows Subsystem for Linux)
Alt
+Shift
+R
pwd
to confirm the terminal is running in the WSLR
to start the R console in the WSL terminalCtrl
+Alt
+Enter
10If you are familiar with the behavior of the terminal window on Linux and macOS, you will find the Windows terminal to be very strange. However, if you’ve used Git Bash before, then you will already know how it works.
To copy text from the terminal, highlight the text and press Enter
.
Alternatively, you can right-click the Ubuntu icon in the top left of the
window, select Edit
, and finally select Copy
. Now you can paste into another
window with Ctrl
+V
.
To paste text into the terminal, right click with the mouse. Alternatively, you
can use the menu: Edit
->Paste
. You can also try Ctrl
+V
or the Insert
key to see if either of those happen to work on your computer.
Technically everything I have said above applies to WSL1. The new WSL2 is a rewrite that changes the technical implementation of running Linux on Windows (it uses a lightweight VM to run the Linux kernel), with the goal of faster file access. From the user perspective, you shouldn’t have to worry much if you want to try out the WSL2. You should still be able to access RStudio Server at http://localhost/rstudio/ (source). And you can switch back and forth between WSL1 and WSL2, or run them both. However, it is still in development, so you may have to do some more troubleshooting. For example, the VS Code Remote-WSL extension explicitly states that its support for the WSL2 is experimental.
If you’re feeling ambitious, you can install the WSL2. Also, see this post for using the WSL 2 with Visual Studio Code. Note that you’ll need a Windows account to be able to access this new feature via the Windows Insiders Program.
After following these steps, hopefully you are now able to productively develop your R code on Ubuntu running via the WSL. Please let me know if you ran into any issues or if a critical step is missing. Also, please let me know if you have any tips and tricks to share, especially if you have a better strategy for activating the SSH agent.
Not to mention that sharing host files with the VM requires additional setup.↩
For whatever reason, read/write operations and filepath manipulations, e.g. converting an absolute path to a relative path, take a lot longer on Windows 10. On the WSL, which has access to the same computational resources on my laptop as Windows, workflowr functions run a lot faster.↩
Note that bionic refers to Bionic Beaver, the release name for Ubuntu 18.04.↩
Don’t be concerned about the 3.5. There was a big change between versions 3.4 and 3.5 in how R was packaged for Debian/Ubuntu. We’re far enough away from that event that you shouldn’t have to worry about it. Just remember that for the latest version of R, you want the 3.5 version.↩
See this recent blog post from Jumping Rivers on various strategies for Faster R package installation.↩
Serving RStudio Server is a very limited use case for OpenCPU. It’s primary purpose is for converting R code into web applications. This post is already too long to include a detailed description, but you should check out the example apps to get a sense of the possibilities.↩
Also note that his PPA contains r-base and the r-recommended packages. These will eventually become outdated, but since you’ve added the RRutter v3.5 PPA, you’ll continue to get the latest version of R from there. Thus you are only relying on the OpenCPU PPA for OpenCPU server, Apache, and RStudio Server.↩
I got this idea from this blog
post.
However, I don’t agree with all of his conclusions. He says that the ssh-agent
doesn’t persist. When I run ps -ef | grep ssh-agent
, it clearly persists
across sessions. But for whatever reason it doesn’t work, and a new agent has to
be started. One reason I feel this is so hacky is that each time I open a
window, a new ssh-agent process is spawned.↩
There are similar extensions for remote development in Docker containers or in any remote machine accessible via SSH, e.g. an HPC cluster. I’ve used all 3, and they are amazing.↩
Unfortunately none of the other RStudio panes are aware of R running in the terminal. Plots won’t display, and the Environment pane won’t update. If you need these features, use the RStudio Server strategy described above.↩
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/jdblischak/blog.jdblischak.com, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Blischak (2020, March 31). John Blischak's blog: R programming in the Windows Subsystem for Linux. Retrieved from https://blog.jdblischak.com/posts/wsl-r/
BibTeX citation
@misc{blischak2020r, author = {Blischak, John}, title = {John Blischak's blog: R programming in the Windows Subsystem for Linux}, url = {https://blog.jdblischak.com/posts/wsl-r/}, year = {2020} }