Chapter 2 Introduction
2.1 What will this course be about?
This will be an introductory course to programming in R and biological data analysis. It is primarily directed to graduate students in the Life Sciences at the Weizmann Institute of Science but open for auditors of neighboring institutions. There are no prerequisites for the course, we will build from the most basic analysis concepts and operations to more complex ones throughout the course. We will not delve deeply into the mathematical guts of the tools and algorithms, but rather focus on understanding the general concepts and enable students to perform and understand widely used analytical tools such as PCA, hierarchical clustering, linear regression, and differential gene expression.
2.2 What is R and why learning R?
R is a programming language, and as such is at heart a way to give instructions to computers. But in general R is mostly used as an statistical and data analysis environment, a virtual bench to work with data. R is also free software, it is licensed under a GNU General Public License, this means that anyone can use it to create programs, even for commercial purposes, without the need to pay a fee or a usage license to the language creators. R is widely used in quantitative and data intensive research fields as well as by corporate giants such as Google and Microsoft.
R has become a quite popular language, at the time of writing R ranks as 7th most popular language in the world and 4th in the US, according to the PYPL index; which ranks the languages by how many searches are done looking for tutorials of the different languages.
And most importantly for us is, that many scientists, bioinformaticians, and software developers that work in Life Sciences research have worked together to establish the Bioconductor Project. A project which purpose is to develop R packages, code that extends R’s functionality, enabling a more straight-forward biologically-oriented data analysis.
2.3 Install R and Rstudio
To be prepared for the first class it is mandatory that you install R and Rstudio (a development environment for R) before the classes start. Here we point you to the relevant web pages, but if needed you will find tons of online material elsewhere on the step-by-step installation process.
- To install R head to the R-project’s web page click download R. Choose a mirror location close to you for a faster download, but any will do. Then choose between operative systems. R runs in Mac, Windows and most UNIX-like environments. Read and follow the instructions in site.
- To install RStudio head to their download web site, download and install the relevant version for your computer’s OS.
You should first install R and only then install Rstudio. Rstudio will look for your R installation, if it is not already there you will have to configure it afterwards, which can be troublesome.
We hope that while installing R you don’t get intimidated by the process, don’t be afraid. If you run into troubles that you cannot solve, contact the Teaching Assistants, they’ll be glad to help you out. See you in the next chapter and class!