Why R?
Spreadsheets are an admirable software. They are great for data
entry, for viewing raw data and for making quick charts. If you have
been using it for a long time, you must have learned a lot of tricks to
get the most out of it; things like formulas, pivot tables, and even
macros. But surely you also suffered from its limitations.
In a spreadsheet there is no clear boundary between data and
analysis. Overwriting data is a very real danger and complicated
analyses are very hard to understand, especially if you open a
spreadsheet put together by someone else (who may be you from the past).
Also, repeating an analysis on different data or with different
parameters can become very cumbersome.
If what you need are frequent and automatic reports, and data
analysis with many moving parts, it would be nice to be able to write a
recipe-like step-by-step set of instructions and have the computer run
everything automatically every time you ask it to. To be able to do
that, that step-by-step has to be written in a language that the
computer can understand, R is one of that languages.
How are we going to work?
We are going to use R as a language and RStudio as an IDE, a
Integrated Development Environment. If you don’t have these installed on
your computer don’t worry, we have this RStudio Cloud project
for you to work with.
To launch RStudio, double-click on the RStudio icon. Launching
RStudio also launches R (actually you will probably never open R by
itself).
Notice the default panes:
- Console (entire left)
- Environment/History (tabbed in upper right)
- Files/Plots/Packages/Help (tabbed in lower right)
We don’t need to know how to use all of this right away. We will
become familiar with more of the options and capabilities throughout the
workshop.
We can write code, that is, instructions to be executed by R on the
Console. For example we can calculate the result two plus two by
writing
## [1] 4
on the Console and pressing enter.
The result appears right below. We can also save that result to an
object, in this case called x
.
That little arrow is the assignment operator and works as an
=
. Now the result is saved on the Environment as a variable
with name x
and is not printed on the console.
This is great when we are trying out code to see if it works but
we’ll lose everything when we close RStudio. What we need to do is to
save the code that generates your analysis. For that we use R Scripts
and RMarkdown files.
We are going to have our first experience with R through RMarkdown,
so let’s see what an RMarkdown document is. We prepared this example report for you, please open
it on RStudio or in RStudio Cloud. The file will appear on a new forth
panel on the left-hand side of the screen, and the console panel will
move to bottom.
RMarkdown
An RMarkdown file is a plain text file, with some rules and special
syntax that allow us to write code and text together. When it is
“knited,” the code will be evaluated and executed and the text formatted
so that it creates a reproducible report or document that is nice to
read and contains all your work.
This is really critical to reproducibility. It also saves us time and
can help with automation tasks. This document will recreate your figures
for you in the same document where you are writing the text that explain
them. This will save you the effort of doing some analysis, saving a
plot on a file, copy-pasting that plot into Word or Power Point or
Google Slides, and having to do it all over again after discovering a
typo.
Now let’s see how our Penguins Report looks like.
- The top part has the Title and the output type (which in this case
is an HTML document).
- Below that there are alternating white and grey
sections. These are the two main sections that make up an RMarkdown
file: * Grey sections are R code * White sections are Markdown text
- There is black, blue text and green text.
Let’s go ahead and “Knit” the document by clicking the blue yarn
() at the top of the RMarkdown file.
We’ve just made an html file! This is a single webpage that we are
viewing locally on our own computers. By knitting this RMarkdown
document, R has formatted the markdown text and ran the R code.
Markdown text
You can get a guide to rmarkdown in this
cheat sheet, but here is a minimum syntax to get you started:
- headers start with
#
or ##
and so on (it’s
important to put a space after the last #
).
- bold words are surrounded with
**
- and italics, with
_
R Code
The R code is written inside code “chunks”. Code chunks start with
```{r label}
(where “label” is an optional, unique name)
and end with ```
. In RStudio, you can create a new chunk
with the Ctrl + Alt + I
keboard shortcut.
This report shows information about the Gentoo penguins but we could
change a few code lines to create the same analysis for the other two
species, Adelie and Chinstrap.
Now is your turn. Go ahead and look through the code, if you find any
mention of "gentoo"
, change it to any of the other
species.
This task is a bit cumbersome if you have to change many things any
time we want to re-run the analysis for different species. But don’t
worry, we’ll learn how to make everything more automatic by the end of
the workshop.
LS0tCnRpdGxlOiAiSW50cm9kdWN0aW9uIGFuZCBSZXBvcnRzIgpvdXRwdXQ6IAogIGh0bWxfZG9jdW1lbnQ6CiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgICB0b2M6IHRydWUKICAgIHRvY19mbG9hdDogZmFsc2UKICAgIGhpZ2hsaWdodDogdGFuZ28KLS0tCgojIyBXaHkgUj8KClNwcmVhZHNoZWV0cyBhcmUgYW4gYWRtaXJhYmxlIHNvZnR3YXJlLiBUaGV5IGFyZSBncmVhdCBmb3IgZGF0YSBlbnRyeSwgZm9yIHZpZXdpbmcgcmF3IGRhdGEgYW5kIGZvciBtYWtpbmcgcXVpY2sgY2hhcnRzLiBJZiB5b3UgaGF2ZSBiZWVuIHVzaW5nIGl0IGZvciBhIGxvbmcgdGltZSwgeW91IG11c3QgaGF2ZSBsZWFybmVkIGEgbG90IG9mIHRyaWNrcyB0byBnZXQgdGhlIG1vc3Qgb3V0IG9mIGl0OyB0aGluZ3MgbGlrZSBmb3JtdWxhcywgcGl2b3QgdGFibGVzLCBhbmQgZXZlbiBtYWNyb3MuIEJ1dCBzdXJlbHkgeW91IGFsc28gc3VmZmVyZWQgZnJvbSBpdHMgbGltaXRhdGlvbnMuIAoKSW4gYSBzcHJlYWRzaGVldCB0aGVyZSBpcyBubyBjbGVhciBib3VuZGFyeSBiZXR3ZWVuIGRhdGEgYW5kIGFuYWx5c2lzLiBPdmVyd3JpdGluZyBkYXRhIGlzIGEgdmVyeSByZWFsIGRhbmdlciBhbmQgY29tcGxpY2F0ZWQgYW5hbHlzZXMgYXJlIHZlcnkgaGFyZCB0byB1bmRlcnN0YW5kLCBlc3BlY2lhbGx5IGlmIHlvdSBvcGVuIGEgc3ByZWFkc2hlZXQgcHV0IHRvZ2V0aGVyIGJ5IHNvbWVvbmUgZWxzZSAod2hvIG1heSBiZSB5b3UgZnJvbSB0aGUgcGFzdCkuIEFsc28sIHJlcGVhdGluZyBhbiBhbmFseXNpcyBvbiBkaWZmZXJlbnQgZGF0YSBvciB3aXRoIGRpZmZlcmVudCBwYXJhbWV0ZXJzIGNhbiBiZWNvbWUgdmVyeSBjdW1iZXJzb21lLiAKCklmIHdoYXQgeW91IG5lZWQgYXJlIGZyZXF1ZW50IGFuZCBhdXRvbWF0aWMgcmVwb3J0cywgYW5kIGRhdGEgYW5hbHlzaXMgd2l0aCBtYW55IG1vdmluZyBwYXJ0cywgaXQgd291bGQgYmUgbmljZSB0byBiZSBhYmxlIHRvIHdyaXRlIGEgcmVjaXBlLWxpa2Ugc3RlcC1ieS1zdGVwIHNldCBvZiBpbnN0cnVjdGlvbnMgYW5kIGhhdmUgdGhlIGNvbXB1dGVyIHJ1biBldmVyeXRoaW5nIGF1dG9tYXRpY2FsbHkgZXZlcnkgdGltZSB5b3UgYXNrIGl0IHRvLiBUbyBiZSBhYmxlIHRvIGRvIHRoYXQsIHRoYXQgc3RlcC1ieS1zdGVwIGhhcyB0byBiZSB3cml0dGVuIGluIGEgbGFuZ3VhZ2UgdGhhdCB0aGUgY29tcHV0ZXIgY2FuIHVuZGVyc3RhbmQsIFIgaXMgb25lIG9mIHRoYXQgbGFuZ3VhZ2VzLiAKCiMjIEhvdyBhcmUgd2UgZ29pbmcgdG8gd29yaz8KCldlIGFyZSBnb2luZyB0byB1c2UgUiBhcyBhIGxhbmd1YWdlIGFuZCBSU3R1ZGlvIGFzIGFuIElERSwgYSBJbnRlZ3JhdGVkIERldmVsb3BtZW50IEVudmlyb25tZW50LiBJZiB5b3UgZG9uJ3QgaGF2ZSB0aGVzZSBpbnN0YWxsZWQgb24geW91ciBjb21wdXRlciBkb24ndCB3b3JyeSwgd2UgaGF2ZSB0aGlzIFtSU3R1ZGlvIENsb3VkIHByb2plY3RdKGh0dHBzOi8vcnN0dWRpby5jbG91ZC9wcm9qZWN0LzMxNzQ4NjUpIGZvciB5b3UgdG8gd29yayB3aXRoLgoKVG8gbGF1bmNoIFJTdHVkaW8sIGRvdWJsZS1jbGljayBvbiB0aGUgUlN0dWRpbyBpY29uLiBMYXVuY2hpbmcgUlN0dWRpbyBhbHNvIGxhdW5jaGVzIFIgKGFjdHVhbGx5IHlvdSB3aWxsIHByb2JhYmx5IG5ldmVyIG9wZW4gUiBieSBpdHNlbGYpLgoKCjxpbWcgc3JjPSJpbWcvcnN0dWRpby1wcmluY2lwYWwucG5nIiBhbHQ9IlJTdHVkaW8gbWFpbiBzY3JlZW4uIENvbnNvbGUsIGVudmlyb21lbnQgYW5kIEZpbGVzIHBhbmVscyIgLz4KCjxpbWcgc3JjPSJpbWcvcnN0dWRpby1jbG91ZC1wcmluY2lwYWwucG5nIiBhbHQ9IlJTdHVkaW8gQ2xvdWQgbWFpbiBzY3JlZW4uIENvbnNvbGUsIGVudmlyb21lbnQgYW5kIEZpbGVzIHBhbmVscyIgLz4KCk5vdGljZSB0aGUgZGVmYXVsdCBwYW5lczoKCiAgKiBDb25zb2xlIChlbnRpcmUgbGVmdCkKICAqIEVudmlyb25tZW50L0hpc3RvcnkgKHRhYmJlZCBpbiB1cHBlciByaWdodCkKICAqIEZpbGVzL1Bsb3RzL1BhY2thZ2VzL0hlbHAgKHRhYmJlZCBpbiBsb3dlciByaWdodCkKCldlIGRvbid0IG5lZWQgdG8ga25vdyBob3cgdG8gdXNlIGFsbCBvZiB0aGlzIHJpZ2h0IGF3YXkuIFdlIHdpbGwgYmVjb21lIGZhbWlsaWFyIHdpdGggbW9yZSBvZiB0aGUgb3B0aW9ucyBhbmQgY2FwYWJpbGl0aWVzIHRocm91Z2hvdXQgdGhlIHdvcmtzaG9wLgoKV2UgY2FuIHdyaXRlIGNvZGUsIHRoYXQgaXMsIGluc3RydWN0aW9ucyB0byBiZSBleGVjdXRlZCBieSBSIG9uIHRoZSBDb25zb2xlLiBGb3IgZXhhbXBsZSB3ZSBjYW4gY2FsY3VsYXRlIHRoZSByZXN1bHQgdHdvIHBsdXMgdHdvIGJ5IHdyaXRpbmcKCmBgYHtyfQoyICsgMgpgYGAKCm9uIHRoZSBDb25zb2xlIGFuZCBwcmVzc2luZyBlbnRlci4gCgpUaGUgcmVzdWx0IGFwcGVhcnMgcmlnaHQgYmVsb3cuIFdlIGNhbiBhbHNvIHNhdmUgdGhhdCByZXN1bHQgdG8gYW4gb2JqZWN0LCBpbiB0aGlzIGNhc2UgY2FsbGVkIGB4YC4KCmBgYHtyfQp4IDwtIDIgKyAyIApgYGAKClRoYXQgbGl0dGxlIGFycm93IGlzIHRoZSAqYXNzaWdubWVudCBvcGVyYXRvciogYW5kIHdvcmtzIGFzIGFuIGA9YC4gTm93IHRoZSByZXN1bHQgaXMgc2F2ZWQgb24gdGhlIEVudmlyb25tZW50IGFzIGEgdmFyaWFibGUgd2l0aCBuYW1lIGB4YCBhbmQgaXMgbm90IHByaW50ZWQgb24gdGhlIGNvbnNvbGUuIAoKVGhpcyBpcyBncmVhdCB3aGVuIHdlIGFyZSB0cnlpbmcgb3V0IGNvZGUgdG8gc2VlIGlmIGl0IHdvcmtzIGJ1dCB3ZSdsbCBsb3NlIGV2ZXJ5dGhpbmcgd2hlbiB3ZSBjbG9zZSBSU3R1ZGlvLiBXaGF0IHdlIG5lZWQgdG8gZG8gaXMgdG8gc2F2ZSB0aGUgY29kZSB0aGF0IGdlbmVyYXRlcyB5b3VyIGFuYWx5c2lzLiBGb3IgdGhhdCB3ZSB1c2UgUiBTY3JpcHRzIGFuZCBSTWFya2Rvd24gZmlsZXMuCgpXZSBhcmUgZ29pbmcgdG8gaGF2ZSBvdXIgZmlyc3QgZXhwZXJpZW5jZSB3aXRoIFIgdGhyb3VnaCBSTWFya2Rvd24sIHNvIGxldCdzIHNlZSB3aGF0IGFuIFJNYXJrZG93biBkb2N1bWVudCBpcy4gV2UgcHJlcGFyZWQgW3RoaXMgZXhhbXBsZSByZXBvcnRdKHBlbmd1aW5zX3JlcG9ydC5SbWQpIGZvciB5b3UsIHBsZWFzZSBvcGVuIGl0IG9uIFJTdHVkaW8gb3IgaW4gUlN0dWRpbyBDbG91ZC4gVGhlIGZpbGUgd2lsbCBhcHBlYXIgb24gYSBuZXcgZm9ydGggcGFuZWwgb24gdGhlIGxlZnQtaGFuZCBzaWRlIG9mIHRoZSBzY3JlZW4sIGFuZCB0aGUgY29uc29sZSBwYW5lbCB3aWxsIG1vdmUgdG8gYm90dG9tLgoKIyMgUk1hcmtkb3duCgpBbiBSTWFya2Rvd24gZmlsZSBpcyBhIHBsYWluIHRleHQgZmlsZSwgd2l0aCBzb21lIHJ1bGVzIGFuZCBzcGVjaWFsIHN5bnRheCB0aGF0IGFsbG93IHVzIHRvIHdyaXRlIGNvZGUgYW5kIHRleHQgdG9nZXRoZXIuIFdoZW4gaXQgaXMg4oCca25pdGVkLOKAnSB0aGUgY29kZSB3aWxsIGJlIGV2YWx1YXRlZCBhbmQgZXhlY3V0ZWQgYW5kIHRoZSB0ZXh0IGZvcm1hdHRlZCBzbyB0aGF0IGl0IGNyZWF0ZXMgYSByZXByb2R1Y2libGUgcmVwb3J0IG9yIGRvY3VtZW50IHRoYXQgaXMgbmljZSB0byByZWFkIGFuZCBjb250YWlucyBhbGwgeW91ciB3b3JrLgoKVGhpcyBpcyByZWFsbHkgY3JpdGljYWwgdG8gcmVwcm9kdWNpYmlsaXR5LiBJdCBhbHNvIHNhdmVzIHVzIHRpbWUgYW5kIGNhbiBoZWxwIHdpdGggYXV0b21hdGlvbiB0YXNrcy4gVGhpcyBkb2N1bWVudCB3aWxsIHJlY3JlYXRlIHlvdXIgZmlndXJlcyBmb3IgeW91IGluIHRoZSBzYW1lIGRvY3VtZW50IHdoZXJlIHlvdSBhcmUgd3JpdGluZyB0aGUgdGV4dCB0aGF0IGV4cGxhaW4gdGhlbS4gVGhpcyB3aWxsIHNhdmUgeW91IHRoZSBlZmZvcnQgb2YgZG9pbmcgc29tZSBhbmFseXNpcywgc2F2aW5nIGEgcGxvdCBvbiBhIGZpbGUsIGNvcHktcGFzdGluZyB0aGF0IHBsb3QgaW50byBXb3JkIG9yIFBvd2VyIFBvaW50IG9yIEdvb2dsZSBTbGlkZXMsIGFuZCBoYXZpbmcgdG8gZG8gaXQgYWxsIG92ZXIgYWdhaW4gYWZ0ZXIgZGlzY292ZXJpbmcgYSB0eXBvLgoKTm93IGxldCdzIHNlZSBob3cgb3VyIFBlbmd1aW5zIFJlcG9ydCBsb29rcyBsaWtlLgoKKiBUaGUgdG9wIHBhcnQgaGFzIHRoZSBUaXRsZSBhbmQgdGhlIG91dHB1dCB0eXBlICh3aGljaCBpbiB0aGlzIGNhc2UgaXMgYW4gSFRNTCBkb2N1bWVudCkuCiogQmVsb3cgdGhhdCB0aGVyZSBhcmUgYWx0ZXJuYXRpbmcgX3doaXRlXyBhbmQgX2dyZXlfIHNlY3Rpb25zLiBUaGVzZSBhcmUgdGhlIHR3byBtYWluIHNlY3Rpb25zIHRoYXQgbWFrZSB1cCBhbiBSTWFya2Rvd24gZmlsZToKICAgICAgKiBHcmV5IHNlY3Rpb25zIGFyZSBSIGNvZGUKICAgICAgKiBXaGl0ZSBzZWN0aW9ucyBhcmUgTWFya2Rvd24gdGV4dAoqIFRoZXJlIGlzIGJsYWNrLCBibHVlIHRleHQgYW5kIGdyZWVuIHRleHQuCgo+IExldOKAmXMgZ28gYWhlYWQgYW5kIOKAnEtuaXTigJ0gdGhlIGRvY3VtZW50IGJ5IGNsaWNraW5nIHRoZSBibHVlIHlhcm4gKDxpbWcgc3JjPSJpbWcva25pdC1ib3Rvbi5wbmciPikgYXQgdGhlIHRvcCBvZiB0aGUgUk1hcmtkb3duIGZpbGUuIAoKV2UndmUganVzdCBtYWRlIGFuIGh0bWwgZmlsZSEgVGhpcyBpcyBhIHNpbmdsZSB3ZWJwYWdlIHRoYXQgd2UgYXJlIHZpZXdpbmcgbG9jYWxseSBvbiBvdXIgb3duIGNvbXB1dGVycy4gQnkga25pdHRpbmcgdGhpcyBSTWFya2Rvd24gZG9jdW1lbnQsIFIgaGFzIGZvcm1hdHRlZCB0aGUgbWFya2Rvd24gdGV4dCBhbmQgcmFuIHRoZSBSIGNvZGUuCgo8aW1nIHNyYz0iaW1nL21hcmtkb3duLWtuaXQucG5nIiBhbHQ9IlJtYXJrZG93biBpbiB0aGUgbGVmdC4gS25pdCBkb2N1bWVudCBvbiB0aGUgcmlndGgiLz4KCiMjIyBNYXJrZG93biB0ZXh0CgpZb3UgY2FuIGdldCBhIGd1aWRlIHRvIHJtYXJrZG93biBbaW4gdGhpc10oYWRkIGxpbmspIGNoZWF0IHNoZWV0LCBidXQgaGVyZSBpcyBhIG1pbmltdW0gc3ludGF4IHRvIGdldCB5b3Ugc3RhcnRlZDoKCiogaGVhZGVycyBzdGFydCB3aXRoIGAjIGBvciBgIyMgYGFuZCBzbyBvbiAoaXQncyBpbXBvcnRhbnQgdG8gcHV0IGEgc3BhY2UgYWZ0ZXIgdGhlIGxhc3QgYCNgKS4KKiBib2xkIHdvcmRzIGFyZSBzdXJyb3VuZGVkIHdpdGggYCoqYAoqIGFuZCBpdGFsaWNzLCB3aXRoIGBfYAoKIyMjIFIgQ29kZQoKYGBge3IgaW5jbHVkZT1GQUxTRX0KY2h1bmtfc3RhcnQgPC0gImBgYHtyIGxhYmVsfSIKY2h1bmtfZW5kIDwtICJgYGAiCmBgYAoKVGhlIFIgY29kZSBpcyB3cml0dGVuIGluc2lkZSBjb2RlICJjaHVua3MiLiBDb2RlIGNodW5rcyBzdGFydCB3aXRoIGBgIGByIGNodW5rX3N0YXJ0YCBgYCAod2hlcmUgImxhYmVsIiBpcyBhbiBvcHRpb25hbCwgdW5pcXVlIG5hbWUpIGFuZCBlbmQgd2l0aCAgYGAgYHIgY2h1bmtfZW5kYCBgYC4gSW4gUlN0dWRpbywgeW91IGNhbiBjcmVhdGUgYSBuZXcgY2h1bmsgd2l0aCB0aGUgYEN0cmwgKyBBbHQgKyBJYCBrZWJvYXJkIHNob3J0Y3V0LgoKVGhpcyByZXBvcnQgc2hvd3MgaW5mb3JtYXRpb24gYWJvdXQgdGhlIEdlbnRvbyBwZW5ndWlucyBidXQgd2UgY291bGQgY2hhbmdlIGEgZmV3IGNvZGUgbGluZXMgdG8gY3JlYXRlIHRoZSBzYW1lIGFuYWx5c2lzIGZvciB0aGUgb3RoZXIgdHdvIHNwZWNpZXMsIEFkZWxpZSBhbmQgQ2hpbnN0cmFwLgoKPiBOb3cgaXMgeW91ciB0dXJuLiBHbyBhaGVhZCBhbmQgbG9vayB0aHJvdWdoIHRoZSBjb2RlLCBpZiB5b3UgZmluZCBhbnkgbWVudGlvbiBvZiBgImdlbnRvbyJgLCBjaGFuZ2UgaXQgdG8gYW55IG9mIHRoZSBvdGhlciBzcGVjaWVzLiAKClRoaXMgdGFzayBpcyBhIGJpdCBjdW1iZXJzb21lIGlmIHlvdSBoYXZlIHRvIGNoYW5nZSBtYW55IHRoaW5ncyBhbnkgdGltZSB3ZSB3YW50IHRvIHJlLXJ1biB0aGUgYW5hbHlzaXMgZm9yIGRpZmZlcmVudCBzcGVjaWVzLiBCdXQgZG9uJ3Qgd29ycnksIHdlJ2xsIGxlYXJuIGhvdyB0byBtYWtlIGV2ZXJ5dGhpbmcgbW9yZSBhdXRvbWF0aWMgYnkgdGhlIGVuZCBvZiB0aGUgd29ya3Nob3AuIAo=