Introductory R: A beginner’s guide to programming, data visualisation and statistical analysis in R
2021-10-14
Chapter 1 Preface
Rob Knell School of Biological and Chemical Sciences Queen Mary University of London
Published by Robert Knell, Hersham, Walton on Thames United Kingdom KT12 5RH
Copyright © Robert Knell 2021
All rights reserved. No part of this book may be reproduced, stored in a retrieval system or transmitted at any time or by any means mechanical, electronic, photocopying, recording or otherwise without the prior written agreement of the publisher.
The right of Robert Knell to be identified as the author of this work has been asserted by him in accordance with the copyright, designs and patents act 1988
First printed March 2013
ISBN 978-0-9575971-1-2
This edition edited and rendered with the bookdown package (Xie 2016), using R Markdown and knitr (Xie 2015)
1.1 3rd Edition draft
This is a draft copy of the third edition of this book. As such it’s not yet complete and is still having things added to it or revised.
1.2 Why this book?
This book has slowly arisen from a set of course notes that I originally wrote for the postgraduate course, and has since been field-tested on the undergraduates. I decided to self-publish it as an e-book for several reasons. Firstly, it’s an interesting experiment. Secondly, it keeps the price down to roughly 20% of the cost of a book published in the traditional sense. Thirdly, it leaves me with complete control over the content of the book, where it’s available and what’s included in it.
I am writing from the perspective of a biologist, and many of the examples given are either biological or medical ones. The basic principles of the analyses presented are universal, however, and I’ve made an effort to explain the examples in such a way that I hope most people will understand them. Nonetheless, if you’re more used to data from other fields then you might have some difficulty following some of the examples, in which case I’m sorry. Please drop me a line (r.knell@qmul.ac.uk) and let me know if you think one or more of the examples used are particularly opaque and I’ll try to take your feedback into account.
1.3 Acknowledgements
I’d just like to thank all the staff and students at QMUL who gave feedback on the various drafts of the manuscript, especially Mark Stevenson who took a lot of time to check through the code for a previous version, Robin Wyatt who helped with the conversion of the previous version to the new format and Richard Nichols for ruthlessly weeding out any wording that might possibly be interpreted as implying that rejecting a null hypothesis might mean that the alternative hypothesis is true.
1.4 How to use this book
The structure of the book is, I hope, fairly self-evident. The first part introduces and explains some of the fundamental concepts of R (e.g. basic use of the console, importing data, record keeping, graphics). Some of these have sets of exercises associated with them (e.g. the “Basics” chapter), some don’t but have lots of examples instead. The second part deals with graphics in R, with detailed coverage of base R graphics in one chapter and a second chapter on ggplot2, which is a very popular package offering a very different approach to data visualisation in R. Part 3 deals with more advanced topics in R programming, including loops, pipelines and conditional statements.
1.5 Where are the statistics
Previous editions had a large component on using statistics in R. As the R programming aspect of the book has grown it became something of a monster, and for clarity I decided to split the book into two, so this book focuses on using R and the statistics are now presented in a companion volume called Introductory Biostatistics in R.
1.6 Tidyverse versus base R
R nowadays is almost two different languages. Firstly there is base R, the language that is the foundation of the language and that is installed in every R installation. Secondly, there is what is called the Tidyverse — this is a series of add-on packages for R that offer alternative methods for importing data, a new data structure, a suite of packages that give alternative ways of selecting and manipulating data and finally the graphics package ggplot2
which gives a different way of producing graphics with R. The Tidyverse packages are the vision of Hadley Wickham and his co-workers, many of whom work at RStudio. In this book the focus is on base R, which is often easier to learn and is more similar to other programming languages. Nonetheless, you can’t use R nowadays without knowing something about the Tidyverse so we also look at some of the more important Tidyverse packages, including dplyr
, purrr
and especially ggplot2
: after the ggplot2
chapter you’ll note that I switch to using ggplot2
rather than base R graphics for the rest of the book.
If you want to know more about the Tidyverse there are a number of excellent online resources, especially Hadley Wickham’s book R for Data Science for the philosophy and use of the Tidyverse generally, and Winston Chang’s R Graphics Cookbook for ggplot2
.