32 R Programming – 1
Mr. Hardik Joshi
Overview of R
R is an open source program licensed under GNU GPL, it is a Software Environment for Statistical Computing . R supports a variety of statistical functions, programming and graphics related utilities. R is widely used by statisticians, data miners and researchers as a statistical tool. Recently, R has gained wide popularity due to the capabilities of machine learning utilities which can be done very easily by importing libraries. R is available for most of the operating systems. R has a command line interface, there are several GUI front-ends that support R.
R is an effective programming language. R provides a variety of programming features like variables, conditions, loops, user defined recursive functions, datasets facilities. R also has facilities for data handling, calculations, retrieval and strage facilities. R provides a wide range of operators for calculations on arrays, data structures like lists, vectors and matrices. It provides integrated collection of tools for data analysis. R provides graphical facilities for to create charts, plots in desired formats.
Download and Installation
R can be downloaded from its website https://www.r-project.org, R is available for most of the operating systems. It is available for Windows in binary form while it can be installed on Red Hat based distributions using the following command:
$sudo yum install R -y
Alternatively, R can be downloaded as a source code and can be compiled into a package on Unix like systems
Interface & Other tools
The interface of R can be either in command prompt or there are GUI tools that are made available by third party for programming in R. The following screen shows a command line interface that is available after downloading R from the above-mentioned website:
Figure 1: Default R Console
The default software uses command line interface. We can run commands using the command line interface or execute R scripts using the same interface. Following are few examples of using the command line interface to execute commands and R scripts.
As shown in above screen, R can be started by using ‘R’ command on $ prompt. A special ‘>’ prompt is seen for R. We can type commands on this prompt one by one and get commands executed. The output will also be generated on the same prompt. R, also supports execution of scripts. The scripts can be typed in text format in some file, say “first.R” and it can be executed by issuing the command Rscript on the command prompt. The execution of R scripts is shown below:
Commands related to R Environment
R provides functions to interact with the environment of system. R has its own default working directory that can be listed using getwd() function. We can also set the working directory to some other desired directory of secondary storage. Through the R command prompt, list of files is also possible using dir() function. The following table shows few widely used functions of R:
Packages in R
The real power of R is hidden in its wide variety of packages. R packages are a collection of R functions, complied code and sample data, the packages are used for specific functionality. They are stored under a directory called “library” in the R environment. R comes by default with a set of packages during installation. Depending on the requirement, more packages can be added later, when they are needed for some specific purpose. When we start the R console, only the default packages are available in standard installation.
R provides a variety of packages for executing specialized tasks. There are packages for numerical computing, machine learning, financial calculations, etc. The above table lists functions that can be used to display/search packages or install additional packages that are not provide in standard installation of R. Functions like help() or help.start() can be used to browse various topics available in the R manual. The above listed commands/functions can be typed on the command prompt screen as shown in below figure:
Figure 2: Command prompt to execute various functions of R
Using Command Prompt for R programming
Let us go through some example, how command prompt can be used to execute different commands of R. As seen in the below code, the command prompt ‘>’ is used to execute mathematical calculations. Assignment of value to variables can be done using ‘->’ symbol or also by using the assign() function as shown below:
Programming Support in R
In R programming, there is a rich set of data type support as in ‘C’ language. R supports a wide variety of data types that are listed in following table:
Since R is a scientific tool, it supports complex data types and also supports raw data type where the data ítems can be a mix of text and numeric values.
The following table lists varios operators that are supported by R. As we have already seen, R supports arithmetic operations through command prompt or through scripts. The following operators can be used over variables/numbers to perform calculations:
R also supports a rich set of logical operators to carry out comparison of values. The list of logical operators are shown in following table:
Data Structures in R
Data structures are generally used to store multiple ítems using a common variable name. In R, there are two categories of data structures. Data structures that store homogeneous values and those that store heterogeneous values. Data structures supported by R are:
- atomic vector
- list
- matrix
- data frame
- factors
- tables
Atomic vector is a data type that store values of same type. The vector can be created using ‘c()’ function as shown in the below code:
Output of the above code is shown in following figure:
The matrix function accepts arguments like no. of rows, no. of columns, arrangement of elements and data items. These arguments are described in following table:
A data frame is a two-dimensional array-like structure in which each column contains values of one variable and each row contains for columns. Data frames are used when we want to process data that is available in CSV or Spreadsheet format. Data frames can parse large number of records very quickly. Following are the characteristics required to create a data frame
- The column names must be there
- The row names should be unique
- Each column should contain same number of data items
The following code creates a data frame of 6 letters and two columns labelled ‘x’ and ‘y’.
Datasets
For those who do not want to create datasets, R provides inbuilt datasets that can be accessed by command prompt. In most of the standard installations, few datasets are provided by R. Let us see an example of ‘mtcars’ dataset that is provided by default in R installations. By typing ‘mtcars’ on command prompt, it lists the dataset values as shown in following figure:
We can create a table of data items form the above mentioned dataset. We can also select or summarize columns as shown in following code:
Functions in R
R provides a wide variety of functions. Functions can be used for mathematical, statistical, financial calculations. The list of functions provided by R can be browsed using the ‘help()’ command on prompt. Let us see how functions can be used in R. The following code shows the use of statistical functions mean and median over an array of numbers.
Let us summarize the key concepts covered in this module
- What is the need of R
- How to install and use R
- Overview of variables and operators
- How to use data structures
- Overview of functions and packages
you can view video on R Programming – 1 |
References for R Programming:
- Felix Alvaro. R : Easy R programming for beginners, your step-by-step guide to learning R programming,
- John Braun and Duncan James Murdoch. A first course in statistical programming with R,
- John M. Chambers. 2008. Software for data analysis : programming with R, Springer.Michael J. Crawley. The R book,
- Tilman M. Davies. The book of R : a first course in programming and statistics,
- Mark. Gardener. 2012. Beginning R : the Statistical Programming Language., John Wiley & Sons.
- Colin (Colin Stevenson) Gillespie and Robin Lovelace. Efficient R programming : a practical guide to smarter programming,
- Garrett Grolemund. Hands-on programming with R,
- Thomas Mailund. Advanced Object-Oriented Programming in R : StatisticalPrograming for Data Science, Analysis and Finance,
- Norman S. Matloff. 2011. The art of R programming : tour of statistical software design, No Starch Press.
- Shuichi Ohsaki, Jori Ruppert-Felsot, and Daisuke Yoshikawa. R programming and its applications in financial mathematics,
- Kun Ren. Modern R programming cookbook : recipes to simplify your statistical applications,
- Kun Ren. Learning R programming : become an efficient data scientist with R,
- Omar. Trejo and Peter. C. Figliozzi. 2017. R Programming By Example Practical, hands-on projects to help you get started with R., Packt Publishing.
- Alain F. Zuur, Elena N. Ieno, and Erik. Meesters. 2009. A beginner’s guide to R, Springer.
- Anon. R: The R Project for Statistical Computing. Retrieved March 19, 2017 from https://www.r-project.org/