What is the solution of v2v2v2

Introduction to RStudio

Top-down vs. bottom-up

As with learning a new language, there are different approaches to programming languages. In the bottom-up approach, you will first deal with the elements of a language (vocabulary) and their correct composition (grammar). This approach is often difficult and laborious, especially at the beginning, as the connections only become clear after a while and the (successful) use of the language only after a longer or shorter period Lead time is possible.

With the top-down approach, you are confronted directly with the entire scope of the language. Language modules are taken over by others, details on the structure, the rules and an expansion of the vocabulary are formed by taking over from experts.

When learning a programming language, the innumerable possibilities of examples and templates on the Internet, as well as the help pages for the respective languages ​​that have been worked out down to the last detail, offer a mixture of the learning process. We will start the entry into the R programming language with a top-down approach.

Copy and paste (exercise block)

  1. Copy the line \ (v <- c (1,4,4,3,2,2,3) \) from Indexing with numbers and names into the new script and execute this line. Describe the effect of this line on the changes in the console and environment windows.
  2. Copy (from the website) the line \ (v [c (2,3,4)] \) into the new script and execute this line. Describe the effect of this line on the changes in the console and environment windows.
  3. Go to the following website: Bar and line graphs (ggplot2) and copy from the block Bar graphs of values the first four lines in the editor. Carry out these lines and discuss the result.
  4. Now copy the second code block from this website and execute the code. Discuss the result (see plot window).
  5. Now copy the two code blocks in the chapter “Bar graphs of counts” and execute the code. Pay particular attention to where the data (tips) come from!
  6. Enter the commands data () and? Tips, or? French_fries and discuss the result.
  7. Do the following two lines:
    • Dat_FF <- french_fries
    • str (Dat_FF)
  8. Load the SPSS file “bigfive.sav” with the help of Import Dataset and copy the corresponding commands into your script.
  9. Now load the file “bigfive_excel.xlsx” (also in ../Data/) with the help of Import Dataset and copy the corresponding R commands into the editor.

Objects in R (bottom up)

Everything can be saved as an object in R.

  • Individual values ​​Multiple values ​​(e.g. as a data record with raw data)
  • Tables
  • Statistical models
  • Results of statistical analyzes
  • Functions, etc.

All objects that were created or opened in an R session are located in the so-called environment (work area). In the last exercise, the following objects were created in the environment:

Figure 25: Environment of the last exercise

The creation of an object in the environment takes place via the assignment Object name <- object content. The assignment via the <- is R-specific. You can also do the usual = use, whereby this also has another property in R (details later).

For the award of Object names the following rules must be observed:

  • Object name cannot begin with number
  • Object name must not contain any operators (+, -, * etc.)
  • Pay attention to upper and lower case

There are a variety of different object types in. The basic object types (and what they have in common with SPSS data types) are:

  • Vectors \ (\ rightarrow \) ordinal / scale variables
    • numeric (numbers)
    • character (letters)
  • Factors \ (\ rightarrow \) nominal / ordinal variables
    • nominal variable
    • Categories of the factor = levels (can contain letters or numbers)
  • Data frames (multiple rows and columns) \ (\ rightarrow \) data set
    • Columns (vectors and factors
    • Lines (cases, e.g. test subjects)
  • Matrices \ (\ rightarrow \) not available in SPSS
    • lots of rows and columns
    • only data of one type (e.g. numerical values, or only factors, etc.) can be displayed.
  • Arrays \ (\ rightarrow \) not available in SPSS
    • Combination of several matrices (summaries of elements of the same data type (numeric, character, logical), are addressed via 2 or more indices).
  • Lists \ (\ rightarrow \) does not exist in SPSS
    • Combination of several objects
    • Lists can contain any object, including objects of various types.
    • In contrast to data frames and arrays, objects of different lengths can also be saved.

Vectors

The simplest object is a vector made up of several elements. Open a new script file and copy the following content into this file:

Save the file under the name 06_Objects.R. Now execute the content line by line and discuss the effect of the individual lines! Now copy the following code into the file and execute this code line by line. What happens in the environment?

There are several options for accessing individual elements of a vector.

  1. Access through direct indexing
  2. Access through variables (vectors) in which the indices of the elements to be accessed are stored.

The following code shows both of these possibilities. Copy the code into the script and execute it line by line.

To determine what type of data or what class of data (numeric, alphanumeric, date, etc.) is stored in the respective object, the function class () be used. The functions can be used to change the data type of a (simple) object as.numeric (variable), or. as.character (variable) possible. In addition, there are many functions that can be used with regard to the data type of objects. Some of these functions are discussed in detail in the relevant applications.

The conversion from numeric to alphanumeric and vice versa, however, are frequently used functions, especially when transferring data from other applications.

Other useful vector functions

In addition to the function already discussed seq (), the following functions are often useful for working with vectors. Since we are constantly using these functions, not all details are discussed here - use the help () function for this.

Logical vectors

A logical vector consists of TRUE and FALSE elements. These vectors follow a so-called Boolean logic with these principles (in R the logical AND is written with \ (\ & \), the logical OR with \ (| \):

  • T and T equals T (T & T \ (\ rightarrow \) T)
  • F and F equals F (F & F \ (\ rightarrow \) F)
  • T and F equals F (T & F \ (\ rightarrow \) F)
  • T or T equals T (T | T \ (\ rightarrow \) T)
  • F or F equals F (T | T \ (\ rightarrow \) F)
  • T or F equals T (T | T \ (\ rightarrow \) T)

If brackets are used, the following applies: Expression within brackets is evaluated first! With the sum () Function, the number of true (T) elements of a vector can be determined. An application example would be, for example, to use a logical query to determine how many yes answers are in a vector consisting of yes / no answers (see code below).

Task pad vectors

Copy the following code into a new R script and save it under the name 06_Objects_Tasks.

Now work on the following tasks:

  1. compute the mean of the variable large.
  2. copy the variable large into a new variable large1.
  3. replace the first value of large1 with the value NA.
  4. compute the mean of large1 - discuss the result.
  5. compute the mean of large1 taking into account missing values ​​(use the help to read about the parameters of the mean () function!).
  6. compute the mean of large1, whereby 50% of the values ​​(25% of the first and 25% of the last) should not be included in the mean value calculation.
  7. take the product of the second and the third value of the variable large and save the result in the variable gross_prod.
  8. set the third value of the variable sex on 1.
  9. create a variable (x), the values ​​of which should be in descending order and in steps of one from 50 to 1.
  10. show all but the second value of the variable x at.
  11. create a sequence (x) which starts with 1 and counts up in steps of two.
  12. create a sequence (x) which starts with 10 and counts down in steps of two.
  13. create a sequence (x) which starts with 0 and counts up to 10 in steps of two and the last value is 50 (i.e. x = 0 2 4… 10 50).
  14. determine what class the vector x assigned.
  15. convert the vector x from the data type ‘numeric’ to und character ’and save the result in x_c.
  16. determine what class the vector x_c assigned.
  17. convert the vector x back to data type ‘numeric’ and save the result in x_n
  18. determine what class the vector x_n assigned.
  19. how many people are greater than or equal to (\ (\ ge \)) than 173?
  20. for which element of the variable large is height = 181 true (T)?

Factors

Factors are a special form of vectors and are also defined in R as nominal data. For example, the subdivision of test persons according to gender is usually done in a data type factor saved. This factor would usually have 2 so-called factor levels (levels) own (male / female).

However, factors can also have several factor levels. For example, a factor could depict the highest level of school education (secondary school leaving certificate, high school diploma, university of applied sciences, university). This factor would therefore have 4 levels.

Since factors are defined at the nominal level, an assignment of levels and labels possible. This option is a bit confusing for SPSS users, as the meaning in R is as follows:

  • levels are the input, i.e. how the levels are coded (in the example with 1, 2, 3).
  • labels are the output, i.e. which levels are provided with which label.

To assign several dates to a factor, the factor ()Command with the c ()Command combined. Copy the following code into the script file and execute the commands line by line. Discuss the results.

Task block factors

  1. create the variable x = c (1,2,3,1,1,2,2) and convert it into a factor x_fact around. Name the levels of the factor of the variable x_fact with ‘A’, ‘B’, ...
  2. copy x_fact in variable x_fact2 and change the designation of the factor levels to ‘S1’, ‘S2’,… (use the function levels (…)). Use the command table () to the variable x_fact2 to display. What does the command do?
  3. Enter the following command: x_fact3 = factor (x_fact2, levels = c ('S3', 'S1', 'S2')). Comparisons x_fact2 and x_fact3, What has changed?

Matrices

In R, matrices are objects to which elements of the same data type can be assigned in the form of rows and columns. This means that vectors of the same data type can be linked and stored in an object (the matrix) in rows or columns. In R, a matrix of vectors with the rbind () and cbind () Functions are put together.

  • With rbind () the vectors are stored as rows of the matrix, and
  • With cbind () as columns.

If vectors with different data types are combined using these functions, all values ​​in the matrix are used as type character saved. Copy the code from the following block and execute it line by line. Discuss the results.

As can be seen from the code above, both the column name and the designation of the rows with the functions colnames () and rownames () be determined. This option still proves to be very helpful for many applications, especially with the column headings.

To determine how many rows or columns a matrix has, you can either take it from the Environment (Value column) or directly with the functions nrow (matrix) / ncol (matrix) Interrogate. The function dim (matrix) returns a vector as the result, the first entry of which contains the number of rows and the second the number of columns10.

Access to elements of a matrix

As with vectors, an index is also used with matrices to address a position. In contrast to the vector, two indices are used for matrices:

  • the first index always refers to the row number of a matrix
  • the second index always refers to the column number of a matrix

The following examples illustrate the possible uses of addressing with indices:

Comment: if only one index is used in a matrix, the corresponding element is output as a matrix. The numbering is determined as follows: starting from the first row and column, the elements of the first column are numbered in ascending order. At the end of a column, the numbering continues in the first row of the following column - until the end of the matrix is ​​reached.

Generation of matrices

There are various ways of generating matrices with the help of R functions. Especially for the simulation of data (e.g. drawing a sample of size \ (N \) from a normally distributed population, or from a uniform distribution, etc.) can be solved elegantly for the evaluation of statistical models. The following examples give an insight into a few possibilities to generate matrices. Copy the code into your editor and execute it line by line. Discuss the features and the results.

Arithmentic functions on two matrices

When arithmetic functions are applied to two matrices, two elements of the same row and column are always added, subtracted, multiplied or divided. For this reason it is also necessary that the dimensions (= number of rows and columns, also \ (m \ times n \)) of the two matrices match. This can be checked with the help of logical operators.

Without going into further details, the calculation of the scalar product of two matrices should be mentioned here. This is calculated using the operator \ (\% * \% \). Copy the following code into the editor and execute it line by line. Discuss the results.

In addition, functions such as mean (), median (), sum (), etc. can be applied to all elements of a matrix. Copy the following code into the editor and execute it line by line. Discuss the results.

The function used here apply () is part of a function group that can be used as an alternative for loops. Other important functions of this group are:

  • lapply () - application of a function to the elements of vectors, data frames and lists. Returns a list as the result.
  • sapply () - simplifies the output of the lapply () function.
  • rep () - for repetitively replicating vectors and / or factors.

The application and details of these functions are dealt with in the corresponding chapters.

Exercise block matrices

Using the above variables, the following tasks are to be processed:

  1. create a matrix Xin which the variables lalt, sex and large are stored as columns.
  2. create a matrix Zin which the variables lalt, sex and large are stored as lines.
  3. Enter the value of the second row and third column of the variable X out.
  4. Enter the third line of the variable X off (all columns).
  5. Enter the 3rd to 5th row and 1st to 2nd column of the variable X out.
  6. Use the command colnames () the column names of the variables X to display.
  7. create a variable ColNames with the values ​​Age ’, Weight’ and Height ’.
  8. wise of the variables X these new column names to (use the command colnames ()).
  9. create a variable names, in which 7 arbitrary names are saved.
  10. use this variable around the matrix X assign these names as line labels (use the command rownames ()).
  11. try the command fix (X), see in the help what the command does and discuss the properties.
  12. determine the ‘dimension’ of the matrix X (use the command dim ()).
  13. determine the "length" of the matrix X (use the command length ()).
  14. Calculate the height in meters and add the result as another column to the existing matrix X at.
  15. Determine the positions (indices) of the people who are taller than 200 cm. What's the name of the person?

Data frames

An extension of the data type matrix is the so-called Dataframe. With this data type it is possible to save different formats (data types) within an object. In the case of vectors and matrices, the restriction applies that all elements must have the same data type. In the case of matrices, when adding another column, for example, the number of rows corresponds to the number of rows already in the matrix (otherwise there is an error message).

At a Dataframe data of different types can now be combined in one object.However, the requirement remains that the length of the different elements (vectors) is the same.

A Dataframe is in a way comparable to an Excel spreadsheet.

Figure 26: Example of different data in an Excel table

In the first and third columns (LNr, Age) are numerical values, in the second is a date and in the fourth a name (string, character). If this table were stored in R as a matrix, R would have all the data in the data type character convert! At a Data frame in contrast, the data type of each column is preserved.

To create a data frame from existing objects (vectors, matrices), the function data.frame () used. Access to the elements of a data frame is the same as with matrices. The name of Data frames followed by two square brackets, within which the indices of the rows and columns are given. Correspondingly, the names of the columns and rows can also be displayed or changed using the colnames () and rowname () functions (see matrices).

In addition, the data frame offers a special form of addressing columns. If the columns are viewed as variables whose names are given by the column name, the corresponding column can be accessed using the following syntax: DataframeName $ ColumnName. The addressing of elements within a variable (column) is still carried out with the index method.

Some programmers prefer one lean code, i.e. there should be as little text as possible in the program lines. When using data frames, there are the functions attach (DataframeName) and detach (DataframeName). The former means that when referring to variables of a data frame, the name of the same no longer has to be specified (so you save the DataframeName $). However, the content is then only available for reading. With detach (DataFrameName) direct referencing via variable names is canceled again. It should be noted, however, that some problems can arise in connection with the naming of other variables! Therefore, it is often not recommended to use this function.

The following code shows the properties just described. Copy the code into the editor and then execute it line by line:

Task block data frames

Again, use the following data to process the tasks:

  1. create a data structure (data.frame) with the name questionnaire with the following content: id, gender, sex, lalt, large, mon, date, decided, proj, i1, i2, i3, i4, i5
  2. enter the 3rd line of the questionnaire out.
  3. use the function head () and discuss the result
  4. try the command questionnaire $ i1 [1: 3]. Discuss the result, see what that $ means!
  5. enter the values ​​of the 1st to 3rd row of the column proj out.
  6. try the attach () command and the detach () command. Use the help function if necessary.

Data tables

An expanded version of the Data frames we through the package data.table11. One of the main advantages of using data.table is the considerably faster processing - especially with very large data sets. The package contains a function (fread ()) for reading csv-Files, which is the often used function with regard to loading times of very large files read.csv () dwarfs. The handling of the data in a data.table is far faster than in one Dataframe.

Despite these advantages, we will forego the use of these functions in the following, since on the one hand we are not using big data and on the other hand a large part of the sample data sets made available by R in the form of Data frames and Lists will find.

Lists

To address the restrictions on the Data frames to escape the data type list be used. In addition to different data types, list objects can also contain objects of different sizes (scalars, vectors, matrices, data frames), but object structures such as functions. Lists are a way of storing pretty much everything that can be created in R in objects in a variable of the list type. Generating a list is done simply using the function list (). First, let's look at the following code: