# What is the solution of v2v2v2

## Introduction to RStudio

### Top-down vs. bottom-up

As with learning a new language, there are different approaches to programming languages. In the bottom-up approach, you will first deal with the elements of a language (vocabulary) and their correct composition (grammar). This approach is often difficult and laborious, especially at the beginning, as the connections only become clear after a while and the (successful) use of the language only after a longer or shorter period *Lead time* is possible.

With the top-down approach, you are confronted directly with the entire scope of the language. Language modules are taken over by others, details on the structure, the rules and an expansion of the vocabulary are formed by taking over from experts.

When learning a programming language, the innumerable possibilities of examples and templates on the Internet, as well as the help pages for the respective languages that have been worked out down to the last detail, offer a mixture of the learning process. We will start the entry into the R programming language with a top-down approach.

### Copy and paste (exercise block)

- Copy the line \ (v <- c (1,4,4,3,2,2,3) \) from Indexing with numbers and names into the new script and execute this line. Describe the effect of this line on the changes in the console and environment windows.
- Copy (from the website) the line \ (v [c (2,3,4)] \) into the new script and execute this line. Describe the effect of this line on the changes in the console and environment windows.
- Go to the following website: Bar and line graphs (ggplot2) and copy from the block
*Bar graphs of values*the first four lines in the editor. Carry out these lines and discuss the result. - Now copy the second code block from this website and execute the code. Discuss the result (see plot window).
- Now copy the two code blocks in the chapter “Bar graphs of counts” and execute the code. Pay particular attention to where the data (tips) come from!
- Enter the commands data () and? Tips, or? French_fries and discuss the result.
- Do the following two lines:
- Dat_FF <- french_fries
- str (Dat_FF)

- Load the SPSS file “bigfive.sav” with the help of Import Dataset and copy the corresponding commands into your script.
- Now load the file “bigfive_excel.xlsx” (also in ../Data/) with the help of Import Dataset and copy the corresponding R commands into the editor.

### Objects in R (bottom up)

Everything can be saved as an object in R.

- Individual values Multiple values (e.g. as a data record with raw data)
- Tables
- Statistical models
- Results of statistical analyzes
- Functions, etc.

All objects that were created or opened in an R session are located in the so-called environment (work area). In the last exercise, the following objects were created in the environment:

**Figure 25**: Environment of the last exercise

The creation of an object in the environment takes place via the assignment *Object name <- object content*. The assignment via the *<-* is R-specific. You can also do the usual *=* use, whereby this also has another property in R (details later).

For the award of *Object names* the following rules must be observed:

- Object name cannot begin with number
- Object name must not contain any operators (+, -, * etc.)
- Pay attention to upper and lower case

There are a variety of different object types in. The basic object types (and what they have in common with SPSS data types) are:

- Vectors \ (\ rightarrow \) ordinal / scale variables
- numeric (numbers)
- character (letters)

- Factors \ (\ rightarrow \) nominal / ordinal variables
- nominal variable
- Categories of the factor = levels (can contain letters or numbers)

- Data frames (multiple rows and columns) \ (\ rightarrow \) data set
- Columns (vectors and factors
- Lines (cases, e.g. test subjects)

- Matrices \ (\ rightarrow \) not available in SPSS
- lots of rows and columns
- only data of one type (e.g. numerical values, or only factors, etc.) can be displayed.

- Arrays \ (\ rightarrow \) not available in SPSS
- Combination of several matrices (summaries of elements of the same data type (numeric, character, logical), are addressed via 2 or more indices).

- Lists \ (\ rightarrow \) does not exist in SPSS
- Combination of several objects
- Lists can contain any object, including objects of various types.
- In contrast to data frames and arrays, objects of different lengths can also be saved.

### Vectors

The simplest object is a vector made up of several elements. Open a new script file and copy the following content into this file:

Save the file under the name *06_Objects.R*. Now execute the content line by line and discuss the effect of the individual lines! Now copy the following code into the file and execute this code line by line. What happens in the environment?

There are several options for accessing individual elements of a vector.

- Access through direct indexing
- Access through variables (vectors) in which the indices of the elements to be accessed are stored.

The following code shows both of these possibilities. Copy the code into the script and execute it line by line.

To determine what type of data or what class of data (numeric, alphanumeric, date, etc.) is stored in the respective object, the function *class ()* be used. The functions can be used to change the data type of a (simple) object *as.numeric (variable)*, or. *as.character (variable)* possible. In addition, there are many functions that can be used with regard to the data type of objects. Some of these functions are discussed in detail in the relevant applications.

The conversion from numeric to alphanumeric and vice versa, however, are frequently used functions, especially when transferring data from other applications.

### Other useful vector functions

In addition to the function already discussed *seq ()*, the following functions are often useful for working with vectors. Since we are constantly using these functions, not all details are discussed here - use the help () function for this.

### Logical vectors

A logical vector consists of TRUE and FALSE elements. These vectors follow a so-called *Boolean logic* with these principles (in R the logical AND is written with \ (\ & \), the logical OR with \ (| \):

- T and T equals T (T & T \ (\ rightarrow \) T)
- F and F equals F (F & F \ (\ rightarrow \) F)
- T and F equals F (T & F \ (\ rightarrow \) F)
- T or T equals T (T | T \ (\ rightarrow \) T)
- F or F equals F (T | T \ (\ rightarrow \) F)
- T or F equals T (T | T \ (\ rightarrow \) T)

If brackets are used, the following applies: Expression within brackets is evaluated first! With the *sum ()* Function, the number of true (T) elements of a vector can be determined. An application example would be, for example, to use a logical query to determine how many yes answers are in a vector consisting of yes / no answers (see code below).

### Task pad vectors

Copy the following code into a new R script and save it under the name *06_Objects_Tasks*.

Now work on the following tasks:

- compute the mean of the variable
*large*. - copy the variable
*large*into a new variable*large1*. - replace the first value of
*large1*with the value NA. - compute the mean of
*large1*- discuss the result. - compute the mean of
*large1*taking into account missing values (use the help to read about the parameters of the mean () function!). - compute the mean of
*large1*, whereby 50% of the values (25% of the first and 25% of the last) should not be included in the mean value calculation. - take the product of the second and the third value of the variable
*large*and save the result in the variable*gross_prod*. - set the third value of the variable
*sex*on 1. - create a variable (
*x*), the values of which should be in descending order and in steps of one from 50 to 1. - show all but the second value of the variable
*x*at. - create a sequence (
*x*) which starts with 1 and counts up in steps of two. - create a sequence (
*x*) which starts with 10 and counts down in steps of two. - create a sequence (
*x*) which starts with 0 and counts up to 10 in steps of two and the last value is 50 (i.e. x = 0 2 4… 10 50). - determine what class the vector
*x*assigned. - convert the vector
*x*from the data type ‘numeric’ to und character ’and save the result in*x_c*. - determine what class the vector
*x_c*assigned. - convert the vector x back to data type ‘numeric’ and save the result in
*x_n* - determine what class the vector
*x_n*assigned. - how many people are greater than or equal to (\ (\ ge \)) than 173?
- for which element of the variable
*large*is height = 181 true (T)?

### Factors

Factors are a special form of vectors and are also defined in R as nominal data. For example, the subdivision of test persons according to gender is usually done in a data type *factor* saved. This factor would usually have 2 so-called factor levels (*levels*) own (male / female).

However, factors can also have several factor levels. For example, a factor could depict the highest level of school education (secondary school leaving certificate, high school diploma, university of applied sciences, university). This factor would therefore have 4 levels.

Since factors are defined at the nominal level, an assignment of *levels* and *labels* possible. This option is a bit confusing for SPSS users, as the meaning in R is as follows:

- levels are the input, i.e. how the levels are coded (in the example with 1, 2, 3).
- labels are the output, i.e. which levels are provided with which label.

To assign several dates to a factor, the *factor ()*Command with the *c ()*Command combined. Copy the following code into the script file and execute the commands line by line. Discuss the results.

### Task block factors

- create the variable x = c (1,2,3,1,1,2,2) and convert it into a factor
*x_fact*around. Name the levels of the factor of the variable x_fact with ‘A’, ‘B’, ... - copy
*x_fact*in variable*x_fact2*and change the designation of the factor levels to ‘S1’, ‘S2’,… (use the function levels (…)). Use the command*table ()*to the variable*x_fact2*to display. What does the command do? - Enter the following command: x_fact3 = factor (x_fact2, levels = c ('S3', 'S1', 'S2')). Comparisons
*x_fact2*and*x_fact3*, What has changed?

### Matrices

In R, matrices are objects to which elements of the same data type can be assigned in the form of rows and columns. This means that vectors of the same data type can be linked and stored in an object (the matrix) in rows or columns. In R, a matrix of vectors with the *rbind ()* and *cbind ()* Functions are put together.

- With
*rbind ()*the vectors are stored as rows of the matrix, and - With
*cbind ()*as columns.

If vectors with different data types are combined using these functions, all values in the matrix are used as type *character* saved. Copy the code from the following block and execute it line by line. Discuss the results.

As can be seen from the code above, both the column name and the designation of the rows with the functions *colnames ()* and *rownames ()* be determined. This option still proves to be very helpful for many applications, especially with the column headings.

To determine how many rows or columns a matrix has, you can either take it from the Environment (Value column) or directly with the functions *nrow (matrix)* / *ncol (matrix)* Interrogate. The function *dim (matrix)* returns a vector as the result, the first entry of which contains the number of rows and the second the number of columns^{10}.

### Access to elements of a matrix

As with vectors, an index is also used with matrices to address a position. In contrast to the vector, two indices are used for matrices:

- the first index always refers to the row number of a matrix
- the second index always refers to the column number of a matrix

The following examples illustrate the possible uses of addressing with indices:

*Comment:* if only one index is used in a matrix, the corresponding element is output as a matrix. The numbering is determined as follows: starting from the first row and column, the elements of the first column are numbered in ascending order. At the end of a column, the numbering continues in the first row of the following column - until the end of the matrix is reached.

### Generation of matrices

There are various ways of generating matrices with the help of R functions. Especially for the simulation of data (e.g. drawing a sample of size \ (N \) from a normally distributed population, or from a uniform distribution, etc.) can be solved elegantly for the evaluation of statistical models. The following examples give an insight into a few possibilities to generate matrices. Copy the code into your editor and execute it line by line. Discuss the features and the results.

### Arithmentic functions on two matrices

When arithmetic functions are applied to two matrices, two elements of the same row and column are always added, subtracted, multiplied or divided. For this reason it is also necessary that the dimensions (= number of rows and columns, also \ (m \ times n \)) of the two matrices match. This can be checked with the help of logical operators.

Without going into further details, the calculation of the scalar product of two matrices should be mentioned here. This is calculated using the operator \ (\% * \% \). Copy the following code into the editor and execute it line by line. Discuss the results.

In addition, functions such as *mean ()*, *median ()*, *sum ()*, etc. can be applied to all elements of a matrix. Copy the following code into the editor and execute it line by line. Discuss the results.

The function used here *apply ()* is part of a function group that can be used as an alternative for loops. Other important functions of this group are:

- lapply () - application of a function to the elements of vectors, data frames and lists. Returns a list as the result.
- sapply () - simplifies the output of the lapply () function.
- rep () - for repetitively replicating vectors and / or factors.

The application and details of these functions are dealt with in the corresponding chapters.

### Exercise block matrices

Using the above variables, the following tasks are to be processed:

- create a matrix
*X*in which the variables*lalt*,*sex*and*large*are stored as columns. - create a matrix
*Z*in which the variables*lalt*,*sex*and*large*are stored as lines. - Enter the value of the second row and third column of the variable
*X*out. - Enter the third line of the variable
*X*off (all columns). - Enter the 3rd to 5th row and 1st to 2nd column of the variable
*X*out. - Use the command
*colnames ()*the column names of the variables*X*to display. - create a variable
*ColNames*with the values Age ’, Weight’ and Height ’. - wise of the variables
*X*these new column names to (use the command*colnames ()*). - create a variable
*names*, in which 7 arbitrary names are saved. - use this variable around the matrix
*X*assign these names as line labels (use the command*rownames ()*). - try the command
*fix (X)*, see in the help what the command does and discuss the properties. - determine the ‘dimension’ of the matrix
*X*(use the command*dim ()*). - determine the "length" of the matrix
*X*(use the command*length ()*). - Calculate the height in meters and add the result as another column to the existing matrix
*X*at. - Determine the positions (indices) of the people who are taller than 200 cm. What's the name of the person?

### Data frames

An extension of the data type *matrix* is the so-called *Dataframe*. With this data type it is possible to save different formats (data types) within an object. In the case of vectors and matrices, the restriction applies that all elements must have the same data type. In the case of matrices, when adding another column, for example, the number of rows corresponds to the number of rows already in the matrix (otherwise there is an error message).

At a *Dataframe* data of different types can now be combined in one object.However, the requirement remains that the length of the different elements (vectors) is the same.

A *Dataframe* is in a way comparable to an Excel spreadsheet.

**Figure 26**: Example of different data in an Excel table

In the first and third columns (LNr, Age) are numerical values, in the second is a date and in the fourth a name (string, character). If this table were stored in R as a matrix, R would have all the data in the data type *character* convert! At a *Data frame* in contrast, the data type of each column is preserved.

To create a data frame from existing objects (vectors, matrices), the function *data.frame ()* used. Access to the elements of a data frame is the same as with matrices. The name of *Data frames* followed by two square brackets, within which the indices of the rows and columns are given. Correspondingly, the names of the columns and rows can also be displayed or changed using the colnames () and rowname () functions (see matrices).

In addition, the data frame offers a special form of addressing columns. If the columns are viewed as variables whose names are given by the column name, the corresponding column can be accessed using the following syntax: *DataframeName $ ColumnName*. The addressing of elements within a variable (column) is still carried out with the index method.

Some programmers prefer one *lean code*, i.e. there should be as little text as possible in the program lines. When using data frames, there are the functions *attach (DataframeName)* and *detach (DataframeName*). The former means that when referring to variables of a data frame, the name of the same no longer has to be specified (so you save the *DataframeName $*). However, the content is then only available for reading. With *detach (DataFrameName)* direct referencing via variable names is canceled again. It should be noted, however, that some problems can arise in connection with the naming of other variables! *Therefore, it is often not recommended to use this function*.

The following code shows the properties just described. Copy the code into the editor and then execute it line by line:

### Task block data frames

Again, use the following data to process the tasks:

- create a data structure (data.frame) with the name
*questionnaire*with the following content:*id*,*gender*,*sex*,*lalt*,*large*,*mon*,*date*,*decided*,*proj*,*i1*,*i2*,*i3*,*i4*,*i5* - enter the 3rd line of the
*questionnaire*out. - use the function
*head ()*and discuss the result - try the command questionnaire $ i1 [1: 3]. Discuss the result, see what that $ means!
- enter the values of the 1st to 3rd row of the column
*proj*out. - try the attach () command and the detach () command. Use the help function if necessary.

### Data tables

An expanded version of the *Data frames* we through the package *data.table*^{11}. One of the main advantages of using data.table is the considerably faster processing - especially with very large data sets. The package contains a function (fread ()) for reading *csv*-Files, which is the often used function with regard to loading times of very large files *read.csv ()* dwarfs. The handling of the data in a *data.table* is far faster than in one *Dataframe*.

Despite these advantages, we will forego the use of these functions in the following, since on the one hand we are not using *big data* and on the other hand a large part of the sample data sets made available by R in the form of *Data frames* and *Lists* will find.

### Lists

To address the restrictions on the *Data frames* to escape the data type *list* be used. In addition to different data types, list objects can also contain objects of different sizes (scalars, vectors, matrices, data frames), but object structures such as functions. Lists are a way of storing pretty much everything that can be created in R in objects in a variable of the list type. Generating a list is done simply using the function *list ()*. First, let's look at the following code:

- What does three-dimensional thinking mean
- Is Daly City CA a safe city
- What is 2 in a billion
- What is the last number on earth
- All WordPress themes are blog themes
- Which fertilizer makes plants grow faster
- Should humans colonize Mars
- Who invented the supply chain
- Can cats eat other cats?
- How hardworking are Singaporeans
- Bakeries have high returns
- How do I develop my spiritual powers
- How do you sharpen your teeth
- Which stoic book should I read next?
- Would a Pentacam diagnose astigmatism
- How many Roman emperors were there
- How was your first Quora experience
- What is the best laptop for programmers
- How do I increase credit card points
- Hate Serbs Malaysia
- What should everyone know about good sales
- Are there any countries without a government?
- What are the best personal blogs
- What is SpaceX Falcon 9 used for?