apply family functions - Part 4

eapply(), rapply() and mapply() functions

eapply function

Using environments

In R, an Environment is a site to assign variables or values that we assign to objects. Each time a new R session begins, all created objects stay in the global environment. If we create the object x, it will then be in the R default Environment. Thels() function lists the created objects, so we can check that x actually exists.

x <- 28022020
ls()
## [1] "x"

It is possible to assign the global Environment to an object that we’ll call environment_1. The curious thing is that after doing this, environment_1 is an object into the global Environment , but at the same time, it is the global Environmet.

environment_1 <- globalenv()
class(environment_1)
## [1] "environment"
ls()
## [1] "environment_1" "x"

This process is similar to using a list, a topic that we talked about in this post. Since the object environment_1 is the global environment, it contains the objectx, which we assign to the global Environment and also includes itself in a cyclical way…

environment_1$x
## [1] 28022020
environment_1$environment_1
## <environment: R_GlobalEnv>
environment_1$environment_1$environment_1
## <environment: R_GlobalEnv>
ls(environment_1$environment_1$environment_1$environment_1)
## [1] "environment_1" "x"

The environment_1 object could, as Emmett Brown someday said, it creates a paradox that would destroy the universe, so it is best to eliminate it. Section 2.1.10 of the R language definition offers a more formal description of what a Environment is.

rm(environment_1)
ls()
## [1] "x"

It is possible to create a new Environment that is independent of the global Environment, which we will call environment_2. When creating it, we can see that it is empty compared to the global Environment, which contains x and environment_2. Also, if we directly generate a new variable, say z, it will set to the global Environment.

environment_2 <- new.env()
environment_2
## <environment: 0x000000001564e3b8>
ls(globalenv())
## [1] "environment_2" "x"
ls(environment_2)
## character(0)
z <- pi

ls(globalenv())
## [1] "environment_2" "x"             "z"
ls(environment_2)
## character(0)

Now we create a new object called yand we assign it to environment_2. Now, the object y will be contain only in environment_2 and not in Environment global, even if the last one contains environment_2.

environment_2$y <- "This is an abstract topic"

ls(globalenv())
## [1] "environment_2" "x"             "z"
ls(environment_2)
## [1] "y"
environment(x)
## NULL

Has an Environment a real use? The answer is yes, and they are fundamental in something widely used in R: Functions. In general, when building a function, it is assumed that it has only two components: the arguments and the content of the function. Consider a simple function that reverses the sign of a number.

reverse_sign <- function(number){
    number*-1
}

In the previous case, the reverse_sign() function has a single argument number, while its content is \(number\cdot -1\). However, the functions have a third argument: Environment. When the reverse_sign() function created, it was in the global Environment.

ls(globalenv())
## [1] "environment_2" "reverse_sign"  "x"             "z"
ls(environment_2)
## [1] "y"
environment(reverse_sign)
## <environment: R_GlobalEnv>

The consequence is that the reverse_sign() function only works in the Environment assigned to it. For example, the function run only on the object x that is in the global Environment:

reverse_sign(x)
## [1] -28022020

If we create an object that is also called x, but that is in theenvironment_2, the reverse_sign() function still running only in the global Environment.

environment_2$x <- 123456
reverse_sign(x)
## [1] -28022020

eapply function

These small details that we usually don’t handle in regular tasks using R may become indispensable in certain situations, such as when you want to use the eapply() function. Knowing basics on how an Environment works, we are going to eliminate all the objects created so far, this to keep everything in order.

rm(list = ls())

As we also review in this post, the lapply() function applies a function to each element of a list. Similarly, the eapply() function applies a function to each element named in an Environment with the difference that in eaaply(), the first argument is an Environment, and not a list as in lapply(). Let’s create now a new Environment to apply a function to its elements that return the square root of each one plus 10:

environment_1 <- new.env()
environment_1$element_1 <- 4
environment_1$element_2 <- 9
environment_1$element_3 <- 25

eapply(environment_1, function(x){
    sqrt(x)+10
})
## $element_1
## [1] 12
## 
## $element_2
## [1] 13
## 
## $element_3
## [1] 15

We can also display the result without name tags:

eapply(environment_1, function(x){
    sqrt(x)+10
}, USE.NAMES = FALSE)
## [[1]]
## [1] 12
## 
## [[2]]
## [1] 13
## 
## [[3]]
## [1] 15

Previous runs of the eapply() function evaluate our function in all the elements contained in the Environment, but there are exceptions if there are hidden elements. The hidden elements are objects that exist but are not visible to the naked eye, we can create the element_4 as hidden as follows:

environment_1$.element_4 <- 48
ls(environment_1)
## [1] "element_1" "element_2" "element_3"

If we repeat the previous evaluation of the eapply() function, it only runs with the visible elements:

eapply(environment_1, function(x){
    sqrt(x)+10
})
## $element_1
## [1] 12
## 
## $element_2
## [1] 13
## 
## $element_3
## [1] 15

But we can run that function over all elements:

eapply(environment_1, function(x){
    sqrt(x)+10
}, all.names = TRUE)
## $element_1
## [1] 12
## 
## $.element_4
## [1] 16.9282
## 
## $element_2
## [1] 13
## 
## $element_3
## [1] 15

rapply function

In this function, the “r” refers to “recursive”. This function has two objectives: Apply a function recursively to a list or apply that function only to the elements of a list with a specific class. This second reason is especially useful because the lists in R are perhaps the most useful objects when we are using that language because they store databases, numbers, character strings, graphics, and more. We can apply a function to every numeric element of a list, without the need to know in which positions these elements are within the list. A simple example can be applied to the famous iris data set:

rapply(iris, mean, class="numeric")
## Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
##     5.843333     3.057333     3.758000     1.199333
rapply(iris, table, class="factor")
##     Species.setosa Species.versicolor  Species.virginica 
##                 50                 50                 50

Or, if we have a list with different classes, we may want to multiply by two those elements of the list that are numerical:

rapply(list(2,5,7,"We can't multiply this element because is a string"), function(x){x*2}, class="numeric")
## [1]  4 10 14

mapply function

The mapply() function can be seen as the multivariate version of the apply functions. For example, lapply() function can only be applied to the elements of a list, but if you have a list whose elements are an argument of a function and another list whose elements are the other argument of the function, then mapply() is used. The function to be applied must have as many arguments as the number of lists to pass to mapply(). MoreArgs argument is useful if there are more arguments that need to use in the function. It is easier to show its operation with an example than with words. Suppose we want to obtain the result of \(x*y+1\) by varying the values of \(x\) and \(y\) as follows: \(1\cdot 2 + 1, 2\cdot 3 + 1, 3\cdot 4 + 1, \cdots , 10000\cdot 10001 + 1\). We can obtain this calculation through a for loop as follows:

z <- NULL
k <- 1
x <- 1:10000
y <- 2:10001
for(i in 1:10000){
    z[k] <- x[i]*y[i]+1
    k <- k+1
}

But we can also use the mapply() function:

mapply(function(x,y){x*y+1},
       x=1:10000,
       y=2:10001)

Both functions display the same result; however, mapply() is considerably more efficient when performing the calculation. We will compare the running time of different processes in the next post when I’ll show you some parallel versions of the apply family functions.

Related

comments powered by Disqus
ORCID iD iconhttps://orcid.org/0000-0001-6733-4759