You are currently viewing Run R for Loop in Parallel

How to run R for loop in parallel? In any programming, loops are used to execute the statements repeatedly. As your code within the loop gets complex, the loops may take longer hence, to solve this problem you can run each iteration of the loop in parallel.

Following are the steps to run R for loop in parallel

  • Step 1: Install foreach package
  • Step 2: Load foreach package into R
  • Step 3: Use foreach() statement
  • Step 4: Install and load doParallel package

Let’s execute these steps and run an example.

Step 1- Install foreach package

In order to run R for loop in parallel, you need to use foreach() statement from the foreach package. This doesn’t come with the default install of R hence, you need to install it first.


#Install foreach package
install.packages("foreach")

Step 2 – Load and initialize the foreach library

Load foreach library using library(foreach)


#Load foreach library
library(foreach)

Step 3 – Use foreach

The foreach() returns a list with the results. You also need to use %do% operator after the loop definition


#Load foreach library
x <- foreach(i = 1:20) %do% {
  sqrt(i)
  }
x

To change the return type from list to vector, you can use the .combine argument of foreach to arrange the list as a vector.

Similarly, you can also use options such as cbindrbind, or even custom functions can be used as well to change the return type of foreach().


#Load foreach library
x <- foreach(i = 1:20, .combine=cbind) %do% {
  sqrt(i)
  }
x

Step 4: Running foreach loops in Parallel

The foreach loop with operator %do% explained above processes the tasks or each iteration sequentially. In order to run in parallel you have to use foreach with operator %dopar%. And, you also need to install and load the library doParallel.

In order to run in parallel, you need to create a cluster with the processors or cores on your server or laptop.


library(doParallel)

#Setup backend to use many processors
totalCores = detectCores()

#Leave one core to avoid overload your computer
cluster <- makeCluster(totalCores[1]-1) 
registerDoParallel(cluster)

Now run the foreach in parallel


library(foreach)
#Run forloop in Parallel
x <- foreach(i = 1:20 .combine=cbind) %dopar% {
  sqrt(i)
  }
x

#Stop cluster
stopCluster(cluster)

If you get the below error when you run your program, you might have allocated too many processes, try running again by reducing the codes

Error in serialize(data, node$con) : error writing to connection

Finally, stop the cluster.


#Stop cluster
stopCluster(cluster)

Conclusion

In this article, you have learned how to run R for loop in parallel by using foreach and operator %dopar%. Also, learned to run in parallel first you need to create the cluster with the cores you wanted to run. The number of cores defines how many parallel iterations you wanted to run.

Related Articles

Naveen Nelamali

Naveen Nelamali (NNK) is a Data Engineer with 20+ years of experience in transforming data into actionable insights. Over the years, He has honed his expertise in designing, implementing, and maintaining data pipelines with frameworks like Apache Spark, PySpark, Pandas, R, Hive and Machine Learning. Naveen journey in the field of data engineering has been a continuous learning, innovation, and a strong commitment to data integrity. In this blog, he shares his experiences with the data as he come across. Follow Naveen @ LinkedIn and Medium