Distributed R brings scalable, high-performance Big Data analytics to R

Distributed R

The R statistical and programming language is a single-threaded language, a factor that counts against it when it comes to analyzing massive datasets.

HP Labs has launched a platform called Distributed R that makes it possible to analyze such massive datasets using computing resources available to R instances running on multiple computing nodes. It is an open source project hosted on GitHub.

IBM has a similar system called Big R that’s used for working with datasets hosted on the company’s InfoSphere® BigInsights servers.

According to the project’s official description:

Distributed R consists of a single master process and multiple workers. Logically, each worker resides on one server. The master controls each worker and can be co-located with a worker or on a separate server. Each worker manages multiple local R instances.

Distributed R allows users to write programs that are executed in a distributed fashion. That is, developer-specified program components can run in multiple single-threaded R-processes. The result dramatically reduces execution times for Big Data analysis.

This video provides a short introduction to Distributed R.

Related Post:  Linux Deepin needs your help with the Deepin Localization Project

Distributed R

Setting up a Distributed R cluster is slightly involved, but the process is well-documented (PDF), so if you know your way around Linux, the project’s only supported platform, you should be able to get one running fairly easily. It is only supported on the 64-bit editions Red Hat Enterprise Linux 6.x+, CentOS 6.x+, and Ubuntu 12.04 LTS and Debian 7. I haven’t tried to set up a cluster yet, but I’ll take a stab at it as soon as time permits.

Share:

Share on facebook
Facebook
Share on twitter
Twitter
Share on pinterest
Pinterest
Share on linkedin
LinkedIn

Newsletter: Subscribe for updates

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

Get the latest

On social media

Security distros

Hacker
Linux distros for hacking and pentesting

Crypto mining OS

Bitcoin
Distros for mining bitcoin and other cryptocurrencies

Crypto hardware

MSI GeForce GTX 1070
Installing Nvidia GTX 1070 GPU drivers on Ubuntu

Disk guide

LVM
Beginner's guide to disks & disk partitions in Linux

Bash guide

Bash shell terminal
How to set the PATH variable in Bash
Categories
Archives
0
Hya, what do you think? Please comment.x
()
x
Algorithm 2020

Did you get your ticket yet?

Algorithm 2022 is a 3-day conference on blockchain, cryptocurrencies and AI set for Feb. 10 – 12, 2022, in Dallas. Speakers from the US Air Force, Ministry of Digital Transformation, Ukraine, and more. click that button to learn more and get your ticket. Use BSD20 code for 20% off ticket price.