Distributed R

The R statistical and programming language is a single-threaded language, a factor that counts against it when it comes to analyzing massive datasets.

HP Labs has launched a platform called Distributed R that makes it possible to analyze such massive datasets using computing resources available to R instances running on multiple computing nodes. It is an open source project hosted on GitHub.

IBM has a similar system called Big R that’s used for working with datasets hosted on the company’s InfoSphere® BigInsights servers.

According to the project’s official description:

Distributed R consists of a single master process and multiple workers. Logically, each worker resides on one server. The master controls each worker and can be co-located with a worker or on a separate server. Each worker manages multiple local R instances.

Distributed R allows users to write programs that are executed in a distributed fashion. That is, developer-specified program components can run in multiple single-threaded R-processes. The result dramatically reduces execution times for Big Data analysis.

This video provides a short introduction to Distributed R.

Related Post:  Vagrant 1.8 released. Includes support for linked clones and snapshots

Distributed R

Setting up a Distributed R cluster is slightly involved, but the process is well-documented (PDF), so if you know your way around Linux, the project’s only supported platform, you should be able to get one running fairly easily. It is only supported on the 64-bit editions Red Hat Enterprise Linux 6.x+, CentOS 6.x+, and Ubuntu 12.04 LTS and Debian 7. I haven’t tried to set up a cluster yet, but I’ll take a stab at it as soon as time permits.

Related Post:  Why did this server run out of disk space?

Share:

Share on facebook
Facebook
Share on twitter
Twitter
Share on pinterest
Pinterest
Share on linkedin
LinkedIn

Hola! Did you notice that LinuxBSDos.com no longer runs network ads?  Yep, no more ads from the usual suspects that track you across the Internet.  But since  I still need to pay to keep the site running, feel free to make a small donation by PayPal.

Subscribe for updates. Trust me, no spam!

Mailchimp Signup Form

Sponsored links

1. Attend Algorithm Conference, a top AI and ML event for 2020.
2. Reasons to use control panel for your server.
3. DHgate Computers Electronics, Cell Phones & more.

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest

On social media
Via my newsletter
Mailchimp Signup Form

Sponsored links

1. Attend Algorithm Conference, a top AI and ML event for 2020.
2. Reasons to use control panel for your server.
3. DHgate Computers Electronics, Cell Phones & more.
Hacking, pentesting distributions

Linux Distributions for Hacking

Experts use these Linux distributions for hacking, digital forensics, and pentesting.

Categories
Archives

The authors of these books are confirmed to speak during

Algorithm Conference

T-minus AI

Author was the first chairperson of AI for the U.S. Air Force.

The case for killer robots

Author is the Director of the Center for Natural and Artificial Intelligence.

Why greatness cannot be planned

Author works on AI safety as a Senior Research Scientist at Uber AI Labs.

Anastasia Marchenkova

An invitation from Anastasia Marchenkova

Hya, after stints as a quantum researcher at Georgia Tech Quantum Optics & Quantum Telecom Lab, and the University of Maryland Joint Quantum Institute, I’m now working on superconducting qubit quantum processors at Bleximo. I’ll be speaking during Algorithm Conference in Austin, Texas, July 16 – 18, 2020. Meet me there and let’s chat about progress and hype in quantum computing.