Docker & Container Runtimes, News & Announcements

Distributed data analysis with plain UNIX commands and Docker Swarm

Editor: For setting up the Docker Swarm cluster used in this article, the author uses Docker Machine. Keep that in mind because the pre-stable version of Docker has orchestration built-in, so Docker Machine is about to go the way of the dodo.

The purpose of this post is to show how powerful and flexible Docker Swarm can be when combined with standard UNIX tools to analyze data in a distributed fashion. To do this, let’s write a simple MapReduce implementation in bash/sh that uses Docker Swarm to schedule Map jobs on nodes across the cluster.

Related Post:  Samsung Chromebook Pro, Chromebook Plus will run Android apps

MapReduce is usually implemented when there’s a large dataset to process. For the sake of simplicity and for reproducibility by the reader, we’re using a very small dataset composed of a few megabytes of text files.

This post is not about showing you how to write a MapReduce program. It’s also not about suggesting that MapReduce is best done in this way. Instead, this post is about making you aware that the plain old UNIX tools such as sort, awk, netcat, pv, uniq, xargs, pipe, join, time, and cat can be useful for distributed data processing when running on top of a Docker Swarm cluster.

Related Post:  What is Kubernetes? An intro for beginners

Read the complete article here.

Docker

Subscribe to LinuxBSDos.com

Subscribe to receive the latest articles in your Inbox

Trust me, you'll not be spammed...

Please share:
Tags:

We Recommend These Vendors and Free Offers

Google has got competition, because Presearch is building a blockchain-based search engine controlled by the community. At $0.15 a token, you can participation in Lot 3 of the token sale by clicking here

Open Money is building a solution that will run mainstream software on blockchain tech. Click here to get free tokens that will be the digital currency of the platform

COMSA allows centralized businesses to adopt blockchain technology. The token sale starts soon! Sign up for free by clicking here

Register now for Blockchain & Cryptocurrency Con 2018, international conference on blockchain technnology in Dallas, TX (USA), Feb. 23-24, 2018. Students can register at a 50% discount.

Launch an SSD VPS in Europe, USA, Asia & Australia on Vultr's KVM-based Cloud platform starting at $5:00/month (15 GB SSD, 768 MB of RAM).


Leave a Comment

Your email address will not be published. Required fields are marked *

*