Docker & Container Runtimes, News & Announcements

Distributed data analysis with plain UNIX commands and Docker Swarm

Editor: For setting up the Docker Swarm cluster used in this article, the author uses Docker Machine. Keep that in mind because the pre-stable version of Docker has orchestration built-in, so Docker Machine is about to go the way of the dodo.

The purpose of this post is to show how powerful and flexible Docker Swarm can be when combined with standard UNIX tools to analyze data in a distributed fashion. To do this, let’s write a simple MapReduce implementation in bash/sh that uses Docker Swarm to schedule Map jobs on nodes across the cluster.

MapReduce is usually implemented when there’s a large dataset to process. For the sake of simplicity and for reproducibility by the reader, we’re using a very small dataset composed of a few megabytes of text files.

This post is not about showing you how to write a MapReduce program. It’s also not about suggesting that MapReduce is best done in this way. Instead, this post is about making you aware that the plain old UNIX tools such as sort, awk, netcat, pv, uniq, xargs, pipe, join, time, and cat can be useful for distributed data processing when running on top of a Docker Swarm cluster.

Read the complete article here.

Docker

Subscribe to LinuxBSDos.com

Subscribe to receive the latest articles in your Inbox

I agree to have my personal information transfered to MailChimp ( more information )

Trust me, you'll not be spammed...

Please share:
Tags:

We Recommend These Blockchain Conferences and Servicess

Register now for Blockchain & Decentralized Tech SuperSummit, international conference on blockchain technology in Dallas, TX (USA), April 1 - 4, 2019

Reasons to use control panel for your server

Today's valid web hosting discount codes

Learn how to trade cryptocurrencies using technical and fundamental analyses at BDT SuperSummit

Best binary auto trading software reviews by 7binaryoptions.com

Launch an SSD VPS in Europe, USA, Asia & Australia on Vultr's KVM-based Cloud platform starting at $5:00/month (15 GB SSD, 768 MB of RAM).


Leave a Comment

Your email address will not be published. Required fields are marked *

*