Docker & Container Runtimes, News & Announcements

Distributed data analysis with plain UNIX commands and Docker Swarm

Editor: For setting up the Docker Swarm cluster used in this article, the author uses Docker Machine. Keep that in mind because the pre-stable version of Docker has orchestration built-in, so Docker Machine is about to go the way of the dodo.

The purpose of this post is to show how powerful and flexible Docker Swarm can be when combined with standard UNIX tools to analyze data in a distributed fashion. To do this, let’s write a simple MapReduce implementation in bash/sh that uses Docker Swarm to schedule Map jobs on nodes across the cluster.

Related Post:  Five Reasons to Switch to Software for Load Balancing

MapReduce is usually implemented when there’s a large dataset to process. For the sake of simplicity and for reproducibility by the reader, we’re using a very small dataset composed of a few megabytes of text files.

This post is not about showing you how to write a MapReduce program. It’s also not about suggesting that MapReduce is best done in this way. Instead, this post is about making you aware that the plain old UNIX tools such as sort, awk, netcat, pv, uniq, xargs, pipe, join, time, and cat can be useful for distributed data processing when running on top of a Docker Swarm cluster.

Related Post:  Building Rancher Catalog Templates from Scratch : Part 1

Read the complete article here.

Docker

JOIN OUR NEWSLETTER
Enjoying the article you're reading? Subscribe to our newsletter to receive updates and new articles as soon as they are published - right in your Inbox
We hate spam. Your email address will not be sold or shared with anyone else.
Please share:
Tags:

We Recommend These Vendors and Free Offers

Launch an SSD VPS in Europe, USA, Asia & Australia on Vultr's KVM-based Cloud platform starting at $5:00/month (15 GB SSD, 768 MB of RAM).

Deploy an SSD Cloud server in 55 seconds on DigitalOcean. Built for developers and starting at $5:00/month (20 GB SSD, 512 MB of RAM).

Want to become an expert ethical hacker and penetration tester? Request your free video training course of Online Penetration Testing and Ethical Hacking

Whether you're new to Linux or are a Linux guru, you can learn a lot more about the Linux kernel by requesting your free ebook of Linux Kernel In A Nutshell.


Leave a Comment

Your email address will not be published. Required fields are marked *

*