Docker & Container Runtimes, News & Announcements

Distributed data analysis with plain UNIX commands and Docker Swarm

Editor: For setting up the Docker Swarm cluster used in this article, the author uses Docker Machine. Keep that in mind because the pre-stable version of Docker has orchestration built-in, so Docker Machine is about to go the way of the dodo.

The purpose of this post is to show how powerful and flexible Docker Swarm can be when combined with standard UNIX tools to analyze data in a distributed fashion. To do this, let’s write a simple MapReduce implementation in bash/sh that uses Docker Swarm to schedule Map jobs on nodes across the cluster.

Related Post:  Can an existing app be containerized?

MapReduce is usually implemented when there’s a large dataset to process. For the sake of simplicity and for reproducibility by the reader, we’re using a very small dataset composed of a few megabytes of text files.

This post is not about showing you how to write a MapReduce program. It’s also not about suggesting that MapReduce is best done in this way. Instead, this post is about making you aware that the plain old UNIX tools such as sort, awk, netcat, pv, uniq, xargs, pipe, join, time, and cat can be useful for distributed data processing when running on top of a Docker Swarm cluster.

Related Post:  Five Reasons to Switch to Software for Load Balancing

Read the complete article here.

Docker

LinuxBSDos needs your donation to continue!

I hope this article has saved you valuable time and effort to fix a problem that would have taken more time than is necessary. That makes me happy, and why I love doing this. But because more people than ever are reading articles like this with an adblocker, ad revenues have fallen to a level that's not enough to cover my operating costs. That's why I want to ask you a favor: To make a one-time or recurring donation to support this site and keep it going. It's a small favor, but every one counts. And you can make your donation using Patreon or directly via Paypal. Thank you for whatever donation you're able to make.

Donate via Patreon. Donate via Paypal.

Aside from donation, you may also signup to receive an email once I publish new content. Your email will not be shared or traded to anyone. And you can unsubscribe at any time.

Please share:
Tags:

We Recommend These Vendors and Free Offers

Launch an SSD VPS in Europe, USA, Asia & Australia on Vultr's KVM-based Cloud platform starting at $5:00/month (15 GB SSD, 768 MB of RAM).

Deploy an SSD Cloud server in 55 seconds on DigitalOcean. Built for developers and starting at $5:00/month (20 GB SSD, 512 MB of RAM).

Want to become an expert ethical hacker and penetration tester? Request your free video training course of Online Penetration Testing and Ethical Hacking

Whether you're new to Linux or are a Linux guru, you can learn a lot more about the Linux kernel by requesting your free ebook of Linux Kernel In A Nutshell.


Leave a Comment

Your email address will not be published. Required fields are marked *

*