Big Data

Tidy Text Mining with R

Editor: Tidy Text Mining with R is the title of a book being written by Julia Silge and David Robinson. In it, the authors show how to use tidytext, an R package they co-developed, as a solution for text mining in R. It will be published by O’Reilly, but the draft copy is available free online right now. And a GitHub repo is also available. The following text is the draft introduction.

If you work in analytics or data science, like we do, you are familiar with the fact that data is being generated all the time at ever faster rates. (You may even be a little weary of people pontificating about this fact.) Analysts are often trained to handle tabular or rectangular data that is mostly numeric, but much of the data proliferating today is unstructured and typically text-heavy. Many of us who work in analytic fields are not trained in even simple interpretation of natural language.

Related Post:  Learn to speak Hadoop in 5 minutes

We developed a new R package, tidytext (Silge and Robinson 2016), because we were familiar with many methods for data wrangling and visualization, but couldn’t easily apply these same methods to text. We found that using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. By treating text as data frames of words, we can manipulate, summarize, and visualize the characteristics of text easily and integrate natural language processing into effective workflows we were already using.

Related Post:  Using Pachyderm to analyze the 2016 World Chess Championship

The tools provided by the tidytext package are relatively simple; what is important is the possible applications. Thus, this book provides compelling examples of real text mining problems.

Click here to read the book.

Tidy Text Mining with R

Subscribe to LinuxBSDos.com

Subscribe to receive the latest articles in your Inbox

Trust me, you'll not be spammed...

Please share:
Tags:

We Recommend These Vendors and Free Offers

Get 100 free cryptocurrency tokens when you join the Datum martketplace 100 Datum network tokens, Offer ends August 11, 2018

Launch an SSD VPS in Europe, USA, Asia & Australia on Vultr's KVM-based Cloud platform starting at $5:00/month (15 GB SSD, 768 MB of RAM).

Deploy an SSD Cloud server in 55 seconds on DigitalOcean. Built for developers and starting at $5:00/month (20 GB SSD, 512 MB of RAM).

Want to become an expert ethical hacker and penetration tester? Request your free video training course of Online Penetration Testing and Ethical Hacking

Whether you're new to Linux or are a Linux guru, you can learn a lot more about the Linux kernel by requesting your free ebook of Linux Kernel In A Nutshell.


Leave a Comment

Your email address will not be published. Required fields are marked *

*