Tidy Text Mining with R

Editor: Tidy Text Mining with R is the title of a book being written by Julia Silge and David Robinson. In it, the authors show how to use tidytext, an R package they co-developed, as a solution for text mining in R. It will be published by O’Reilly, but the draft copy is available free online right now. And a GitHub repo is also available. The following text is the draft introduction.

If you work in analytics or data science, like we do, you are familiar with the fact that data is being generated all the time at ever faster rates. (You may even be a little weary of people pontificating about this fact.) Analysts are often trained to handle tabular or rectangular data that is mostly numeric, but much of the data proliferating today is unstructured and typically text-heavy. Many of us who work in analytic fields are not trained in even simple interpretation of natural language.

Related Post:  Learn to speak Hadoop in 5 minutes

We developed a new R package, tidytext (Silge and Robinson 2016), because we were familiar with many methods for data wrangling and visualization, but couldn’t easily apply these same methods to text. We found that using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. By treating text as data frames of words, we can manipulate, summarize, and visualize the characteristics of text easily and integrate natural language processing into effective workflows we were already using.

Related Post:  Using Decision Trees to predict infant birth weights

The tools provided by the tidytext package are relatively simple; what is important is the possible applications. Thus, this book provides compelling examples of real text mining problems.

Click here to read the book.

Tidy Text Mining with R

Share:

Share on facebook
Facebook
Share on twitter
Twitter
Share on pinterest
Pinterest
Share on linkedin
LinkedIn

Hola! Did you notice that LinuxBSDos.com no longer runs network ads?  Yep, no more ads from the usual suspects that track you across the Internet.  But since  I still need to pay to keep the site running, feel free to make a small donation by PayPal.

Subscribe for updates. Trust me, no spam!

Mailchimp Signup Form

Sponsored links

1. Attend Algorithm Conference, a top AI and ML event for 2020.
2. Reasons to use control panel for your server.
3. DHgate Computers Electronics, Cell Phones & more.

Leave a Reply

Your email address will not be published. Required fields are marked *

Get the latest

On social media
Via my newsletter
Mailchimp Signup Form

Sponsored links

1. Attend Algorithm Conference, a top AI and ML event for 2020.
2. Reasons to use control panel for your server.
3. DHgate Computers Electronics, Cell Phones & more.
Hacking, pentesting distributions

Linux Distributions for Hacking

Experts use these Linux distributions for hacking, digital forensics, and pentesting.

Categories
Archives

The authors of these books are confirmed to speak during

Algorithm Conference

T-minus AI

Author was the first chairperson of AI for the U.S. Air Force.

The case for killer robots

Author is the Director of the Center for Natural and Artificial Intelligence.

Why greatness cannot be planned

Author works on AI safety as a Senior Research Scientist at Uber AI Labs.

Anastasia Marchenkova

An invitation from Anastasia Marchenkova

Hya, after stints as a quantum researcher at Georgia Tech Quantum Optics & Quantum Telecom Lab, and the University of Maryland Joint Quantum Institute, I’m now working on superconducting qubit quantum processors at Bleximo. I’ll be speaking during Algorithm Conference in Austin, Texas, July 16 – 18, 2020. Meet me there and let’s chat about progress and hype in quantum computing.