Tidy Text Mining with R

Tidy Text Mining with R

Editor: Tidy Text Mining with R is the title of a book being written by Julia Silge and David Robinson. In it, the authors show how to use tidytext, an R package they co-developed, as a solution for text mining in R. It will be published by O’Reilly, but the draft copy is available free online right now. And a GitHub repo is also available. The following text is the draft introduction.

If you work in analytics or data science, like we do, you are familiar with the fact that data is being generated all the time at ever faster rates. (You may even be a little weary of people pontificating about this fact.) Analysts are often trained to handle tabular or rectangular data that is mostly numeric, but much of the data proliferating today is unstructured and typically text-heavy. Many of us who work in analytic fields are not trained in even simple interpretation of natural language.

Related Post:  Securely storing your secrets in R code

We developed a new R package, tidytext (Silge and Robinson 2016), because we were familiar with many methods for data wrangling and visualization, but couldn’t easily apply these same methods to text. We found that using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. By treating text as data frames of words, we can manipulate, summarize, and visualize the characteristics of text easily and integrate natural language processing into effective workflows we were already using.

Related Post:  Using Octave on Fedora 26

The tools provided by the tidytext package are relatively simple; what is important is the possible applications. Thus, this book provides compelling examples of real text mining problems.

Click here to read the book.

Tidy Text Mining with R

Share:

Share on facebook
Facebook
Share on twitter
Twitter
Share on pinterest
Pinterest
Share on linkedin
LinkedIn

Newsletter: Subscribe for updates

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

Get the latest

On social media

Security distros

Hacker
Linux distros for hacking and pentesting

Crypto mining OS

Bitcoin
Distros for mining bitcoin and other cryptocurrencies

Crypto hardware

MSI GeForce GTX 1070
Installing Nvidia GTX 1070 GPU drivers on Ubuntu

Disk guide

LVM
Beginner's guide to disks & disk partitions in Linux
Categories
Archives
0
Hya, what do you think? Please comment.x
()
x
Algorithm 2020

Did you get your ticket yet?

Algorithm 2022 is a 3-day conference on blockchain, cryptocurrencies and AI set for Feb. 10 – 12, 2022, in Dallas. Speakers from the US Air Force, Ministry of Digital Transformation, Ukraine, and more. click that button to learn more and get your ticket. Use BSD20 code for 20% off ticket price.