Big Data

Tidy Text Mining with R

Editor: Tidy Text Mining with R is the title of a book being written by Julia Silge and David Robinson. In it, the authors show how to use tidytext, an R package they co-developed, as a solution for text mining in R. It will be published by O’Reilly, but the draft copy is available free online right now. And a GitHub repo is also available. The following text is the draft introduction.

If you work in analytics or data science, like we do, you are familiar with the fact that data is being generated all the time at ever faster rates. (You may even be a little weary of people pontificating about this fact.) Analysts are often trained to handle tabular or rectangular data that is mostly numeric, but much of the data proliferating today is unstructured and typically text-heavy. Many of us who work in analytic fields are not trained in even simple interpretation of natural language.

Related Post:  Monitoring Kafka with Elastic Stack: Filebeat

We developed a new R package, tidytext (Silge and Robinson 2016), because we were familiar with many methods for data wrangling and visualization, but couldn’t easily apply these same methods to text. We found that using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. By treating text as data frames of words, we can manipulate, summarize, and visualize the characteristics of text easily and integrate natural language processing into effective workflows we were already using.

Related Post:  OpenStack Monitoring With Elasticsearch, Logstash, and Kibana

The tools provided by the tidytext package are relatively simple; what is important is the possible applications. Thus, this book provides compelling examples of real text mining problems.

Click here to read the book.

Tidy Text Mining with R

Subscribe to LinuxBSDos.com

Subscribe to receive the latest articles in your Inbox

Trust me, you'll not be spammed...

Please share:
Tags:

We Recommend These Vendors and Free Offers

Google has got competition, because Presearch is building a blockchain-based search engine controlled by the community. At $0.15 a token, you can participation in Lot 3 of the token sale by clicking here

Open Money is building a solution that will run mainstream software on blockchain tech. Click here to get free tokens that will be the digital currency of the platform

COMSA allows centralized businesses to adopt blockchain technology. The token sale starts soon! Sign up for free by clicking here

Register now for Blockchain & Cryptocurrency Con 2018, international conference on blockchain technnology in Dallas, TX (USA), Feb. 23-24, 2018. Students can register at a 50% discount.

Launch an SSD VPS in Europe, USA, Asia & Australia on Vultr's KVM-based Cloud platform starting at $5:00/month (15 GB SSD, 768 MB of RAM).


Leave a Comment

Your email address will not be published. Required fields are marked *

*