Big Data

Tidy Text Mining with R

Editor: Tidy Text Mining with R is the title of a book being written by Julia Silge and David Robinson. In it, the authors show how to use tidytext, an R package they co-developed, as a solution for text mining in R. It will be published by O’Reilly, but the draft copy is available free online right now. And a GitHub repo is also available. The following text is the draft introduction.

If you work in analytics or data science, like we do, you are familiar with the fact that data is being generated all the time at ever faster rates. (You may even be a little weary of people pontificating about this fact.) Analysts are often trained to handle tabular or rectangular data that is mostly numeric, but much of the data proliferating today is unstructured and typically text-heavy. Many of us who work in analytic fields are not trained in even simple interpretation of natural language.

Related Post:  OpenStack Monitoring With Elasticsearch, Logstash, and Kibana

We developed a new R package, tidytext (Silge and Robinson 2016), because we were familiar with many methods for data wrangling and visualization, but couldn’t easily apply these same methods to text. We found that using tidy data principles can make many text mining tasks easier, more effective, and consistent with tools already in wide use. By treating text as data frames of words, we can manipulate, summarize, and visualize the characteristics of text easily and integrate natural language processing into effective workflows we were already using.

Related Post:  The best R package for learning to “think about visualization”

The tools provided by the tidytext package are relatively simple; what is important is the possible applications. Thus, this book provides compelling examples of real text mining problems.

Click here to read the book.

Tidy Text Mining with R

Subscribe to LinuxBSDos.com

Subscribe to receive the latest articles in your Inbox

Trust me, you'll not be spammed...

Please share:
Tags:

We Recommend These Vendors and Free Offers

Register now for Blockchain & Cryptocurrency Con 2018, international conference on blockchain technnology in Dallas, TX (USA), Feb. 23-24, 2018. A 50% discount for students.

Best WhatsApp Plus features in Gbwhatsapp latest APK download

Best binary auto trading software reviews by 7binaryoptions.com

Google has got competition, because Presearch is building a blockchain-based search engine controlled by the community. At $0.15 a token, you can participation in Lot 3 of the token sale by clicking here

Open Money is building a solution that will run mainstream software on blockchain tech. Click here to get free tokens that will be the digital currency of the platform

Launch an SSD VPS in Europe, USA, Asia & Australia on Vultr's KVM-based Cloud platform starting at $5:00/month (15 GB SSD, 768 MB of RAM).


Leave a Comment

Your email address will not be published. Required fields are marked *

*