Machine Learning with R for Text Data

By Benjamin Soltoff in R liveproject

September 1, 2016

website

Picture this: You’re an academic researcher tasked with helping social scientists determine the U.S. government’s responsiveness to public demands. A clear expression of this responsiveness is examining the types of policies legislators seek to advance. In this series of liveProjects, you’ll apply machine learning to generate predictions of the policy focus of each congressional bill in a legislation dataset. Leveraging tools widely used by data scientists and academic researchers—including R, the tidymodels framework, feature engineering techniques, and ML algorithms—you’ll perform exploratory data analysis (EDA) to prepare for predictive modeling, preprocess the text data, develop core ML models, and train DL models.

In this liveProject series, you’ll learn to use feature engineering, machine learning workflows, and deep learning techniques to generate predictions.

Import and clean data
Generate basic data visualizations for time-series datasets
Tokenize and clean text data
Calculate summary statistics for text data
Resample datasets for unbiased measures of model performance
Feature engineering for text data
Fit models using a tidy framework
Evaluate classification models using appropriate metrics
Tune machine learning models to maximize their effectiveness
Resample datasets for unbiased measures of model performance
Generate feature hashes for categorical variables
Implement pre-trained word embeddings in a machine learning workflow
Subsample an unbalanced dataset to minimize bias
Fit models using Keras
Tune hyperparameters to maximize model performance
Explain how a machine learning model generates specific predictions

Posted on:: September 1, 2016

Length:: 2 minute read, 225 words

Categories:: R liveproject

Tags:: liveproject manning r text-data

See Also:: Can You Defeat the Detroit Lines?; More Menorah Math!; Time travel for the lottery