Machine Learning with R for Text Data

By Benjamin Soltoff in R liveproject

September 1, 2016

Machine Learning with R for Text Data

Picture this: You’re an academic researcher tasked with helping social scientists determine the U.S. government’s responsiveness to public demands. A clear expression of this responsiveness is examining the types of policies legislators seek to advance. In this series of liveProjects, you’ll apply machine learning to generate predictions of the policy focus of each congressional bill in a legislation dataset. Leveraging tools widely used by data scientists and academic researchers—including R, the tidymodels framework, feature engineering techniques, and ML algorithms—you’ll perform exploratory data analysis (EDA) to prepare for predictive modeling, preprocess the text data, develop core ML models, and train DL models.

In this liveProject series, you’ll learn to use feature engineering, machine learning workflows, and deep learning techniques to generate predictions.

  • Import and clean data
  • Generate basic data visualizations for time-series datasets
  • Tokenize and clean text data
  • Calculate summary statistics for text data
  • Resample datasets for unbiased measures of model performance
  • Feature engineering for text data
  • Fit models using a tidy framework
  • Evaluate classification models using appropriate metrics
  • Tune machine learning models to maximize their effectiveness
  • Resample datasets for unbiased measures of model performance
  • Generate feature hashes for categorical variables
  • Implement pre-trained word embeddings in a machine learning workflow
  • Subsample an unbalanced dataset to minimize bias
  • Fit models using Keras
  • Tune hyperparameters to maximize model performance
  • Explain how a machine learning model generates specific predictions
Posted on:
September 1, 2016
2 minute read, 225 words
R liveproject
liveproject manning r text-data
See Also:
Can You Defeat the Detroit Lines?
More Menorah Math!
Time travel for the lottery