Machine Learning with R for Text Data
By Benjamin Soltoff in R liveproject
September 1, 2016
Picture this: You’re an academic researcher tasked with helping social scientists determine the U.S. government’s responsiveness to public demands. A clear expression of this responsiveness is examining the types of policies legislators seek to advance. In this series of liveProjects, you’ll apply machine learning to generate predictions of the policy focus of each congressional bill in a legislation dataset. Leveraging tools widely used by data scientists and academic researchers—including R, the tidymodels framework, feature engineering techniques, and ML algorithms—you’ll perform exploratory data analysis (EDA) to prepare for predictive modeling, preprocess the text data, develop core ML models, and train DL models.
In this liveProject series, you’ll learn to use feature engineering, machine learning workflows, and deep learning techniques to generate predictions.
- Import and clean data
- Generate basic data visualizations for time-series datasets
- Tokenize and clean text data
- Calculate summary statistics for text data
- Resample datasets for unbiased measures of model performance
- Feature engineering for text data
- Fit models using a tidy framework
- Evaluate classification models using appropriate metrics
- Tune machine learning models to maximize their effectiveness
- Resample datasets for unbiased measures of model performance
- Generate feature hashes for categorical variables
- Implement pre-trained word embeddings in a machine learning workflow
- Subsample an unbalanced dataset to minimize bias
- Fit models using Keras
- Tune hyperparameters to maximize model performance
- Explain how a machine learning model generates specific predictions
- Posted on:
- September 1, 2016
- Length:
- 2 minute read, 225 words
- Categories:
- R liveproject
- Tags:
- liveproject manning r text-data