NLPSentiment AnalysisR · TF-IDF · Random Forest

Strava App Review Sentiment Analysis

What do 2,000 Google Play reviews actually say about Strava? Using TF-IDF and Random Forest models in R, I identified the language patterns that separate satisfied users from frustrated ones — and what product teams should do about it.

Model Performance

Two-Model Comparison

ModelFeaturesAccuracyAUC
Random ForestBehavioral only(length, date, version…)75.5%0.822
TF-IDF + LASSOText tokens(logistic regression)82.0%0.893

Text content is a stronger signal than behavioral metadata alone — adding language features improved accuracy by +6.5 pp and AUC by +0.071.

LASSO Coefficients

Top 20 Sentiment Tokens

Larger bars = stronger predictive weight toward that sentiment class

Pattern Recognition

Theme Clusters

Tokens grouped by the underlying problem or strength they represent

Negative Clusters

Authentication & Onboarding Failures

signedunableisnwonreceived

Users consistently fail during sign-in and initial setup. Authentication is the most damaging friction point in the entire experience.

Upload & Server Outages

uploadservererrors

Activity upload failures and server errors are directly visible to users and break the core loop of record-and-share.

Account & Access Problems

accountissue

Ongoing account management issues compound onboarding friction, suggesting systemic identity/auth infrastructure problems.

Crashes & Forced Upsell

uninstalledconstantly

App instability and aggressive paywalls drive uninstalls. Users explicitly cite these as deal-breakers in their reviews.

Accuracy & Measurement Issues

mileminutestime

Inaccurate GPS and pace data undermine the product's core promise for serious athletes who depend on precise metrics.

Positive Clusters

Tracking & Core Utility

trackingtrackfitnessexercise

When the product works, users love it for exactly what it is: a reliable fitness tracker. This is the core value prop in action.

Social & Community

friendsmotivatesmotivatedworld

Strava's social layer is a genuine differentiator. Users don't just track—they compete, cheer, and stay accountable together.

Ease of Use

easyniceexcellenthelpfulawesome

When onboarding succeeds, users find the app intuitive and well-designed. The UX itself isn't the problem—reliability is.

Activity Culture

walkstrailrunning

Positive reviews span all activity types. Strava's brand is broader than running—it's the home for all outdoor fitness culture.

So What

What This Means for Strava

The data points to a clear strategic gap: the fitness experience is loved, but infrastructure failures are destroying it.

Product

Fix infrastructure first

Auth flow, upload reliability, and crash reduction dominate negative reviews. These aren't feature gaps — they're broken foundations that undermine the entire experience for every user.

Marketing

Double down on social fitness

"Friends", "motivates", and "community" are among the strongest positive predictors. Strava's unique angle isn't tracking — it's the social accountability layer. Lead with that.

Insight

Complaints aren't about fitness

Not a single fitness-related word appears in the top negative tokens. Users aren't unhappy with the workout experience — they're frustrated by sign-in errors, server failures, and crashes.

Insight

The core value prop works

"Tracking", "fitness", "exercise", and "trail" all drive positive sentiment. When Strava works, users love exactly what it's supposed to do. The product vision is validated — execution is the issue.

How It Was Built

Methodology Pipeline

01

Data Collection

2,000 Google Play reviews scraped and labeled (Bad: 1–3 stars, Good: 4–5 stars)

02

Feature Engineering

Review length, word count, time/date features, app version, and season extracted

03

Random Forest

Behavioral features only — accuracy 75.5%, AUC 0.822

04

TF-IDF + LASSO

Text vectorization + logistic regression — accuracy 82.0%, AUC 0.893

05

Findings

Top tokens by LASSO coefficient reveal what language separates happy users from frustrated ones

Analysis conducted in R using the tidytext, randomForest, and glmnet packages. 2,000 Google Play reviews collected and labeled by star rating.