Strava App Review Sentiment Analysis
What do 2,000 Google Play reviews actually say about Strava? Using TF-IDF and Random Forest models in R, I identified the language patterns that separate satisfied users from frustrated ones — and what product teams should do about it.
Model Performance
Two-Model Comparison
| Model | Features | Accuracy | AUC |
|---|---|---|---|
| Random Forest | Behavioral only(length, date, version…) | 75.5% | 0.822 |
| TF-IDF + LASSO | Text tokens(logistic regression) | 82.0% | 0.893 |
Text content is a stronger signal than behavioral metadata alone — adding language features improved accuracy by +6.5 pp and AUC by +0.071.
LASSO Coefficients
Top 20 Sentiment Tokens
Larger bars = stronger predictive weight toward that sentiment class
Pattern Recognition
Theme Clusters
Tokens grouped by the underlying problem or strength they represent
Negative Clusters
Authentication & Onboarding Failures
Users consistently fail during sign-in and initial setup. Authentication is the most damaging friction point in the entire experience.
Upload & Server Outages
Activity upload failures and server errors are directly visible to users and break the core loop of record-and-share.
Account & Access Problems
Ongoing account management issues compound onboarding friction, suggesting systemic identity/auth infrastructure problems.
Crashes & Forced Upsell
App instability and aggressive paywalls drive uninstalls. Users explicitly cite these as deal-breakers in their reviews.
Accuracy & Measurement Issues
Inaccurate GPS and pace data undermine the product's core promise for serious athletes who depend on precise metrics.
Positive Clusters
Tracking & Core Utility
When the product works, users love it for exactly what it is: a reliable fitness tracker. This is the core value prop in action.
Social & Community
Strava's social layer is a genuine differentiator. Users don't just track—they compete, cheer, and stay accountable together.
Ease of Use
When onboarding succeeds, users find the app intuitive and well-designed. The UX itself isn't the problem—reliability is.
Activity Culture
Positive reviews span all activity types. Strava's brand is broader than running—it's the home for all outdoor fitness culture.
So What
What This Means for Strava
The data points to a clear strategic gap: the fitness experience is loved, but infrastructure failures are destroying it.
Product
Fix infrastructure first
Auth flow, upload reliability, and crash reduction dominate negative reviews. These aren't feature gaps — they're broken foundations that undermine the entire experience for every user.
Marketing
Double down on social fitness
"Friends", "motivates", and "community" are among the strongest positive predictors. Strava's unique angle isn't tracking — it's the social accountability layer. Lead with that.
Insight
Complaints aren't about fitness
Not a single fitness-related word appears in the top negative tokens. Users aren't unhappy with the workout experience — they're frustrated by sign-in errors, server failures, and crashes.
Insight
The core value prop works
"Tracking", "fitness", "exercise", and "trail" all drive positive sentiment. When Strava works, users love exactly what it's supposed to do. The product vision is validated — execution is the issue.
How It Was Built
Methodology Pipeline
Data Collection
2,000 Google Play reviews scraped and labeled (Bad: 1–3 stars, Good: 4–5 stars)
Feature Engineering
Review length, word count, time/date features, app version, and season extracted
Random Forest
Behavioral features only — accuracy 75.5%, AUC 0.822
TF-IDF + LASSO
Text vectorization + logistic regression — accuracy 82.0%, AUC 0.893
Findings
Top tokens by LASSO coefficient reveal what language separates happy users from frustrated ones
Analysis conducted in R using the tidytext, randomForest, and glmnet packages. 2,000 Google Play reviews collected and labeled by star rating.