Self-quantification is a double-edged tool. The wearable industry sells "measure to improve"; the clinical literature documents a subset of users who measure themselves into worse outcomes.
The clinical phenomenon
Baron 2017 and colleagues coined the term "orthosomnia" after seeing patients in sleep clinic whose primary sleep complaint was driven by their tracker data. Presentation pattern:
- Patient reports "bad sleep" based on app metrics.
- Objective sleep (PSG, actigraphy independently) is often normal.
- Preoccupation with sleep quality scores creates anxiety that worsens sleep onset latency and increases awakenings.
- Attempts to "game" the tracker (earlier bedtime, sleep positions, supplements) fail to produce the expected score improvement.
- Cycle reinforces.
Chinoy 2021 documented the accuracy gap wearables have against polysomnography ( Chinoy et al. 2020, n=8 ): total sleep time accurate within ~5-15 min; stage classification mediocre; REM detection worst. Users anchor on the least-accurate numbers.
Sleep biology is mostly invariant to whether you track it: Besedovsky 2019 reviews how short sleep measurably suppresses immune function within 24 hours regardless of wearable interpretation ( Besedovsky et al. 2019 ). The biology does not care about your score.
The 90-day test
If you have worn a tracker for 90 days and cannot name a specific behavior that changed as a result, the tracker is not earning its keep. Stop wearing.
Examples of tracker-driven behavior changes that pass the test:
- Learned alcohol within 4h of bed crashes HRV and REM. Stopped drinking on training nights.
- Noticed step count below 4,000 on weekends. Committed to Saturday morning walks.
- Observed HRV crash 2 days before perceived illness. Use as early-warning for deload.
Examples that fail the test:
- Check the score every morning; feel good or bad depending; no action.
- Go to bed 15 min earlier on weekends; score doesn't change; conclude tracker is broken.
- Buy apigenin, magnesium, glycine stack based on "deep sleep minutes" read; stop after 4 weeks without noticeable change.
Better heuristics than sleep score
Subjective sleep pressure before bed. If you're drowsy, you slept short or badly. If you're alert, you probably slept enough, regardless of the wearable's verdict.
Morning alertness 30 min after waking. Groggy after caffeine + morning light = sleep debt. Normal = baseline OK.
Training performance. If you're hitting expected loads in the gym, recovery is adequate regardless of score.
Hunger pattern. Consistent excessive morning hunger often signals under-sleep more reliably than a tracker metric.
When a tracker pays off
- Chronic insomniacs under CBT-I: objective sleep logging valuable for therapy homework.
- Shift workers: circadian drift tracking genuinely useful.
- Athletes with structured periodization: HRV-driven deload decisions (see HRV-Guided Training).
- Someone investigating a specific intervention: 30-day wearable trial, then stop. Not a permanent lifestyle fixture.
When it doesn't pay off
- Casual users who wear it "in case it's useful".
- People prone to health anxiety generally.
- Anyone who has started supplementing based on stage minutes without changing the behavioral basics (see Sleep Hygiene Ranked).
The quantified self paradox
The data-rich lifestyle only works if you act on the data. Most humans do not. Measuring without acting increases awareness of all the things wrong with you without changing any of them. This can cause negative utility even when the data is accurate.
The fix is either:
- Commit to decision rules before wearing. Write down what you'll change if metric X moves beyond threshold Y.
- Stop wearing. Accept your body's direct signal (energy, mood, performance) as the feedback loop.
Counter-view
Casey Means and Levels argue continuous biometric awareness durably changes food and lifestyle choices; plausible for a subset of high-engagement users. Andrew Huberman advocates aggressive tracking protocols; fine for him, fails for many. Data scientists skeptical of consumer wearable accuracy (Kenneth Chang, Thomas Goetz) argue most consumer health data is too noisy to drive individual decisions. The empirical middle: tracking helps people who use tracking to drive decisions, harms people who use tracking to drive anxiety.