Lab Notes: 100 Bets in 18 days. 12.5% ROI, 0.3% CLV, and a Model That Loves Unders + Dogs

I just crossed my first 100 tracked NBA sides and totals bets on Betstamp, and I want to do a “Lab Notes” checkpoint. While the initial results look promising, it’s still far to early to declare victory over the sports books, or claim this is a long term profitable model. This post is meant to be a transparent report on: what happened, what I think it means, what the roadmap for model enhancements looks like going forward

TL;DR

Official tracked record: 60–41 on Betstamp (NBA spreads + totals only).
ROI: +12.5% (as tracked by Betstamp).
CLV: +0.3% (not great; more on this below).
Split by market:
- Spreads: 33–26
- Totals: 27–15
Big behavioral bias in the signals:
- Underdogs: 50 of 59 spread bets
- Unders: 31 of 42 totals bets
Statistical Reality: at ~100 bets, this can still absolutely be variance. The data is encouraging, not conclusive.

Figure below shows the “Official” results and net profit from flat-betting $100

What I’m tracking

All tracked bets are:

NBA only
Spreads + totals only. No parlays, no moneylines, no props
Placed using a simple signal rule: minimum model implied win probability ~57% (some bets higher, a handful much higher)

Important context: the model is possession-level and trained on season-to-date performance. It does not explicitly ingest injury news yet, and I often bet early, which matters a lot for CLV.

Results (and a quick note on tracking errors)

Official Betstamp record: 60–41.

Two honest footnotes:

I accidentally tracked the wrong side of one bet (the bet lost, and had -10%+ CLV)
I had a couple of tired moments where I misread a line and logged something I shouldn’t have taken based on the model.

I’m leaving the official tracked record as-is because (a) the whole point of tracking is accountability, and (b) the bigger lesson is I need a better process so “human error” doesn’t sneak into a system that’s supposed to be systematic.

Breakdown: spreads vs totals

When I isolate bets that match the model’s intended triggers:

Spreads: 59 bets → 33–26
Totals: 42 bets → 27–15

Both categories are profitable so far, but it’s hard not to notice that totals have been the stronger performer in this early sample.

That’s not proof totals are “easier” (they’re not), but it might hint that the underlying possession simulation is doing something useful for pace/efficiency that shows up cleaner in totals than sides.

It’s not just that the model is winning overall — it’s where it’s winning that’s informative.

The most obvious observation: the model loves Unders and Underdogs

This surprised me enough that I think it deserves its own section.

Spread bets skew heavily toward underdogs

Of the 59 sides the model liked:

50 were underdogs
9 were favorites

Total bets skew heavily toward unders

Of the 42 totals the model liked:

31 were unders
11 were overs

This could mean a few different things:

Possibility A: The market is shaded toward favorites/overs (public preference, narrative bias, “overs are more fun,” etc.) and the model is exploiting that shade.

Possibility B: The model has its own internal bias that’s pushing it toward “ugly” bets: dogs and unders. That bias could be real edge… or it could be a modeling artifact (for example: shrinkage towards league averages resulting in not giving enough credit to favorites, or overestimating underdogs.).

Possibility C: Selection effects. If my threshold is “only bet when the edge is large enough,” and the market tends to offer the largest pricing errors in certain regions of the bet space, my filtered bet set will naturally cluster there.

Right now I’m not ready to declare which one it is. But I am ready to treat it as a diagnostic signal:

If this bias persists at 250+ bets, it’s probably structural.
Then the job becomes figuring out whether it’s structural alpha or structural model error.

60% feels amazing… but why I’m not declaring victory yet

Let’s be honest: 60–41 feels like I’m locked in. If someone told you they were hitting ~60% on NBA spreads/totals, you’d assume they’re either elite or more than likely lying.

But statistically, ~100 bets is still a small sample when you’re trying to prove a thin edge.

Why breakeven isn’t 50%

Most of these bets are in the -110-ish range, so the rough breakeven point is about 52.4%.

The uncomfortable part: my p-value isn’t “significant” yet

Using a one-sided hypothesis test vs ~52.4% breakeven, I get a p-value around the 0.05–0.10 range depending on exact assumptions. In my own calculations so far, it lands at ~0.078.

That’s not “bad.” It’s just not the traditional ‘statistically certified’ stamp of approval (p < 0.05).

Confidence intervals tell the story better

Based on this sample size, my estimated true win-rate range is still wide:

75% confidence interval: ~53.8% to ~65.1%
(slightly profitable even on the low end)
95% confidence interval: ~49.7% to ~69.1%
(the low end still includes “lighting money on fire”)

This is the reality of small samples: the “truth” could still be a lot less exciting than the point estimate.

A more intuitive lens

Instead of arguing about p-values: with a neutral prior, the probability that my true win rate is above breakeven is around ~90% based on this data. That’s not a guarantee — it’s just a cleaner way to express what’s actually happening:

To summarize: the evidence is leaning my way, but it’s not a closed case.

So… how many bets until I can be confident?

This is the part a lot of people skip over, but it’s the part that matters.

If I keep winning at ~59–60%

If the true win rate is around what we’ve seen so far, I likely need something like ~150 tracked bets before the results become “statistically significant” in the traditional sense.

If my win rate regresses to ~55%

If the true edge is more modest (and 55% is still excellent in NBA spreads/totals), I’m in “grind” territory:

I’d need ~1,000 tracked bets to get the same level of statistical confidence.

This is why I’m treating the first 100 as a checkpoint, not a finish line.

CLV: why it’s low, and what I think is happening

My CLV is only +0.3%, and I’m not thrilled with that.

I think the main driver is this:

The model does not account for injuries yet
I often bet early
Injury news can swing NBA lines dramatically

So I’ll sometimes place what looks like a solid bet… and then by tip-off I’m sitting on -10% CLV because a player got ruled in/out, or the market repriced something my model doesn’t know.

Two additional observations (both still “anecdotal,” but consistent enough to track):

If I filter out games with major injury uncertainty, CLV tends to look better.
Sometimes the market appears to overreact to injuries, and the actual results don’t feel aligned with what CLV would imply.

I’m not ready to make a grand claim like “CLV doesn’t matter.” I am ready to say:

CLV is a strong proxy for edge
but NBA injury dynamics can create weird cases where “closing” isn’t purely “more efficient,” it’s “more informed” — and sometimes that information appears to be noisy, or priced with fear.

For now my action item is simple: track CLV more carefully by injury context, instead of treating it as one monolithic number.

Process and tracking execution improvements

Over the course of the first 100 bets, I made a small handful of errors in terms of tracking execution. The tracking mistakes weren’t catastrophic, but they do highlight a need for a better execution process so I don’t attribute human error to the model. Going forward I’m tightening the workflow:

double-check line + side before logging
record a quick “Significant Injury/news” tag for each bet so I can easily filter results

If I’m going to do this publicly, the process has to be as clean as the math.

Next steps + roadmap

Next milestone

I’m going to keep tracking exactly the same way and publish another update at 250 tracked bets.

That’s the point where trends start to feel less like noise and more like shape.

Model roadmap (V2.0 → V3.0)

Here’s the current plan:

v2.1: Improve Time of Possession (ToP) in the sim
- shot-clock-centric time buckets
- test Weibull ToP vs Lognormal
v2.2: More robust data preprocessing
- align simulation possessions more tightly with official possession definitions
v2.3: Explicit fouling model in the sim
- late-game dynamics matter a lot for totals and spreads
v2.4: Richer possession outcomes
- incorporate shot zones + shot types into the sim
v3.0 (longer term): Player lineups + rotations
- player-specific outcome attribution (the “real” version of injury awareness)

Closing thoughts

I’m happy with the first ~100 results. I’m also very aware that early heaters happen — especially when you’re selecting only “high-confidence” edges. The real test is whether the process holds up when the coin flips start landing the other way.

For now, the goal is consistency:

keep logging
keep publishing
keep improving the model
and keep being honest about what the results do and don’t prove.

If you’re following along, the next meaningful checkpoint is 250 bets.