A $185 Raspberry Pi Replaced a Failing Industrial Sensor

The packing line had a photoelectric sensor counting bags on the conveyor. It was failing — miscounting, drifting, and the replacement quote was steep. The line manager needed accurate counts for shift tracking and production reporting. We needed a solution that would just work, 19 hours a day, with zero operator input.

We put a Raspberry Pi 5 and a camera on it. Classical computer vision. No neural network, no cloud, no subscription. It hit 98.8% accuracy, validated against the packing machine’s own hardware counter. Total hardware cost: about $185.

This is the story of how and — more importantly — why.

The problem with “just count the bags”

Counting things on a conveyor sounds trivial until you actually watch the conveyor. Here’s what we were dealing with:

Bags stick together. They come off the packing machine close together and physically touch as they travel down the belt. Two bags become one blob of motion. A naive “count every moving thing” approach undercounts badly.

The belt stops constantly. Operator breaks, upstream jams, shift changes. When it stops, bags vibrate from motor jitter. When it restarts, those same bags are still in frame. You have to remember what you already counted and suppress the jitter — or you double-count everything.

Bags move fast. At full speed, a bag can travel 30-50 pixels between frames. It can jump clean over a narrow counting zone in a single frame.

False positives everywhere. Hands reaching in, people walking past, shadows from overhead equipment. Anything that moves is a candidate for “bag” if you’re not careful.

Why not a neural network?

We build AI systems. We like neural networks. But we think about what the problem actually needs, not what sounds impressive.

A YOLO model would work here. It would also require:

Collecting and labeling a few thousand training images
Training and exporting the model
Dealing with edge cases in training data (what does a merged blob look like to a labeler?)
Retraining when bag sizes or labels change
More compute, more power draw, more complexity

The bag counting problem has a very specific structure: objects move across a static background on a conveyor belt. That’s what background subtraction was invented for. We don’t need to classify anything — we need to detect motion, separate blobs, and count line crossings.

Classical CV gave us a working system in days instead of weeks. And it runs at 15 FPS on a Pi 5 with CPU headroom to spare.

The detection pipeline

The core is MOG2 background subtraction with distance transform blob splitting. About 200 lines of OpenCV that do the real work.

Step 1: Background subtraction

self._bg_sub = cv2.createBackgroundSubtractorMOG2(
    history=500,
    varThreshold=50,
    detectShadows=True,
)

# When belt stops, freeze learning so stationary bags
# don't fade into the background model
lr = 0 if self._belt_stopped else -1
fg_mask = self._bg_sub.apply(blurred, learningRate=lr)

MOG2 learns the empty conveyor as background. Anything that moves becomes foreground. The trick is freezing the learning rate when the belt stops — without this, bags gradually blend into the background during pauses, and you lose them entirely when the belt restarts.

Step 2: Morphological cleanup

# MORPH_CLOSE fills gaps WITHIN bags without bridging BETWEEN them
fg_mask = cv2.morphologyEx(fg_mask, cv2.MORPH_CLOSE, close_kernel, iterations=2)
# MORPH_OPEN removes noise specks
fg_mask = cv2.morphologyEx(fg_mask, cv2.MORPH_OPEN, open_kernel, iterations=1)

This was a key insight. Our early versions used separate dilate + erode, which bridged adjacent bags into one mega-blob. MORPH_CLOSE (dilation then erosion as a single operation) fills holes inside a bag’s silhouette without connecting neighboring bags. Small difference in approach, large difference in accuracy.

Step 3: The merged-bag problem

This is where it gets interesting. Even with good morphology, bags that are physically touching still form single contours. We attack this at two levels.

Level 1 — Distance transform splitting. For any blob larger than ~1.4x the expected single-bag area, we run a distance transform. Peaks in the distance map correspond to bag centers:

def _split_blob(self, contour, mask):
    blob_mask = np.zeros(mask.shape, dtype=np.uint8)
    cv2.drawContours(blob_mask, [contour], -1, 255, -1)

    # Distance transform — peaks indicate bag centers
    dist = cv2.distanceTransform(blob_mask, cv2.DIST_L2, 5)
    threshold = dist.max() * 0.25
    _, peak_mask = cv2.threshold(dist, threshold, 255, cv2.THRESH_BINARY)

    # Each connected component in the peak mask = one bag
    n_labels, labels, stats, centroids = cv2.connectedComponentsWithStats(
        peak_mask.astype(np.uint8), connectivity=8
    )

This works about 60% of the time on merged blobs. For the other 40%, we have a second layer.

Level 2 — Area-based counting at the line. When a tracked blob crosses the counting line, we estimate how many bags it contains from its area:

n_bags = max(1, round(blob_area / calibrated_single_bag_area))

The calibrated area comes from auto-calibration: the system collects area measurements from the first 8 clearly-solo detections and uses the median. No manual tuning needed. It adapts to whatever bag size is running.

Tracking across frames

Detection alone isn’t enough. We need to track each bag across frames so we count it exactly once. The tracker is a centroid-distance matcher — each frame, match new detections to existing tracks by nearest centroid:

# Greedy matching: assign closest pairs first
for dist, obj_idx, det_idx in sorted(pairs):
    if dist > self._max_match_distance:
        break
    if obj_idx in matched or det_idx in matched:
        continue
    # Match found

Belt stop detection uses jitter analysis: if all tracked centroids move less than 2 pixels for 5 consecutive frames, the belt is stopped. During stops, tracking freezes — no new tracks, no counting, no disappearance timeouts. The moment movement resumes, counting picks up instantly.

What $185 buys you

Component	Cost
Raspberry Pi 5 (2GB is sufficient)	~$60
IMX415 CSI camera module	~$30
Case, power supply, cables, mounting	~$60
SD card (32GB)	~$15
Miscellaneous (standoffs, cable ties, etc.)	~$20
Total	~$185

It’s a fraction of what an industrial sensor replacement costs, and it gives you something the sensor never did: data. Every count event is logged with timestamp, area, confidence score, and method. Belt stop durations are tracked. Hourly production rates are computed automatically.

The system syncs to a cloud API every 5 minutes, feeds a live dashboard, and survives reboots — counter state persists to disk and restores on startup. It runs as a systemd service with watchdog monitoring. It has been running in production since February with 98.8% accuracy validated over weeks of factory operation.

The real lesson

We could have spent weeks training a YOLO model. We could have pitched an “AI-powered vision system” and charged accordingly. Instead, we looked at the problem structure — objects moving across a known background — and picked the simplest tool that could solve it.

The right solution isn’t always the most sophisticated one. Background subtraction has been in OpenCV since 2004. Distance transforms are from the 1960s. But applied to the right problem, with attention to the real-world edge cases (belt stops, merged bags, jitter suppression), these techniques deliver industrial-grade results.

A $185 Pi with 200 lines of OpenCV beat an expensive sensor replacement. Not because the Pi is magic, but because a camera gives you information — shape, area, position, history — where a photoelectric sensor gives you a pulse. More information means more ways to solve the problem.

That’s the edge AI thesis in one sentence: put compute where the data is, use the simplest model that works, and let the physics of the problem guide the solution.