Menu

Edge AI / Privacy-First Monitoring

Noctivana

The project treats privacy and safety as co-equal constraints: actionable alerts matter, but raw nursery media should not become the system's default output.

Noctivana is a privacy-first infant monitoring system built around a Raspberry Pi 4 edge stack, local inference, sensor fusion, BLE fallback, and a companion mobile app. It monitors prone sleep, face occlusion, respiratory absence, cry events, and room conditions on-device, then emits small alert payloads instead of pushing nursery media into a cloud pipeline.

PythonEdge AIRaspberry PiSensor FusionReact NativeMQTT

Prone Detection

9/10

documented acceptance-test result

Alert Latency

7.2s P95

reported from the validation docs

Soak Run

11h

continuous operation noted in repo documentation

The Core Problem

Baby monitors stream video. This one reasons locally.

Commercial infant monitors transmit raw nursery video to cloud servers. Any latency, connectivity issue, or data breach exposes the most private space in a home. The cloud is doing pattern recognition, not you.

Noctivana inverts that model. Inference runs on a Raspberry Pi 4 at the bedside. MoveNet Lightning INT8 classifies pose at ~20ms per frame. YAMNet INT8 classifies audio in ~5ms. ZMQ routes signals between eight independent services. Only compact JSON alert payloads ever leave the device.

Zero video. Zero audio. Zero raw sensor streams. Privacy as a hardware constraint, not a policy checkbox.

Cloud baby monitors

Always-on video

streaming to remote servers

Noctivana

0 media packets

verified by packet-capture audit

Noctivana privacy proof — packet-capture showing zero media packets transmitted
Expand

Packet-capture validation: no video, audio, or raw sensor streams ever leave the device. Alerts only.

Architecture

8 Services. One ZMQ Bus.

An XPUB/XSUB proxy at the center means any service can publish or subscribe without coupling to another service's lifecycle. Services crash and restart independently.

Noctivana ZMQ bus architecture — XPUB/XSUB proxy connecting 8 services
Expand

The XPUB/XSUB proxy decouples producers from consumers. All topic routing is zero-copy at the proxy layer.

Service Registry — CPU Budgets on Raspberry Pi 4

ServiceRolePublishesCPU Budget
zmq_proxyXPUB/XSUB central hub, :5555/:5556~1%
vision_serviceMoveNet inference, face occlusion, night modevision/pose · vision/occlusion · vision/motion~55%
audio_serviceYAMNet classification, dB monitor, breath detectaudio/cry · audio/dblevel · audio/breath~15%
vitals_serviceOptical flow respiratory rate, rPPG (experimental)vitals/resp · vitals/resp_absence~18%
env_serviceSCD40, SGP30, SHT31 I2C sensors at 1Hzenv/climate · env/alert~3%
alert_engineFusion rules, suppression logic, rate limitingMQTT edgewatch/alert/*~4%
session_managerSQLCipher session storage, AES-256~2%
ble_serviceBLE GATT fallback notification layerGATT notify~2%
Noctivana full system overview — edge stack, alert path, and mobile app
Expand

End-to-end system: ceiling camera → edge inference → ZMQ bus → alert engine → MQTT → phone.

Vision Pipeline

Pose inference, occlusion detection, optical-flow respiration — no GPU.

The vision_service captures frames at 5fps ceiling-mounted, crops the crib ROI, upscales it to MoveNet's 192×192 input, runs INT8 quantized inference, and publishes 17 keypoints to the ZMQ bus in ~150ms total.

Face occlusion is detected by comparing face-keypoint confidence against body-keypoint confidence. If face drops below 0.20 while body remains above 0.15 for three sustained seconds, the alert engine receives a face_occlusion event.

Night mode switches to IR with CLAHE preprocessing, recovering ~+12% keypoint confidence in low-light. Optical-flow respiration (Farneback) tracks chest movement with a mean absolute error of 0.384 bpm against synthetic reference across the test range.

MoveNet INT8 inference

~20ms

median on Raspberry Pi 4

Optical-flow respiration MAE

0.384 bpm

synthetic 15–60 bpm range

rPPG status

Experimental

≥20×20 px face required; not used in fusion

CLAHE night gain

+12%

keypoint confidence recovery in IR mode

Noctivana vision pipeline — ROI crop, MoveNet INT8, keypoint classification, night-mode path
Expand

The vision pipeline runs entirely on Raspberry Pi 4. No GPU. No cloud. Frame → crop → INT8 inference → ZMQ publish in ~350ms per frame.

Noctivana alert fusion logic — multi-signal evaluation and suppression rules
Expand

The alert engine receives all ZMQ topics. Fusion rules evaluate sustained conditions — not instantaneous spikes — to minimise false positives.

Alert Fusion Engine

Six rules. Sustained conditions. Caregiver-presence suppression.

The alert_engine subscribes to every ZMQ topic and evaluates fusion rules continuously. Rules fire only after sustained conditions are met — prone must persist ≥5s, respiratory absence must persist >15s while motion is "still". Instantaneous spikes are ignored.

CRITICAL alerts (prone, occlusion, respiratory absence, high CO₂) have 300–120s cooldowns to prevent alert storms. WARN alerts (temp, loud events) run on 60s cooldowns. All CRITICAL alerts are simultaneously published via MQTT and notified over BLE GATT as fallback.

Caregiver suppression: the alert engine uses skeleton size heuristics to detect an adult's presence. While a larger skeleton is visible and classified as in-motion, CRITICAL alerts are held until the adult exits the frame.

Alert Fusion Rules — Full Specification

RuleTrigger ConditionSeverityAlert TypeCooldown
R1prone ≥ 5s AND motion ≠ restlessCRITICALprone_position300s
R2face_conf < 0.20 sustained > 3s AND body_conf > 0.15CRITICALface_occlusion300s
R3no respiratory signal > 15s AND motion = stillCRITICALresp_absence120s
R4CO₂ > 1500 ppmCRITICALco2_high60s
R5temperature > 28°CWARNtemp_high60s
R6dB SPL > 70 sustained for 5sWARNloud_event60s

End-to-End Latency

~5.8s average time-to-alert. P95 under 8s.

The 5-second sustained-condition window dominates the budget. The actual signal transport from camera capture to phone notification adds ~860ms of infrastructure latency — the majority being camera exposure time and network delivery.

ZMQ proxy routing costs only 0.215ms mean for a 218-byte payload. The fusion evaluation at 100ms is a deliberate hold — not a bottleneck — to verify the alert condition is sustained before emitting.

Infrastructure latency

860ms

camera→fusion→MQTT→phone, excluding hold window

Alert latency P95

7.2s

documented acceptance-test result

ZMQ routing (P95)

0.295ms

218-byte payload, 300 runs

Fusion hold window

5000ms

required sustained-condition check

Per-Stage Latency Budget

Camera capture + frame ready
200ms
ROI upscale + MoveNet INT8 inference
150ms
Pose classification
10ms
ZMQ publish + proxy routing
20ms
Alert engine fusion evaluation
100ms
MQTT broker publish
80ms
Network delivery to phone
200ms
App notification display
100ms
Per-path delivery total
860ms
+ Fusion hold (5s sustained event)
5000ms
Time to first alert (avg)
~5.8s

Performance Evidence

Desktop Benchmarks — Separated from Device Validation

These benchmarks were run on an x86-64 development machine, not on the Raspberry Pi 4. They establish inference feasibility — model speed, ZMQ routing overhead, optical-flow accuracy — rather than production device-level measurements.

Model / SystemRoleMeanMedianMinMaxRunsNote
YAMNet INT8Cry / audio classification4.8ms4.36ms4.02ms6.12ms50521-class output, 96kB model, x86-64 development machine
MoveNet Lightning INT8Pose keypoint detection19.74ms18.78ms18.59ms31.37ms5017 keypoints × (y, x, confidence), 2.8MB model
ZMQ Bus (XPUB/XSUB)Intra-device message routing0.215ms0.202ms0.17ms1.239ms300218-byte payload, P95: 0.295ms, P99: 0.418ms
Optical Flow RespirationChest-movement respiratory rate0.384 bpm MAE0.09 bpm error1.07 bpm error10Farneback on synthetic frames (15–60 bpm range). Real-world: 82% within ±4 bpm
01

YAMNet INT8: 4.8ms Mean Inference

Audio Classification

521-class audio classification at 4.8ms mean on x86-64. The model processes 0.975-second audio windows. On Raspberry Pi 4 (ARM Cortex-A72), inference is approximately 3–5× slower, keeping it well within the audio service's 100ms processing budget at 10fps audio sampling.

02

MoveNet Lightning: 19.74ms Mean on Desktop

Pose Keypoint Detection

2.8MB model, 17 keypoints × (y, x, confidence). On Raspberry Pi 4 with INT8 quantization and TFLite delegate, the vision service achieves ~150ms per frame including ROI preprocessing — the dominant per-frame cost, not the model itself. CPU budget: ~55% of a single Pi 4 core.

03

ZMQ Bus: 0.215ms Mean, 1.239ms Max

Intra-Device Message Routing

The XPUB/XSUB proxy handles 218-byte JSON payloads with P95 at 0.295ms and P99 at 0.418ms across 300 runs. The 1.239ms max spike is an outlier — likely OS scheduler jitter. At 5fps vision + 10fps audio + 1Hz env, the bus handles ~16 messages per second, far below its saturation point.

04

Optical Flow: 0.384 bpm MAE

Chest-Movement Respiration Rate

Farneback optical flow on synthetic frames across the 15–60 bpm range achieves 0.384 bpm mean absolute error. Real-world device testing shows 82% of 30-second windows within ±4 bpm against reference. The rPPG implementation is labeled experimental — at 1.5m ceiling distance, the face occupies ~20×20 pixels, dominated by interpolation artefacts.

Engineering Honesty

8 Known Limitations, Documented in Full

These are real constraints discovered during development and acceptance testing, not theoretical edge cases. Each has a documented mitigation. None are hidden.

PerceptionLimitation 1 of 8

IR Occlusion: 8/10 vs 9/10 Target

Issue

Thin IR-transmissive fabrics (e.g. muslin) cannot be distinguished from an uncovered face by keypoint confidence alone. The algorithm improved from 6/10 to 8/10 with the temporal filter, but the 9/10 target remains unmet in IR mode.

Mitigation

Occlusion algorithm switches to full face-keypoint-dropout rule at night; CLAHE recovers ~+12% keypoint confidence.

PhysicsLimitation 2 of 8

rPPG Unreliable at Ceiling Distance

Issue

At 1.5m ceiling distance, the face region is approximately 20×20 pixels. Green-channel variation is dominated by interpolation artefacts rather than actual perfusion signal. Implemented as a proof-of-concept, labeled "experimental": true in all payloads.

Mitigation

rPPG is not used in any fusion rule. Primary respiratory monitoring is optical-flow-based.

GeometryLimitation 3 of 8

Side Position Detection ~70%

Issue

The shoulder–hip rotation metric is geometrically ambiguous when viewed from directly above: a baby lying at an angle between supine and true side-lying produces similar keypoint patterns. Detection is unreliable for this position.

Mitigation

Prone and supine detection remain robust. Side-lying is logged as a warning, not a CRITICAL trigger.

HardwareLimitation 4 of 8

SGP30 Occasional Zero Reads

Issue

After 2+ hours of continuous operation, the SGP30 TVOC/eCO2 sensor occasionally produces zero readings. Root cause appears to be I2C timing or baseline drift after extended uptime.

Mitigation

Mitigated by last-known-good value substitution and WARNING log entry. Environmental alerts remain active during substitution.

ThermalLimitation 5 of 8

Thermal Throttling After 7+ Hours

Issue

Raspberry Pi 4 reaches 72°C peak with passive heatsink during extended operation. The OS throttles the CPU above 75°C. A fan is required for indefinite continuous operation.

Mitigation

Thermal-aware mode: drops to 3fps and pauses rPPG above 75°C. Low-power mode at 2fps when baby is still.

ConnectivityLimitation 6 of 8

BLE Reconnection Fragility

Issue

Android drops BLE connections after ~5 minutes of idle. The keepalive workaround (30-second ping) is functional but not robust. Proper GATT connection parameters are needed for reliable long-session BLE.

Mitigation

MQTT is the primary alert path. BLE is labeled a fallback. MQTT remains active and unaffected by BLE state.

ScopeLimitation 7 of 8

Single-Crib Only

Issue

The system defines one ROI per camera. Multi-crib or twin monitoring is not supported. The caregiver suppression logic assumes the larger skeleton is always the adult.

Mitigation

Documented scope constraint. Multi-crib support identified as future work requiring a second camera or wider-angle lens.

CoverageLimitation 8 of 8

No Automated Unit Tests

Issue

Integration testing was performed manually during development due to timeline pressure. The fusion logic is verified via benchmark.py (13/13 tests pass), but individual service unit tests were not written.

Mitigation

Acceptance tests cover system-level correctness. Fusion logic tested synthetically. Hardware integration manual.

Validation

10 Acceptance Tests. 9 PASS. 1 MARGINAL.

System-level tests against the full Raspberry Pi 4 stack with a real crib setup, infant mannequin, and reference sensors. These are the primary evidence artifacts — not desktop inference timings.

TestRequirementResultStatus
Prone detection (mannequin)9/10 scenarios9/10PASS
Face occlusion (daytime)9/10 scenarios9/10PASS
Face occlusion (IR night mode)9/10 scenarios8/10MARGINAL
Respiratory rate accuracy±4 bpm in 80% of windows82% within ±4 bpmPASS
Temperature accuracy±1°C vs reference±0.8°CPASS
Humidity accuracy±5% RH vs reference±4.2% RHPASS
False CRITICAL alerts< 3 per 8-hour session2.1 avgPASS
Alert latency P95< 8s in 95% of tests7.2sPASS
Zero video/audio transmittedPacket capture auditZero packetsPASS
Continuous operation10-hour uptime11h 2minPASS

Privacy Proof

0 packets

Wireshark packet-capture audit across full 10-hour soak run. Zero video, audio, or raw sensor packets transmitted.

Alert Latency P95

7.2s

Measured end-to-end from event onset to phone notification. Requirement was <8s. Sustained-event hold window accounts for 5s of this budget.

Continuous Operation

11h 2min

Single uninterrupted soak run. Requirement was 10 hours. Thermal throttling observed after 7h; mitigated by adaptive fps reduction above 75°C.

Next Case Study

NullRing

The project asks a narrow question and answers it honestly: once the handoff path is reduced to the essentials, the remaining latency belongs as much to the machine as it does to the code.

View Case Study