Agent Task: Generate MonStim Analyzer Demo Data for Portfolio Website

Context

This task is for a portfolio website demo. The goal is to create a standalone Python script that generates a realistic synthetic H-reflex EMG dataset and exports it as a single demo_data.json file. No real patient/animal data is used. The JSON will be consumed by a Plotly.js interactive demo page embedded in a static GitHub Pages site — no server, no Python in the browser.

The script should live at tools/generate_demo_data.py in the MonStim-Analyzer repo.

What You Need to Understand First

Before writing anything, read the following source files to understand existing signal processing APIs you should reuse:

monstim_signals/transform/filtering.py — bandpass filter implementation
monstim_signals/transform/amplitude.py — amplitude calculation methods (RMS, peak-to-trough, etc.)
monstim_signals/transform/plateau.py — M-max plateau detection
monstim_signals/domain/recording.py — Recording data model (scan_rate, stim_amplitude, channel_types, raw_view)

Read all four files in full before proceeding. The goal is to reuse these exact functions on synthetic raw waveform arrays rather than re-implementing them. This makes the demo data genuinely representative of what the real app computes.

Script Specification: `tools/generate_demo_data.py`

Purpose

Generate a synthetic but physiologically realistic single-session EMG H-reflex recruitment dataset and write it to tools/demo_data.json.

Physiological Parameters (do not change these — they reflect real H-reflex biology)

Parameter	Value	Notes
Sampling rate	30,000 Hz	Typical for MonStim acquisitions
Recording window	80 ms total	Pre-stim: 10 ms, post-stim: 70 ms
Number of sweeps / recordings	35	Stimulus intensities spanning recruitment curve
Stimulus intensities	0.5 mA → 12.0 mA, log-spaced	Covers sub-threshold through M-max saturation
Stimulus delivery time	10 ms into the window	Index = 300 samples at 30 kHz
M-wave onset latency post-stim	~5–6 ms	Corresponds to ~150–180 samples post-stim
M-wave peak latency post-stim	~8–10 ms
M-wave duration (window)	~6 ms	Use 5–11 ms post-stim as analysis window
H-wave onset latency post-stim	~25 ms
H-wave peak latency post-stim	~28–30 ms
H-wave duration (window)	~8 ms	Use 24–32 ms post-stim as analysis window
Background noise floor	~0.05 mV RMS	Gaussian white noise

Waveform Shape

Each recording sweep should be synthesized as follows:

Baseline noise: Gaussian white noise at 0.05 mV RMS across all 2400 samples
Stimulus artifact: A sharp biphasic spike at the stimulus sample (index 300):
- Duration: 3 samples
- Amplitude: scales slightly with stimulus intensity (1–3 mV peak) — clipped by the real hardware
M-wave (compound muscle action potential): A biphasic Gaussian waveform
- Positive peak then negative trough (ratio ~1.5:1)
- Peak at stim+9 ms; model as A_m * sin(2π * 200Hz * t) * exp(-t²/(2σ²)) with σ=1.2ms
- Amplitude A_m follows a sigmoid as a function of stimulus intensity:
  - Threshold: ~2 mA; saturation: ~7 mA; max amplitude: ~1.2 mV peak-to-trough
  - Use: A_m = A_m_max / (1 + exp(-k*(stim - stim_m_threshold))) with k=1.2
H-wave (H-reflex): A smaller biphasic waveform at a longer latency
- Peak at stim+29 ms; same shape model as M-wave with σ=1.8ms, dominant frequency ~100 Hz
- Amplitude A_h follows an inverted-U (bell curve) over stimulus intensity:
  - Appears at ~1.5 mA, peaks around 4–5 mA (~0.4 mV), disappears above ~8 mA
  - Use: A_h = A_h_max * exp(-((stim - stim_h_peak)**2) / (2 * sigma_h**2)) with stim_h_peak=4.0, sigma_h=1.5
  - Only present when A_h > 0.02 mV (below noise floor threshold)
Apply bandpass filter to the full waveform using the existing monstim_signals.transform.filtering functions (100–3500 Hz Butterworth, order 4, filtfilt). If the module import is inconvenient from tools/, just implement scipy.signal.butter + filtfilt inline with identical parameters.

Amplitude Extraction

After generating the filtered waveforms, compute for each sweep:

M-wave amplitude: RMS in the M-wave window (5–11 ms post-stim) — use the existing amplitude.py function
H-wave amplitude: RMS in the H-wave window (24–32 ms post-stim)
M-wave peak_to_trough: peak minus trough in window (store this too)
H-wave peak_to_trough: peak minus trough in window

Do not use the plateau/M-max detection for the demo — just store the raw per-sweep amplitudes.

Output Schema: `tools/demo_data.json`

Produce exactly this JSON structure (no extra nesting, no deviation):

{
  "meta": {
    "scan_rate": 30000,
    "num_samples": 2400,
    "stim_onset_ms": 10.0,
    "m_window_ms": [5.0, 11.0],
    "h_window_ms": [24.0, 32.0],
    "channel_name": "Tibialis Anterior (Synthetic)",
    "generated_at": "<ISO-8601 timestamp>"
  },
  "recordings": [
    {
      "index": 0,
      "stim_ma": 0.5,
      "time_ms": [/* 2400 floats, 2 decimal places, from 0.0 to 79.97 ms */],
      "emg_mv": [/* 2400 floats, filtered waveform, 5 significant figures */],
      "m_wave": {
        "window_ms": [5.0, 11.0],
        "amplitude_rms_mv": 0.0,
        "amplitude_p2t_mv": 0.0,
        "present": false
      },
      "h_wave": {
        "window_ms": [24.0, 32.0],
        "amplitude_rms_mv": 0.0,
        "amplitude_p2t_mv": 0.0,
        "present": false
      }
    }
    /* ... 34 more recording objects ... */
  ],
  "recruitment_curve": {
    "stim_ma": [/* 35 floats */],
    "m_wave_rms_mv": [/* 35 floats */],
    "h_wave_rms_mv": [/* 35 floats */],
    "m_wave_p2t_mv": [/* 35 floats */],
    "h_wave_p2t_mv": [/* 35 floats */]
  }
}

Encoding rules:

time_ms and emg_mv: round to 5 significant figures to keep file size manageable
present flag: true if the wave’s amplitude_rms_mv > 0.02 (i.e., above noise floor)
Use Python’s json module; do NOT use numpy types directly (call .tolist() on arrays, round(float(x), 5) on scalars)
Target file size: under 1.5 MB. If the file exceeds this, reduce num_samples to 1200 (40 ms window, same relative timing) and recalculate accordingly. Update meta to match whatever you choose.
Set a fixed random seed (np.random.seed(42)) for reproducibility

Execution & Validation

After writing the script:

Run it: python tools/generate_demo_data.py from the repo root (activate the alv_lab conda environment first)
Verify the output file exists at tools/demo_data.json
Run these validation checks and report the results:
- File size in KB
- Number of recordings in the JSON
- Max M-wave RMS amplitude across all recordings
- Max H-wave RMS amplitude across all recordings
- Stimulus intensity at which H-wave is maximum
- Stimulus intensity at which M-wave first exceeds 0.1 mV RMS (M-threshold)
- Print the first recording’s emg_mv min and max values (sanity check on filter/noise)
If any validation fails (e.g., H-wave never appears, M-wave never saturates, file > 1.5 MB), fix the synthesis parameters and re-run before reporting results.

Deliverables

Return the following in your response:

The complete source of tools/generate_demo_data.py
The full terminal output from running it (including validation checks)
The complete contents of tools/demo_data.json (paste the entire file — it should be under 1.5 MB)
A brief note on any deviations from this spec (e.g., if you had to adjust physiological parameters to get realistic-looking curves, explain what and why)

The agent consuming this output will use the JSON to build a Plotly.js interactive page and needs the schema to be exactly as specified. Do not invent extra fields or change key names.

Andrew Worthy