Agent Task: Generate MonStim Analyzer Demo Data for Portfolio Website

Agent Task: Generate MonStim Analyzer Demo Data for Portfolio Website

Context

This task is for a portfolio website demo. The goal is to create a standalone Python script that generates a realistic synthetic H-reflex EMG dataset and exports it as a single demo_data.json file. No real patient/animal data is used. The JSON will be consumed by a Plotly.js interactive demo page embedded in a static GitHub Pages site — no server, no Python in the browser.

The script should live at tools/generate_demo_data.py in the MonStim-Analyzer repo.


What You Need to Understand First

Before writing anything, read the following source files to understand existing signal processing APIs you should reuse:

  • monstim_signals/transform/filtering.py — bandpass filter implementation
  • monstim_signals/transform/amplitude.py — amplitude calculation methods (RMS, peak-to-trough, etc.)
  • monstim_signals/transform/plateau.py — M-max plateau detection
  • monstim_signals/domain/recording.py — Recording data model (scan_rate, stim_amplitude, channel_types, raw_view)

Read all four files in full before proceeding. The goal is to reuse these exact functions on synthetic raw waveform arrays rather than re-implementing them. This makes the demo data genuinely representative of what the real app computes.


Script Specification: tools/generate_demo_data.py

Purpose

Generate a synthetic but physiologically realistic single-session EMG H-reflex recruitment dataset and write it to tools/demo_data.json.

Physiological Parameters (do not change these — they reflect real H-reflex biology)

ParameterValueNotes
Sampling rate30,000 HzTypical for MonStim acquisitions
Recording window80 ms totalPre-stim: 10 ms, post-stim: 70 ms
Number of sweeps / recordings35Stimulus intensities spanning recruitment curve
Stimulus intensities0.5 mA → 12.0 mA, log-spacedCovers sub-threshold through M-max saturation
Stimulus delivery time10 ms into the windowIndex = 300 samples at 30 kHz
M-wave onset latency post-stim~5–6 msCorresponds to ~150–180 samples post-stim
M-wave peak latency post-stim~8–10 ms 
M-wave duration (window)~6 msUse 5–11 ms post-stim as analysis window
H-wave onset latency post-stim~25 ms 
H-wave peak latency post-stim~28–30 ms 
H-wave duration (window)~8 msUse 24–32 ms post-stim as analysis window
Background noise floor~0.05 mV RMSGaussian white noise

Waveform Shape

Each recording sweep should be synthesized as follows:

  1. Baseline noise: Gaussian white noise at 0.05 mV RMS across all 2400 samples
  2. Stimulus artifact: A sharp biphasic spike at the stimulus sample (index 300):
    • Duration: 3 samples
    • Amplitude: scales slightly with stimulus intensity (1–3 mV peak) — clipped by the real hardware
  3. M-wave (compound muscle action potential): A biphasic Gaussian waveform
    • Positive peak then negative trough (ratio ~1.5:1)
    • Peak at stim+9 ms; model as A_m * sin(2π * 200Hz * t) * exp(-t²/(2σ²)) with σ=1.2ms
    • Amplitude A_m follows a sigmoid as a function of stimulus intensity:
      • Threshold: ~2 mA; saturation: ~7 mA; max amplitude: ~1.2 mV peak-to-trough
      • Use: A_m = A_m_max / (1 + exp(-k*(stim - stim_m_threshold))) with k=1.2
  4. H-wave (H-reflex): A smaller biphasic waveform at a longer latency
    • Peak at stim+29 ms; same shape model as M-wave with σ=1.8ms, dominant frequency ~100 Hz
    • Amplitude A_h follows an inverted-U (bell curve) over stimulus intensity:
      • Appears at ~1.5 mA, peaks around 4–5 mA (~0.4 mV), disappears above ~8 mA
      • Use: A_h = A_h_max * exp(-((stim - stim_h_peak)**2) / (2 * sigma_h**2)) with stim_h_peak=4.0, sigma_h=1.5
      • Only present when A_h > 0.02 mV (below noise floor threshold)
  5. Apply bandpass filter to the full waveform using the existing monstim_signals.transform.filtering functions (100–3500 Hz Butterworth, order 4, filtfilt). If the module import is inconvenient from tools/, just implement scipy.signal.butter + filtfilt inline with identical parameters.

Amplitude Extraction

After generating the filtered waveforms, compute for each sweep:

  • M-wave amplitude: RMS in the M-wave window (5–11 ms post-stim) — use the existing amplitude.py function
  • H-wave amplitude: RMS in the H-wave window (24–32 ms post-stim)
  • M-wave peak_to_trough: peak minus trough in window (store this too)
  • H-wave peak_to_trough: peak minus trough in window

Do not use the plateau/M-max detection for the demo — just store the raw per-sweep amplitudes.


Output Schema: tools/demo_data.json

Produce exactly this JSON structure (no extra nesting, no deviation):

{
  "meta": {
    "scan_rate": 30000,
    "num_samples": 2400,
    "stim_onset_ms": 10.0,
    "m_window_ms": [5.0, 11.0],
    "h_window_ms": [24.0, 32.0],
    "channel_name": "Tibialis Anterior (Synthetic)",
    "generated_at": "<ISO-8601 timestamp>"
  },
  "recordings": [
    {
      "index": 0,
      "stim_ma": 0.5,
      "time_ms": [/* 2400 floats, 2 decimal places, from 0.0 to 79.97 ms */],
      "emg_mv": [/* 2400 floats, filtered waveform, 5 significant figures */],
      "m_wave": {
        "window_ms": [5.0, 11.0],
        "amplitude_rms_mv": 0.0,
        "amplitude_p2t_mv": 0.0,
        "present": false
      },
      "h_wave": {
        "window_ms": [24.0, 32.0],
        "amplitude_rms_mv": 0.0,
        "amplitude_p2t_mv": 0.0,
        "present": false
      }
    }
    /* ... 34 more recording objects ... */
  ],
  "recruitment_curve": {
    "stim_ma": [/* 35 floats */],
    "m_wave_rms_mv": [/* 35 floats */],
    "h_wave_rms_mv": [/* 35 floats */],
    "m_wave_p2t_mv": [/* 35 floats */],
    "h_wave_p2t_mv": [/* 35 floats */]
  }
}

Encoding rules:

  • time_ms and emg_mv: round to 5 significant figures to keep file size manageable
  • present flag: true if the wave’s amplitude_rms_mv > 0.02 (i.e., above noise floor)
  • Use Python’s json module; do NOT use numpy types directly (call .tolist() on arrays, round(float(x), 5) on scalars)
  • Target file size: under 1.5 MB. If the file exceeds this, reduce num_samples to 1200 (40 ms window, same relative timing) and recalculate accordingly. Update meta to match whatever you choose.
  • Set a fixed random seed (np.random.seed(42)) for reproducibility

Execution & Validation

After writing the script:

  1. Run it: python tools/generate_demo_data.py from the repo root (activate the alv_lab conda environment first)
  2. Verify the output file exists at tools/demo_data.json
  3. Run these validation checks and report the results:
    • File size in KB
    • Number of recordings in the JSON
    • Max M-wave RMS amplitude across all recordings
    • Max H-wave RMS amplitude across all recordings
    • Stimulus intensity at which H-wave is maximum
    • Stimulus intensity at which M-wave first exceeds 0.1 mV RMS (M-threshold)
    • Print the first recording’s emg_mv min and max values (sanity check on filter/noise)
  4. If any validation fails (e.g., H-wave never appears, M-wave never saturates, file > 1.5 MB), fix the synthesis parameters and re-run before reporting results.

Deliverables

Return the following in your response:

  1. The complete source of tools/generate_demo_data.py
  2. The full terminal output from running it (including validation checks)
  3. The complete contents of tools/demo_data.json (paste the entire file — it should be under 1.5 MB)
  4. A brief note on any deviations from this spec (e.g., if you had to adjust physiological parameters to get realistic-looking curves, explain what and why)

The agent consuming this output will use the JSON to build a Plotly.js interactive page and needs the schema to be exactly as specified. Do not invent extra fields or change key names.