# Synthetic Data Simulation

## Why simulate?

Simulated traces with known ground truth let you:

- **Benchmark deconvolution** — measure spike detection accuracy against known spike trains.
- **Test edge cases** — vary SNR, kernel shape, drift, and saturation to see where algorithms break.
- **Test pipelines** — confirm that your analysis code handles common artifacts before running it on real data.

CaLab's simulation runs the heavy work in Rust for performance and exposes Pydantic configuration models in Python for full control.

## Basic usage

```python
import calab

result = calab.simulate()

print(result.traces.shape)              # (100, 27000) — 100 cells, 15 min at 30 Hz
print(len(result.ground_truth))         # 100 — one CellGroundTruth per cell
print(result.ground_truth[0].spikes)    # (27000,) spike counts at imaging rate
```

`simulate()` accepts an optional `SimulationConfig` and/or keyword overrides:

```python
# Override individual fields on the default config
result = calab.simulate(num_cells=50, seed=123)

# Pass a full config object
config = calab.SimulationConfig(num_cells=20)
result = calab.simulate(config)

# Combine a config with keyword overrides
result = calab.simulate(config, seed=99)
```

## Indicator presets

Each preset returns a `SimulationConfig` with approximate, indicator-appropriate kernel time constants and SNR. These are rough starting points for generating synthetic data, not validated fits to real indicator measurements. All presets accept `**overrides` to customize any field.

Available presets: `gcamp6f`, `gcamp6s`, `gcamp6m`, `jgcamp8f`, `ogb1`, and `clean` (minimal noise, for debugging).

```python
result = calab.simulate(calab.presets.gcamp6f(num_cells=20))
result = calab.simulate(calab.presets.jgcamp8f(num_cells=50))
result = calab.simulate(calab.presets.clean())
```

## Custom configuration

The simulation is configured with Pydantic models. Every field has a sensible default.

```python
from calab import SimulationConfig, KernelConfig, NoiseConfig, MarkovConfig

config = SimulationConfig(
    num_cells=20,
    num_timepoints=9000,       # 9000 samples = 5 min at 30 Hz
    fs_hz=30.0,
    kernel=KernelConfig(tau_rise_s=0.02, tau_decay_s=0.4, tau_decay_cv=0.15),
    spike_model=MarkovConfig(p_silent_to_active=0.01),
    noise=NoiseConfig(snr=5.0),
)

result = calab.simulate(config)
```

### SimulationConfig defaults

| Field            | Default                  | Description                                   |
| ---------------- | ------------------------ | --------------------------------------------- |
| `fs_hz`          | 30.0                     | Sampling rate (Hz)                            |
| `num_timepoints` | 27000                    | Number of timepoints (27000 / 30 Hz = 15 min) |
| `num_cells`      | 100                      | Number of cells                               |
| `kernel`         | `KernelConfig()`         | Double-exponential kernel                     |
| `spike_model`    | `MarkovConfig()`         | Spike generator                               |
| `noise`          | `NoiseConfig()`          | Noise model                                   |
| `drift`          | `RandomWalkDrift()`      | Baseline drift model                          |
| `photobleaching` | `PhotobleachingConfig()` | Photobleaching (disabled by default)          |
| `saturation`     | `SaturationConfig()`     | Indicator saturation (disabled by default)    |
| `alpha_mean`     | 1.0                      | Mean per-cell amplitude scaling factor        |
| `alpha_cv`       | 0.3                      | Per-cell log-normal CV on alpha               |
| `seed`           | 42                       | RNG seed (u32)                                |
| `spike_sim_hz`   | 300.0                    | Internal spike simulation rate (Hz)           |

### Spike models

Two spike generators are available:

- **MarkovConfig** — Two-state model (silent/active) with bursty firing. Default.
- **PoissonConfig** — Poisson process at a fixed rate (`rate_hz`, default 1.0).

```python
from calab import SimulationConfig, PoissonConfig

config = SimulationConfig(spike_model=PoissonConfig(rate_hz=2.0))
result = calab.simulate(config)
```

### Kernel

`KernelConfig` defines the double-exponential calcium response (rise and decay time constants).

| Field          | Default | Description                         |
| -------------- | ------- | ----------------------------------- |
| `tau_rise_s`   | 0.1     | Rise time constant (seconds)        |
| `tau_decay_s`  | 0.6     | Decay time constant (seconds)       |
| `tau_rise_cv`  | 0.0     | Per-cell log-normal CV on tau_rise  |
| `tau_decay_cv` | 0.0     | Per-cell log-normal CV on tau_decay |

### Noise and artifacts

```python
from calab import SimulationConfig, NoiseConfig, PhotobleachingConfig, SaturationConfig

config = SimulationConfig(
    noise=NoiseConfig(snr=3.0, shot_noise_enabled=True),
    photobleaching=PhotobleachingConfig(enabled=True, decay_time_constant_s=300.0),
    saturation=SaturationConfig(enabled=True, k_d=5.0),
)
```

**NoiseConfig** defaults: `snr=8.0`, `shot_noise_enabled=False`, `shot_noise_fraction=0.3`, `snr_spread=0.0`.

**PhotobleachingConfig** (disabled by default): `decay_time_constant_s=600.0`, `amplitude_fraction=0.15`, `amplitude_cv=0.0`.

**SaturationConfig** (disabled by default): `hill_coefficient=1.0`, `k_d=5.0`, `k_d_cv=0.0`.

### Drift models

```python
from calab import SimulationConfig, SinusoidalDrift, RandomWalkDrift

# Deterministic sinusoidal drift
config = SimulationConfig(drift=SinusoidalDrift(amplitude_fraction=0.1, cycles_min=2.0))

# Stochastic mean-reverting random walk (default)
config = SimulationConfig(drift=RandomWalkDrift(step_std_fraction=0.01))
```

**RandomWalkDrift** (default): `step_std_fraction=0.002`, `mean_reversion=0.001`, `step_std_cv=0.0`.

**SinusoidalDrift**: `amplitude_fraction=0.1`, `cycles_min=2.0`, `cycles_max=4.0`, `amplitude_cv=0.0`.

## Ground truth

Each cell's ground truth is a `CellGroundTruth` object with these fields:

```python
gt = result.ground_truth[0]

gt.spikes          # (num_timepoints,) spike counts at imaging rate
gt.clean_calcium   # (num_timepoints,) kernel * spikes, no noise
gt.alpha           # amplitude scaling factor for this cell
gt.snr             # actual SNR for this cell
gt.tau_rise_s      # actual rise time constant (seconds; varies if tau_rise_cv > 0)
gt.tau_decay_s     # actual decay time constant (seconds; varies if tau_decay_cv > 0)
```