cometspec.mcmc

MCMC fitting kernel for the multi-isotopologue fluorescence model.

cometspec.mcmc.mcmc_fitting(data=None, window=None, *, pumping=None, isotopologues='12C14N', systems=None, linelists=None, include_rotations=True, include_deltaJ0_parity_mix=True, require_X_only_for_rot=True, priors=None, nwalkers=50, nsteps=1000, n_cores=None, lsf=None, lsf_method=None, make_plots=False, progress=True, A_min=10000.0, a=3, velocity_kms=0.0, delta_lambda_A=0.0, init_logQ=None, init_logQ_by_iso=None, init_T=300.0, init_T_by_iso=None, init_v_kms=0.0, init_v_kms_by_iso=None, init_dlam=0.0, init_dlam_by_iso=None, init_logN=None, init_logN_by_iso=None, init_sigma=None, init_sigma1=None, init_sigma2=None, init_sigma_G=None, init_fwhm_L=None, init_ratio=None, fig_file=None, wave_col='WAVE', flux_col='FLUX_STACK', error_col='ERR_STACK', continuum_col='CONTINUUM', omega=None, verbose=True, pruning=True, N_Model=20000, config=None)[source]

Run MCMC fitting for the fluorescence model in a wavelength window.

This routine builds the model for one or more isotopologues, applies an optional line-spread function (LSF), and samples posterior distributions for the parameters provided in priors.

This is the key function for fitting. It can be used standalone, but is used by cometspec.fluorescence.FluorescenceModel as its default fitting method.

Defaults and selector behavior

  • isotopologues defaults to "12C14N".

  • systems defaults to None, which maps to ["BX00", "AX_dv1"]. Only used for CN; ignored for other species.

  • Accepted string selectors for systems include:
    • "both"/"bx+ax"/"bxax" -> ["BX00", "AX_dv1", "AX_dv2", "AX_dv3"]

    • "all" -> ["ALL"] Extremely high computation cost; not recommended.

    • "bx", "b-x", "bx(0,0)", "bx00", "bx_00", "b_x_00" -> ["BX00"]

    • "ax"/"a-x" -> ["AX_dv1", "AX_dv2"]

    • "ax(dv=1)"/"ax_dv1" -> ["AX_dv1"]

    • "ax(dv=2)"/"ax_dv2" -> ["AX_dv2"]

    • "ax(dv=3)"/"ax_dv3" -> ["AX_dv3"]

  • emcee nwalkers=50 and nsteps=1000 by default.

  • Collisions are gated by logQ: if neither a per-iso logQ_{iso} nor a shared logQ prior is given it will fall back to init_logQ, and if init_logQ or init_logQ_by_iso are also not given for an isotopologue, that isotopologue is treated as collisionless. Other collision controls default to include_deltaJ0_parity_mix=True (to allow \(\Delta J = 0\)) and require_X_only_for_rot=True (to enable just collisions to the lower electronic state found).

  • It is recommended to set explicitly include_rotations=False if no collisions is the desired behavior.

  • Pumping-shift controls default to velocity_kms=0.0 and delta_lambda_A=0.0. These values are used when computing \(J_\nu\) for the radiative rates and they are different from v_kms and dlam parameters of a model which are shifts of the output spectrum lines.

  • Fallback parameter values starts with init_ check the default behavior here if this function is used directly, check them in cometspec.fluorescence.FluorescenceModel if this function is used via the model’s default fitting method.

  • If parameters are not present in the priors, they will fall to the init_ default values (if provided) or to the hardcoded defaults in the model_flux function (e.g. T=300K, v_kms=0, etc). This means that if you want to fit for a parameter but don’t provide a prior for it, it will not be fitted and instead will use the fallback value. This allows you to control which parameters are fitted and which are fixed without having to change the code of this function.

Note

Priors consist on a dict mapping parameter name to a (min, max) tuple defining the uniform prior range for that parameter. The possible keys are: "logN", "logQ", "T", "dlam", "v_kms", ‘logN_<isotopologue>’, ‘logQ_<isotopologue>’, ‘T_<isotopologue>’, ‘dlam_<isotopologue>’, ‘v_kms_<isotopologue>’ (any iso not found falls back to <parameter>) and lsf priors, sigma, sigma1, sigma2, sigma_G, fwhm_L, ratio.

Parameters:
  • data (Any) – Observed spectrum table or DataFrame which is going to be fitted.

  • window (tuple[float, float]) – Wavelength fitting window (min_A, max_A).

  • pumping (Any) – Pumping spectrum with WAVE and FLUX columns.

  • isotopologues (str or Sequence[str], optional, default "12C14N") – One or more isotopologue labels.

  • systems (str or Sequence[str], optional, default None) – CN system selector(s). If None, uses ["BX00", "AX_dv1"].

  • linelists (pandas.DataFrame or dict[str, pd.DataFrame] or Sequence[pd.DataFrame], optional, default None) – Optional normalized line-list DataFrame or isotopologue mapping.

  • priors (dict[str, tuple[float, float]]) – Parameter prior ranges. Required.

  • nwalkers (int, optional, default 50) – Number of walkers.

  • nsteps (int, optional, default 1000) – Number of MCMC steps.

  • n_cores (int, optional, default None) – Number of CPU cores used to parallelize the per-step walker likelihood evaluations through the multiprocess library (a drop-in replacement for the standard multiprocessing that uses dill for serialization). None or 1 keeps the sampler single-threaded (the prior default behavior). Values >1 spawn a spawn-based pool that emcee uses to evaluate walkers in parallel; the value is clamped to os.cpu_count(). The spawn start method is used on every platform so the pool is safe on macOS (Apple Accelerate’s GCD-backed BLAS deadlocks or segfaults when a multi-threaded process forks) as well as on Linux and Windows. Using multiprocess (with dill) instead of stock multiprocessing means user-provided lsf callables defined as lambdas, closures, or top-level functions in a Jupyter notebook cell are serialized correctly — stock pickle would silently fail on those and hang the sampler. Each worker has its BLAS thread pool pinned to 1 via threadpoolctl to avoid thread oversubscription. The tqdm progress bar (when progress=True) keeps working under both modes.

    Important

    When calling this function with n_cores > 1 from a .py script (not a Jupyter notebook), the call must be guarded by if __name__ == "__main__":. Each spawn worker re-imports the user’s script, and without the guard the call would re-execute recursively; Python detects this and aborts with a RuntimeError. No guard is needed in notebooks or when n_cores is None/1.

    Note

    Whether n_cores>1 is faster than n_cores=1 depends on how expensive each likelihood evaluation is, which scales with the line-list size — driven by A_min, the number of isotopologues and systems, and the window width. Multiprocessing adds a fixed IPC overhead per emcee step (pickle + pipe roundtrip per walker half). When each lnprob call is slow (large line list — e.g. A_min=1e4 for CN with B-X + A-X, or multi-isotopologue runs) the per-step cost dwarfs IPC and n_cores=4 typically gives a ~3-4x speedup. When each call is cheap (small line list — e.g. A_min ≥ 1e6 with a single isotopologue, or a narrow window with few transitions) IPC overhead can dominate and n_cores=1 may end up equal or faster. If unsure, time one short run at each setting on your problem before committing to a long MCMC.

  • lsf (Callable[[numpy.ndarray], numpy.ndarray], optional, default None) – Optional custom LSF callable.

  • lsf_method (str, optional, default None) – Built-in LSF method name. It must be one of "2Gauss", "Gauss_Lorentz", "Gauss", or "Lorentz". Required if lsf is not provided.

  • include_rotations (bool, optional, default True) – Enable rotational collisions in the rate matrix.

  • omega (float, optional, default None) – Optional aperture solid angle in sr.

  • config (MCMCFitConfig, optional, default None) – Optional grouped configuration. Any field set on this object overrides the corresponding individual keyword argument. Fields left at cometspec.config.UNSET (the default) are ignored, so callers that pass individual kwargs not in the config are unaffected. See cometspec.config.MCMCFitConfig.

Other Parameters:
  • include_deltaJ0_parity_mix (bool, optional, default True) – Allow parity-changing Delta J = 0 collisions.

  • require_X_only_for_rot (bool, optional, default True) – Restrict collisions to the lower electronic state (auto-detected as the lower_es label with the smallest minimum E_lower_cm1; works for any spectroscopic notation).

  • make_plots (bool, optional, default False) – Generate diagnostic plots (fit, corner and traces).

  • progress (bool, optional, default True) – Show emcee progress output.

  • A_min (float, optional, default 1e4) – Minimum Einstein A threshold for line lists. User provided line lists are not filtered by A_ul

  • a (float, optional, default 3) – Stretch-move parameter for emcee.

  • velocity_kms (float, optional, default 0.0) – Velocity shift used when evaluating pumping J_nu.

  • delta_lambda_A (float, optional, default 0.0) – Additive wavelength shift used when evaluating pumping J_nu.

  • init_logQ (float, optional, default None) – Fallback logQ value used by every isotopologue when no logQ prior is sampled and no per-iso entry is given. None disables collisions for any isotopologue not covered by init_logQ_by_iso.

  • init_logQ_by_iso (dict[str, float or None], optional, default None) – Per-isotopologue fallback logQ map. Each value may be None to force that isotopologue to be collisionless. Isotopologues not present in the map fall back to init_logQ.

  • init_T (float, optional, default 300.0) – Global fallback temperature when T is not sampled and no per-iso entry is given.

  • init_T_by_iso (dict[str, float], optional, default None) – Per-isotopologue fallback temperature map. For each isotopologue, init_T_by_iso[iso] is used when neither T_{iso} nor T appears in the sampled parameters. Isotopologues not in this map fall back to init_T.

  • init_v_kms (float, optional, default 0.0) – Global fallback emission velocity shift when v_kms is not sampled and no per-iso entry is given.

  • init_v_kms_by_iso (dict[str, float], optional, default None) – Per-isotopologue fallback emission velocity shift. Falls back to init_v_kms for isotopologues not present in the map.

  • init_dlam (float, optional, default 0.0) – Global fallback emission wavelength shift when dlam is not sampled and no per-iso entry is given.

  • init_dlam_by_iso (dict[str, float], optional, default None) – Per-isotopologue fallback additive wavelength shift. Falls back to init_dlam for isotopologues not present in the map.

  • init_logN (float, optional, default None) – Global fallback logN when neither logN_{iso} nor logN appears in the sampled parameters and no per-iso entry is given.

  • init_logN_by_iso (dict[str, float], optional, default None) – Per-isotopologue fallback logN map. For each isotopologue, init_logN_by_iso[iso] is used when neither logN_{iso} nor logN are sampled. Falls back to init_logN for isotopologues not present in the map.

  • init_sigma (float, optional, default None) – Fallback Gaussian sigma when "sigma" is not sampled.

  • init_sigma1 (float, optional, default None) – Fallback sigma1 for "2Gauss" when not sampled.

  • init_sigma2 (float, optional, default None) – Fallback sigma2 for "2Gauss" when not sampled.

  • init_sigma_G (float, optional, default None) – Fallback sigma_G for "Gauss_Lorentz" when not sampled.

  • init_fwhm_L (float, optional, default None) – Fallback Lorentzian FWHM when not sampled.

  • init_ratio (float, optional, default None) – Fallback mixture ratio when not sampled.

  • fig_file (str, optional, default None) – Base path for output figures.

  • wave_col (str, optional, default "WAVE") – Wavelength column in data.

  • flux_col (str, optional, default "FLUX_STACK") – Flux column in data.

  • error_col (str, optional, default "ERR_STACK") – Uncertainty column in data.

  • continuum_col (str, optional, default "CONTINUUM") – Continuum column in data.

  • verbose (bool, optional, default True) – Print diagnostics.

  • pruning (bool, optional, default True) – Apply posterior pruning.

  • N_Model (int, optional, default 20000) – Number of elements in the model grid.

returns:

dict[str, Any] – Dictionary with posterior summaries, samples, and model envelopes. Keys:

  • param_keys (list of str) – Ordered list of sampled parameter names, matching the column order of samples_pruned.

  • median_params (dict [str, float]) – Posterior median for each parameter.

  • up_errors_params (dict [str, float]) – Upper 1-sigma error (p84 - p50) for each parameter.

  • low_errors_params (dict [str, float]) – Lower 1-sigma error (p50 - p16) for each parameter.

  • samples_pruned (numpy.ndarray of float, shape (N_samples, ndim)) – Pruned and burn-in-removed posterior samples.

  • lnprob_pruned (numpy.ndarray of float, shape (N_samples,)) – Log-probability for each pruned sample.

  • model_wave (numpy.ndarray of float, shape (N_Model,)) – Wavelength grid over the fitting window.

  • median_model (numpy.ndarray of float, shape (N_Model,)) – Model flux evaluated at the posterior median parameters.

  • model_p16 (numpy.ndarray of float, shape (N_Model,)) – 16th percentile of the model ensemble (lower 1-sigma envelope).

  • model_p84 (numpy.ndarray of float, shape (N_Model,)) – 84th percentile of the model ensemble (upper 1-sigma envelope).

  • best_model (numpy.ndarray of float, shape (N_Model,)) – Model flux evaluated at the highest-likelihood sample.

raises ValueError:

If priors or required parameters are inconsistent.

raises ValueError:

If required columns are missing from the data.

Notes

If extreme prior values cause the rate-matrix solver to fail (LinAlgError from a numerically degenerate matrix, typically due to NaN/Inf collision rates at very high Q), solve_with_normalization_fast returns NaN populations. lnlike then detects a non-finite model and returns -inf, so the walker step is rejected gracefully rather than crashing.

cometspec.mcmc.H_CGS

Planck constant, in erg*s.