`cometspec.linelist`¶

Line-list parsing and normalization routines.

Routines¶

normalize_cn_systems_arg() – Normalize user-friendly CN system selectors to canonical tokens.
from_user_linelist() – Convert a user line list into the normalized transition schema.
make_sym() – Build a symmetry label.
from_cn_brooke() – Convert a Brooke CN line list (e.g. from load_cn_linelist()) to the normalized schema.
filter_cn_systems() – Filter a Brooke CN line list by system, wavelength, and A (Einstein coefficient) threshold.
load_default_transitions() – Load and normalize packaged transitions per isotopologue.
resolve_linelists_with_defaults() – Resolve user-supplied linelists, filling in defaults for any missing isotopologues.
default_linelist_source() – Return the file path that would be loaded for a given isotopologue from packaged defaults.
linelist_origins() – Return a mapping of isotopologue to source description for a set of linelists.
attach_pumping_and_labels() – Attach pumping information and human-friendly labels to a transition table. This is important as it ensures the solar pumping information is correctly associated with each transition.

cometspec.linelist.normalize_cn_systems_arg(systems)[source]¶

Translate user-friendly CN band-system selectors into canonical tokens.

This is the input-parser for any function that needs to know which CN band system(s) to operate on. It accepts a variety of human-friendly spellings (case-insensitive, with or without dashes/parentheses) and maps each one to a fixed set of internal tokens used downstream. A sequence of selectors is also accepted; results are flattened and deduplicated while preserving order.

The canonical (output) tokens are:

"BX00" – \(B^{2}\Sigma^{+} \to X^{2}\Sigma^{+}\) violet system, \((v', v'') = (0, 0)\) band (~388 nm).
"AX_dv0" – \(A^{2}\Pi \to X^{2}\Sigma^{+}\) red system, \(\Delta v = |v' - v''| = 0\) sequence.
"AX_dv1" – \(A^{2}\Pi \to X^{2}\Sigma^{+}\) red system, \(\Delta v = |v' - v''| = 1\) sequence.
"AX_dv2" – A–X red system, \(\Delta v = 2\) sequence.
"AX_dv3" – A–X red system, \(\Delta v = 3\) sequence.
"XX" – All X–X transitions.
"ALL" – This token if used, it will include all transitions, resulting in extremely long computation times.

Recognized input forms (all matched case-insensitively after stripping):

None – default selection, returns ["BX00", "AX_dv1"].
"both", "bx+ax", "bxax" – violet plus all three red sequences.
"all" – returns ["ALL"].
"bx", "b-x", "bx(0,0)", "bx00", "bx_00", "b_x_00" – the violet \((0,0)\) band.
"ax", "a-x" – the \(\Delta v = 1\) and \(\Delta v = 2\) red sequences.
"ax(dv=0)", "ax_dv0" – A–X \(\Delta v = 0\) only.
"ax(dv=1)", "ax_dv1" – A–X \(\Delta v = 1\) only.
"ax(dv=2)", "ax_dv2" – A–X \(\Delta v = 2\) only.
"ax(dv=3)", "ax_dv3" – A–X \(\Delta v = 3\) only.
"xx" – all X–X transitions.
Any other string – passed through unchanged as a single-element list, letting the caller handle (or reject) unknown tokens.
A sequence (list, tuple, …) of any of the above – each element is normalized recursively, results are concatenated, and duplicates are removed while preserving first-occurrence order.

Parameters:: systems (str or sequence of str, optional) – Band-system selector(s). See the list of recognized forms above.
Returns:: list of str – Canonical token list. Order matches the order of the input. No results are duplicated.

Examples

normalize_cn_systems_arg(None)
['BX00', 'AX_dv1']
normalize_cn_systems_arg("both")
['BX00', 'AX_dv1', 'AX_dv2', 'AX_dv3']
normalize_cn_systems_arg("BX")
['BX00']
normalize_cn_systems_arg(["bx", "ax_dv1", "bx"])  # dedup, order preserved
['BX00', 'AX_dv1']
normalize_cn_systems_arg("unknown")
['unknown']

cometspec.linelist.from_user_linelist(df, *, lam_col, A_col, upper_id_col, lower_id_col, g_upper_col, g_lower_col, lower_es_col=None, lower_v_col=None, lower_J_col=None, lower_sym_col=None, E_lower_cm1_col=None)[source]¶

Convert a user line list into the normalized transition schema.

Parameters:

df (pandas.DataFrame) – Input line list table.
lam_col (str) – Wavelength column in vacuum \(\AA\).
A_col (str) – Einstein \(A\) coefficient column in \(\mathrm{s}^{-1}\).
upper_id_col (str) – Upper-state identifier column.
lower_id_col (str) – Lower-state identifier column.
g_upper_col (str) – Upper-state degeneracy column.
g_lower_col (str) – Lower-state degeneracy column.
lower_es_col (str, optional, default None) – Optional lower electronic-state column.
lower_v_col (str, optional, default None) – Optional lower vibrational-level column.
lower_J_col (str, optional, default None) – Optional lower rotational-level column.
lower_sym_col (str, optional, default None) – Name of an optional column holding a composite lower-state spin-orbit/parity label. For Brooke-style line lists this is typically the concatenation of the lower-state \(F''\), \(p''\), and \(eS''\) columns, which together identify the fine-structure/parity sublevel within its electronic state.
E_lower_cm1_col (str, optional, default None) – Optional lower-state energy column in \(\mathrm{cm}^{-1}\). A pair of levels will use these values to get the \(\Delta E\) for the collisions.

Returns:

pandas.DataFrame – Normalized transition table. Note that the output has E_cm1 and optionally E_lower_cm1, they are different, the first is the energy corresponding to the transition (energy from the line wavelength) and the second one the energy of a state with respect the ground state.

Raises:

ValueError – If required columns are missing or values are invalid.

cometspec.linelist.make_sym(F, p, use_omega=False, es=None)[source]¶

Build a compact CN-style symmetry label.

Parameters:

F (Any) – Spin component or branch label.
p (Any) – Parity label.
use_omega (bool, optional, default False) – Whether to emit Omega-style labels for A states.
es (str, optional, default None) – Electronic-state label.

Returns:

str – Compact symmetry token.

cometspec.linelist.from_cn_brooke(df, *, lam_col='lambda_vac_A_from_Cal', A_col='A', use_omega_labels=False, E_lower_col="E''")[source]¶

Convert a Brooke CN line list (e.g. the output load_cn_linelist()) to the normalized schema.

Parameters:

df (pandas.DataFrame) – Brooke-format CN line list.
lam_col (str, optional, default "lambda_vac_A_from_Cal") – Wavelength column in vacuum Angstrom.
A_col (str, optional, default "A") – Einstein A coefficient column.
use_omega_labels (bool, optional, default False) – Use Omega labels for A-state symmetry tags.
E_lower_col (str, optional, default "E''") – Lower-state energy column in cm^-1.

Returns:

pandas.DataFrame – Normalized transition table. Each row is one rovibronic transition.

lambda_vac_A (float) – Vacuum wavelength in Å.
A_ul (float) – Einstein \(A\) coefficient (spontaneous emission rate), in s^-1.
upper_id (str) – String key identifying the upper level, formatted as ES|v=V|J=J|sym=S.
lower_id (str) – String key identifying the lower level, formatted as ES|v=V|J=J|sym=S.
g_upper (float) – Upper-level degeneracy.
g_lower (float) – Lower-level degeneracy.
E_cm1 (float) – Transition energy in cm^-1, computed as \(1/\lambda_{\mathrm{vac}}\).
lower_es (str) – Lower electronic state label (e.g. X).
lower_v (float) – Lower vibrational quantum number \(v''\).
lower_J (float) – Lower rotational quantum number \(J''\).
lower_sym (str) – Lower-level symmetry tag (e-f parity and \(\Omega\) component).
E_lower_cm1 (float) – Lower-state energy in cm^-1, taken directly from E_lower_col. A pair of levels will use these values to get the \(\Delta E\) for the collisions.

Raises:

ValueError – If required columns are missing or contain invalid values.

cometspec.linelist.filter_cn_systems(df_all, *, systems=None, lambda_min_A=2990.001, lambda_max_A=10009.998, A_min=10000.0, lam_col='lambda_vac_A_from_Cal')[source]¶

Filter a Brooke CN line list by system, wavelength, and A (Einstein coefficient) threshold.

Parameters:

df_all (pandas.DataFrame) – Full Brooke/Sneden CN table.
systems (str or Sequence[str], optional, default None) – System selector(s) accepted by normalize_cn_systems_arg().
lambda_min_A (float, optional, default 2990.001) – Minimum wavelength in Angstrom.
lambda_max_A (float, optional, default 10009.998) – Maximum wavelength in Angstrom.
A_min (float, optional, default 1e4) – Minimum Einstein A threshold, or None to disable.
lam_col (str, optional, default "lambda_vac_A_from_Cal") – Wavelength column name.

Returns:

pandas.DataFrame – Filtered CN line list.

cometspec.linelist.load_default_transitions(*, isotopologues='12C14N', systems=None, A_min=10000.0, lambda_min_A=2990.001, lambda_max_A=10009.998, use_omega_labels=False, line_paths=None)[source]¶

Load and normalize packaged default transitions per isotopologue. The options are “12C2”, “12C13C”, “13C2”, “12C14N”, “13C14N”, “12C15N”, “Fe”. For CN if the isotopologue is not found it will fall back to “12C14N”. Any string with Fe on it will load the fe_normalized.csv file. For C2 if the isotopologue is not found it will fail. If CN is choosen, the systems to include can be given as a parameter. Default system is BX(0,0) and AX(Δv=+1), but this can be changed with the systems argument. The options for systems is list containing one or more of the following str:

“both” or “bx+ax”: BX(0,0), AX(Δv=±1), AX(Δv=±2) and AX(Δv=±3)
“all”: all systems in the Brooke linelist (including minor ones, this will lead to extremely high computation times)
“bx”, “b-x”, “bx(0,0)”, “bx00”, “bx_00”, “b_x_00” or “b-x”: BX(0,0) only
“ax” or “a-x”: for “AX_dv1”, ‘AX_dv2’
“ax(dv=0)”, “ax_dv0”: AX(Δv=0) only
“ax(dv=1)”, “ax_dv1”: AX(Δv=±1) only
“ax(dv=2)”, “ax_dv2”: AX(Δv=±2) only
“ax(dv=3)”, “ax_dv3”: AX(Δv=±3) only
‘xx’: all X-X transitions

At the end the references of each line list can be found [1] [2] [3] [4] [5].

Note

Rows on the line lists with missing or invalid values in any of the necessary columns are dropped.

Important

These are intrinsic filters applied to the default line lists, so lines with values beyond these filters will not be retrieved for the default isotopologues, even if the corresponding model or function parameters are set.

For \(\rm CN\), the default line lists are the ones from [1] and [2], where the available systems are those described above. Check the respective references to see how they were built. We did not apply an intrinsic \(A_{ul}\) cut. To use the full line lists, you will need to set the corresponding parameters when calling the function.
For \(\rm C_2\), the default line list is the recommended ExoMol compilation [3] [4]. The following selection criteria were applied: wavelengths in the range \(2000\)–\(10000\,\unicode{x212B}\), upper-level energies \(< 30\,000\ \mathrm{cm}^{-1}\), vibrational quantum number \(v < 5\), rotational quantum number \(N < 50\), and only the \(a\,^1\Pi_u - x\,^1\Sigma_g^+\), \(b\,^3\Sigma_g^- - a\,^3\Pi_u\), \(d\,^3\Pi_g - a\,^3\Pi_u\), \(d\,^3\Pi_g - c\,^3\Sigma_u^+\), \(a\,^3\Pi_u - x\,^1\Sigma_g^+\), and \(c\,^3\Sigma_u^+ - x\,^1\Sigma_g^+\) transitions. The minimum intrinsic \(A_{ul}\) is \(10^{3}\) for \(^{12}\mathrm{C}^{13}\mathrm{C}\) and \(^{12}\mathrm{C}_2\), and \(10^{-10}\) for \(^{13}\mathrm{C}_2\). Note that building models with a small \(A_{\min}\) (i.e. including most of the transitions) is computationally expensive.
For \(\rm Fe\), we adopt the line list of [5]. We retrieved all transitions in the \(2000\)–\(10000\,\unicode{x212B}\) range and retained those with \(A_{ul} > 10^{3}\ \mathrm{s}^{-1}\) and upper-level energies below \(40\,000\ \mathrm{cm}^{-1}\). Note that there is an intrinsic \(A_{ul}\) cut of \(10^{3}\ \mathrm{s}^{-1}\).

Parameters:

isotopologues (str or Sequence[str], optional, default "12C14N") – One or more isotopologue labels.
systems (str or Sequence[str], optional, default None) – CN system selector(s).
A_min (float, optional, default 1e4) – Minimum Einstein A threshold.
lambda_min_A (float, optional, default 2990.001) – Minimum wavelength in Angstrom.
lambda_max_A (float, optional, default 10009.998) – Maximum wavelength in Angstrom.
use_omega_labels (bool, optional, default False) – Use Omega labels for A-state symmetry tags.
line_paths (dict[str, str], optional, default None) – Optional mapping of isotopologue to explicit file path.

Returns:

dict[str, pandas.DataFrame] – Dictionary mapping isotopologue label to normalized transition table. The keys are exactly the entries in isotopologues; the values are DataFrames with the same schema as described in from_cn_brooke() or from_user_linelist().

References

cometspec.linelist.resolve_linelists_with_defaults(linelists, iso_list, *, systems=None, A_min=10000.0, lambda_min_A=2990.001, lambda_max_A=10009.998, use_omega_labels=False, line_paths=None)[source]¶

Function to take a list of linelists and a list of isotopologues. It is going to match all the line lists with their isotopologues, if the len linelists is less than the len of isotopologues, the remaining isotopologues will be loaded with the default linelists. Thus if the user wants to mix the default linelists and custome ones the isotopologues should be ordered by first the ones with provided line lists and then the ones without provided line lists, so the function can match them correctly.

Resolution rules:

linelists is None -> every iso loaded from packaged defaults via load_default_transitions().
Single pandas.DataFrame -> assigned to iso_list[0]; the remaining isotopologues fall back to defaults.
dict mapping iso label to DataFrame -> entries used for matching labels in iso_list; any iso label not present in the dict falls back to defaults. Keys not in iso_list are ignored.
Sequence (list/tuple) of DataFrames -> positional pairing with the first len(linelists) entries of iso_list; the remainder fall back to defaults.

Loading a default for an isotopologue without a packaged file (e.g. "COH") raises ValueError from load_default_transitions().

Parameters:

linelists (pandas.DataFrame or dict[str, pandas.DataFrame] or Sequence[pandas.DataFrame] or None) – User-supplied line list(s). See resolution rules above
iso_list (Sequence[str]) – Isotopologue labels, in the order they should be returned. Each label is matched against the user-supplied line lists (if any) according to the resolution rules above, and any isotopologue without a user-supplied line list is loaded from the packaged defaults.
systems (str or Sequence[str], optional, default None) – CN system selector(s) for default CN line lists. See normalize_cn_systems_arg() for accepted forms.
A_min (float, optional, default 1e4) – Minimum Einstein A threshold for default line lists, or None to disable.
lambda_min_A (float, optional, default 2990.001) – Minimum wavelength in Angstrom for default line lists.
lambda_max_A (float, optional, default 10009.998) – Maximum wavelength in Angstrom for default line lists.
use_omega_labels (bool, optional, default False) – Use Omega labels for A-state symmetry tags in default CN line lists.
line_paths (dict[str, str], optional, default None)

Returns:

dict[str, pandas.DataFrame] – {iso: DataFrame} ordered exactly as iso_list.

cometspec.linelist.default_linelist_source(iso)[source]¶

Return the file path that would be loaded for iso from packaged defaults.

Parameters:: iso (str) – Isotopologue label.
Returns:: str – File path that would be loaded for iso from packaged defaults.
Raises:: ValueError – If iso does not match any supported default pattern for the packaged default line lists (12C14N, 13C14N, 12C15N, 12C2, 13C2, 12C13C, or any label containing “Fe”). (CN-like, C2-like, or containing "Fe").

cometspec.linelist.linelist_origins(linelists, iso_list, *, line_paths=None)[source]¶

Return a per-isotopologue origin string (file) for the configured line lists.

Mirrors the resolution rules of resolve_linelists_with_defaults():

Entries supplied by the user (DataFrame, dict entry, or positional list slot) are reported as "custom (user-provided)".
Entries with an explicit override in line_paths are reported as that path.
Otherwise the path returned by default_linelist_source() is used.

Does not load any data just to determine the origin used.

Parameters:

linelists (pandas.DataFrame or dict[str, pandas.DataFrame] or Sequence[pd.DataFrame] or None) – User-supplied line list(s). See resolution rules in resolve_linelists_with_defaults().
iso_list (Sequence[str]) – Isotopologue labels, in the order they should be returned. Each label is matched against the user-supplied line lists (if any) according to the resolution rules above, and any isotopologue without a user-supplied line list is assigned the origin of the packaged default
line_paths (dict[str, str], optional, default None) – Optional mapping of isotopologue to explicit file path, used for reporting the origin of any isotopologue without a user-supplied line list. If an isotopologue is present in this dict, its origin is reported as the corresponding path instead of the default path returned by default_linelist_source(). This is intended to be used when the user has provided a custom path

Returns:

dict[str, str] – Mapping of isotopologue label to origin string (e.g. file path). The keys are exactly the entries in iso_list; the values are determined according to the resolution rules above.

cometspec.linelist.attach_pumping_and_labels(df, pumping, *, line_v_kms=0.0, line_dlam_A=0.0, lsf_for_Jnu=None, lam_col='lambda_vac_A')[source]¶

Attach the solar flux incident in the comet for a given wavelength to a transition table.

Parameters:

df (pandas.DataFrame) – Normalized transition DataFrame.
pumping (Any) – Pumping spectrum with WAVE and FLUX columns.
line_v_kms (float, optional, default 0.0) – Doppler velocity shift applied to line wavelengths, in km/s.
line_dlam_A (float, optional, default 0.0) – Additive wavelength shift in Angstrom.
lsf_for_Jnu (Callable[[numpy.ndarray], numpy.ndarray], optional, default None) – Optional kernel used to average flux around each line.
lam_col (str, optional, default "lambda_vac_A") – Input wavelength column name in df.

Returns:

astropy.table.Table – Astropy table with wavelength, frequency, flux-at-line, J_nu and original dataframe columns.

cometspec.linelist¶

Routines¶

`cometspec.linelist`¶