{ "cells": [ { "cell_type": "markdown", "id": "d003df19-502d-42b2-94d6-fa9d909452ff", "metadata": {}, "source": [ "# PyLHE version 2.0.0 demo" ] }, { "cell_type": "markdown", "id": "5f29ceda", "metadata": {}, "source": [ "## The LHE File Format\n", "\n", "- standardized format to describe events generated in high-energy physics simulations\n", "- widely used in the context of Monte Carlo event generators\n", "- designed to facilitate the exchange of event information between different software packages\n", "\n", "### Key Features\n", "\n", "- **XML-Based with Fortran mixin**: both human-readable and machine-readable.\n", "- **Event Structure**: contains a series of events, each described by particles and their properties.\n", "\n", "### Basic Structure\n", "\n", "An LHE file typically consists of the following main components:\n", "\n", "1. **Header**: Contains metadata, version and generator information.\n", "2. **Initialization Block**: Describes the initial state of the event, including the incoming beams.\n", "3. **Event Blocks**: Each event is described in its own block, detailing the initial, intermediate, and final state particles.\n", "4. **Weights Block**: Optional block that provides additional information about the event weights.\n", "\n", "```xml\n", "\n", "
\n", "\n", "beam1id beam2id beam1energy beam2energy pdfg1 pdfg2 pdfs1 pdfs2 idweight nproc\n", "crosssection crosssectionerror crosssectionmaximum pid\n", "...\n", "\n", "\n", "nparticles pid weight scale aqed aqcd\n", "id status mother1 mother2 color1 color2 px py pz E m lifetime spin\n", "...\n", "\n", "...\n", "
\n", "```\n" ] }, { "cell_type": "markdown", "id": "9df3cfc4", "metadata": {}, "source": [ "## New since version 1.0.0\n", "\n", "- Strict typing checks with MyPy\n", "- Larger test suite\n", "- Sphinx documentation at https://pylhe.readthedocs.io/\n", "\n", " " ] }, { "cell_type": "markdown", "id": "0c7c4959-7553-4e45-a855-0180c33c3f97", "metadata": {}, "source": [ "### IO access via `@classmethod`s `LHEFile.fromstring/fromfile`" ] }, { "cell_type": "code", "execution_count": 1, "id": "c804dc49", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pylhe import LHEFile\n", "\n", "mylhe = \"\"\"\n", "\n", "
\n", "\n", "11 -11 100.0 100.0 0 0 0 0 3 1\n", "3.783590 0.001676 0.001569 0\n", "\n", "\n", "6 0 3.783590e-06 200 0.007849 0.1075\n", "11 -1 0 0 0 0 0 0 100.0 100.0 0 0 9.0\n", "-11 -1 0 0 0 0 0 0 -100.0 100.0 0 0 9.0\n", "22 2 1 2 0 0 0 0 0 200 200 0 9.0\n", "2 1 3 0 0 0 48.253308 67.445271 -54.164510 99.050697 0 0 9.0\n", "21 1 3 0 0 0 -1.190913 -12.743630 5.613176 13.975913 0 0 9.0\n", "-2 1 3 0 0 0 -47.062395 -54.701640 48.551333 86.973389 0 0 9.0\n", "\n", "
\n", "\"\"\"\n", "lhef = LHEFile.fromstring(mylhe)\n", "lhef" ] }, { "cell_type": "code", "execution_count": 2, "id": "2187339b", "metadata": {}, "outputs": [ { "data": { "image/svg+xml": [ "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "0\n", "\n", "e\n", "-\n", "\n", "\n", "\n", "2\n", "\n", "γ\n", "\n", "\n", "\n", "0->2\n", "\n", "\n", "\n", "\n", "\n", "1\n", "\n", "e\n", "+\n", "\n", "\n", "\n", "1->2\n", "\n", "\n", "\n", "\n", "\n", "3\n", "\n", "u\n", "\n", "\n", "\n", "2->3\n", "\n", "\n", "\n", "\n", "\n", "4\n", "\n", "g\n", "\n", "\n", "\n", "2->4\n", "\n", "\n", "\n", "\n", "\n", "5\n", "\n", "\n", "\n", "\n", "\n", "2->5\n", "\n", "\n", "\n", "\n", "\n" ], "text/plain": [ "LHEEvent(eventinfo=LHEEventInfo(nparticles=6, pid=0, weight=3.78359e-06, scale=200.0, aqed=0.007849, aqcd=0.1075), particles=[LHEParticle(id=11, status=-1, mother1=0, mother2=0, color1=0, color2=0, px=0.0, py=0.0, pz=100.0, e=100.0, m=0.0, lifetime=0.0, spin=9.0), LHEParticle(id=-11, status=-1, mother1=0, mother2=0, color1=0, color2=0, px=0.0, py=0.0, pz=-100.0, e=100.0, m=0.0, lifetime=0.0, spin=9.0), LHEParticle(id=22, status=2, mother1=1, mother2=2, color1=0, color2=0, px=0.0, py=0.0, pz=0.0, e=200.0, m=200.0, lifetime=0.0, spin=9.0), LHEParticle(id=2, status=1, mother1=3, mother2=0, color1=0, color2=0, px=48.253308, py=67.445271, pz=-54.16451, e=99.050697, m=0.0, lifetime=0.0, spin=9.0), LHEParticle(id=21, status=1, mother1=3, mother2=0, color1=0, color2=0, px=-1.190913, py=-12.74363, pz=5.613176, e=13.975913, m=0.0, lifetime=0.0, spin=9.0), LHEParticle(id=-2, status=1, mother1=3, mother2=0, color1=0, color2=0, px=-47.062395, py=-54.70164, pz=48.551333, e=86.973389, m=0.0, lifetime=0.0, spin=9.0)], weights={}, scales={}, attributes={}, optional=[], _graph=)" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "theevent = next(lhef.events)\n", "theevent" ] }, { "cell_type": "markdown", "id": "9935a24a-3d4c-4c12-93b9-00ebfdaf74ae", "metadata": {}, "source": [ "### Structured dataclasses instead of deprecated dicts\n", "\n", "old dict way" ] }, { "cell_type": "markdown", "id": "28a9626a-583d-43ed-9cb6-a6425523ef7a", "metadata": {}, "source": [ "becomes simpler" ] }, { "cell_type": "code", "execution_count": 3, "id": "4ebd041d", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "lhef.init.initInfo.beamA" ] }, { "cell_type": "code", "execution_count": 4, "id": "7cca7914", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[LHEParticle(id=11, status=-1, mother1=0, mother2=0, color1=0, color2=0, px=0.0, py=0.0, pz=100.0, e=100.0, m=0.0, lifetime=0.0, spin=9.0),\n", " LHEParticle(id=-11, status=-1, mother1=0, mother2=0, color1=0, color2=0, px=0.0, py=0.0, pz=-100.0, e=100.0, m=0.0, lifetime=0.0, spin=9.0),\n", " LHEParticle(id=22, status=2, mother1=1, mother2=2, color1=0, color2=0, px=0.0, py=0.0, pz=0.0, e=200.0, m=200.0, lifetime=0.0, spin=9.0),\n", " LHEParticle(id=2, status=1, mother1=3, mother2=0, color1=0, color2=0, px=48.253308, py=67.445271, pz=-54.16451, e=99.050697, m=0.0, lifetime=0.0, spin=9.0),\n", " LHEParticle(id=21, status=1, mother1=3, mother2=0, color1=0, color2=0, px=-1.190913, py=-12.74363, pz=5.613176, e=13.975913, m=0.0, lifetime=0.0, spin=9.0),\n", " LHEParticle(id=-2, status=1, mother1=3, mother2=0, color1=0, color2=0, px=-47.062395, py=-54.70164, pz=48.551333, e=86.973389, m=0.0, lifetime=0.0, spin=9.0)]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "theevent.particles" ] }, { "cell_type": "code", "execution_count": 5, "id": "27cf662b", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "11" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "theevent.particles[0].id" ] }, { "cell_type": "code", "execution_count": 13, "id": "d4e6f640", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[LHEParticle(id=22, status=2, mother1=1, mother2=2, color1=0, color2=0, px=0.0, py=0.0, pz=0.0, e=200.0, m=200.0, lifetime=0.0, spin=9.0)]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "theevent.mothers(theevent.particles[-1])" ] }, { "cell_type": "code", "execution_count": 7, "id": "215c69f6", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "
\n", "\n", " 11 -11 1.0000000e+02 1.0000000e+02 0 0 0 0 3 1\n", " 3.7835900e+00 1.6760000e-03 1.5690000e-03 0\n", "\n", "\n", "\n" ] } ], "source": [ "print(lhef.tolhe())" ] }, { "cell_type": "markdown", "id": "b3151c22-4292-4324-8a1c-c32c06c00997", "metadata": {}, "source": [ "> ⚠️ **No Events?!:** We already consumed the events `yield`ed by the generator. The generator approach is great for large files or event generation streams, but what if we just want to modify the/some events in a list-like fashion." ] }, { "cell_type": "code", "execution_count": 8, "id": "e08924b5-63d8-45bf-a5c2-5d8815c0b912", "metadata": {}, "outputs": [], "source": [ "lhef = LHEFile.fromstring(mylhe, generator=False)" ] }, { "cell_type": "markdown", "id": "d32a09b8-35e2-48ee-87fc-0a8ef05f80f4", "metadata": {}, "source": [ "### New output/write to file `LHEFile.tofile/tolhe`" ] }, { "cell_type": "code", "execution_count": 9, "id": "0dab87e5-e04c-48b1-a412-daef147450c3", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "
\n", "\n", " 11 -11 1.0000000e+02 1.0000000e+02 0 0 0 0 3 1\n", " 3.7835900e+00 1.6760000e-03 1.5690000e-03 0\n", "\n", "\n", "\n", " 6 0 3.7835900000e-06 2.0000000000e+02 7.8490000000e-03 1.0750000000e-01\n", " 11 -1 0 0 0 0 0.00000000e+00 0.00000000e+00 1.00000000e+02 1.00000000e+02 0.00000000e+00 0.0000e+00 9.0000e+00\n", " -11 -1 0 0 0 0 0.00000000e+00 0.00000000e+00 -1.00000000e+02 1.00000000e+02 0.00000000e+00 0.0000e+00 9.0000e+00\n", " 22 2 1 2 0 0 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.00000000e+02 2.00000000e+02 0.0000e+00 9.0000e+00\n", " 2 1 3 0 0 0 4.82533080e+01 6.74452710e+01 -5.41645100e+01 9.90506970e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n", " 21 1 3 0 0 0 -1.19091300e+00 -1.27436300e+01 5.61317600e+00 1.39759130e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n", " -2 1 3 0 0 0 -4.70623950e+01 -5.47016400e+01 4.85513330e+01 8.69733890e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n", "\n", "\n" ] } ], "source": [ "lhef.tofile(\"myevents.lhe.gz\")\n", "print(lhef.tolhe())" ] }, { "cell_type": "code", "execution_count": 10, "id": "ad521850-bcd8-41d2-8cbd-d950fcde1ae9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "
\n", "\n", " 11 -11 1.0000000e+02 1.0000000e+02 0 0 0 0 3 1\n", " 3.7835900e+00 1.6760000e-03 1.5690000e-03 0\n", "\n", "\n", "\n", " 6 0 3.7835900000e-06 2.0000000000e+02 7.8490000000e-03 1.0750000000e-01\n", " 11 -1 0 0 0 0 0.00000000e+00 0.00000000e+00 1.00000000e+02 1.00000000e+02 0.00000000e+00 0.0000e+00 9.0000e+00\n", " -11 -1 0 0 0 0 0.00000000e+00 0.00000000e+00 -1.00000000e+02 1.00000000e+02 0.00000000e+00 0.0000e+00 9.0000e+00\n", " 22 2 1 2 0 0 0.00000000e+00 0.00000000e+00 0.00000000e+00 2.00000000e+02 2.00000000e+02 0.0000e+00 9.0000e+00\n", " 2 1 3 0 0 0 4.82533080e+01 6.74452710e+01 -5.41645100e+01 9.90506970e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n", " 21 1 3 0 0 0 -1.19091300e+00 -1.27436300e+01 5.61317600e+00 1.39759130e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n", " -2 1 3 0 0 0 -4.70623950e+01 -5.47016400e+01 4.85513330e+01 8.69733890e+01 0.00000000e+00 0.0000e+00 9.0000e+00\n", "\n", "" ] } ], "source": [ "!zcat myevents.lhe.gz" ] }, { "cell_type": "markdown", "id": "fcaa0fa1", "metadata": {}, "source": [ "## dict -> dataclass: Parse don't validate! [1]\n", "\n", "- Don’t just check if data is valid instead transform it into a well-typed structure.\n", "- Make invalid states unrepresentable through type-safe parsing.\n", "\n", "[1] https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/\n" ] }, { "cell_type": "markdown", "id": "cbc3e8ff", "metadata": {}, "source": [ "### Dataclasses and strict typing instead of loose dicts\n", "\n", "The old layout:\n", "- inherits from `dict`\n", "- weird `fieldnames`\n", "- `args` and `kwargs`\n", "- ugly hypothetical `dict` typing" ] }, { "cell_type": "code", "execution_count": 11, "id": "2ba8716d", "metadata": {}, "outputs": [], "source": [ "class LHEInit_old(dict):\n", " \"\"\"Store the block as dict.\"\"\"\n", "\n", " # weightgroup : dict[str, dict[str, dict[str, Union[dict[str, str], str, int]]]]\n", " fieldnames = [\"initInfo\", \"procInfo\", \"weightgroup\", \"LHEVersion\"]\n", "\n", " def __init__(self, *args, **kwargs):\n", " super().__init__(*args, **kwargs)" ] }, { "cell_type": "markdown", "id": "7d49eb1e", "metadata": {}, "source": [ "The new layout below:\n", "- members are strictly typed now\n", "- properly type hint => better LLM/coding agent integration\n", "- autocompletion works better in Jupyter Notebooks and IDEs\n", "- data is guaranteed to exist in correct format unlike dicts\n", "- MyPy found bugs in the old code while doing this (e.g. missing `None` checks)\n", "- members have documentation strings\n", "- data is printed nicely without extra effort" ] }, { "cell_type": "code", "execution_count": 12, "id": "9b5e3e98", "metadata": {}, "outputs": [], "source": [ "from dataclasses import dataclass\n", "\n", "from pylhe import LHEInitInfo, LHEProcInfo\n", "\n", "\n", "@dataclass\n", "class LHEInit_new:\n", " \"\"\"Store the block as a dataclass.\"\"\"\n", "\n", " initInfo: LHEInitInfo\n", " \"\"\"Init information\"\"\"\n", " procInfo: list[LHEProcInfo]\n", " \"\"\"Process information\"\"\"" ] }, { "cell_type": "markdown", "id": "135f4c9c-c24e-4e0f-aac1-7e4bb9fdcfd8", "metadata": {}, "source": [ "### Next steps\n", "\n", "- release 1.0.0\n", "- publish JOSS paper" ] }, { "cell_type": "markdown", "id": "de6ae27d-8f82-47bf-b05e-6ac43d4b0e39", "metadata": {}, "source": [ "## Further interactive examples to explore\n", "\n", "- [Analyze LHE file and plot `hist`-ograms →](01_zpeak.ipynb)\n", "- [Filter LHE events based on kinematic cuts →](02_filter_events_example.ipynb)\n", "- [Simple Monte Carlo LHE event generator →](03_write_monte_carlo_example.ipynb)\n", "- [Conversion/interface to `awkward` arrays →](91_awkward_example.ipynb)\n", "- [Parallel processing of LHE files →](92_multiple_files.ipynb)\n", "- [Parquet Cache →](93_parquet_cache.ipynb)" ] }, { "cell_type": "markdown", "id": "6fbf43b5", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.13.13" } }, "nbformat": 4, "nbformat_minor": 5 }