436 integrate ptms from evidence into alphafold predictions by tE3m · Pull Request #449 · cschlaffner/PROTzilla

tE3m · 2026-05-29T15:38:03Z

Description

fixes #436

Introduces a new step to enable insertion of modified residues into a cif_df where they are detected in a supplied psm_df. Requires outputs of a alphafold-import step

Changes

backend/user_data/external/ptm/*: location of the modified residue structure files
backend/protzilla/constants/cif_columns.py: add required columns and a KnownPTM enum, which holds all PTM combinations we currently have structure files for
backend/protzilla/constants/peptide_columns.py: refactor column lists to enums to allow reuse
- backend/protzilla/importing/peptide_import.py: slight adaptations due to change above
backend/protzilla/data_analysis/crosslinking_validation.py: refactor former local get_crosslink_positions_in_protein method to allow reuse outside of the crosslinking use case, rename to reflect more general use-case
backend/protzilla/data_integration/cif_ptm.py: add calc method for new step
backend/protzilla/importing/alphafold_protein_structure_load.py: move the mapping from y/n/. strings to native booleans to constants file
backend/protzilla/methods/importing.py: add a raw cif import for debugging purposes, to be extended by Integrate PTMs from evidence into conventional protein structure files #437
backend/protzilla/utilities/ptm_helpers.py: helper functions for parsing PTM names from a psm_df

Testing

Import a monomer with (or multimer containing a) protein ID for which we have PTMs in an evidence-file, I used Q16555 because the evidence.txt from the MaxQuant_data in the nextcloud folder is well-formed, unlike the one from our example dataset (see Allow abbreviated PTM identifiers in evidence #438).
Import the evidence.txt mentioned above
Connect the relevant inputs to the new step
Observe the result in the visualisations tab, where the non-standard molecules can be highlighted by selecting "non-standard" from the components in the bottom right

PR checklist

Development

If necessary, I have updated the documentation (README, docstrings, etc.)
If necessary, I have created / updated tests.

Mergeability

main-branch has been merged into local branch to resolve conflicts
The tests and linter have passed AFTER local merge
The backend code has been formatted with black
The frontend code has been formatted with pnpm format and checked with pnpm lint

Code review

I have self-reviewed my code.
At least one other developer reviewed and approved the changes

why do we even bother with this

AnnaPolensky

Looks good overall. I especially liked your comments as they were very helpful for understanding what you do when you filter and change the dataframes.

But please provide a description on how to test your changes. I tried this:

And got this error:

Maybe I have uploaded a wrong evidence file? I used the one we got first right at the beginning of our project. For the AlphaFold step I used O43242.

Also, black fails.

AnnaPolensky · 2026-06-02T09:06:11Z

+    cleaned_mods = []
+    for mod in mod_list:
+        mod = str(mod)
+        cleaned_mods.append(mod.lstrip(digits).lstrip(" "))


This first removes leading digits and then leading spaces. If we had something like " 45something" only the spaces would be removed but not the digits. Is that ok/intended? If so, I might make the docstring a bit more specific about the order in which things are removed.

The data is taken straight from MQ and processed, so we don't have to deal with user input here. As such, the format is <amount if more than 1> , so what you describe could be caught, there is no need for it

AnnaPolensky · 2026-06-02T09:07:06Z

+        str: The simplified modification name (i.e. Oxidation).
+    """
+    mod_name = str(mod_name)
+    return mod_name.lstrip(digits).lstrip(" ").split(" ")[0]


Same as above.

AnnaPolensky · 2026-06-02T09:08:33Z

Do we need to credit the source from where we got these PTMs? (same question for the other cif files as well)

tE3m · 2026-06-02T10:20:40Z

Looks good overall. I especially liked your comments as they were very helpful for understanding what you do when you filter and change the dataframes.

But please provide a description on how to test your changes. I tried this:

And got this error:

Maybe I have uploaded a wrong evidence file? I used the one we got first right at the beginning of our project. For the AlphaFold step I used O43242.

Also, black fails.

sorry about that, as this PR is still marked as draft I didn't expect you to get pinged - while writing up the testing, I encountered similar issues to you (the one you show happens when there are no PTMs for the protein ID in the evidence), some of which only happened after rebasing onto the current crosslinking state. As those issues are not yet fixed, I hadn't intended for them to be reviewed already. Sorry again!

github-actions · 2026-06-02T12:01:41Z

Coverage report

Click to see where and how coverage changed

File	Statements	Missing	Coverage	Coverage (new stmts)	Lines missing
backend/protzilla
all_steps.py
backend/protzilla/data_analysis
crosslinking_validation.py
backend/protzilla/data_integration
cif_ptm.py					88, 205-247, 356-404
backend/protzilla/importing
alphafold_protein_structure_load.py					175
peptide_import.py					56-57
backend/protzilla/methods
data_integration.py					1008
importing.py					441
backend/protzilla/utilities
ptm_helpers.py					37-43, 87, 126-129, 175-176, 209-210
utilities.py					191-193
Project Total

_{This report was generated by python-coverage-comment-action}

AnnaPolensky

Changes look good. Works now, results look good 👍

Elena-kal

The code has no major problems and looks really good overall, I commented and mostly suggested a few things. They are really nitpicky which is why I mostly suggested them so there should not be too much work on your side, hopefully.

There is only one thing I noticed of which I am really unsure:
What does selecting "the PTMs to consider" actually mean? For me it did not change which PTMs were loaded in the cif, at least not from what I saw in the table and from the success message I received.
Also I could not find the button where I could then select non-standard, so I could not test that part but I hope that is fine :)

Overall, good work! 👍

Elena-kal · 2026-06-06T15:31:01Z

+        )
+    )
+
+    return ids


Make sure to mark the uniprot ids as required in the import then. I think right now they are not and the format is not enforced the way you expect it here. We should probably enforce it in the import step already.

From the multimer upload method:

if not uniprot_ids: msg = "Uniprot Ids cannot be empty or None." logger.error(msg) raise ValueError(msg)

tE3m · 2026-06-09T18:14:08Z

There is only one thing I noticed of which I am really unsure:
What does selecting "the PTMs to consider" actually mean? For me it did not change which PTMs were loaded in the cif, at least not from what I saw in the table and from the success message I received.

@Elena-kal Are you sure about that? I just tested loading Q16555 and when only selecting Acetylation on Lysine, there are 5 insertions whereas when also considering e.g. Phosphorylation on Serine, 15 amino acids are altered.
edit: however, I've found something interesting. The Mol* viewer only shows ALY (Acetylation on Lysine) as being non-standard and applies the ball and stick representation. Even though other non-standard residues are present (e.g. SEP), they are not part of the non-standard group according to mol*. No clue why though

Co-authored-by: Elena Kalbitzer <148279640+Elena-kal@users.noreply.github.com>

Elena-kal

lgtm

tE3m added 5 commits May 28, 2026 21:52

feat: convert evidence and peptide column constants to enums

85e2cd7

refactor: pull peptide location search within protein up

df6ef13

feat: add PTM cif files

8a04d03

feat: add insertion of PTMs into alphafold structures

f2a5ce2

chore: formatting

ecd4429

tE3m force-pushed the 436-integrate-ptms-from-evidence-into-alphafold-predictions branch from 7f1e50b to ecd4429 Compare June 1, 2026 11:20

chore: fix all steps test

3e1a357

why do we even bother with this

tE3m requested review from AnnaPolensky and Elena-kal June 1, 2026 11:43

AnnaPolensky requested changes Jun 2, 2026

View reviewed changes

tE3m added 2 commits June 2, 2026 13:28

chore: formatting

04f2d66

feat: add visualization output to PTM insertion

ebe0b2c

tE3m added 6 commits June 2, 2026 18:19

fix: multimer handling

ef981b6

rename step category

19a5cd4

fix: respect native boolean value

69d5c55

fix: return early if no PTMs are found

87af14b

fix: correctly re-index changed id column

41b0228

feat: add tests

0532c0f

tE3m marked this pull request as ready for review June 2, 2026 17:45

AnnaPolensky requested changes Jun 5, 2026

View reviewed changes

Comment thread backend/tests/protzilla/data_integration/test_cif_ptm.py Outdated

Elena-kal requested changes Jun 7, 2026

View reviewed changes

tE3m self-assigned this Jun 9, 2026

fix: test only for actual PTM path

90cc8bc

tE3m added 4 commits June 11, 2026 10:17

fix: add docstring

d94dca3

refactor: move utility method to utils

30e1572

chore: adapt incorrect docstring

b7312b4

refactor: rename cif_columns to cif_constants

aecd3e5

tE3m and others added 2 commits June 11, 2026 08:47

chore: adapt ptm_helpers docstring format

6335b05

Co-authored-by: Elena Kalbitzer <148279640+Elena-kal@users.noreply.github.com>

chore: formatting

080e20b

tE3m requested review from AnnaPolensky and Elena-kal June 11, 2026 13:56

tE3m linked an issue Jun 11, 2026 that may be closed by this pull request

Integrate PTMs from evidence into Alphafold predictions #436

Open

Elena-kal approved these changes Jun 11, 2026

View reviewed changes

AnnaPolensky approved these changes Jun 12, 2026

View reviewed changes

Conversation

tE3m commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes

Testing

PR checklist

Uh oh!

AnnaPolensky left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

AnnaPolensky Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

tE3m Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

AnnaPolensky Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

AnnaPolensky Jun 2, 2026

Choose a reason for hiding this comment

Uh oh!

tE3m commented Jun 2, 2026

Uh oh!

github-actions Bot commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

AnnaPolensky left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Elena-kal left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Elena-kal Jun 6, 2026

Choose a reason for hiding this comment

Uh oh!

tE3m Jun 9, 2026

Choose a reason for hiding this comment

Uh oh!

tE3m commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Elena-kal left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tE3m commented May 29, 2026 •

edited

Loading

AnnaPolensky left a comment •

edited

Loading

github-actions Bot commented Jun 2, 2026 •

edited

Loading

AnnaPolensky left a comment •

edited

Loading

tE3m commented Jun 9, 2026 •

edited

Loading