Skip to content

436 integrate ptms from evidence into alphafold predictions#449

Open
tE3m wants to merge 21 commits into
361-include-ptms-in-existing-cifsfrom
436-integrate-ptms-from-evidence-into-alphafold-predictions
Open

436 integrate ptms from evidence into alphafold predictions#449
tE3m wants to merge 21 commits into
361-include-ptms-in-existing-cifsfrom
436-integrate-ptms-from-evidence-into-alphafold-predictions

Conversation

@tE3m

@tE3m tE3m commented May 29, 2026

Copy link
Copy Markdown
Collaborator

Description

fixes #436

Introduces a new step to enable insertion of modified residues into a cif_df where they are detected in a supplied psm_df. Requires outputs of a alphafold-import step

Changes

  • backend/user_data/external/ptm/*: location of the modified residue structure files
  • backend/protzilla/constants/cif_columns.py: add required columns and a KnownPTM enum, which holds all PTM combinations we currently have structure files for
  • backend/protzilla/constants/peptide_columns.py: refactor column lists to enums to allow reuse
    • backend/protzilla/importing/peptide_import.py: slight adaptations due to change above
  • backend/protzilla/data_analysis/crosslinking_validation.py: refactor former local get_crosslink_positions_in_protein method to allow reuse outside of the crosslinking use case, rename to reflect more general use-case
  • backend/protzilla/data_integration/cif_ptm.py: add calc method for new step
  • backend/protzilla/importing/alphafold_protein_structure_load.py: move the mapping from y/n/. strings to native booleans to constants file
  • backend/protzilla/methods/importing.py: add a raw cif import for debugging purposes, to be extended by Integrate PTMs from evidence into conventional protein structure files #437
  • backend/protzilla/utilities/ptm_helpers.py: helper functions for parsing PTM names from a psm_df

Testing

  1. Import a monomer with (or multimer containing a) protein ID for which we have PTMs in an evidence-file, I used Q16555 because the evidence.txt from the MaxQuant_data in the nextcloud folder is well-formed, unlike the one from our example dataset (see Allow abbreviated PTM identifiers in evidence #438).
  2. Import the evidence.txt mentioned above
  3. Connect the relevant inputs to the new step
  4. Observe the result in the visualisations tab, where the non-standard molecules can be highlighted by selecting "non-standard" from the components in the bottom right

PR checklist

Development

  • If necessary, I have updated the documentation (README, docstrings, etc.)
  • If necessary, I have created / updated tests.

Mergeability

  • main-branch has been merged into local branch to resolve conflicts
  • The tests and linter have passed AFTER local merge
  • The backend code has been formatted with black
  • The frontend code has been formatted with pnpm format and checked with pnpm lint

Code review

  • I have self-reviewed my code.
  • At least one other developer reviewed and approved the changes

@tE3m tE3m force-pushed the 436-integrate-ptms-from-evidence-into-alphafold-predictions branch from 7f1e50b to ecd4429 Compare June 1, 2026 11:20
why do we even bother with this
@tE3m tE3m requested review from AnnaPolensky and Elena-kal June 1, 2026 11:43

@AnnaPolensky AnnaPolensky left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I especially liked your comments as they were very helpful for understanding what you do when you filter and change the dataframes.

But please provide a description on how to test your changes. I tried this:

Image

And got this error:

Image

Maybe I have uploaded a wrong evidence file? I used the one we got first right at the beginning of our project. For the AlphaFold step I used O43242.

Also, black fails.

Comment thread backend/protzilla/data_analysis/crosslinking_validation.py Outdated
Comment thread backend/protzilla/data_integration/cif_ptm.py
Comment thread backend/protzilla/data_integration/cif_ptm.py
Comment thread backend/protzilla/data_integration/cif_ptm.py
Comment thread backend/protzilla/data_integration/cif_ptm.py Outdated
Comment thread backend/protzilla/utilities/ptm_helpers.py
cleaned_mods = []
for mod in mod_list:
mod = str(mod)
cleaned_mods.append(mod.lstrip(digits).lstrip(" "))

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This first removes leading digits and then leading spaces. If we had something like " 45something" only the spaces would be removed but not the digits. Is that ok/intended? If so, I might make the docstring a bit more specific about the order in which things are removed.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The data is taken straight from MQ and processed, so we don't have to deal with user input here. As such, the format is <amount if more than 1> , so what you describe could be caught, there is no need for it

str: The simplified modification name (i.e. Oxidation).
"""
mod_name = str(mod_name)
return mod_name.lstrip(digits).lstrip(" ").split(" ")[0]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

Comment thread backend/protzilla/utilities/ptm_helpers.py

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to credit the source from where we got these PTMs? (same question for the other cif files as well)

@tE3m

tE3m commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator Author

Looks good overall. I especially liked your comments as they were very helpful for understanding what you do when you filter and change the dataframes.

But please provide a description on how to test your changes. I tried this:
Image

And got this error:
Image

Maybe I have uploaded a wrong evidence file? I used the one we got first right at the beginning of our project. For the AlphaFold step I used O43242.

Also, black fails.

sorry about that, as this PR is still marked as draft I didn't expect you to get pinged - while writing up the testing, I encountered similar issues to you (the one you show happens when there are no PTMs for the protein ID in the evidence), some of which only happened after rebasing onto the current crosslinking state. As those issues are not yet fixed, I hadn't intended for them to be reviewed already. Sorry again!

@github-actions

github-actions Bot commented Jun 2, 2026

Copy link
Copy Markdown

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  backend/protzilla
  all_steps.py
  backend/protzilla/data_analysis
  crosslinking_validation.py
  backend/protzilla/data_integration
  cif_ptm.py 88, 205-247, 356-404
  backend/protzilla/importing
  alphafold_protein_structure_load.py 175
  peptide_import.py 56-57
  backend/protzilla/methods
  data_integration.py 1008
  importing.py 441
  backend/protzilla/utilities
  ptm_helpers.py 37-43, 87, 126-129, 175-176, 209-210
  utilities.py 191-193
Project Total  

This report was generated by python-coverage-comment-action

@tE3m tE3m marked this pull request as ready for review June 2, 2026 17:45

@AnnaPolensky AnnaPolensky left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. Works now, results look good 👍

Comment thread backend/tests/protzilla/data_integration/test_cif_ptm.py Outdated

@Elena-kal Elena-kal left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code has no major problems and looks really good overall, I commented and mostly suggested a few things. They are really nitpicky which is why I mostly suggested them so there should not be too much work on your side, hopefully.

There is only one thing I noticed of which I am really unsure:
What does selecting "the PTMs to consider" actually mean? For me it did not change which PTMs were loaded in the cif, at least not from what I saw in the table and from the success message I received.
Also I could not find the button where I could then select non-standard, so I could not test that part but I hope that is fine :)

Overall, good work! 👍

Comment thread backend/protzilla/utilities/ptm_helpers.py
Comment thread backend/protzilla/utilities/ptm_helpers.py Outdated
Comment thread backend/protzilla/utilities/ptm_helpers.py Outdated
Comment thread backend/protzilla/utilities/ptm_helpers.py Outdated
Comment thread backend/protzilla/utilities/ptm_helpers.py Outdated
Comment thread backend/protzilla/utilities/ptm_helpers.py Outdated
Comment thread backend/protzilla/utilities/ptm_helpers.py Outdated
Comment thread backend/protzilla/utilities/ptm_helpers.py
Comment thread backend/protzilla/constants/cif_constants.py
)
)

return ids

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to mark the uniprot ids as required in the import then. I think right now they are not and the format is not enforced the way you expect it here. We should probably enforce it in the import step already.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the multimer upload method:

    if not uniprot_ids:
        msg = "Uniprot Ids cannot be empty or None."
        logger.error(msg)
        raise ValueError(msg)

@tE3m tE3m self-assigned this Jun 9, 2026
@tE3m

tE3m commented Jun 9, 2026

Copy link
Copy Markdown
Collaborator Author

There is only one thing I noticed of which I am really unsure:
What does selecting "the PTMs to consider" actually mean? For me it did not change which PTMs were loaded in the cif, at least not from what I saw in the table and from the success message I received.

@Elena-kal Are you sure about that? I just tested loading Q16555 and when only selecting Acetylation on Lysine, there are 5 insertions whereas when also considering e.g. Phosphorylation on Serine, 15 amino acids are altered.
edit: however, I've found something interesting. The Mol* viewer only shows ALY (Acetylation on Lysine) as being non-standard and applies the ball and stick representation. Even though other non-standard residues are present (e.g. SEP), they are not part of the non-standard group according to mol*. No clue why though

tE3m and others added 2 commits June 11, 2026 08:47
Co-authored-by: Elena Kalbitzer <148279640+Elena-kal@users.noreply.github.com>
@tE3m tE3m requested review from AnnaPolensky and Elena-kal June 11, 2026 13:56
@tE3m tE3m linked an issue Jun 11, 2026 that may be closed by this pull request

@Elena-kal Elena-kal left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Integrate PTMs from evidence into Alphafold predictions

3 participants