436 integrate ptms from evidence into alphafold predictions#449
Conversation
7f1e50b to
ecd4429
Compare
why do we even bother with this
There was a problem hiding this comment.
Looks good overall. I especially liked your comments as they were very helpful for understanding what you do when you filter and change the dataframes.
But please provide a description on how to test your changes. I tried this:
And got this error:
Maybe I have uploaded a wrong evidence file? I used the one we got first right at the beginning of our project. For the AlphaFold step I used O43242.
Also, black fails.
| cleaned_mods = [] | ||
| for mod in mod_list: | ||
| mod = str(mod) | ||
| cleaned_mods.append(mod.lstrip(digits).lstrip(" ")) |
There was a problem hiding this comment.
This first removes leading digits and then leading spaces. If we had something like " 45something" only the spaces would be removed but not the digits. Is that ok/intended? If so, I might make the docstring a bit more specific about the order in which things are removed.
There was a problem hiding this comment.
The data is taken straight from MQ and processed, so we don't have to deal with user input here. As such, the format is <amount if more than 1> , so what you describe could be caught, there is no need for it
| str: The simplified modification name (i.e. Oxidation). | ||
| """ | ||
| mod_name = str(mod_name) | ||
| return mod_name.lstrip(digits).lstrip(" ").split(" ")[0] |
There was a problem hiding this comment.
Do we need to credit the source from where we got these PTMs? (same question for the other cif files as well)
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Elena-kal
left a comment
There was a problem hiding this comment.
The code has no major problems and looks really good overall, I commented and mostly suggested a few things. They are really nitpicky which is why I mostly suggested them so there should not be too much work on your side, hopefully.
There is only one thing I noticed of which I am really unsure:
What does selecting "the PTMs to consider" actually mean? For me it did not change which PTMs were loaded in the cif, at least not from what I saw in the table and from the success message I received.
Also I could not find the button where I could then select non-standard, so I could not test that part but I hope that is fine :)
Overall, good work! 👍
| ) | ||
| ) | ||
|
|
||
| return ids |
There was a problem hiding this comment.
Make sure to mark the uniprot ids as required in the import then. I think right now they are not and the format is not enforced the way you expect it here. We should probably enforce it in the import step already.
There was a problem hiding this comment.
From the multimer upload method:
if not uniprot_ids:
msg = "Uniprot Ids cannot be empty or None."
logger.error(msg)
raise ValueError(msg)
@Elena-kal Are you sure about that? I just tested loading |
Co-authored-by: Elena Kalbitzer <148279640+Elena-kal@users.noreply.github.com>


Description
fixes #436
Introduces a new step to enable insertion of modified residues into a
cif_dfwhere they are detected in a suppliedpsm_df. Requires outputs of a alphafold-import stepChanges
backend/user_data/external/ptm/*: location of the modified residue structure filesbackend/protzilla/constants/cif_columns.py: add required columns and aKnownPTMenum, which holds all PTM combinations we currently have structure files forbackend/protzilla/constants/peptide_columns.py: refactor column lists to enums to allow reusebackend/protzilla/importing/peptide_import.py: slight adaptations due to change abovebackend/protzilla/data_analysis/crosslinking_validation.py: refactor former localget_crosslink_positions_in_proteinmethod to allow reuse outside of the crosslinking use case, rename to reflect more general use-casebackend/protzilla/data_integration/cif_ptm.py: add calc method for new stepbackend/protzilla/importing/alphafold_protein_structure_load.py: move the mapping from y/n/. strings to native booleans to constants filebackend/protzilla/methods/importing.py: add a raw cif import for debugging purposes, to be extended by Integrate PTMs from evidence into conventional protein structure files #437backend/protzilla/utilities/ptm_helpers.py: helper functions for parsing PTM names from apsm_dfTesting
Q16555because the evidence.txt from the MaxQuant_data in the nextcloud folder is well-formed, unlike the one from our example dataset (see Allow abbreviated PTM identifiers in evidence #438).PR checklist
Development
Mergeability
blackpnpm formatand checked withpnpm lintCode review