prepmd package¶

Subpackages¶

prepmd.lib package

Submodules¶

prepmd.add_modeller_license module¶

Change the modeller license key

prepmd.add_modeller_license.setup_license(key)[source]¶

prepmd.add_modeller_license.entry_point()[source]¶

prepmd.align_together module¶

Trim the ends off two PDB files until their indices match. Note: this is a big pile of spaghetti, it turns out doing this in a robust way is actually kind of difficult. For example, I went to all the effort of getting positions of residues relative to the start and end of each chain, rather than measuring them from the start, only to discover that Bio.PDB throws weird errors when you do for element in reversed(object).

prepmd.align_together.align_modeller(s1, s2, code1='AAAA', code2='BBBB', outfile='alignment.out')[source]¶

prepmd.align_together.get_gaps(alignment_fasta_path)[source]¶

prepmd.align_together.print_mask(mask)[source]¶

prepmd.align_together.mark_missing_ends(gaps)[source]¶

prepmd.align_together.mark_innermost_gap(gaps)[source]¶

prepmd.align_together.get_ends_deletion(missing_seq)[source]¶

prepmd.align_together.remove_from_model(structure, number_start, number_end, verbose=True)[source]¶

prepmd.align_together.align_together(seq1path, seq2path, seq1out, seq2out, code1='AAAA', code2='BBBB', verbose=True)[source]¶

prepmd.align_together.entry_point()[source]¶: CLI entry point function. Uses sys.argv and argparse args object.

prepmd.analysis module¶

Created on Mon May 11 14:39:36 2026

@author: rob

prepmd.analysis.get_selection_traj(universe, selection)[source]¶

prepmd.analysis.dist(a, b)[source]¶

prepmd.analysis.get_ligand_centroid_traj(top, traj, ignore=['UNK'], output_trajs=True, print_dists=True)[source]¶

prepmd.download module¶

Download data from the PDB and UNIPROT

prepmd.download.get_em_map(emdb_id, directory)[source]¶

Download a structure from the EMDB. Args:

emdb_id: id of the em map to download, a string directory: directory to download the file into, a string

returns:: path to the downloaded file.

prepmd.download.get_structure(pdb_id, directory, file_format='mmCif', redo=False)[source]¶

Download a structure from the PDB. Args:

pdb_id: id of the pdb to download, a string directory: directory to download the file into, a string file_format: mmCif or pdb, a string

returns:: path to the downloaded file.

prepmd.download.get_uniprot_sequence(pdb_id, merge_sequence=True, write_to_file=None, verbose=False)[source]¶

For a given pdb id, find the fasta sequence for all chains from UNIPROT. Args:

pdb_id: the id of the pdb, a string merge_sequence: whether to merge the sequences together into a single fasta file, a bool write_to_file: path of file to write to, a string verbose: a bool, whether to write debug info out

Returns:: the fasta sequences as a dictionry keyed by pdb id, or a string with the fasta sequence.

prepmd.fix module¶

Fix structures using PDBFixer

prepmd.fix.fix_nonstandard_cif(cif)[source]¶: Removed metadata written by MODELLER to cif files, which can prevent those files from being read by other software.

prepmd.fix.remove_hetatms_unk(pdb, out)[source]¶

prepmd.fix.fix(pdb, out, remove_heterogens=True, add_hydrogens=True, ph=7.0)[source]¶: Fix a pdb file with pdbfixer. Arguments:

pdb: path to a pdb or mmcif file, a string out: output file path, a string remove_heterogens - if true, will remove heterogens add_hydrogens - if true, will add hydrogens ph - the ph to aim for when adding hydrogens, a float

prepmd.fix.restore_metadata_pdb(pdb, fixed_pdb)[source]¶

Copy metadata from one pdb file to another. Useful as the output of pdbfixer and other tools often doesn’t contain REMARKS and such. Args:

pdb: original pdb file path containing metadata, a string fixed_pdb: new pdb file path, containing no metadata, a string

Returns:: nothing, but updates the contents of fixed_pdb

prepmd.get_residues module¶

Read residue information from structure files

prepmd.get_residues.get_residues_pdb(pdb, code, get_hetatms=False)[source]¶

Get the fasta sequence of residues in the ATOM entries of a PDB or mmCif file. Args:

pdb: path to pdb file, a string code: PDB code

Returns:: the fasta sequence as a string

prepmd.get_residues.get_fullseq_pdb(pdb, code, get_hetatms=False)[source]¶

Get the fasta sequence of residues in the SEQRES records of a PDB/mmCif file. Args:

pdb: path to pdb/mmcif file, a string code: PDB code

Returns:: the fasta sequence as a string

prepmd.ligand module¶

Created on Fri Feb 27 16:49:54 2026

@author: rob

prepmd.ligand.load_universe(pdb_path)[source]¶

Create an mdanalysis universe from a structure file. This uses a quick and dirty hack - going via openmm - to get an mmcif file into mdanalysis, as mdanalysis does not natively support mmCif files, despite the fact that the pdb format has been deprecated for two years (!) at time of writing.

Args:: pdb_path: path to a pdb or cif file
Returns:: an mdanalysis Universe object

prepmd.ligand.pdb_back_to_mmcif(inpdb, outcif)[source]¶

prepmd.ligand.split_pdb_ligand(pdb_path, new_pdb_output=None, neutralise_radicals=True)[source]¶

Extract ligands from a structure file (pdb or mmcif) and write out a new, ligand/hetatm-free structure file and sdf files containing the ligands.

Args:: pdb_path - path to input pdb or mmcif file, a string new_pdb_output - filename for the new pdb file, a string. Otherwise the existing pdb file will be overwritten. neutralise_radicals - whether to remove free radicals. A bool.
Returns:: a list of strings, where each string is the filename of an sdf file that has been written, which are named according to their residue.

prepmd.ligand.solvate_openff(interchange, protein, ligands, n_water=None, gutter_nm=1.5, positive_ion_smile='[Na+]', negative_ion_smile='[Cl-]', desired_charge_elementary=0, water_ff='tip3p', solvent_smiles='O', ff='openff-2.2.0.offxml', solvent_name='HOH')[source]¶

Solvate an openff system. Args:

interchange: an openff interchange containing the protein AND ligand. protein: protein loaded as an openff topology ligands: a list of openff molecule objects n_water: number of water molecules, an int. Will calculate if this is not set. gutter_nm: size of the gap between the protein and box edge in nm, a float. positive_ion_smile: smiles string for the positive ions positive_ion_smile: smiles string for the negative ions desired_charge_elementary: desired charge in units of the elementary charge water_ff: force field to use for the water. can be tip3p or tip4pew. solvent_smiles: smiles string for the solvent molecule. ff: force field to use, A string. Can be any openff force field. solvent name: name of solvent molecule (for topology!) - a string

Returns:: openff topology and an interchange for just the waters.

prepmd.ligand.create_ligand_system(protein_structure, ligand_structures, n_water=None, water_ff='tip3p', solvent_smiles='O', gutter_nm=1, positive_ion_smile='[Na+]', negative_ion_smile='[Cl-]', desired_charge_elementary=0, ff='ff14sb_off_impropers_0.0.4.offxml', ligand_ff='openff-2.2.0.offxml', output_topology=None, solvent_name='HOH', ligands_names=None)[source]¶

Set up an openmm system containing ligands using openff. Args:

protein_structure: structure file path, pdb or mmcif, a string ligand_structure: a list of strings where each string is a path to an sdf file n_water = None, # if None, calculate water automatically positive_ion_smile: smiles string for the positive ions positive_ion_smile: smiles string for the negative ions desired_charge_elementary: desired charge in units of the elementary charge water_ff: force field to use for the water. can be tip3p or tip4pew. solvent_smiles: smiles string for the solvent molecule. ff: force field. For now, this is ff14sb_off_impropers_0.0.4.offxml as it’s the only one openff properly supports. ff: force field to use for the ligand, A string. Can be any small molecule force field supported by openff. solvent_name - solvent name to appear in the topology, a string ligands_names: names of ligands to be written to the topology. a list of strings

Returns:: an openmm system, topology and positions object.

prepmd.metadynamics module¶

Run metadynamics simulations - used in run.py

exception prepmd.metadynamics.StopSimulation[source]¶

Bases: Exception

Exception raised in order to stop simulation

exception prepmd.metadynamics.RMSDIncrease[source]¶: Bases: Exception

class prepmd.metadynamics.RMSDReporter(file, reportInterval, ref_positions, threshold_nm=0.17, consecutive_frames=2)[source]¶

Bases: object

describeNextReport(simulation)[source]¶

report(simulation, state)[source]¶

prepmd.metadynamics.get_representative_rmsd_frames(rmsd_log, num_frames)[source]¶

prepmd.model module¶

Calls to the MODELLER API

prepmd.model.retitle_alignment(alignmentfile, title1, title2, metadata1='sequence:::::::::', metadata2='sequence:::::::::')[source]¶

Alignment files generated by BioPython aren’t formatted in a way that modeller can easily read, so this function reformats them with correct titles and header info. Args:

alignmentfile: path to alignment file, a string title1: a string, title of the first sequence title2: a string, title of the second sequence metadata1: a string, metadata for the first sequence metadata2: a string, metadata for the second sequence.

The metadata strings must contain at least the substring ‘:::::::::’. Returns:

none, but overwrites alignmentfile

prepmd.model.fasta(fasta_name, include_metadata=False)[source]¶

Read data in fasta sequence. Args:

fasta_name: filename of fasta file, a string include_metadata: bool, reads metadata if true

Returns:: a list of lists, the first list is the sequence in the file, the second are the different blocks in the file: first the name/id, then the sequence, and the metadata (if requested)

prepmd.model.get_alignment_info(alignmentout)[source]¶

For a FASTA-formatted alignment file, get the number of residues filled, number of gaps, and largest gap.

Args:: alignmentout: path to an alignment file, a string
Returns:: total_resididues_filled - total number of residues to be added total_gaps_filled - number of gaps to be filled filled_residues - missing residues filled_gaps - gaps max_gap - largest gap

prepmd.model.get_objective_functions(pdbs)[source]¶

For a list of PDB or mmCif files generated by modeller, get the objective function (which measures the quality of the model) and similarity (of the model sequence and the sequence used to fill the missing loops) for each pdb.

Args:: pdbs: a list of strings, paths to pdb files
Returns:: scores, similarities, two dictionaries keyed by file path containing the scores and similarities.

prepmd.model.get_best_pdb(directory, exts=['pdb', 'cif', 'mmcif', 'mmCif'], em_map=None, em_contour_level=None)[source]¶

For a directory, find all of the pdbs generated by modeller and select the one with the highest objective function (arbitrary metric used by modeller). Args:

directory: a string, the directory to scan ext: file extensions to check for (a list of strings) em_map: path to an EM density map file (a string). If this is set, the best PDB will be picked based on similarity to the map. em_contour_level: contour level for the EM map, a float.

Returns:: path to the file with the highest objective function, a string

prepmd.model.fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel, wdir, num_models=1, em_map=None, em_contour=None, keep_hetatms=False)[source]¶

For a given structure, fill in missing loops using modeller. Args:

code: the pdb id of the structure, a string fastafile: path to the fasta sequence, a string alignmentout: filename of the temporary alignment output file, a string inmodel: input structure. note: due to the limitations of modeller, this must be named according to the pdb id! outmodel: output structure file, a string wdir: working directory, a string, will be created if it doesn’t exist num_models: how many models to generate, an int.

Returns:: nothing, but writes out outmodel and wdir.

prepmd.pdb2pqr module¶

Created on Mon Feb 9 14:04:25 2026

@author: rob

prepmd.pdb2pqr.run_pdb2pqr(infile, outfile, ff='AMBER', ph=7)[source]¶

Run PDB2PQR.

Args:: infile - path to input file, a string outfile - path to output PQR file, a string ff - force field to use for calculations and residue naming
Returns:: nothing, but writes ‘outfile’

prepmd.point_cloud module¶

Created on Fri Feb 13 15:41:18 2026

@author: rob

prepmd.point_cloud.to_point_cloud(mrcdata, voxel, contour_level)[source]¶

Convert an EM density map to a point cloud. Args:

mrcdata - ndarray of size resolutionXresolutionXresolution containg MRC data. voxel - voxel size, from the mrcfile library, should contain three member variables, x, y, and z (floats), for the voxel size in those dimensions. contour_level: density above which to add a point to the point cloud, a float.

Returns:: point cloud as an ndarray with three columns (x, y, z) and a row for each point.

prepmd.point_cloud.score_pdb_map(pdb, em_map, contour_level)[source]¶

For a given pdb and em_map, convert them to point clouds and score their similarity based on the error in an alignment between two point clouds. Args:

pdb - path to a pdb file, a string em_map - path to an an em map file for the same structure as that pdb, a string. contour_level: density above which to add a point to the point cloud, a float.

Returns:: the error, a float

prepmd.prep module¶

Prepare structures from the PDB for molecular dynamics.

prepmd.prep.id_generator(size=6, chars='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789')[source]¶

Generate a random 6-character ID. Used for naming scratch directories. Args:

size: length of string, an int chars; chars to use, a string

Returns:: the string

prepmd.prep.prep(code, outmodel, workingdir, folder=None, fastafile=None, inmodel=None, alignmentout='alignment_out.fasta', download_format=None, quiet=False, fix_after=True, download_sequence=False, fix_missing_atoms=True, write_metadata='prepmeta.json', pqrff='AMBER', pqr_out=None, redo=False, num_models=1, em_map=None, em_contour=None, split_hetatms=False, no_modeller=False, ph=7.0)[source]¶

Prepare a PDB/MMCIF structure file for simulation. Args:

code: pdb code, a string outmodel: path to output file, a string workingdir: path to working directory folder: input folder, containing structure file/fasta sequences. Used instead of the pdb code if provided. A string fastafile: path to fasta-formatted text file containing the original sequence for the structure file, a string. If no fastafile is provided the sequence will instead be found from the structure file metadata. inmodel: path to input structure file, a string. alignmentout: output file from the alignment of the sequence from the structure file and the original sequence. download_format: format of the structure file, a string. Can be ‘mmCif’ or ‘pdb’. quiet: if true, don’t print anything. boolean fix_after: if true, fix the pdb file after loop building. If false, fix it before. a boolean download_sequence: get the fasta sequence for the structure from UNIPROT instead of the pdb metadata or an external file. Note: the UNIPROT sequence is normally very different from the pdb sequence, so the alignment might fail. a boolean fix_missing_atoms: whether to add missing atoms with pdbfixer. A bool pqrff: A string, can be AMBER or CHARMM. Force field to use for the PQR creation. Note: when I tested, CHARMM was buggy. pqr_out: output PQR filename, a string. redo: if True, will download PDB from PDB-REDO instead of the regular PDB. num_models: number of models to generate, an int. If >1, the best model will be selected based on MODELLER’s internal scoring metric. em_map: path to an EM density map file (a string). If this is set, the best PDB will be picked based on similarity to the map. em_contour: contour level for the EM map, a float. split_hetatms: if this is set to True, hetams will be written to sdf files, otherwise they will be removed. ph - ph to target when adding hydrogens, a float

Returns:: nothing, but writes out a file to outmodel.

prepmd.prep.entry_point()[source]¶: CLI entry point function. Uses sys.argv and argparse args object.

prepmd.run module¶

Use OpenMM to run simulations

prepmd.run.make_forcefield(ff, solvent=None)[source]¶

prepmd.run.check_platforms()[source]¶

prepmd.run.test_sim(pdb, no_output=False)[source]¶

prepmd.run.run(pdb, minimised_structure_out=None, traj_out=None, max_minimise_iterations=500, minimise_error=None, test_sim_steps=500, md_steps=None, md_timestep=None, forcefield_str=None, integrator_str='LangevinMiddleIntegrator', friction_coeff=1 /ps, temperature=300 K, minimise=True, test_run=True, fix_backbone=False, constraints_str='Default', solvent=None, strip_solvent=False, ionic_strength=0.0 M, pressure=None, step=1000, thermo_out_file='thermo.txt', non_bonded_method_str='Default', checkpoint_output='checkpoint.dat', verbose=True, write_params='params.json', metadynamics_morph=None, meta_rmsd_threshold_nm=0.17, no_output=False, ligands=[], ligands_names=None)[source]¶: Run an MD simulation from a pdb/mmcif structure created with prepmd. Args:

minimised_structure_out - filename to write the final minimised structure to (string) traj_out - output trajectory filename (xtc or dcd) if MD runs, a string max_minimise_iterations - maximum iterations of the energy minimisation algorithm, an int minimise_error - error tolerance for variable langevin integrator. The value is arbitrary, 0.001 is a good starting point, increasing this will make the simulation run faster at the expense of accuracy, a float test_sim_steps - how many steps to run of the test simulation. This isn’t production MD, this is just the simulation that checks that your structure doesn’t have any steric clashes, an int md_steps - number of steps for production md, an int md_timestep - md simulation timestep. Multiply by an openmm time unit. A float. forcefield_str - which MD forcefield to use. Valid forcefield: charmm36, amber14, amber19 (if using a recent openmm version), amoeba. A str integrator_str - which integrator to use. Valid choices are LangevinMiddleIntegrator, VariableLangevinIntegrator. Starting with the middle integrator is probably best, as the variable langevin integrator will be used automatically if the test simulation crashes. A string. friction_coeff - the friction coefficient which couples the system to the heat bath. divide by an openmm time unit. A float temperature - simulation temperature, a float. Multiply by an openmm temperature unit. minimise - whether to run minimisation. You almost certainly want to turn this on, unless you are using an already minimised structure. A bool. test_run - whether to run a short test MD test run. A bool. fix_backbone - whether to fix the backbone in place, (for example, if you’re resolving the positions of side chains). A bool constraints_str - whether to constrain Hbonds or not. Possible values: None, HBonds. By default, HBonds will be constrained if the backbone is not fixed. A string. solvent - solvent to use. Possible values: None (no solvent), tip3p, tip4pew, spce, implicit (gbn model, equivalent to AMBER igb=8). A string. write_solvent: whether to write solvent atoms to the minimised structure file. Solvent will always be written to the trajectory. A bool. ionic_strength - ionic strength of the solvent. A float. multiply by openmm molar. pressure - pressure coupling via monte carlo barostat. If None, no pressure coupling will be used, otherwise specify a pressure multiplied by an openmm pressure unit. A float. step - how often to write out traj/thermo information. An int. thermo_out_file - name of a file to write thermo information to. A string. non_bonded_method_str - method for calculating long-range interactions. Can be one of: PME, CutoffPeriodic, CutoffNonPeriodic. A string. If this is not set, it will be picked automatically. checkpoint_output: name of checkpoint file to write to. A str. write_params: name of a file to write all the simulation params to. a str ligands - list of ligand filenames. i think? in sdf format. ligands_names - a list of strings of names of ligands. used to output a ligand topology. Can run without this but ligands will show up as UNK

prepmd.run.entry_point()[source]¶: CLI entry point function. Uses sys.argv and argparse args object.

prepmd.util module¶

Utility functions

prepmd.util.is_residue(resid)[source]¶

Check if a string is a valid residue. Args:

resid: the string

Returns:: a boolean that is true if the string is a valid residue

prepmd.util.is_residue_sequence(sequence)[source]¶

Check if a sequence only contains valid residues Args:

sequence: an iterable

Returns:: true if every item in the iterable is a valid residue, otherwise false

prepmd.util.three_to_one(resid, ignore_non_standard=False)[source]¶

Convert the three-character residue representation found in SEQRES records to the one-character FASTA representation. Args:

resid: the residue, a string ignore_non_standard: ignore non-standard residues instead of throwing an error, a bool

Returns:: the fasta residue, a string

prepmd.util.three_to_one_sequence(resids)[source]¶

Convert the three-character residue representation found in SEQRES records to the one-character FASTA representation. Args:

resid: an iterable of strings containing three-character residue codes

Returns:: the residue sequence in FASTA format

prepmd package¶

Subpackages¶

Submodules¶

prepmd.add_modeller_license module¶

prepmd.align_together module¶

prepmd.analysis module¶

prepmd.download module¶

prepmd.fix module¶

prepmd.get_residues module¶

prepmd.ligand module¶

prepmd.metadynamics module¶

prepmd.model module¶

prepmd.pdb2pqr module¶

prepmd.point_cloud module¶

prepmd.prep module¶

prepmd.run module¶

prepmd.util module¶

Module contents¶