University College London
Browse
1/1
9 files

Protein structures predicted using DMPfold2, plus training data

Version 3 2022-01-27, 09:51
Version 2 2021-10-26, 16:06
Version 1 2021-07-22, 11:00
dataset
posted on 2022-01-27, 09:51 authored by Shaun KandathilShaun Kandathil, Andy Lau, Joe Greener, David JonesDavid Jones
This dataset comprises predicted protein structures from the paper "Ultrafast end-to-end protein structure prediction enables high-throughput exploration of uncharacterized proteins". Structures were predicted using DMPfold2.

BFD_1.3M.hdf5 contains all the models from the set of 1.3M that were generated. The models can be retrieved from this file using the provided hdf5_extract.py script and the list of IDs in bfdfold_1.3M_target_ids.csv.

Also provided are tarballs of the models and sequence alignments for the 5193 Pfam families modelled in the paper, as well as for the set of 255 Pfams with released structures used for comparisons against DMPfold1 and C-I-TASSER.

train_data.tar.bz2 contains the data used to train the DMPfold2 neural network. Further scripts and instructions are available on the associated GitHub page: https://github.com/psipred/DMPfold2

Funding

Exploring new applications of amino acid covariation analysis in modelling proteins and their complexes

European Research Council

Find out more...

History

Usage metrics

    Department of Computer Science

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC