Mike's new attributes have been uploaded to the attribute server. The new attributes and their descriptions are given below the fold.
To access these attributes during testing, you need to connect to the main attribute server in your FIG_Config.pm file. The required text is as follows.
$attrURL = "http://anno-3.nmpdr.org/attrib_server/AttribXMLRPC.cgi";
- CDD: Conserved domain data for a PEG. A PEG's domain indicates the shape of a protein's molecular units. A PEG can have one domain or several. If it has several, each one is coded as an individual value. The relevant entry ID in the public Conserved Domain Database is coded as a subkey. The match score is specified as the attribute value (an exact match is 0). The value is represented in scientific notation twice per value, separated by a semi-colon. The value before the semi-colon is designed for sorting; the value after the semi-colon is essentially a standard floating-point representation without the "e" (1-200 instead of 1e-200).
- CELLO: Location of a PEG's protein in the cell as predicted by the CELLO tool. The subkey is the location name (e.g. membrane, cytoplasm, extracellular), and the value is the score, a higher score indicating more confidence in the placement.
- IPR: Protein domain information from the InterPro database. The subkey contains the ID of a protein domain and the value is a similarity score, encoded as a sortable string followed by the actual score in exponential notation without the "E".
- isoelectric_point: pH in the surrounding medium at which the charge on a protein is neutral. If the pH of the medium is lower than this value, the protein will have a net positive charge. If the pH of the medium is higher, then the protein will have a net negative charge.
- molecular_weight: Molecular mass of a feature's protein, expressed in daltons (where 1 = 1/12 the mass of a carbon-12 atom). The molecular mass is ideally almost equal to the number of protons and neutrons in the molecule.
- PDB: Protein Database entry for a given Feature. The subkey is the four-letter PDB identifier, and the value is a sortable similarity score followed by an optional location. In other words, if the PDB value is XXX.XXX;YYY-YYY, then the XXXs are a sortable version of the score that causes lower scores to sort higher, and the YYY-YYY refers to a location on the contig.
- PFAM: Protein family data for a given Feature, taken from the protein family database maintained by the Washington University of St. Louis. The subkey is the protein family ID, and the value is a sortable similarity score followed by the actual similarity score encoded as a floating-point number without the E.
- SignalP: Signal Peptide capability of a given Feature. The subkey is either "signal_peptide" or "cleavage_site". The value is a probability followed by the location of the peptide or cleavage site. The location is always between two nucleotides. Existence of a signal peptide means the protein will move toward the cell membrane. Existence of a cleavage site means that once the protein reaches the membrane, part of it will be cleaved off and get excreted to the environment. A feature must have both if it needs to be expelled from the cell.
- TMPRED: Estimation of membrane-spanning regions in a given feature as computed by the TMPredict utility. The value is a number followed by a list of locations inside the feature.
Leave a comment