Keyword searching in Sprout is implemented using the text search capabilities of MySQL. Each feature has a keywords field that contains a space-delimited list of all the keywords for that feature. The keyword list contains:
- The FIG feature ID and all aliases.
- The functional assignment, if any.
- The names and classifications of the subsystems containing the feature.
- For each subsystem role performed by the feature, the role name and abbreviation.
- The genome ID and the complete taxonomy (including the names of the genus, species, and strain).
- The special attribute names relevant to the feature (currently
essential,virulent, and/oriedb).
Words containing hyphens or underscores are included in their natural form as well as being broken up into pieces. There is also some folding done. All letters are changed to lower-case, hyphens (-) and periods (.) are converted to underscores, and vertical bars (| and colons (:) are converted to apostrophes ('). This same process occurs on the keyword phrase before it is applied to the keyword index. There are technical requirements for this (apostrophes and underscores are the only punctuation marks allowed by MySQL in searchable text), but it also makes the search process more forgiving.
For example, the keyword list for fig|100226.1.peg.1023 is as follows.
- Genome ID: 100226_1
- FIG ID: fig'100226_1_peg_1023
- Taxonomy: bacteria actinobacteria actinobacteridae actinomycetales streptomycineae streptomycetaceae streptomyces streptomyces coelicolor a32
- Aliases: geneid'1096479, np_625350_1, sco1056, gi'21219571, kegg'sco'sco1056, tr'q9k443, uni'q9k443
- Functional Assignment: possible alpha_xyloside alpha xyloside transporter, substrate_binding substrate binding component
- Subsystem: xylose utilization, classification carbohydrates. monosaccharides
- Subsystem Role: possible alpha_xyloside alpha xyloside transporter substrate_binding substrate binding component, abbreviated ax_abca
Leave a comment