Main

About Sprout Archives

October 4, 2006

The NMPDR Web Sites

The NMPDR is divided into two sections. The cover pages are web pages that serve as a front end for the NMPDR content. These also include the template pages used to format NMPDR data. The database site contains the database and scripts used to retrieve NMPDR content. The cover pages are maintained by the NCSA personnel using Macromedia Dreamweaver. The database site is maintained by FIG personnel using the CVS source control system.

We currently support up to four NMPDR web sites at any given time.

  1. The Public Site is the version of the NMPDR available for public use. The cover pages for The Public Site are stored on web-3.nmpdr.org. The database site is stored on nmpdr-3.nmpdr.org.
  2. The Staging Site is generally only available for a few days. It contains a copy of the site that is about to be made public. The cover pages for The Staging Site are stored on web-3.nmpdr.org. The database site is stored on nmpdr-3.nmpdr.org.
  3. The Mirror Site is a copy of The Public Site used for testing and debugging when a problem is found on The Public Site. The cover pages for The Mirror Site are stored on web-1.nmpdr.org. The database site is stored on nmpdr-1.nmpdr.org.
  4. The Development Site is the version of the NMPDR currently in development. The cover pages for The Development Site are stored on web-1.nmpdr.org. The database site is stored on nmpdr-1.nmpdr.org.

Click here to see a diagram of the four sites and how they fit into the update process.

October 12, 2006

What Goes Into Sprout?

The Sprout contains a subset of the SEED data computed from a snapshot taken roughly once every two weeks. The subset is determined by a selection of genomes and a selection of subsystems. The default behavior is

  • All complete genomes are loaded. A genome is considered complete if its organism directory contains a file named COMPLETE.
  • All NMPDR subsystems are loaded. A subsystem is considered NMPDR if it has a file named NMPDR in its directory. The $fig->nmpdr_subsystem method is used to make this determination. (Note that prior to 10/29/2006, a more inclusive criterion was used.)

This behavior can be overridden by creating special files listing the genomes and/or subsystems to be loaded.

The Sprout load normally takes place over a period of two to three days after the previous version rolls over.

What Makes Sprout Special?

The Sprout uses data from the SEED to build a database that is optimized for searching and data mining. There is considerable data redundancy in order to insure that the searches are as fast as possible.

To add new data to the Sprout data base, you first update an XML file that contains the database definition, then you add a new module to the Sprout loader. The next time the Sprout is loaded, the new data will immediately be available for use via calls to database methods implemented in the Sprout base module.

At the current time, we are developing a high-powered search framework that can be used to add new search capabilities quickly. It is currently available on the as-yet-unpublished new search page. The search script automatically generates a list of search types from data in the FIG configuration file. If it is asked for a particular type of search, it will display the search form. When the form is filled in it will display search results.

Each search is implemented using a Search Helper module. All Search Helper modules are built on top of pre-existing code that handles the bookeeping and formatting, so when we add a new search we only need to lay out the form and write the code to find the search targets. Most searches are for features, so there are built-in helpers for feature filtering and retrieval.

Using this framework, a new search module can be added to Sprout in less than a day of programming effort. The ultimate goal is to make the NMPDR the go-to site for finding genes.

Continue reading "What Makes Sprout Special?" »

November 5, 2006

Keyword Searching

Keyword searching in Sprout is implemented using the text search capabilities of MySQL. Each feature has a keywords field that contains a space-delimited list of all the keywords for that feature. The keyword list contains:

  • The FIG feature ID and all aliases.
  • The functional assignment, if any.
  • The names and classifications of the subsystems containing the feature.
  • For each subsystem role performed by the feature, the role name and abbreviation.
  • The genome ID and the complete taxonomy (including the names of the genus, species, and strain).
  • The special attribute names relevant to the feature (currently essential, virulent, and/or iedb).

Words containing hyphens or underscores are included in their natural form as well as being broken up into pieces. There is also some folding done. All letters are changed to lower-case, hyphens (-) and periods (.) are converted to underscores, and vertical bars (| and colons (:) are converted to apostrophes ('). This same process occurs on the keyword phrase before it is applied to the keyword index. There are technical requirements for this (apostrophes and underscores are the only punctuation marks allowed by MySQL in searchable text), but it also makes the search process more forgiving.

Continue reading "Keyword Searching" »

About About Sprout

This page contains an archive of all entries posted to NMPDR Development Blog in the About Sprout category. They are listed from oldest to newest.

Attributes is the next category.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 4.01