Main | November 2006 »

October 2006 Archives

October 4, 2006

The NMPDR Web Sites

The NMPDR is divided into two sections. The cover pages are web pages that serve as a front end for the NMPDR content. These also include the template pages used to format NMPDR data. The database site contains the database and scripts used to retrieve NMPDR content. The cover pages are maintained by the NCSA personnel using Macromedia Dreamweaver. The database site is maintained by FIG personnel using the CVS source control system.

We currently support up to four NMPDR web sites at any given time.

  1. The Public Site is the version of the NMPDR available for public use. The cover pages for The Public Site are stored on web-3.nmpdr.org. The database site is stored on nmpdr-3.nmpdr.org.
  2. The Staging Site is generally only available for a few days. It contains a copy of the site that is about to be made public. The cover pages for The Staging Site are stored on web-3.nmpdr.org. The database site is stored on nmpdr-3.nmpdr.org.
  3. The Mirror Site is a copy of The Public Site used for testing and debugging when a problem is found on The Public Site. The cover pages for The Mirror Site are stored on web-1.nmpdr.org. The database site is stored on nmpdr-1.nmpdr.org.
  4. The Development Site is the version of the NMPDR currently in development. The cover pages for The Development Site are stored on web-1.nmpdr.org. The database site is stored on nmpdr-1.nmpdr.org.

Click here to see a diagram of the four sites and how they fit into the update process.

Version 15 Schedule

October 6, 2006

Development Server Status

So that development can proceed on version 15 while the load is taking place, the Development Site has been connected to the version 14 database. It still uses, however, the latest version 15 web pages, templates and code. Once the table load completes I will switch back to the new database.

October 7, 2006

Subsystem Page Fix

The subsystem pages in NMPDR now stay in NMPDR mode when the user clicks on the Show Spreadsheet button. Previously, it would slip back into SEED mode. To test this fix on The Development Site, click here and then click the Show Spreadsheet button.

October 8, 2006

Version 15 Development Site is Now Available

The version 15 database has been loaded and the appropriate files generated for The Development Site. The difference report is below the fold.

This time there is not a lot of new stuff. Seventy-five property name/value pairs were deleted and 17 new ones were added.

Continue reading "Version 15 Development Site is Now Available" »

October 9, 2006

Drug Target Data Base

I have finished the design of the drug target section of the Sprout database. (The two sections connect via the Feature entity.) You can view the design diagram here. Clicking on an entity or relationship will bring you to the corresponding section of the database documentation.

October 10, 2006

Useful Web Page Debugging Tool

If you use Firefox, you should consider installing the Web Developer Toolbar at http://chrispederick.com/work/webdeveloper/. It allows you to view the current web page's cookies, form data, and style information (among other things). Plus, there's a little icon on the far right that turns red if there's an error in the page. This last is particularly helpful for navigating through JavaScript and style problems.

The Tracing System

The new tracing system has been updated so that it works on the bio* machines. To use the system, start at /FIG/Html/SetPassword.html on whichever version of SEED you're debugging. Contact me via EMAIL if you don't know the password. Fill your name or something similar into the Tracing Key field, then submit the form. This will take you to the Sprout debugging console. The tracing form is at the bottom.

Click here for information on how to enable your web scripts for tracing and generate trace messages. When you activate tracing using the Emergency Tracing form on the debug page, it creates a file in the FIG temporary directory. Your tracing key is stored in a cookie as long as you stay on the debug form, and the web scripts can use the cookie to find the temporary file and turn on the type of tracing indicated. There are also buttons on the form for turning off tracing and for showing the tracing file.

Continue reading "The Tracing System" »

October 12, 2006

What Goes Into Sprout?

The Sprout contains a subset of the SEED data computed from a snapshot taken roughly once every two weeks. The subset is determined by a selection of genomes and a selection of subsystems. The default behavior is

  • All complete genomes are loaded. A genome is considered complete if its organism directory contains a file named COMPLETE.
  • All NMPDR subsystems are loaded. A subsystem is considered NMPDR if it has a file named NMPDR in its directory. The $fig->nmpdr_subsystem method is used to make this determination. (Note that prior to 10/29/2006, a more inclusive criterion was used.)

This behavior can be overridden by creating special files listing the genomes and/or subsystems to be loaded.

The Sprout load normally takes place over a period of two to three days after the previous version rolls over.

What Makes Sprout Special?

The Sprout uses data from the SEED to build a database that is optimized for searching and data mining. There is considerable data redundancy in order to insure that the searches are as fast as possible.

To add new data to the Sprout data base, you first update an XML file that contains the database definition, then you add a new module to the Sprout loader. The next time the Sprout is loaded, the new data will immediately be available for use via calls to database methods implemented in the Sprout base module.

At the current time, we are developing a high-powered search framework that can be used to add new search capabilities quickly. It is currently available on the as-yet-unpublished new search page. The search script automatically generates a list of search types from data in the FIG configuration file. If it is asked for a particular type of search, it will display the search form. When the form is filled in it will display search results.

Each search is implemented using a Search Helper module. All Search Helper modules are built on top of pre-existing code that handles the bookeeping and formatting, so when we add a new search we only need to lay out the form and write the code to find the search targets. Most searches are for features, so there are built-in helpers for feature filtering and retrieval.

Using this framework, a new search module can be added to Sprout in less than a day of programming effort. The ultimate goal is to make the NMPDR the go-to site for finding genes.

Continue reading "What Makes Sprout Special?" »

downloadable data

There are apparently several different ways to make data downloadable and transportable to Excel. I believe the miscommunications between me, the users, and the coders stems from using different combinations of browser/os and also from different expectations. Providing a link on an html page that takes text out of an html table and puts it on the screen so that it looks like a tab-delimited text file is what is frequently provided as a service called "download tab-delimited text" or "export file for Excel".


In order to actually put this html into Excel from Firefox on a PC requires copying the text on the screen, opening Excel, and choosing "paste special" from the edit menu. Then change from html to text and paste. It doesn't work if you just paste. Thus, there is no reason to redraw the web page because one can select and copy any html table and paste it into Excel the same way. Drawing an ugly html page does not facilitate viewing the info in Excel.


Most biologists will also interpret "download" or "export" as meaning a file will appear on the local computer; not that an ugly web page will appear and do nothing--not even prompt you to copy it. I can provide a help statement that can go on the template for this type of "exported" file that will explain what to do with it in the common browser/os combinations. This will help a lot.


PC users who run Internet Explorer have the easiest time--any html table that is copied can be pasted directly into Excel, retaining the links. This is ultimately what I want any user to be able to do--transport the search results or the protein context table or the BBH into a table in Excel WITH the links to NMPDR.

Continue reading "downloadable data" »

October 13, 2006

Keyword Searching Is Almost Here!

I have completed proof-of-concept testing for keyword searching in Sprout. Each feature will have a keyword list in the database that is processed by MySQL into a text-search index. In addition to simply listing keywords, it is possible to put modifiers on the words. For example, dnaK -hypothetical would return all dnaK features which are not hypothetical. A complete description of the operators is available here, but the point is we are already ahead of what Lucene can deliver, and it's better controlled. For example, if you ask for all hypothetical features for NMPDR genomes that belong to a specific subsystem, the search tool will combine the keyword search with the other criteria so that we get the full benefit of the database indexing.

The keyword search is currently inoperative while I load the keywords into the feature table. However, once its ready it will be automatically incorporated into all search tools that support feature filtering. There will also be a special keyword-only search tool designed to replace the ubiquitous NMPDR search box.

October 16, 2006

Lucene Search Fixes

A few fixes to the lucene search have been posted to the Development NMPDR.

  • The search results page knows whether results were returned or the advanced form was displayed, and this information is used by the template to insure the proper help text is displayed.
  • The organism names now display correctly with the strain included. Previously, the NMPDR group name was showing instead of the strain.

October 17, 2006

RNAs Now Work in Sprout

RNAs once again work correctly on the NMPDR Development version.

To test the fix,

  1. Go to the Development Server home page.
  2. Enter fig|100226.1.rna.10 into the search box.
  3. Select the NMPDR button on the results page to view the feature's data.

Continue reading "RNAs Now Work in Sprout" »

Keyword Searching Now Available

Keyword searching is now available on the unpublished Sprout Search Page. Not all the keywords we want are working yet, but you can still experiment with things like EC numbers and words found in functional assignments.

The feature filter has also been redone to make it less complicated. The only filters remaining are the subsystem name and the keyword search box. This change was made because too many of the other filtering choices (especially properties) tended to produce improper or confusing results.

Subsystem Diagram Link Fix

Subsystem diagrams now display as NMPDR pages when invoked from Sprout. To test this fix, go to this subsystem page, then click on the diagram link.

Continue reading "Subsystem Diagram Link Fix" »

October 18, 2006

Sig Tool Subsystem Links

Subsystems displayed in the signature genes tool now contain a link to the appropriate subsystem page. To test this, go here, then

  • Select Campylobacter jejuni RM1221 as the given gene.
  • Select Campylobacter jejuni RM1221, Campylobacter jejuni subsp. jejuni 260.94, Campylobacter jejuni subsp. jejuni 81-176, and Campylobacter jejuni subsp. jejuni 84-25 for set 1.
  • Select Campylobacter jejuni subsp. jejuni CF93-6, Campylobacter jejuni subsp. jejuni HB93-13, and Campylobacter jejuni subsp. jejuni NCTC 11168 for set 2.

Click Find Discriminating Proteins. You should see at least three linked subsystems in the discriminating set.

Essentiality Graph Works Again

The essentiality page works again. You can find it here.

Last Monday, after a phone conference, I removed the property and feature type controls from the feature filter in the new search system in order to insure we only returned meaningful results and to reduce the problem that most property searches return no results. This greatly streamlined the code, but it broke the essentiality page, which was using the feature filter to search for genes with specific properties. To fix the problem, I added a new search that looks for the occurrence of specific property name/value pairs in a chosen genome. This is not a very useful search for users visiting the site, but it does make it possible to do the essentiality searches using the old essentiality properties. I am now leaning toward the idea that the generated search page (found here on the development server) will only be used by developers, and we can have another page for the user-friendly searches that can have more involved explanations and examples.

Anyway, the important thing is that essentiality is back.

October 20, 2006

Organism Searches Return

The organism page search boxes have been fixed. To test this, go to the main page on the Development Server and type 2.7.6.3 into the search box. Several hundred results will come back. Next, go to the campylobacter page and try the same search. You will get a much smaller set of results, and all of them will be for campylobacters.

Hopefully this will be the last Lucene fix and we'll be using the new search in version 16.

GBrowse setup

The GBrowse setup procedure has been changed. Previously, the group files were stored in CVS, and the Other.group was used to fill in the gaps. Unfortunately, it is possible that if the site is remade enough times, the Other.group would not have all the genomes in it. The group files have therefore been moved out of CVS and they will be generated automatically from the group information in the database. Before the database is loaded, special dummy group files will be put in place to prevent the make from failing. Hopefully, this will be the end of the GBrowse craziness.

October 21, 2006

Subsystem Fixes

I have changed the delimiter for subsystem classifications from space to colon. This fixes the problem with the classification names being truncated to one word. In addition, I updated the genome statistics page to show the name as well as the number of the genome.

To test this, go to the subsystem summaries page, select Listeria monocytogenes EGD-e, and click Show Subsystems.

Version 15 is Now Live

Development on version 16 will begin later this weekend.

Version 16 Schedule

October 22, 2006

Version 16 now loading

The version 16 data is now loading into the Development Web Site. This load incorporates three changes to the handling of features:

  1. Subsystem roles are now included in the search keyword list.
  2. Complex hyphenated compound names are stored in the keyword list both in their original form and split on hyphen boundaries.
  3. The primary functional assignment for a non-peg feature is taken from the alias list instead of the annotations.

October 23, 2006

PSI-Blast Will Have Been Fixed Soon

PSI-Blast is sort of working on the NMPDR Development Site.

PSI-Blast was reported non-functional in v14. The reason had to do with the fact that the NCBI tools assume the first form on the web page is the NCBI tool's form, when in fact it was the third form when the results were loaded into an NMPDR template. All of the protein page tools use a single template, so the fix was to change that template to eschew the search and bug reporting forms normally present on every NMPDR page.

Unfortunately, the Sprout database is still loading, so this fix can't be tested directly. Instead, you must use the URL you would get if you clicked on the PSI-Blast link at the bottom of a protein page once the protein page is working. For the infamous fig|83333.1.peg.4, click here to get the desired page. Clicking on the FORMAT button from this location used to take you to a blank search results page; now it takes you to the PSI-Blast results.

My intent is to test this as soon as the load completes and then slipstream the fix into the live site. If the load is still going on tomorrow, then I will slipstream without first testing on the development site, which is scary, but required if we're to be ready for a Wednesday demo.

October 25, 2006

Slipstream Fixes to Web Pages

Two web page changes have been slipstreamed into the live site.

  1. The tool template has been changed so that Psi-Blast works correctly on the NMPDR protein page.
  2. The essentials page has been fixed so that the two E coli bars work. Previously, these had been left out of the image map and clicking them had no effect.

These changes have also been applied to the development site.

October 28, 2006

New Search is Now Installed

The new search has been installed on the Development Site. The old nmpdr_lucene_search.cgi script now redirects the request to the new search module so that the Mozilla search bar will continue to work. I've tested that as best I can without modifying the live site.

I have filled in the various help templates (all called SearchHelpsomething.inc in the template directory), but I suspect they'll need additional work. I'm also thinking of delaying the rollover while we work out the bugs.

October 29, 2006

Drug Target Data Now Available

The drug target data has been loaded into Sprout. You can review the database design here. I coded a quick display script so you can see what the data looks like. (PLEASE NOTE: It's just a data dump so that people can critique the data design, not a finished display.)

Subsystem Inclusion Criterion

The inclusion criterion for subsystems has been changed. Starting with the next load, only NMPDR subsystems will be included in the Sprout database. Previously, any usable subsystem was included. The total for this load would have been 549 under the old scheme; now it will be 393.

This is actually a reversal of a change made on July 13 of this year. The What Goes Into Sprout article has been updated accordingly.

Subsystem Classifications

The subsystem classification problem has been fixed in version 16. To test this, go to the Subsystem Summaries Page, select Vibrio cholerae O1 biovar eltor str. N16961 and click Show Subsystems. Alanine, serine, and glycine will now be a subclass of Amino Acids and Derivatives rather than part of the classification name.

New URLs for the NMPDR Development Blog

I have added subdomain URLs to make the Development Blog easier to navigate.

Revised Version 16 Schedule

The original schedule has been pushed back to allow more time for testing the new search facility and the new subsystem inclusion criterion.

October 30, 2006

Version 16 Reloaded

The data for version 16 has been reloaded. The difference report is substantial this time, since over 160 subsystems dropped out when we changed the inclusion criteria. I have included it below the fold (which is blogspeak for "on the other side of the Continue Reading link").

You can now search for keywords like essential and iedb. There are still some glitches in the searching. In particular, when you ask for essential, a hyperlinked list of essentiality values should show up in the results. I will investigate this further tomorrow (as noted on the current to-do list).

Continue reading "Version 16 Reloaded" »

About October 2006

This page contains all entries posted to NMPDR Development Blog in October 2006. They are listed from oldest to newest.

November 2006 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 4.01