« May 2007 | Main | July 2007 »

June 2007 Archives

June 4, 2007

Attribute Analysis

I have created a page on the Development Server that contains information useful in understanding the meaning and format of the various attributes. The report is, for now, restricted to attributes of features and attributes related to drug targets.

At the bottom of the page is a report showing how many of the NMPDR features and how many of the NMPDR Core features have which attributes. The report gives us an idea of which attributes may be practical for use in searching. For example, we have CELLO data on 68% of the NMPDR core features, so it would be reasonable to allow a user to search on or ask for CELLO data if he's restricting his attention to core genomes. On the other hand, we have molecular weight information on 18% of the total features and less than 2% of the core features, so an attempt to search on molecular weight would not provide any useful results.

At this point it looks like the only practical things to add would be the CELLO and CDD attributes. Therefore, I am adding these to the database and they should appear on the Advanced Search Page by the end of the week.

June 7, 2007

Version 21 Loading, Version 20 Getting Help

Sprout version 21 is currently intended to be the first version from which a PPO database can be generated. Among the changes were

  • Ripping out the whole PCH family of tables in favor of the PCH server
  • Adding a CDD table and putting CELLO data in the Feature table
  • Converting all of the keyed array fields (feature alias, compound name, role EC number) into separate entities

This last was because PPO does not support keyed array fields. Because of this change, we should be able to generate a working PPO database from the NMPDR database definition, and this will give us a template to shoot for in accomplishing the integration.

In the meantime, I am working on bug fixes to the live NMPDR using the mirror version of NMPDR. One big problem was that the Sprout attribute call did not support the full capabilities of the new attribute system. This caused a problem with incorrect literature counts on the subsystem display page. In addition, it was causing CDD codes to appear in the evidence column for the commentary of a pin page.

The final problem has to do with an incompatibility of the diagrams. I hope to resolve this tomorrow and will then copy the mirror to the live site over the weekend.

It would be tricky, but possible, to reload the version 20 database to get the latest information. This would fix the problem with outdated abbreviations and stuff in the subsystems. It would also give us an opportunity to add serotypes to the names of the core genomes and possibly get any core genome updates out of the pipeline. At the current time, however, I am assuming this would not be done. Please correct me if I'm wrong.

June 8, 2007

Mirror Site Diagrams Fixed

The subsystem diagrams in the Sprout now use the new diagram technology. There is, unfortunately, a slight glitch due to the fact that the roles stored in the version 20 Sprout do not necessarily match the ones in the SEED. The symptom is that some of the role tooltips don't work. This has been fixed on the Mirror Site. If you go to this subsystem page, the first diagram (De Novo &c) is an old-style diagram, and the second (Arginine &c) is a new-style diagram. Both diagrams will invoke the proper CGI script so that they display correctly.

This is the last of the problems reported by Olga in her EMAIL of a few days ago. The fixes will most likely be transferred to the live site some time on Sunday.

June 11, 2007

Version 21 Load Complete, Version 20 Becomes Favorite Child

I had to reload several table groups in Sprout version 21 due to misunderstandings on my part as to how some relationships work. In particular, some RNA roles are still in the aliases, so aliases are a many-to-many relationship rather than one-to-many.

It is also the case that some CAS IDs and compound names belong to multiple compounds. These relationships are now many-to-many.

There is an EC number with a tab inside it: 2.6.1.2. I don't know which subsystem it's in, but the role name is putative alanine transaminase (glutamyc pyruvic transaminase). I've added code in the Sprout loader to fix this, but it's probably better to fix the actual subsystem spreadsheet if someone can find it.

There is also a feature somewhere that has a right curly brace in its alias list. Because of the way the aliases are generated, the right curly brace does not have any features attached to it in Sprout, so it's not going to affect anything; however, the down side is that I have no idea which feature has the bogus alias.

It will be several days before version 21 is ready for testing. Because version 21 contains a major database change, I am currently targeting changes to version 20. The plan is to slipstream in a version 20a in the next few days. The development copy of version 20a is on the mirror site. The major change for 20a is the incorporation of the scan-for-matches search into the BLAST search page. There are also some bug reports from Leslie and Claudia that I am still researching.

The Mirror Site is currently all messed up, but the plan is to have it ready for hammering in a few hours. (Ready for hammering means the bug fixes are in but the new search is not yet done.) I will do another blog post when it's hammer time, at which time we need to find any bugs so I can fix them before we post to the live site.

Version 20a Available on Mirror Site

The new version 20 is now available for testing at The Mirror Site. The missing ingredient at this time is the new, improved Blast/Pattern search. Everything else, however, should be investigated to make sure we're ready for the upcoming conference-like thing.

June 22, 2007

Reload in Progress

The Development Site is going to be completely unusable for the next few days. First and foremost, I am reloading the subsystems table to fix the Role abbreviation problem. Previously, abbreviations were properties of the Role: each Role used the same abbreviation in all subsystems. This has been changed so that the Role can have a different abbreviation in each subsystem, which more closely matches the SEED data structures.

There are also numerous search problems which result from the search code being in an incomplete state at the time I started the load. I will get back to the search fixes as soon as I finish doing the laundry.

June 28, 2007

Attribute Report for v21

I passed out before taking my medications last night. I've been told it's Thursday, but I'm not yet sure which Thursday that is.

In the meantime, the latest attribute report is no available here. For NMPDR core organisms, we have 92% coverage on CELLO data and 50% coverage on TMPRED, which is much better than we had in the previous report.

About June 2007

This page contains all entries posted to NMPDR Development Blog in June 2007. They are listed from oldest to newest.

May 2007 is the previous archive.

July 2007 is the next archive.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 4.01