The version 15 database has been loaded and the appropriate files generated for The Development Site. The difference report is below the fold.
This time there is not a lot of new stuff. Seventy-five property name/value pairs were deleted and 17 new ones were added.
Continue reading "Version 15 Development Site is Now Available" »
Development on version 16 will begin later this weekend.
The version 16 data is now loading into the Development Web Site. This load incorporates three changes to the handling of features:
I have added subdomain URLs to make the Development Blog easier to navigate.
The data for version 16 has been reloaded. The difference report is substantial this time, since over 160 subsystems dropped out when we changed the inclusion criteria. I have included it below the fold (which is blogspeak for "on the other side of the Continue Reading link").
You can now search for keywords like essential and iedb. There are still some glitches in the searching. In particular, when you ask for essential, a hyperlinked list of essentiality values should show up in the results. I will investigate this further tomorrow (as noted on the current to-do list).
Bruce went to see the dentist today and he is feeling much better; however, he was unable to return to work due to a mishap involving a tube of super-glue and a broken car mirror.
Respectfully submitted,
Ferdinand T. Cat
I reloaded the NMPDR 16 database. The new difference report is below the fold.
After much delay, heartache, and gnashing of teeth, version 16 of the NMPDR has been moved to the Staging Site. Please give it a once-over to make sure your favorite features still work. If there are no problems, it will go live some time on Thursday.
The new drug target data is not present in this version. It will be added for the next version, which is scheduled to go live on December 4.
Version 17 of the NMPDR is now available on the development server. It includes the new drug targets pages, which can be seen at http://web-1.nmpdr.org/next/FIG/targets.cgi, though the information there has not yet been completely curated.
Not all of the attributes we want are available. Once the attribute system is fixed (hopefully in a day or two), I will reload the NMPDR property table.
NMPDR version 17 is now on the staging server.
Bigger, faster, and richer in content, version 17 is now live.
To celebrate, I will be spending the next few hours huddled in a corner whimpering and shaking uncontrollably.
This morning I will begin building version 18. There are several important changes I need to make before version 18 can be loaded, so for a while there will be no official data in the Development NMPDR; however, I don't want to make radical code changes in version 17 now that it's live, so we will have to limp along for a while.
The holiday is over and NMPDR version 18 is now officially loaded on the development server. In addition, the keyword search now supports three-letter words. For example, in the old system, adenine RNA and adenine both return the same result set because the three-letter word RNA is ignored.. In the new system, adenine returns 1702 records and adenine RNA returns only 44 records.
The difference report for version 18 is here.
I am currently reloading the Feature table to insure that the RNAs have the correct assignments. Once that is done, I will run the standard post-load scripts to rebuild the cover pages. My goal is to have the staging site up some time tomorrow and then bring up version 18 on the following Tuesday (February 27). Leslie's serotype data will be lost, but I hope to have this information available on the attribute server next week so it can be made part of the load process in version 19.
NMPDR version 18 is now available for preview on the Staging Server. A complete comparison of the data differences between version 17 and version 18 is available in the revised difference report.
The current plan is to roll v18 into production on Tuesday, February 27. In the meantime, you can have some fun by typing riboswitch into the search box.
Version 19 of the NMPDR is now set up on the Development server. At this point no data has been loaded into the database, however, so nothing works.
The goal for this release is to make the NMPDR compatible with the SEED viewer. This is a significant task that may require several database reloads, but it is very different from actually making the SEED viewer available on the NMPDR. All that has to be accomplished for this release is that the Sprout software support the requirements of the viewer.
The search system has been retooled so that while the search is in progress, status messages are sent to the user in real time. This means that if the search is taking a long time, then every so often text will be presented to the user explaining what is happening. This makes a long search more palatable, and it also prevents the long searches from timing out and presenting the user with an internal server or proxy error. When the search is complete, the search results page will immediately pop up. For fast searches, the results come up so quickly that the status messages never show up. I have not tested this change thoroughly, but I hope to later today. The word search and the signature genes search should work if you want to see how the new system operates.
There have been three additions to the Signature Genes Tool.
The NMPDR organism pages have every genome marked as new, which means the counts on the front page are almost certainly wrong. I will investigate this when I wake up this afternoon.
The Annotation button is still in place because there are still some fixes I need to make to the SEED Viewer support. The next thing I have on my list is fixing the BLAST search.
I have been told there are nine incorrect genomes in the NMPDR. As soon as they are fixed, I will reload, run the difference report, and begin the cutover and testing process. The next thing after the BLAST fixes are the drug target objects. Whether those get into v19 or v20 will depend on how long I have before the reload starts.
I have added a link to the search results page that allows you to download the entire search output as a tab-delimited file. I am adding features one at a time and testing them. Suggestions are welcome, but because it is a work in progress there is no need to worry that any anomalies are cast in stone.
NMPDR version 19 has now been moved to the staging server. There was a slight delay in the original schedule due to a problem Leslie discovered with the organism pages. Bob fixed it this morning, however, so now we are back on track.
I have pushed the cut-over date forward to April 17 so that there will still be two full business days for testing.
I am in the midst of applying the following changes.
Once these fixes are in and I've tested them on the development server, I will copy them to the staging server, test them there, and then roll the version.
The version 20 web site is now available on the Development Server. Nothing has been tested, but if you don't look at it too hard it should work fine.
Over the weekend, the power will be turned off somewhere because of something, with the result that nmpdr-3 and web-3 will be unavailable. Therefore, this coming Friday (May 18), we will be redirecting http://www.nmpdr.org to point to the development server. There will therefore be no development server over the weekend. Since this is also my development sandbox, I will not be working on the 18th and 19th. On Monday the 20th, I will meet with Bob and Bill again to restore the normal order of things.
The key thing is that this time there will not be a Staging Server. Instead, we have to hammer on the Development Server as much as we can from now until then.
NMPDR version 20, the first to use our new single-server technology, is now live. There is now a much shorter delay before the search progress page shows up, and this makes the whole thing seem snappier and more responsive. A word search for Vibrio presented the progress page after 10 seconds and completed the search (47912 results) in only 28 seconds. Searches with smaller result sets (eg dnaK) respond in only a few seconds.
Bill is currently working with the configuration to reduce the 10-second delay, but now that the entire NMPDR web site is on a single server, we have a lot more options than we did before.
Version 21 of the NMPDR is now loading on the Development Server. The current target date to go live is June 11. The load is expected to finish around May 28.
I have created a page on the Development Server that contains information useful in understanding the meaning and format of the various attributes. The report is, for now, restricted to attributes of features and attributes related to drug targets.
At the bottom of the page is a report showing how many of the NMPDR features and how many of the NMPDR Core features have which attributes. The report gives us an idea of which attributes may be practical for use in searching. For example, we have CELLO data on 68% of the NMPDR core features, so it would be reasonable to allow a user to search on or ask for CELLO data if he's restricting his attention to core genomes. On the other hand, we have molecular weight information on 18% of the total features and less than 2% of the core features, so an attempt to search on molecular weight would not provide any useful results.
At this point it looks like the only practical things to add would be the CELLO and CDD attributes. Therefore, I am adding these to the database and they should appear on the Advanced Search Page by the end of the week.
Sprout version 21 is currently intended to be the first version from which a PPO database can be generated. Among the changes were
This last was because PPO does not support keyed array fields. Because of this change, we should be able to generate a working PPO database from the NMPDR database definition, and this will give us a template to shoot for in accomplishing the integration.
In the meantime, I am working on bug fixes to the live NMPDR using the mirror version of NMPDR. One big problem was that the Sprout attribute call did not support the full capabilities of the new attribute system. This caused a problem with incorrect literature counts on the subsystem display page. In addition, it was causing CDD codes to appear in the evidence column for the commentary of a pin page.
The final problem has to do with an incompatibility of the diagrams. I hope to resolve this tomorrow and will then copy the mirror to the live site over the weekend.
It would be tricky, but possible, to reload the version 20 database to get the latest information. This would fix the problem with outdated abbreviations and stuff in the subsystems. It would also give us an opportunity to add serotypes to the names of the core genomes and possibly get any core genome updates out of the pipeline. At the current time, however, I am assuming this would not be done. Please correct me if I'm wrong.
The subsystem diagrams in the Sprout now use the new diagram technology. There is, unfortunately, a slight glitch due to the fact that the roles stored in the version 20 Sprout do not necessarily match the ones in the SEED. The symptom is that some of the role tooltips don't work. This has been fixed on the Mirror Site. If you go to this subsystem page, the first diagram (De Novo &c) is an old-style diagram, and the second (Arginine &c) is a new-style diagram. Both diagrams will invoke the proper CGI script so that they display correctly.
This is the last of the problems reported by Olga in her EMAIL of a few days ago. The fixes will most likely be transferred to the live site some time on Sunday.
I had to reload several table groups in Sprout version 21 due to misunderstandings on my part as to how some relationships work. In particular, some RNA roles are still in the aliases, so aliases are a many-to-many relationship rather than one-to-many.
It is also the case that some CAS IDs and compound names belong to multiple compounds. These relationships are now many-to-many.
There is an EC number with a tab inside it: 2.6.1.2. I don't know which subsystem it's in, but the role name is putative alanine transaminase (glutamyc pyruvic transaminase). I've added code in the Sprout loader to fix this, but it's probably better to fix the actual subsystem spreadsheet if someone can find it.
There is also a feature somewhere that has a right curly brace in its alias list. Because of the way the aliases are generated, the right curly brace does not have any features attached to it in Sprout, so it's not going to affect anything; however, the down side is that I have no idea which feature has the bogus alias.
It will be several days before version 21 is ready for testing. Because version 21 contains a major database change, I am currently targeting changes to version 20. The plan is to slipstream in a version 20a in the next few days. The development copy of version 20a is on the mirror site. The major change for 20a is the incorporation of the scan-for-matches search into the BLAST search page. There are also some bug reports from Leslie and Claudia that I am still researching.
The Mirror Site is currently all messed up, but the plan is to have it ready for hammering in a few hours. (Ready for hammering means the bug fixes are in but the new search is not yet done.) I will do another blog post when it's hammer time, at which time we need to find any bugs so I can fix them before we post to the live site.
The new version 20 is now available for testing at The Mirror Site. The missing ingredient at this time is the new, improved Blast/Pattern search. Everything else, however, should be investigated to make sure we're ready for the upcoming conference-like thing.
The Development Site is going to be completely unusable for the next few days. First and foremost, I am reloading the subsystems table to fix the Role abbreviation problem. Previously, abbreviations were properties of the Role: each Role used the same abbreviation in all subsystems. This has been changed so that the Role can have a different abbreviation in each subsystem, which more closely matches the SEED data structures.
There are also numerous search problems which result from the search code being in an incomplete state at the time I started the load. I will get back to the search fixes as soon as I finish doing the laundry.
I passed out before taking my medications last night. I've been told it's Thursday, but I'm not yet sure which Thursday that is.
In the meantime, the latest attribute report is no available here. For NMPDR core organisms, we have 92% coverage on CELLO data and 50% coverage on TMPRED, which is much better than we had in the previous report.
Mark and I spent the afternoon tuning the pattern scan and the compare regions on the development NMPDR. The entry point for the pattern scan / blast search is here. From the results page, you can get to a standard protein page using the NMPDR button and Mark's context display using the Context button. We were consistently able to get the response time under 20 seconds for both DNA and protein searches.
I am currently working on the code that allows the user to decide which ID (uniprot, locus tag, etc.) should be displayed in the results. Once that's done, I will fold it into the display code for the various feature searches as part of getting them fixed and adding rollover hints to the search forms. (This is all based on a discussion with Folker and Liz last week.) In the meantime, I am running an attribute report to find out which attributes can be added to the searching. Once that's done, I will need to reload the feature table to get the new attributes, at which point the site will be mostly working except for the help pages and stuff.
Things are slowly coming together for NMPDR version 21. The word search is mostly working, and I am finishing the fixes to the other feature-based search. Once this is done, the remaining tasks are
There was a bug in the pattern search because the formatter was removing spaces from the pattern string. This has been fixed.
I am currently running the difference report. I added coupling and BBH statistics, both of which have slowed the report considerably, which is why it is still going on. Since this report is not run very frequently, I don't consider that a big problem. Once the report is done, the data will appear here.
This weekend I will re-run the attribute report, which will give me the information I need to begin implementing a drug target search (as opposed to a docking result report). I don't wan to start the attribute report until the difference report is finished.
Finally, I am meeting with Ross on Monday to discuss implementing a close-strain comparison tool. This has become a little more urgent because one of the site users requested it.
Version 21 is now considered stable enough for testing. Among the new features that will be available in this release are:
There was also a major upgrade to the underlying code for the search. Previously, the search was heavily biased toward searches that return features. Under the new system, the type of result is decoupled from the type of search, which makes it easier to create searches for new types of objects (such as subsystems and genomes) in the future.
The SEED Viewer has been re-activated, and can be found here. There is also a link to it in the sidebar on the front page of the Development Blog. This is the old viewer, and it is currently being used to insure that the NMPDR can support the functions needed by the SEED Viewer. The idea here is that when the new viewer is ready, we will be in a better position to couple it to the rest of the site. In the meantime, because there is no direct link to it on the website, we can test and modify it without disrupting anything on the real web site.
NMPDR version 21 is officially live. In addition, I have added SOPs to the development wiki for staging an NMPDR for testing, propagating corrections to the staged site, and rolling over a new version.
I will probably not be able to come in on Monday as that is the day the plumbers are coming to pump the water out from the basement and fix the damage.
I have been in the process of setting up anno-2.nmpdr.org as the new NMPDR development server. This is a faster server, and nmpdr-1 is slated for some different sort of thing that I don't fully understand. For a while, things will be a little crazy as I sort out which links point to where.
Two fixes have been slipstreamed into the production NMPDR.
The next step is to start setting up version 22 on the new development server.
Version 22 of the NMPDR has been created on the New Development Server at http://anno-2.nmpdr.org/next/. Note that at this time, no data has been loaded into the databases. The intent is to bring the Sprout database one step closer to looking like a PPO database before we load it.
This page contains an archive of all entries posted to NMPDR Development Blog in the Web Site Status Reports category. They are listed from oldest to newest.
Tips and Tricks is the previous category.
Many more can be found on the main index page or by looking through the archives.