Main

Attributes Archives

November 3, 2006

Attribute Methodology

Today, Bob and I worked out the architecture for the new attribute server. The server itself will be on the protected anno-3 machine, and the control panel that allows you to define and delete attribute keys will be available only on that machine. Access to attributes (which includes inserting, deleting, and retrieving) will be performed using XML RPC. XML RPC allows a PERL object to appear as a normal object even though it is running on a remote server. Initially, the RPC interface will be providing just the minimal set of methods necessary to implement attributes; but this same scheme with a larger set of methods could allow us to put multiple full-function ERDB databases on remote servers with very little coding effort.

Initially, the new attribute methods will be implemented as replacements for the current methods in FIG.pm. There will be two diminutions of functionality: we can no longer search on URLs, and there is no provision for controlled-vocabulary attributes. Hopefully, this will not be a big problem.

A list of the methods being implemented is given below the fold.

Continue reading "Attribute Methodology" »

November 10, 2006

New Attribute System in Place

The new attribute system is now in place. The four major attribute methods (get_attributes, add_attribute, delete_attribute, and change_attribute) in FIG.pm have been modified so they will process attributes according to how the system is configured.

  1. If FIG_Config.pm contains a value for $attrURL, then FIG.pm will look for the attributes at the specified URL. Currently, a set of live attributes are located at http://anno-3.nmpdr.org/attrib_server/AttribXMLRPC.cgi. (That address is an XML service script, not a real web page.)
  2. If FIG_Config.pm contains a value for $attrDBD, then the FIG.pm will look for the attributes from a MySQL database. Currently, there are MySQL databases for this purpose set up on anno-3 and nmpdr-1.
  3. If neither of those values are found in FIG_Config.pm, the legacy attribute system will be used.

Eventually, we want a local attribute database (method 2) set up on the bioseed servers for use as a scratch database, and public SEEDs would use the attribute server (method 1). There is a timing issue with the conversion, and the messy details are given below the fold.

Continue reading "New Attribute System in Place" »

November 16, 2006

Attribute Server Now Live

The attribute server is now live. In addition, there is a new script-- ExportAttributes-- for writing attributes to flat files.

To use the attribute server, simply put

$attrURL = "http://anno-3.nmpdr.org/attrib_server/AttribXMLRPC.cgi"

in your FIG_Config.pm file. This server should not be used for testing code that adds and deletes attributes. To do that, use

$attrURL = "http://nmpdr-1.nmpdr.org/next/FIG/AttribXMLRPC.cgi"

Attribute keys are now a controlled vocabulary, and you can only attach attributes to genomes and features that are known to the server. If you need to create a new attribute key, you must use the Attribute Maintenance Page. (For the testing server, use this page instead.)

Currently, only Genomes and Features can have attributes, but the possibility exists of adding attributes to subsystems and other objects if needed.

November 28, 2006

New Attribute System

I ran into a problem loading the attributes, so the new attribute system is still offline; however, I have a list of the keys and my attempt at a description of what each one means. They are arranged in a table below the fold.

Please let me know if I am describing any of the attributes incorrectly.

Continue reading "New Attribute System" »

December 1, 2006

The Attribute Situation

Attributes now has its own category on the development blog.

The Even Newer Attribute System is now operating on the SEED instance running on the Development Server. It has been tested to make sure all the little bits work, and will be moved to the attribute server and integrated into the annotator SEED next month. In the meantime, you can see the attribute console at http://web-1.nmpdr.org/next/FIG/Attributes.cgi.

The most important change that still needs to be made is getting the select.cgi collections converted. This is a self-inflicted wound, because I wanted the subsystem attributes to be attached to the actual subsystems instead of a thing called "Subsystem". In order to do that, I need to make changes to select.cgi.

The Even Newer Attribute System differs from the New Attribute System in several ways. There is now a single table of attributes implemented as a relationship between TargetObjects and AttributeKeys. Target objects are identified by ID only, rather than ID and type, which makes the system more like the Old Attribute System.

The TargetObject entity is virtual, which means that there is no data in the TargetObject table. There is, however, another entity called AttributeGroup that allows arbitrary grouping of attribute keys. There is only one level of grouping, but an attribute can belong to many different groups.

The AttrDBRefresh script is used to do batch attribute processing. It has options for backing up attributes to a tab-delimited file, loading attributes from a tab-delimited file, and migrating attributes from an instance of the SEED.

The attribute backup and load files are expected to contain an object ID, an attribute key name, and one or more values in each line. There is also a facility for uploading a single attribute from the web. In this case, the file must still be tab-delimited, but you specify the columns containing the object ID and the attribute value in the upload form.

December 14, 2006

Strangeness is Due Soon

I have finished testing the Even Newer attribute system, and I will be putting it in place on the annotator seed tonight. Things may be a bit slow while this task is in progress.

December 15, 2006

Attribute Server has Cut Over

The new attribute server is now running, and is being used as the attribute repository on the annotator seed.

The biggest difference between the old system and the new one is the fact that you must create an attribute key before you can assign any values to it. New attributes can be defined at http://anno-3.nmpdr.org/attrib_server/Attributes.cgi. To use the attribute server, you need the following line in you FIG_Config.pm file.

$attrURL = "http://anno-3.nmpdr.org/attrib_server/AttribXMLRPC.cgi";

Experience has shown us that for best results you need to reduce the number of calls made to the server. For this reason, in your get_attributes call, you may specify a list reference for either of the first two parameters, and it will return values that match anything in the list.

December 20, 2006

Update to Attribute Methods

It is now possible to specify a regular expression as a value pattern in get_attributes.

Unlike the Original Attribute System or the New Attribute System, the Even Newer Attribute System does not allow general SQL wildcards. Instead, a sort of generic search is provided: if the last character of a pattern is %, then it will be treated as an SQL wildcard character. So, you can specifiy fig|100226.1.peg.% as an object ID to retrieve attributes for all the PEGs of Streptomyces coelicolor, but you cannot do %.rna.% to retrieve attributes for all the RNAs in the system. There are two reasons for this. First, in order to satisfy the latter query, MySQL will end up reading every single row in the attribute table. Second, the underscore is a wild card character in SQL, and we have them all over the place. Only recognizing a percent sign at the end made things much less messy.

The values are filtered in-memory instead of via SQL, so it is possible to allow fancier capabilities for them. Thus, for attribute values, and only for attribute values, you can specify a regular expression in addition to the single-percent generic pattern.

Some examples may help clarify all this.

  • $fig->get_attributes("fig|$genomeID.peg%", "PUBMED%") will retrieve all the PUBMED attributes for the PEGs of the given genome. There are three PUBMED attributes that will match the second operand: PUBMED, PUBMED_CURATED_RELEVANT, and PUBMED_CURATED_NOTRELEVANT.
  • $fig->get_attributes(undef, "PUBMED_CURATED%", "/^[^,]+,$id,/") will retrieve all curated PUBMED attributes for a given document number. In a curated PUBMED attribute, the document information consists of multiple comma-separated fields. The second field is the document number, so the PERL pattern is designed to only match if the specified number is betwen the first and second commas for the value field.
  • $fig->get_attributes([$genomeID, "fig|$genomeID%"], undef, "/^http:\/\/\w*\.?nih\.gov/i") will return all attributes related to the given genome that have an associated URL pointing to the NIH web site. Note that we can't use the PERL m operator: the attribute engine can only recognize a regular expression if it's enclosed in slashes. It does, however, allow modifiers at the end. In this case the i operator is used to make the match case-insensitive. This call also used a list in the first operand to ask for attributes of the genome itself (first operand in the list) as well as all the genome's features (second operand in the list).
  • $fig->get_attributes("/$genomeID/", 'PUBMED') is an attempt to get all PUBMED attributes related to the specified genome, but it will not work because regular expressions are only allowed for values, not for object IDs (or attribute keys for that matter). To get this effect, you must use the list-based approach shown above.

January 4, 2007

Testing in Progress

I am testing a change to the attribute server code, so attributes on the Development Server will be unpredictable for a while. The idea is to permit full-text searching on attribute keys and values. Here is the plan.

  • Update the attribute DBD to support full-text searching. This includes creating a searchable keyword field in the HasValueFor table. The keyword field will contain a cleaned copy of the key name and value.
  • Update CustomAttributes.pm to support the new searchable field.
  • Reload the development server's test attribute database.
  • Test adding and deleting attributes.
  • Test the performance of the ev_code_cron job in the new environment.
  • Convert the FIG.pm text search to use the new attribute-searching method.
  • Add the new attribute-searching method to the XMLRPC support module.
  • Move the changes to the real attribute server.

Attribute Progress Report

Full-text searching has now been implemented on the Development Server. A dry run of the trna search took about 5 seconds to find the values, which is much better than a minute.

The development server, therefore, is once again fully-functional; however, additional testing is needed before this change can be ported to the attribute server itself. In particular, we need to see if the new index is going to significantly slow the ev_code_cron job.

January 27, 2007

Hopefully the Last Major Attribute Rewrite

The new attribute system has been ported to the attribute server and now appears to be running fairly well. A few things have changed.

  • The find_attributes method, which allowed searching for substrings inside keywords and values, is no longer supported.
  • Please note that the hundred-by-hundred attributes have changed. There is now a collection attribute that specifies the names of the collection in which an object participates. So, for example, to get all objects in the Higher Plants collection, you would call $fig->get_attributes(undef, 'collection', 'higher_plants').
  • The erase_attribute_entirely method deletes all values for a keyword but does not erase the notion of the keyword itself. That's a good thing, because if you erase the notion in the new system you can't add values back. To add or remove a keyword, you must use the Attribute Control Panel at http://anno-3.nmpdr.org/attrib_server/Attributes.cgi.
  • There are currently 3.1 million attributes in the database.

February 6, 2007

Attribute Update

I have restored the missing attributes and implemented a logging facility to make it easier to track major changes to the database.

In the background, I am reloading the NMPDR data because it has been lying fallow for so long. My goal when it's done is to get version 18 out the door as quickly as possible, which means major changes will be put off until version 19.

February 22, 2007

Attribute Server Update

Mike's new attributes have been uploaded to the attribute server. The new attributes and their descriptions are given below the fold.

To access these attributes during testing, you need to connect to the main attribute server in your FIG_Config.pm file. The required text is as follows.

$attrURL = "http://anno-3.nmpdr.org/attrib_server/AttribXMLRPC.cgi";

Continue reading "Attribute Server Update" »

April 23, 2007

Attribute Server Security

Somehow when I updated the attribute server last night a bogus .htaccess file got pulled in. This file caused an Internal Server Error each time the attribute server was accessed.

The goal of the file was to prevent unauthorized access to the Attribute Control Center. The problem was that it used a directive which works on the Annotator SEED, but not in the attribute server. so every use of a CGI script on the attribute server failed with a message about an error in the .htaccess file. The tricky part of fixing it was allowing general access to the attribute data without allowing general access to the control center. I accomplished this using a FilesMatch directive, a process which I believe will be causing me nightmares for several weeks.

Anyway, the attribute server is back, and is considerably more secure than it used to be.

April 27, 2007

Important New Attribute Methods

There are now methods in FIGRules.pm for converting BLAST scores to and from the sortable form used in the attribute database. EncodeScore converts a floating-point score to a sortable string. DecodeScore converts the sortable string back to its floating-point value.

The new query_attributes method in FIG.pm allows you to specify an SQL-style filter string when searching for attributes. This capability is necessary for situations where we want to restrict an attribute retrieval to a particular range of values, as is required to get the best docking results for a drug target PDB.

Finally, version 20 of Sprout, which will start loading this weekend, has been converted to use the attribute server instead of its own Property tables. The Property tables will only contain attributes used to speed certain types of searches.

May 6, 2007

Attribute Control Panel Fix for Safari/Mac

The attribute control panel has now been fixed so that it will display properly in the Safari browser.

About Attributes

This page contains an archive of all entries posted to NMPDR Development Blog in the Attributes category. They are listed from oldest to newest.

About Sprout is the previous category.

Drug Targets News is the next category.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 4.01