It is now possible to specify a regular expression as a value pattern in get_attributes.
Unlike the Original Attribute System or the New Attribute System, the Even Newer Attribute System does not allow general SQL wildcards. Instead, a sort of generic search is provided: if the last character of a pattern is %, then it will be treated as an SQL wildcard character. So, you can specifiy fig|100226.1.peg.% as an object ID to retrieve attributes for all the PEGs of Streptomyces coelicolor, but you cannot do %.rna.% to retrieve attributes for all the RNAs in the system. There are two reasons for this. First, in order to satisfy the latter query, MySQL will end up reading every single row in the attribute table. Second, the underscore is a wild card character in SQL, and we have them all over the place. Only recognizing a percent sign at the end made things much less messy.
The values are filtered in-memory instead of via SQL, so it is possible to allow fancier capabilities for them. Thus, for attribute values, and only for attribute values, you can specify a regular expression in addition to the single-percent generic pattern.
Some examples may help clarify all this.
$fig->get_attributes("fig|$genomeID.peg%", "PUBMED%")will retrieve all the PUBMED attributes for the PEGs of the given genome. There are three PUBMED attributes that will match the second operand:PUBMED,PUBMED_CURATED_RELEVANT, andPUBMED_CURATED_NOTRELEVANT.$fig->get_attributes(undef, "PUBMED_CURATED%", "/^[^,]+,$id,/")will retrieve all curated PUBMED attributes for a given document number. In a curated PUBMED attribute, the document information consists of multiple comma-separated fields. The second field is the document number, so the PERL pattern is designed to only match if the specified number is betwen the first and second commas for the value field.$fig->get_attributes([$genomeID, "fig|$genomeID%"], undef, "/^http:\/\/\w*\.?nih\.gov/i")will return all attributes related to the given genome that have an associated URL pointing to the NIH web site. Note that we can't use the PERLmoperator: the attribute engine can only recognize a regular expression if it's enclosed in slashes. It does, however, allow modifiers at the end. In this case theioperator is used to make the match case-insensitive. This call also used a list in the first operand to ask for attributes of the genome itself (first operand in the list) as well as all the genome's features (second operand in the list).$fig->get_attributes("/$genomeID/", 'PUBMED')is an attempt to get all PUBMED attributes related to the specified genome, but it will not work because regular expressions are only allowed for values, not for object IDs (or attribute keys for that matter). To get this effect, you must use the list-based approach shown above.
Leave a comment