ERDBTable of Contents

Entity-Relationship Database PackageIntroductionXML Database DescriptionData TypesEntitiesFieldsGlobal TagsIndexesObject and Field NamesRelationshipsPublic MethodsBeginTranCommitTranComputeObjectSentenceCreateIndexCreatePPOCreateSearchIndexCreateTableCreateTablesDeleteDeleteLikeDeleteRowDeleteValueDigestFieldsDigestKeyDisconnectDisplayMetaDataDropRelationDumpMetaDataDumpRelationsEstimateRowSizeExistsFindIndexForEntityGetGetAllGetChoicesGetCountGetDataTypesGetEntityGetEntityFieldHashGetEntityTypesGetEntityValuesGetFieldRelationNameGetFieldTableGetFlatGetListGetSecondaryFieldsGetTableNamesHTMLNoteInsertObjectInsertValueIsEntityLoadTableLoadTablesMatchSqlPatternnewReadMetaXMLRollbackTranSearchShowMetaDataSortNeededSpecialFieldsSplitKeywordsUpdateEntityUpdateFieldValidateFieldNameVerifyFieldsWriteMetaXMLData Mining MethodsFindColumnGetUsefulCrossValuesParseColumnsVirtual Methods_CreatePPOField_CreatePPOIndexCleanKeywordsGetSourceObjectInternal Utility Methods_AddField_AddIndex_CreateRelationshipIndex_DumpRelation_FindRelation_FixName_FixNames_FixupFields_GetLoadStats_GetRelationTable_GetStatementHandle_GetStructure_IsPrimary_LoadMetaData_LoadRelation_ProcessIndexes_RelationMap_ReOrderRelationTable_SetupSQL_ValidateFieldNamesHTML Documentation Utility Methods_CloseTable_ComputeRelationshipHeading_ComputeRelationshipSentence_OpenFieldTable_OpenTable_ShowField_ShowRelationTableEntity-Relationship Database PackageIntroductionThe Entity-Relationship Database Package allows the client to create an easily-configurabl database of Entities connected by Relationships. Each entity is represented by one or mor relations in an underlying SQL database. Each relationship is represented by a singl relation that connects two entities.

Although this package is designed for general use, all examples are derived from th Sprout database, which is the first database implemented using this package.

Each entity has at least one relation, the primary relation, that has the same name a the entity. The primary relation contains a field named id that contains the uniqu identifier of each entity instance. An entity may have additional relations that contai fields which are optional or can occur more than once. For example, the FEATURE entit has a feature-type attribute that occurs exactly once for each feature. This attribut is implemented by a feature_type column in the primary relation Feature. In addition however, a feature may have zero or more aliases. These are implemented using a FeatureAliasid) and the alias name (alias) The FEATURE entity also contains an optional virulence number. This is implemente as a separate relation FeatureVirulence which contains an ID (id) and a virulence numbe (virulence). If the virulence of a feature ABC is known to be 6, there will be one row i the FeatureVirulence relation possessing the value ABC as its ID and 6 as its virulenc number. If the virulence of ABC is not known, there will not be any rows for it i FeatureVirulence.

Entities are connected by binary relationships implemented using single relations possessing th same name as the relationship itself and that has an arity of 1-to-1 (11), 1-to-many (1M) or many-to-many (MM). Each relationship's relation contains a from-link field that contains th ID of the source entity and a to-link field that contains the ID of the target entity. The nam of the relationship is generally a verb phrase with the source entity as the subject and th target entity as the object. So, for example, the ComesFrom relationship connects the GENOMESOURCE entities, and indicates that a particular source organization participated in th mapping of the genome. A source organization frequently participates in the mappin of many genomes, and many source organizations can cooperate in the mapping of a single genome, s this relationship has an arity of many-to-many (MM). The relation that implements the ComesFromComesFrom and contains two fields-- from-link, which contains a genome ID and to-link, which contains a source ID.

A relationship may itself have attributes. These attributes, known as intersection data attributes are implemented as additional fields in the relationship's relation. So, for example, th IsMadeUpOf relationship connects the Contig entity to the Sequence entity, and is use to determine which sequences make up a contig. The relationship has as an attribute th start-position, which indicates where in the contig that the sequence begins. This attribut is implemented as the start_position field in the IsMadeUpOf relation.

The database itself is described by an XML file. In addition to all the data required to defin the entities, relationships, and attributes, the schema provides space for notes describin the data and what it means. These notes are used by ShowMetaData to generate documentatio for the database.

Special support is provided for text searching. An entity field can be marked as <em>searchable</em> in which case it will be used to generate a text search index in which the user searches for word in the field instead of a particular field value.

Finally, every entity and relationship object has a flag indicating if it is new or old. The objec is considered old if it was loaded by the LoadTables method. It is considered new if i was inserted by the InsertObject method.

XML Database DescriptionData TypesThe ERDB system supports the following data types. Note that there are numerous strin types depending on the maximum length. Some database packages limit the total number o characters you have in an index key; to insure the database works in all environments the type of string should be the shortest one possible that supports all the known values.

charsingle ASCII character

int32-bit signed integer

counter32-bit unsigned integer

date64-bit unsigned integer, representing a PERL date/time value

textlong string; Text fields cannot be used in indexes or sorting and do not support th normal syntax of filter clauses, but can be up to a billion character in length

floatdouble-precision floating-point number

booleansingle-bit numeric value; The value is stored as a 16-bit signed integer (fo compatability with certain database packages), but the only values supported ar 0 and 1.

id-stringvariable-length string, maximum 25 characters

key-stringvariable-length string, maximum 40 characters

name-stringvariable-length string, maximum 80 characters

medium-stringvariable-length string, maximum 160 characters

stringvariable-length string, maximum 255 characters

hash-stringvariable-length string, maximum 22 characters

The hash-string data type has a special meaning. The actual key passed into the loader wil be a string, but it will be digested into a 22-character MD5 code to save space. Although th MD5 algorithm is not perfect, it is extremely unlikely two strings will have the sam digest. Therefore, it is presumed the keys will be unique. When the database is actuall in use, the hashed keys will be presented rather than the original values. For this reason they should not be used for entities where the key is meaningful.

Global TagsThe entire database definition must be inside a Database tag. The display name o the database is given by the text associated with the Title tag. The display nam is only used in the automated documentation. It has no other effect. The entities an relationships are listed inside the Entities and Relationships tags respectively. None of these tags have attributes.

Entities, relationships, indexes, and fields all allow a text tag called Notes The text inside the Notes tag contains comments that will appear when the databas documentation is generated. Within a Notes tag, you may use [i] and [/i] fo italics, [b] and [/b] for bold, and [p] for a new paragraph.

FieldsBoth entities and relationships have fields described by Field tags. A FieldNotes associated with it. The complete set of Field tags for a object mus be inside Fields tags.

The attributes for the Field tag are as follows.

nameName of the field. The field name should contain only letters, digits, and hyphens (-) and the first character should be a letter. Most underlying databases are case-insensitiv with the respect to field names, so a best practice is to use lower-case letters only. Finally the name search-relevance has special meaning for full-text searches and should not b used as a field name.

typeData type of the field. The legal data types are given above.

relationName of the relation containing the field. This should only be specified for entit fields. The ERDB system does not support optional fields or multi-occurring field in the primary relation of an entity. Instead, they are put into secondary relations So, for example, in the Genome entity, the group-name field indicates a specia grouping used to select a subset of the genomes. A given genome may not be in an groups or may be in multiple groups. Therefore, group-name specifies a relatio value. The relation name specified must be a valid table name. By convention, it i usually the entity name followed by a qualifying word (e.g. GenomeGroup). In a entity, the fields without a relation attribute are said to belong to th primary relation. This relation has the same name as the entity itself.

searchableIf specified, then the field is a candidate for full-text searching. A single full-tex index will be created for each relation with at least one searchable field in it For best results, this option should only be used for string or text fields.

specialThis attribute allows the subclass to assign special meaning for certain fields The interpretation is up to the subclass itself. Currently, only entity field can have this attribute.

IndexesAn entity can have multiple alternate indexes associated with it. The fields in a index must all be from the same relation. The alternate indexes assist in searchin on fields other than the entity ID. A relationship has at least two indexes-- a to-index and from-index that order the results when crossing the relationship. Fo example, in the relationship HasContig from Genome to Contig, th from-index would order the contigs of a ganome, and the to-index would orde the genomes of a contig. In addition, it can have zero or more alternat indexes. A relationship's index must specify only fields i the relationship.

The alternate indexes for an entity or relationship must be listed inside the Indexes tag The from-index of a relationship is specified using the FromIndex tag; the to-index i specified using the ToIndex tag.

Each index can contain a Notes tag. In addition, it will have an IndexFieldsIndexField tags. These specify, in order, the fields used i the index. The attributes of an IndexField tag are as follows.

nameName of the field.

orderSort order of the field-- ascending or descending.

The FromIndex, and ToIndex tags have no attributes. The Index tag ca have a Unique attribute. If specified, the index will be generated as a uniqu index.

Object and Field NamesBy convention entity and relationship names use capital casing (e.g. Genome o HasRegionsIn. Most underlying databases, however, are aggressively case-insensitiv with respect to relation names, converting them internally to all-upper case o all-lower case.

If syntax or parsing errors occur when you try to load or use an ERDB database, th most likely reason is that one of your objects has an SQL reserved word as its name The list of SQL reserved words keeps increasing; however, most are unlikely to sho up as a noun or declarative verb phrase. The exceptions are Group, User Table, Index, Object, Date, Number, Update, Time, Percent Memo, Order, and Sum. This problem can crop up in field names as well.

Every entity has a field called id that acts as its primary key. Every relationshi has fields called from-link and to-link that contain copies of the relevan entity IDs. These are essentially ERDB's reserved words, and should not be use for user-defined field names.

EntitiesAn entity is described by the Entity tag. The entity can contain Notes, a Indexes tag containing one or more secondary indexes, and a Fields ta containing one or more fields. The attributes of the Entity tag are as follows.

nameName of the entity. The entity name, by convention, uses capital casing (e.g. GenomeGroupBlock) and should be a noun or noun phrase.

keyTypeData type of the primary key. The primary key is always named id.

RelationshipsA relationship is described by the Relationship tag. Within a relationship there can be a Notes tag, a Fields tag containing the intersection dat fields, a FromIndex tag containing the from-index, a ToIndex tag containin the to-index, and an Indexes tag containing the alternate indexes.

The Relationship tag has the following attributes.

nameName of the relationship. The relationship name, by convention, uses capital casin (e.g. ContainsRegionIn or HasContig), and should be a declarative ver phrase, designed to fit between the from-entity and the to-entity (e.g Block ContainsRegionIn Genome).

fromName of the entity from which the relationship starts.

toName of the entity to which the relationship proceeds.

arityRelationship type: 1M for one-to-many and MM for many-to-many.

Public Methodsnewmy $database = ERDB->new($dbh, $metaFileName);

Create a new ERDB object.

dbhDBKernel database object for the target database.

metaFileNameName of the XML file containing the metadata.

ShowMetaData$erdb->ShowMetaData($fileName);

This method outputs a description of the database. This description can be used to help users creat the data to be loaded into the relations.

filenameThe name of the output file.

DisplayMetaDatamy $html = $erdb->DisplayMetaData();

Return an HTML description of the database. This description can be used to help users creat the data to be loaded into the relations and form queries. The output is raw includable HTM without any HEAD or BODY tags.

filenameThe name of the output file.

DumpMetaData$erdb->DumpMetaData();

Return a dump of the metadata structure.

CreatePPOERDB::CreatePPO($erdbXMLFile, $ppoXMLFile);

Create a PPO XML file from an ERDB data definition XML file. At th current time, the PPO XML file can be used to create a database wit similar functionality. Eventually, the PPO will be able to use th created XML to access the live ERDB database.

erdbXMLFileName of the XML data definition file for the ERDB database. Thi file must exist.

ppoXMLFileOutput file for the PPO XML definition. If this file exists, i will be overwritten.

FindIndexForEntitymy $indexFound = ERDB::FindIndexForEntity($xml, $entityName, $attributeName);

This method locates the entry in an entity's index list that begins with th specified attribute name. If the entity has no index list, one will b created. This method works on raw XML, not a live ERDB object.

xmlThe raw XML structure defining the database.

entityNameThe name of the relevant entity.

attributeNameThe name of the attribute relevant to the search.

RETURNThe numerical index in the index list of the index entry for the specified entity an attribute, or undef if no such index exists.

CreateTables$erdb->CreateTables();

This method creates the tables for the database from the metadata structure loaded by th constructor. It is expected this function will only be used on rare occasions, when th user needs to start with an empty database. Otherwise, the LoadTables method can b used by itself with the truncate flag turned on.

CreateTable$erdb->CreateTable($tableName, $indexFlag, $estimatedRows);

Create the table for a relation and optionally create its indexes.

relationNameName of the relation (which will also be the table name).

indexFlagTRUE if the indexes for the relation should be created, else FALSE. If FALSE CreateIndexes must be called later to bring the indexes into existence.

estimatedRows (optional)If specified, the estimated maximum number of rows for the relation. Thi information allows the creation of tables using storage engines that ar faster but require size estimates, such as MyISAM.

VerifyFieldsmy $count = $erdb->VerifyFields($relName, \@fieldList);

Run through the list of proposed field values, insuring that all the character fields ar below the maximum length. If any fields are too long, they will be truncated in place.

relNameName of the relation for which the specified fields are destined.

fieldListReference to a list, in order, of the fields to be put into the relation.

RETURNReturns the number of fields truncated.

DigestFields$erdb->DigestFields($relName, $fieldList);

Digest the strings in the field list that correspond to data type hash-string in th specified relation.

relNameName of the relation to which the fields belong.

fieldListList of field contents to be loaded into the relation.

DigestKeymy $digested = $erdb->DigestKey($keyValue);

Return the digested value of a symbolic key. The digested value can then be plugged into key-based search into a table with key-type hash-string.

Currently the digesting process is independent of the database structure, but that may no always be the case, so this is an instance method instead of a static method.

keyValueKey value to digest.

RETURNDigested value of the key.

CreateIndex$erdb->CreateIndex($relationName);

Create the indexes for a relation. If a table is being loaded from a large source file (a is the case in LoadTable), it is sometimes best to create the indexes after the load If that is the case, then CreateTable should be called with the index flag set t FALSE, and this method used after the load to create the indexes for the table.

GetSecondaryFieldsmy %fieldTuples = $erdb->GetSecondaryFields($entityName);

This method will return a list of the name and type of each of the secondar fields for a specified entity. Secondary fields are stored in two-column table in addition to the primary entity table. This enables the field to have no valu or to have multiple values.

entityNameName of the entity whose secondary fields are desired.

RETURNReturns a hash mapping the field names to their field types.

GetFieldRelationNamemy $name = $erdb->GetFieldRelationName($objectName, $fieldName);

Return the name of the relation containing a specified field.

objectNameName of the entity or relationship containing the field.

fieldNameName of the relevant field in that entity or relationship.

RETURNReturns the name of the database relation containing the field, or undef i the field does not exist.

DeleteValuemy $numDeleted = $erdb->DeleteValue($entityName, $id, $fieldName, $fieldValue);

Delete secondary field values from the database. This method can be used to delete al values of a specified field for a particular entity instance, or only a single value.

Secondary fields are stored in two-column relations separate from an entity's primar table, and as a result a secondary field can legitimately have no value or multipl values. Therefore, it makes sense to talk about deleting secondary fields where i would not make sense for primary fields.

entityNameName of the entity from which the fields are to be deleted.

idID of the entity instance to be processed. If the instance is not found, thi method will have no effect. If undef is specified, all values for all o the entity instances will be deleted.

fieldNameName of the field whose values are to be deleted.

fieldValue (optional)Value to be deleted. If not specified, then all values of the specified fiel will be deleted for the entity instance. If specified, then only the values whic match this parameter will be deleted.

RETURNReturns the number of rows deleted.

LoadTablesmy $stats = $erdb->LoadTables($directoryName, $rebuild);

This method will load the database tables from a directory. The tables must already have been create in the database. (This can be done by calling CreateTables.) The caller passes in a directory name all of the relations to be loaded must have a file in the directory with the same name as the relatio (optionally with a suffix of .dtx). Each file must be a tab-delimited table of field values. Eac line of the file will be loaded as a row of the target relation table. The field values should be i the same order as the fields in the relation tables generated by ShowMetaData. The old data i erased before the new data is loaded in.

A certain amount of translation automatically takes place. Ctrl-M characters are deleted, an tab and new-line characters inside a field are escaped as \t and \n, respectively. Dates mus be entered as a Unix timestamp, that is, as an integer number of seconds since the base epoch.

directoryNameName of the directory containing the relation files to be loaded.

rebuildTRUE if the tables should be dropped and rebuilt, else FALSE. This is, unfortunately, th only way to erase existing data in the tables, since the TRUNCATE command is not supporte by all of the DB engines we use.

RETURNReturns a statistical object describing the number of records read and a list of the error messages.

GetTableNamesmy @names = $erdb->GetTableNames;

Return a list of the relations required to implement this database.

GetEntityTypesmy @names = $erdb->GetEntityTypes;

Return a list of the entity type names.

GetDataTypesmy %types = ERDB::GetDataTypes();

Return a table of ERDB data types. The table returned is a hash of hashes The keys of the big hash are the datatypes. Each smaller hash has severa values used to manage the data. The most interesting is the SQL type (ke sqlType) and the descriptive node (key notes).

Note that changing the values in the smaller hashes will seriously brea things, so this data should be treated as read-only.

IsEntitymy $flag = $erdb->IsEntity($entityName);

Return TRUE if the parameter is an entity name, else FALSE.

entityNameObject name to be tested.

RETURNReturns TRUE if the specified string is an entity name, else FALSE.

Getmy $query = $erdb->Get(\@objectNames, $filterClause, \@params);

This method returns a query object for entities of a specified type using a specified filter The filter is a standard WHERE/ORDER BY clause with question marks as parameter markers and eac field name represented in the form objectName(fieldName). For example, th following call requests all Genome objects for the genus specified in the variabl $genus.

$query = $erdb->Get(['Genome'], "Genome(genus) = ?", [$genus]);

The WHERE clause contains a single question mark, so there is a single additiona parameter representing the parameter value. It would also be possible to code

$query = $erdb->Get(['Genome'], "Genome(genus) = \'$genus\'");

however, this version of the call would generate a syntax error if there were any quot characters inside the variable $genus.

The use of the strange parenthesized notation for field names enables us to distinguis hyphens contained within field names from minus signs that participate in the computatio of the WHERE clause. All of the methods that manipulate fields will use this same notation.

It is possible to specify multiple entity and relationship names in order to retrieve more tha one object's data at the same time, which allows highly complex joined queries. For example,

$query = $erdb->Get(['Genome', 'ComesFrom', 'Source'], "Genome(genus) = ?", [$genus]);

If multiple names are specified, then the query processor will automatically determine join path between the entities and relationships. The algorithm used is very simplistic In particular, if a relationship is recursive, the path is determined by the order in whic the entity and the relationship appear. For example, consider a recursive relationshi IsParentOf which relates People objects to other People objects. If the join path i coded as ['People', 'IsParentOf'], then the people returned will be parents. If, however the join path is ['IsParentOf', 'People'], then the people returned will be children.

If an entity or relationship is mentioned twice, the name for the second occurrence wil be suffixed with 2, the third occurrence will be suffixed with 3, and so forth. So for example, if we have ['Feature', 'HasContig', 'Contig', 'HasContig'], then th to-link field of the first HasContig is specified as HasContig(to-link), whil the to-link field of the second HasContig is specified as HasContig2(to-link).

objectNamesList containing the names of the entity and relationship objects to be retrieved.

filterClauseWHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause ca be parameterized with parameter markers (?). Each field used in the WHERE clause must b specified in the standard form objectName(fieldName). Any parameters specifie in the filter clause should be added to the parameter list as additional parameters. Th fields in a filter clause can come from primary entity relations, relationship relations or secondary entity relations; however, all of the entities and relationships involved mus be included in the list of object names.

The filter clause can also specify a sort order. To do this, simply follow the filter strin with an ORDER BY clause. For example, the following filter string gets all genomes for particular genus and sorts them by species name.

"Genome(genus) = ? ORDER BY Genome(species)"

Note that the case is important. Only an uppercase "ORDER BY" with a single space wil be processed. The idea is to make it less likely to find the verb by accident.

The rules for field references in a sort order are the same as those for field references in th filter clause in general; however, odd things may happen if a sort field is from a secondar relation.

Finally, you can limit the number of rows returned by adding a LIMIT clause. The LIMIT mus be the last thing in the filter clause, and it contains only the word "LIMIT" followed b a positive number. So, for example

"Genome(genus) = ? ORDER BY Genome(species) LIMIT 10"

will only return the first ten genomes for the specified genus. The ORDER BY clause is no required. For example, to just get the first 10 genomes in the Genome table, you coul use

"LIMIT 10"

paramsReference to a list of parameter values to be substituted into the filter clause.

RETURNReturns a DBQuery that can be used to iterate through all of the results.

Searchmy $query = $erdb->Search($searchExpression, $idx, \@objectNames, $filterClause, \@params);

Perform a full text search with filtering. The search will be against a specified objec in the object name list. That object will get an extra field containing the searc relevance. Note that except for the search expression, the parameters of this method ar the same as those for Get and follow the same rules.

searchExpressionBoolean search expression for the text fields of the target object. The default mode fo a Boolean search expression is OR, but we want the default to be AND, so we wil add a + operator to each word with no other operator before it.

idxIndex in the $objectNames list of the table to be searched in full-text mode.

objectNamesList containing the names of the entity and relationship objects to be retrieved.

filterClauseWHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause ca be parameterized with parameter markers (?). Each field used in the WHERE clause must b specified in the standard form objectName(fieldName). Any parameters specifie in the filter clause should be added to the parameter list as additional parameters. Th fields in a filter clause can come from primary entity relations, relationship relations or secondary entity relations; however, all of the entities and relationships involved mus be included in the list of object names.

paramsReference to a list of parameter values to be substituted into the filter clause.

RETURNReturns a query object for the specified search.

GetFlatmy @list = $erdb->GetFlat(\@objectNames, $filterClause, \@parameterList, $field);

This is a variation of GetAll that asks for only a single field per record an returns a single flattened list.

objectNamesList containing the names of the entity and relationship objects to be retrieved.

filterClauseWHERE/ORDER BY clause (without the WHERE) to be used to filter and sort the query. The WHERE clause ca be parameterized with parameter markers (?). Each field used must be specified in the standard for objectName(fieldName). Any parameters specified in the filter clause should be added to th parameter list as additional parameters. The fields in a filter clause can come from primar entity relations, relationship relations, or secondary entity relations; however, all of th entities and relationships involved must be included in the list of object names.

parameterListList of the parameters to be substituted in for the parameters marks in the filter clause.

fieldName of the field to be used to get the elements of the list returned.

RETURNReturns a list of values.

SpecialFieldsmy %specials = $erdb->SpecialFields($entityName);

Return a hash mapping special fields in the specified entity to the value of thei special attribute. This enables the subclass to get access to the special fiel attributes without needed to plumb the internal ERDB data structures.

entityNameName of the entity whose special fields are desired.

RETURNReturns a hash. The keys of the hash are the special field names, and the value are the values from each special field's special attribute.

Deletemy $stats = $erdb->Delete($entityName, $objectID, %options);

Delete an entity instance from the database. The instance is deleted along with all entity an relationship instances dependent on it. The definition of dependence is recursive.

An object is always dependent on itself. An object is dependent if it is a 1-to-many or many-to-man relationship connected to a dependent entity or if it is the "to" entity connected to a 1-to-man dependent relationship.

entityNameName of the entity type for the instance being deleted.

objectIDID of the entity instance to be deleted. If the ID contains a wild card character (%) then it is presumed to by a LIKE pattern.

optionsA hash detailing the options for this delete operation.

RETURNReturns a statistics object indicating how many records of each particular table wer deleted.

The permissible options for this method are as follows.

testModeIf TRUE, then the delete statements will be traced, but no changes will be made to the database.

keepRootIf TRUE, then the entity instances will not be deleted, only the dependent records.

Disconnect$erdb->Disconnect($relationshipName, $originEntityName, $originEntityID);

Disconnect an entity instance from all the objects to which it is related. Thi will delete each relationship instance that connects to the specified entity.

relationshipNameName of the relationship whose instances are to be deleted.

originEntityNameName of the entity that is to be disconnected.

originEntityIDID of the entity that is to be disconnected.

DeleteRow$erdb->DeleteRow($relationshipName, $fromLink, $toLink, \%values);

Delete a row from a relationship. In most cases, only the from-link and to-link ar needed; however, for relationships with intersection data values can be specifie for the other fields using a hash.

relationshipNameName of the relationship from which the row is to be deleted.

fromLinkID of the entity instance in the From direction.

toLinkID of the entity instance in the To direction.

valuesReference to a hash of other values to be used for filtering the delete.

DeleteLikemy $deleteCount = $erdb->DeleteLike($relName, $filter, \@parms);

Delete all the relationship rows that satisfy a particular filter condition. Unlike a norma filter, only fields from the relationship itself can be used.

relNameName of the relationship whose records are to be deleted.

filterA filter clause (Get-style) for the delete query.

parmsReference to a list of parameters for the filter clause.

RETURNReturns a count of the number of rows deleted.

SortNeededmy $parms = $erdb->SortNeeded($relationName);

Return the pipe command for the sort that should be applied to the specifie relation when creating the load file.

For example, if the load file should be sorted ascending by the firs field, this method would return

If the first field is numeric, the method would return

Unfortunately, due to a bug in the sort command, we cannot eliminate duplicat keys using a sort.

relationNameName of the relation to be examined.

Returns the sort command to use for sorting the relation, suitable for piping.

GetListmy @dbObjects = $erdb->GetList(\@objectNames, $filterClause, \@params);

Return a list of object descriptors for the specified objects as determined by th specified filter clause.

This method is essentially the same as Get except it returns a list of objects rathe than a query object that can be used to get the results one record at a time.

objectNamesList containing the names of the entity and relationship objects to be retrieved.

filterClauseWHERE clause (without the WHERE) to be used to filter and sort the query. The WHERE clause ca be parameterized with parameter markers (?). Each field used in the WHERE clause must b specified in the standard form objectName(fieldName). Any parameters specifie in the filter clause should be added to the parameter list as additional parameters. Th fields in a filter clause can come from primary entity relations, relationship relations or secondary entity relations; however, all of the entities and relationships involved mus be included in the list of object names.

The filter clause can also specify a sort order. To do this, simply follow the filter strin with an ORDER BY clause. For example, the following filter string gets all genomes for particular genus and sorts them by species name.

"Genome(genus) = ? ORDER BY Genome(species)"

The rules for field references in a sort order are the same as those for field references in th filter clause in general; however, odd things may happen if a sort field is from a secondar relation.

paramsReference to a list of parameter values to be substituted into the filter clause.

RETURNReturns a list of ERDBObjects that satisfy the query conditions.

GetCountmy $count = $erdb->GetCount(\@objectNames, $filter, \@params);

Return the number of rows found by a specified query. This method woul normally be used to count the records in a single table. For example, in genetics database

my $count = $erdb->GetCount(['Genome'], 'Genome(genus-species) LIKE ?', ['homo %']);would return the number of genomes for the genus homo. It is conceivable, however to use it to return records based on a join. For example,

my $count = $erdb->GetCount(['HasFeature', 'Genome'], 'Genome(genus-species) LIKE ?',['homo %']);would return the number of features for genomes in the genus homo. Note tha only the rows from the first table are counted. If the above command were

my $count = $erdb->GetCount(['Genome', 'Feature'], 'Genome(genus-species) LIKE ?',['homo %']);it would return the number of genomes, not the number of genome/feature pairs.

objectNamesReference to a list of the objects (entities and relationships) included in th query.

filterA filter clause for restricting the query. The rules are the same as for the GetparamsReference to a list of the parameter values to be substituted for the parameter mark in the filter.

RETURNReturns a count of the number of records in the first table that would satisf the query.

ComputeObjectSentencemy $sentence = $erdb->ComputeObjectSentence($objectName);

Check an object name, and if it is a relationship convert it to a relationship sentence.

objectNameName of the entity or relationship.

RETURNReturns a string containing the entity name or a relationship sentence.

DumpRelations$erdb->DumpRelations($outputDirectory);

Write the contents of all the relations to tab-delimited files in the specified directory Each file will have the same name as the relation dumped, with an extension of DTX.

outputDirectoryName of the directory into which the relation files should be dumped.

InsertValue$erdb->InsertValue($entityID, $fieldName, $value);

This method will insert a new value into the database. The value must be on associated with a secondary relation, since primary values cannot be inserted they occur exactly once. Secondary values, on the other hand, can be missin or multiply-occurring.

entityIDID of the object that is to receive the new value.

fieldNameField name for the new value-- this includes the entity name, sinc field names are of the format objectName(fieldName).

valueNew value to be put in the field.

InsertObject$erdb->InsertObject($objectType, \%fieldHash);

Insert an object into the database. The object is defined by a type name and then a has of field names to values. Field values in the primary relation are represented by scalars (Note that for relationships, the primary relation is the only relation. Field values for the other relations comprising the entity are always list references. Fo example, the following line inserts an inactive PEG feature named fig|188.1.peg.1 with aliase ZP_00210270.1 and gi|46206278.

$erdb->InsertObject('Feature', { id => 'fig|188.1.peg.1', active => 0, feature-type => 'peg', alias => ['ZP_00210270.1', 'gi|46206278']});

The next statement inserts a HasProperty relationship between feature fig|158879.1.peg.1 an property 4 with an evidence URL of http://seedu.uchicago.edu/query.cgi?article_id=142.

$erdb->InsertObject('HasProperty', { 'from-link' => 'fig|158879.1.peg.1', 'to-link' => 4, evidence => 'http://seedu.uchicago.edu/query.cgi?article_id=142'});

newObjectTypeType name of the object to insert.

fieldHashHash of field names to values.

UpdateEntity$erdb->UpdateEntity($entityName, $id, \%fields);

Update the values of an entity. This is an unprotected update, so it should only b done if the database resides on a database server.

entityNameName of the entity to update. (This is the entity type.)

idID of the entity to update. If no entity exists with this ID, an error will be thrown.

fieldsReference to a hash mapping field names to their new values. All of the fields name must be in the entity's primary relation, and they cannot any of them be the ID field.

LoadTablemy $results = $erdb->LoadTable($fileName, $relationName, $truncateFlag);

Load data from a tab-delimited file into a specified table, optionally re-creating the tabl first.

fileNameName of the file from which the table data should be loaded.

relationNameName of the relation to be loaded. This is the same as the table name.

truncateFlagTRUE if the table should be dropped and re-created, else FALSE

RETURNReturns a statistical object containing a list of the error messages.

CreateSearchIndex$erdb->CreateSearchIndex($objectName);

Check for a full-text search index on the specified entity or relationship object, an if one is required, rebuild it.

objectNameName of the entity or relationship to be indexed.

DropRelation$erdb->DropRelation($relationName);

Physically drop a relation from the database.

relationNameName of the relation to drop. If it does not exist, this method will hav no effect.

MatchSqlPatternmy $matched = ERDB::MatchSqlPattern($value, $pattern);

Determine whether or not a specified value matches an SQL pattern. An SQ pattern has two wild card characters: % that matches multiple characters and _ that matches a single character. These can be escaped using backslash (\). We pull this off by converting the SQL pattern to PERL regular expression. As per SQL rules, the match is case-insensitive.

valueValue to be matched against the pattern. Note that an undefined or empt value will not match anything.

patternSQL pattern against which to match the value. An undefined or empty pattern wil match everything.

RETURNReturns TRUE if the value and pattern match, else FALSE.

GetEntitymy $entityObject = $erdb->GetEntity($entityType, $ID);

Return an object describing the entity instance with a specified ID.

entityTypeEntity type name.

IDID of the desired entity.

RETURNReturns a ERDBObject representing the desired entity instance, or an undefined value if n instance is found with the specified key.

GetChoicesmy @values = $erdb->GetChoices($entityName, $fieldName);

Return a list of all the values for the specified field that are represented in th specified entity.

Note that if the field is not indexed, then this will be a very slow operation.

entityNameName of an entity in the database.

fieldNameName of a field belonging to the entity. This is a raw field name withou the standard parenthesized notation used in most calls.

RETURNReturns a list of the distinct values for the specified field in the database.

GetEntityValuesmy @values = $erdb->GetEntityValues($entityType, $ID, \@fields);

Return a list of values from a specified entity instance. If the entity instanc does not exist, an empty list is returned.

entityTypeEntity type name.

IDID of the desired entity.

fieldsList of field names, each of the form objectName(fieldName).

RETURNReturns a flattened list of the values of the specified fields for the specified entity.

GetAllmy @list = $erdb->GetAll(\@objectNames, $filterClause, \@parameters, \@fields, $count);

Return a list of values taken from the objects returned by a query. The first thre parameters correspond to the parameters of the Get method. The final parameter i a list of the fields desired from each record found by the query. The field nam syntax is the standard syntax used for fields in the ERDB system- objectName(fieldName)-- where objectName is the name of the relevant entit or relationship and fieldName is the name of the field.

The list returned will be a list of lists. Each element of the list will contai the values returned for the fields specified in the fourth parameter. If one of th fields specified returns multiple values, they are flattened in with the rest. Fo example, the following call will return a list of the features in a particula spreadsheet cell, and each feature will be represented by a list containing th feature ID followed by all of its essentiality determinations.

@query = $erdb->Get(['ContainsFeature', 'Feature'], "ContainsFeature(from-link) = ?", [$ssCellID], ['Feature(id)', 'Feature(essential)']);

objectNamesList containing the names of the entity and relationship objects to be retrieved.

filterClauseWHERE/ORDER BY clause (without the WHERE) to be used to filter and sort the query. The WHERE clause ca be parameterized with parameter markers (?). Each field used must be specified in the standard for objectName(fieldName). Any parameters specified in the filter clause should be added to th parameter list as additional parameters. The fields in a filter clause can come from primar entity relations, relationship relations, or secondary entity relations; however, all of th entities and relationships involved must be included in the list of object names.

parameterListList of the parameters to be substituted in for the parameters marks in the filter clause.

fieldsList of the fields to be returned in each element of the list returned.

countMaximum number of records to return. If omitted or 0, all available records will be returned.

RETURNReturns a list of list references. Each element of the return list contains the values for th fields specified in the fields parameter.

Existsmy $found = $sprout->Exists($entityName, $entityID);

Return TRUE if an entity exists, else FALSE.

entityNameName of the entity type (e.g. Feature) relevant to the existence check.

entityIDID of the entity instance whose existence is to be checked.

RETURNReturns TRUE if the entity instance exists, else FALSE.

EstimateRowSizemy $rowSize = $erdb->EstimateRowSize($relName);

Estimate the row size of the specified relation. The estimated row size is computed by addin up the average length for each data type.

relNameName of the relation whose estimated row size is desired.

RETURNReturns an estimate of the row size for the specified relation.

GetFieldTablemy $fieldHash = $self->GetFieldTable($objectnName);

Get the field structure for a specified entity or relationship.

objectNameName of the desired entity or relationship.

RETURNThe table containing the field descriptors for the specified object.

SplitKeywordsmy @keywords = ERDB::SplitKeywords($keywordString);

This method returns a list of the positive keywords in the specifie keyword string. All of the operators will have been stripped off and if the keyword is preceded by a minus operator (-), it wil not be in the list returned. The idea here is to get a list of th keywords the user wants to see. The list will be processed to remov duplicates.

It is possible to create a string that confuses this method. For example

would return both frog and toad. If this is a problem we can dea with it later.

keywordStringThe keyword string to be parsed.

RETURNReturns a list of the words in the keyword string the user wants t see.

ValidateFieldNamemy $okFlag = ERDB::ValidateFieldName($fieldName);

Return TRUE if the specified field name is valid, else FALSE. Valid field names mus be hyphenated words subject to certain restrictions.

fieldNameField name to be validated.

RETURNReturns TRUE if the field name is valid, else FALSE.

ReadMetaXMLmy $rawMetaData = ERDB::ReadDBD($fileName);

This method reads a raw database definition XML file and returns it Normally, the metadata used by the ERDB system has been processed an modified to make it easier to load and retrieve the data; however this method can be used to get the data in its raw form.

fileNameName of the XML file to read.

RETURNReturns a hash reference containing the raw XML data from the specified file.

GetEntityFieldHashmy $fieldHashRef = ERDB::GetEntityFieldHash($structure, $entityName);

Get the field hash of the named entity in the specified raw XML structure The field hash may not exist, in which case we need to create it.

structureRaw XML structure defininng the database. This is not the run-time XML used b an ERDB object, since that has all sorts of optimizations built-in.

entityNameName of the entity whose field structure is desired.

RETURNReturns the field hash used to define the entity's fields.

WriteMetaXMLERDB::WriteMetaXML($structure, $fileName);

Write the metadata XML to a file. This method is the reverse of ReadMetaXML, and i used to update the database definition. It must be used with care, however, since i will only work on a raw structure, not on the processed structure created by an ERD constructor.

structureXML structure to be written to the file.

fileNameName of the output file to which the updated XML should be stored.

HTMLNoteConvert a note or comment to HTML by replacing some bulletin-board codes with HTML. The code supported are [b] for bold, [i] for italics, and [p] for a new paragraph Except for [p], all the codes are closed by slash-codes. So, fo example, [b]Feature[/b] displays the string Feature in boldface.

my $realHtml = ERDB::HTMLNote($dataString);

dataStringString to convert to HTML.

RETURNAn HTML string derived from the input string.

BeginTran$erdb->BeginTran();

Start a database transaction.

CommitTran$erdb->CommitTran();

Commit an active database transaction.

RollbackTran$erdb->RollbackTran();

Roll back an active database transaction.

UpdateFieldmy $count = $erdb->UpdateField($objectNames, $fieldName, $oldValue, $newValue, $filter, $parms);

Update all occurrences of a specific field value to a new value. The number of rows changed will b returned.

fieldNameName of the field in standard objectName(fieldName) format.

oldValueValue to be modified. All occurrences of this value in the named field will be replaced by th new value.

newValueNew value to be substituted for the old value when it's found.

filterA standard ERDB filter clause (see Get). The filter will be applied before any substitutions take place.

parmsReference to a list of parameter values in the filter.

RETURNReturns the number of rows modified.

Data Mining MethodsGetUsefulCrossValuesmy @attrNames = $sprout->GetUsefulCrossValues($sourceEntity, $relationship);

Return a list of the useful attributes that would be returned by a Cross cal from an entity of the source entity type through the specified relationship. Thi means it will return the fields of the target entity type and the intersection dat fields in the relationship. Only primary table fields are returned. In other words the field names returned will be for fields where there is always one and only on value.

sourceEntityName of the entity from which the relationship crossing will start.

relationshipName of the relationship being crossed.

RETURNReturns a list of field names in Sprout field format (objectName(fieldName).

FindColumnmy $colIndex = ERDB::FindColumn($headerLine, $columnIdentifier);

Return the location a desired column in a data mining header line. The dat mining header line is a tab-separated list of column names. The colum identifier is either the numerical index of a column or the actual colum name.

headerLineThe header line from a data mining command, which consists of a tab-separate list of column names.

columnIdentifierEither the ordinal number of the desired column (1-based), or the name of th desired column.

RETURNReturns the array index (0-based) of the desired column.

ParseColumnsmy @columns = ERDB::ParseColumns($line);

Convert the specified data line to a list of columns.

lineA data mining input, consisting of a tab-separated list of columns terminated by new-line.

RETURNReturns a list consisting of the column values.

Virtual Methods_CreatePPOIndexmy $index = ERDB::_CreatePPOIndex($indexObject);

Convert the XML for an ERDB index to the XML structure for a PP index.

ERDB XML structure for an index.

RETURNPPO XML structure for the same index.

_CreatePPOFieldmy $fieldXML = ERDB::_CreatePPOField($fieldName, $fieldObject);

Convert the ERDB XML structure for a field to a PPO scalar XML structure.

fieldNameName of the scalar field.

fieldObjectERDB XML structure describing the field.

RETURNReturns a PPO XML structure for the same field.

CleanKeywordsmy $cleanedString = $erdb->CleanKeywords($searchExpression);

Clean up a search expression or keyword list. This is a virtual method that ma be overridden by the subclass. The base-class method removes extra space and converts everything to lower case.

searchExpressionSearch expression or keyword list to clean. Note that a search expression ma contain boolean operators which need to be preserved. This includes leadin minus signs.

RETURNCleaned expression or keyword list.

GetSourceObjectmy $source = $erdb->GetSourceObject($entityName);

Return the object to be used in loading special attributes of the specified entity. Th algorithm for loading special attributes is stored in the DataGen elements of th XML

Internal Utility Methods_RelationMapmy @relationMap = _RelationMap($mappedNameHashRef, $mappedNameListRef);

Create the relation map for an SQL query. The relation map is used by ERDBObjectmappedNameHashRefReference to a hash that maps modified object names to real object names.

mappedNameListRefReference to a list of modified object names in the order they appear in th SELECT list.

RETURNReturns a list of 2-tuples. Each tuple consists of an object name as used in th query followed by the actual name of that object. This enables the ERDBObject t determine the order of the tables in the query and which object name belongs to eac mapped object name. Most of the time these two values are the same; however, if relation occurs twice in the query, the relation name in the field list and WHER clause will use a mapped name (generally the actual relation name with a numeri suffix) that does not match the actual relation name.

_SetupSQLProcess a list of object names and a filter clause so that they can be used t build an SQL statement. This method takes in a reference to a list of object name and a filter clause. It will return a corrected filter clause, a list of mappe names and the mapped name hash.

This is an instance method.

objectNamesReference to a list of the object names to be included in the query.

filterClauseA string containing the WHERE clause for the query (without the WHERE) and als optionally the ORDER BY and LIMIT clauses.

matchClauseAn optional full-text search clause. If specified, it will be inserted at th front of the WHERE clause. It should already be SQL-formatted; that is, th field names should be in the form table.fieldName.

RETURNReturns a three-element list. The first element is the SQL statement suffix, beginnin with the FROM clause. The second element is a reference to a list of the names to b used in retrieving the fields. The third element is a hash mapping the names to th objects they represent.

_GetStatementHandleThis method will prepare and execute an SQL query, returning the statement handle The main reason for doing this here is so that everybody who does SQL queries get the benefit of tracing.

This is an instance method.

commandCommand to prepare and execute.

paramsReference to a list of the values to be substituted in for the parameter marks.

RETURNReturns a prepared and executed statement handle from which the caller can extrac results.

_GetLoadStatsReturn a blank statistics object for use by the load methods.

This is a static method.

_DumpRelationDump the specified relation to the specified output file in tab-delimited format.

This is an instance method.

outputDirectoryDirectory to contain the output file.

relationNameName of the relation to dump.

relationStructure describing the relation to be dumped.

_GetStructureGet the data structure for a specified entity or relationship.

This is an instance method.

objectNameName of the desired entity or relationship.

RETURNThe descriptor for the specified object.

_GetRelationTableGet the list of relations for a specified entity or relationship.

This is an instance method.

objectNameName of the desired entity or relationship.

RETURNA table containing the relations for the specified object.

_ValidateFieldNamesDetermine whether or not the field names are valid. A description of the problems with the name will be written to the standard error output. If there is an error, this method will abort. This i a static method.

metadataMetadata structure loaded from the XML data definition.

_LoadRelationLoad a relation from the data in a tab-delimited disk file. The load will only take place if a dis file with the same name as the relation exists in the specified directory.

This is an instance method.

dbhDBKernel object for accessing the database.

directoryNameName of the directory containing the tab-delimited data files.

relationNameName of the relation to load.

rebuildTRUE if the table should be dropped and re-created before loading.

RETURNReturns a statistical object describing the number of records read and a list of error messages.

_LoadMetaDatamy $metadata = ERDB::_LoadMetaData($filename);

This method loads the data describing this database from an XML file into a metadata structure The resulting structure is a set of nested hash tables containing all the information needed t load or use the database. The schema for the XML file is ERDatabase.xml.

This is a static method.

filenameName of the file containing the database definition.

RETURNReturns a structure describing the database.

_CreateRelationshipIndexCreate an index for a relationship's relation.

This is a static method.

indexKeyType of index: either "From" or "To".

relationshipNameName of the relationship.

relationshipStructureStructure describing the relationship that the index will sort.

_ProcessIndexesERDB::_ProcessIndexes($indexList, $relation);

Build the data structures for the specified indexes in the specified relation.

indexListReference to a list of indexes. Each index is a hash reference containing an optiona Notes value that describes the index and an IndexFields value that is a referenc to a list of index field structures. An index field structure, in turn, is a referenc to a hash that contains a name attribute for the field name and an orderascending or descending. In this sense th index list encapsulates the XML Indexes structure in the database definition.

relationThe structure that describes the current relation. The new index descriptors wil be stored in the structure's Indexes member. Any previous data in the structur will be lost.

_AddIndexAdd an index to a relation structure.

This is a static method.

indexNameName to give to the new index.

relationStructureRelation structure to which the new index should be added.

newIndexNew index to add.

_FixupFieldsThis method fixes the field list for an entity or relationship. It will add the caller-specifie relation name to fields that do not have a name and set the PrettySort value as specified.

This is a static method.

structureEntity or relationship structure to be fixed up.

defaultRelationNameDefault relation name to be added to the fields.

prettySortValuePrettySort value for the relation's normal fields.

textPrettySortValuePrettySort value for the relation's text fields. This value can be set to one greater than th normal pretty sort value so that text fields go at the end of each relation.

_FixNameFix the incoming field name so that it is a legal SQL column name.

This is a static method.

fieldNameField name to fix.

RETURNReturns the fixed-up field name.

_FixNamesFix all the field names in a list.

This is a static method.

field1, field2, field3, ... fieldnList of field names to fix.

RETURNReturns a list of fixed-up versions of the incoming field names.

_AddFieldAdd a field to a field list.

This is a static method.

structureStructure (usually an entity or relationship) that is to contain the field.

fieldNameName of the new field.

fieldDataStructure containing the data to put in the field.

_ReOrderRelationTableThis method will take a relation table and re-sort it according to the implicit ordering of th PrettySort property. Instead of a hash based on field names, it will return a list of fields This requires creating a new hash that contains the field name in the name property but doesn' have the PrettySort property, and then inserting that new hash into the field list.

This is a static method.

relationTableRelation hash to be reformatted into a list.

RETURNA list of field hashes.

_IsPrimaryReturn TRUE if a specified relation is a primary relation, else FALSE. A relation is primar if it has the same name as an entity or relationship.

This is an instance method.

relationNameName of the relevant relation.

RETURNReturns TRUE for a primary relation, else FALSE.

_FindRelationReturn the descriptor for the specified relation.

This is an instance method.

relationNameName of the relation whose descriptor is to be returned.

RETURNReturns the object that describes the relation's indexes and fields.

HTML Documentation Utility Methods_ComputeRelationshipSentenceThe relationship sentence consists of the relationship name between the names of th two related entities and an arity indicator.

This is a static method.

relationshipNameName of the relationship.

relationshipStructureRelationship structure containing the relationship's description and properties.

RETURNReturns a string containing the entity names on either side of the relationship name and a indicator of the arity.

_ComputeRelationshipHeadingThe relationship heading is the relationship sentence with the entit names hyperlinked to the appropriate entity sections of the document.

This is a static method.

relationshipNameName of the relationship.

relationshipStructureRelationship structure containing the relationship's description and properties.

RETURNReturns a string containing the entity names on either side of the relationship name with the entit names hyperlinked.

_ShowRelationTableGenerate the HTML string for a particular relation. The relation's data will be formatted as an HTM table with three columns-- the field name, the field type, and the field description.

This is a static method.

relationNameName of the relation being formatted.

relationDataHash containing the relation's fields and indexes.

RETURNReturns an HTML string that can be used to display the relation name and all of its fields.

_OpenFieldTableThis method creates the header string for the field table generated by ShowMetaData.

This is a static method.

tablenameName of the table whose fields will be displayed.

RETURNReturns a string containing the HTML for a field table's header.

_OpenTableThis method creates the header string for an HTML table.

This is a static method.

tablenameTitle of the table.

colName1, colName2, ..., colNameNList of column names.

RETURNReturns a string containing the HTML for the desired table's header.

_CloseTableThis method returns the HTML for closing a table.

This is a static method.

_ShowFieldThis method returns the HTML for displaying a row of field information in a field table.

This is a static method.

fieldDataTable of data about the field.

RETURNReturns an HTML string for a table row that shows the field's name, type, and description.