We are pleased to introduce a new, XML based format of the complete
dataset file. It conforms to the specification defined in the
xin.xsd
schema file. However, note, that the format defined there is very
flexible and defines only the overall structure of a document that can
be used to describe an arbitrary annotated graph. The DIP dataset
file is just one particular application of this format.
The overall structure of a XIN document is tripartite. The first section
(<attributes>) defines names, types and default values of the
attributes that are assigned to all the nodes and edges of the graph.
It is followed by a list of nodes and then by a list of edges. The from
and to attributes specified for each node refer to the unique id
attribute of each node element and thus define the connectivity of the
network.
Each element of the graph is annotated with a set of attributes defined
in the <attributes> section. The values of the attributes for
each node/edge are specified with <att> entries - if not
present, the value is set to the default specified in the appropriate
<attributes> section entry.
In the particular case of the DIP dataset file, the class
attribute assigned to each edge defines the CORE subset of DIP - a set
of the most reliable S. cerevisiae interactions
identified computationally as described in
Deane at al.
In addition to the attributes, each node/edge can be annotated with any
number of <feature> entries - currently those are used to
specify cross-references to other protein databases as well as are used
to list references to the experimental data describing interactions.
Note, that for brevity we only list PMID numbers as the full information
about each article can be retrieved directly from
PubMed
by searching for a particular PMID nuber.
The class attribute specified for each experiment is used to distinguish
between small-scale (exp:s) and genome-wide (exp:g) approach
whereas the <val> element specifies the experimental method
used to detect the interaction.
We hope the new format provides a better, more flexible way of distributing
DIP data. We'd be
happy to hear
any comments and suggestions for format improvement/extensions/etc.