TOC | Previous | Next | Documentation Home


Format specification for .crt files

A .crt data file contains a representation of some part of a crystal structure, whether one or more particular molecules, the contents of a unit cell, or some other part of the structure as chosen by the file’s creator.  The contents describe atoms, the bonds between them, and optionally the unit cell dimensions and symmetry operations of the structure from which the data are drawn.  Atom positions are expressed as Cartesian x, y, and z coordinates (in units of Ångstroms), hence the .crt extension.  Symmetry and unit cell information is referred to the same Cartesian system.  Among other uses, .crt files may be visualized using the JaMM family of applets included with the Reciprocal Net site software.  The file format was developed by Indiana University in the 1990’s; it is textual in nature, so .crt files can readily be manipulated via a standard editor.

Basic definitions

1.     Each octet in the file must represent an ASCII character (in the range 0-127).  Except as otherwise specified by this document, character codes 32 or less are prohibited.

2.     A line break may consist of a carriage return character (ASCII code 13), a line feed (ASCII code 10), or a carriage return followed immediately by a line feed.

3.     The characters space (ASCII code 32) and tab (ASCII code 9) are considered white space.  Tokens are elements of the file separated by one or more consecutive white space characters.  The precise number of white space characters separating two tokens is not significant.  Additionally, a line break character may separate two tokens, optionally with white space on either side, when the specification calls for it.

4.     The pound sign character (ASCII code 35, ‘#’) signifies a comment.  All characters from the pound sign to the next line break (or end of file, if there is no next line break) are ignored by parsers.

5.     A numeric value consists of a string of decimal digits (characters ‘0’ through ‘9’, ASCII codes 48 through 57), the decimal point (ASCII code 46), and the negative sign (ASCII code 45).  If a negative sign is present then it must be the first character of the value.  Real number values may contain at most one decimal point and are stored with at least the precision of the IEEE “single precision” standard format; this guarantees at least six significant decimal digits of precision.  Integer values may not contain a decimal point and are stored with at least the precision of 32-bit two’s-complement format.

6.     A textual value consists of between one and thirty-one sequential characters in the range 33-127, except that double quote (ASCII code 34), pound sign (ASCII code 35), and backslash (ASCII code 47) characters are not permitted.

Sections

The file is divided into sections, some of which are optional.  Order of sections is significant, and those present MUST appear in the order they are listed below.  At present, three sections are defined, but parsers SHOULD be tolerant of additional (unrecognized) sections that may be present at the bottom of a .crt file.

The CARTESIAN section

This section enumerates the atoms in the asymmetric unit of the structure, locates them in 3-space, and identifies any bonds between them.  It is the only required section, and it must be first in the file.  It is composed of three subsections:

1.     A header line consisting of the word CARTESIAN, an integer count of the atoms that will be described by this section (subsection 2), an integer count of the bonds that will be described by this section (subsection 3), and a textual label for the file.  Each value is separated from its neighbor by white space.  The subsection is terminated by a line break.  Atom and bond counts listed in this section SHOULD be considered advisory rather than prescriptive.

2.     The ATOMS subsection contains zero or more atom definitions, one per line, starting on the line immediately after the one line of subsection 1.  Each atom description consists of a textual label, a real number that represents the atom’s x-coordinate, a real number that represents the atom’s y-coordinate, a real number that represents the atom’s z-coordinate, and an integer that represents the atomic number of the atom.  Version 0.9.0 adds an additional, optional site code tag at the end of each atom line (see below).  Each value is separated from its neighbor by white space.  Any additional tokens on an atom line SHOULD be ignored by parsers.  The order of atom descriptions within this subsection is not externally significant, but the bond definitions in subsection 3 refer to this order.  This subsection is terminated by a line starting with the word ENDATOMS; the remainder of this line SHOULD be ignored by parsers.

Site codes describe the relationship of atom records in the CRT file to crystallographic sites described in some external file, normally a CIF.  They have the form of an atom site label, bar (‘|’, ASCII code 124) character, and CIF-style symmetry code (see the CIF dictionary) concatenated together, with any internal whitespace characters removed.  Example: C10|2_455.  Specific interpretation of these codes relies on reference to a corresponding external file, and unless the appropriate such file can be identified (normally by means of some local convention) it is safest to ignore these codes.

3.     The BONDS subsection contains zero or more bond descriptions, one per line, each connecting two atoms in the model as identified by 1-based indices into the atom list of subsection 2.  Bond descriptions consist simply of the two integer indices, separated by white space.  Parsers SHOULD ignore the remainder of each line.  Order of atom indices within a bond description and the order of bond descriptions within the subsection are not significant.  Duplicate bond descriptions (in the same or the reverse order of atom indices) are permitted but not meaningful; they SHOULD be accommodated by parsers but not expected.  This subsection is terminated by a line starting with the word ENDBONDS; the remainder of this line SHOULD be ignored by parsers.

The CELL section

This optional section describes the origin and unit cell vectors of the structure in terms of the Cartesian system to which the coordinates of the atoms in the CARTESIAN section are referred.  This section is recognized by the keyword CELL as the first word of a line; the remainder of this line SHOULD be ignored by parsers.  The rest of the section consists of four lines, each containing three real numbers constituting the coordinates of a point or vector in Cartesian space, separated by white space: the origin of the crystallographic coordinate system on the first line, then the , , and  vectors, respectively, on each of the other three lines.  Parsers MAY require that the last four lines of this section be devoid of other tokens.  There is no explicit terminator for this section; the end is implicit in the fixed number of lines the section contains.

The SYMMETRY section

This optional section describes symmetry operations of the structure from which the model described in this CRT file is drawn, referred to the same Cartesian system in which the atomic coordinates are expressed.  The identity operation is implied and MUST NOT be included in this section.  This section is identified by the keyword SYMMETRY as the first word of a line.  An integer count of the symmetry operations that will be described by this section follows the keyword on the same line, separated by white space; this count SHOULD be treated as advisory, not prescriptive.  The remainder of the line SHOULD be ignored by parsers.

The body of the section consists of zero or more symmetry operation descriptions, each comprising four lines of three whitespace-delimited real numbers.  The first three lines of each description contain the elements of the three rows of a 3-by-3 rotation matrix M (left-to-right, top-to-bottom), and the last row contains the elements of a translation vector .  The transformation represented by M and , as applied to a Cartesian coordinate triple expressed as column vector  to produce a Cartesian coordinate triple , is represented by the equation .

The section is terminated by a line starting with the word ENDSYMM; the remainder of this line SHOULD be ignored by parsers.


TOC | Previous | Next | Documentation Home