"Fossies" - the Fresh Open Source Software Archive

Member "citadel/docs/databaselayout.txt" (5 Jun 2021, 24050 Bytes) of package /linux/www/citadel.tar.gz:


As a special service "Fossies" has tried to format the requested text file into HTML format (style: standard) with prefixed line numbers. Alternatively you can here view or download the uninterpreted source code file.

    1 The totally incomplete guide to Citadel internals
    2 -----------------------------------------------------
    3 
    4 Citadel has evolved quite a bit since its early days, and the data structures
    5 have evolved with it.  This document provides a rough overview of how the
    6 system works internally.  For details you're going to have to dig through the
    7 code, but this'll get you started. 
    8 
    9 
   10 DATABASE TABLES
   11 ---------------
   12 As you probably already know by now, Citadel uses a group of tables stored
   13 with a record manager (usually Berkeley DB).  Since we're using a record
   14 manager rather than a relational database, all record structures are managed
   15 by Citadel.  Here are some of the tables we keep on disk:
   16 
   17 
   18 USER RECORDS
   19 ------------
   20 This table contains all user records.  It's indexed by
   21 user name (translated to lower case for indexing purposes).  The records in
   22 this file look something like this:
   23 
   24 struct ctdluser {                       /* User record                      */
   25         int version;                    /* Cit vers. which created this rec */
   26         uid_t uid;                      /* Associate with a unix account?   */
   27         char password[32];              /* password (for Citadel-only users)*/
   28         unsigned flags;                 /* See US_ flags below              */
   29         long timescalled;               /* Total number of logins           */
   30         long posted;                    /* Number of messages posted (ever) */
   31         CIT_UBYTE axlevel;              /* Access level                     */
   32         long usernum;                   /* User number (never recycled)     */
   33         time_t lastcall;                /* Last time the user called        */
   34         int USuserpurge;                /* Purge time (in days) for user    */
   35         char fullname[64];              /* Name for Citadel messages & mail */
   36 };
   37  
   38  Most fields here should be fairly self-explanatory.  The ones that might
   39 deserve some attention are:
   40 
   41  - *uid* -- if uid is not the same as the *unix uid* Citadel is running as, then the
   42    account is assumed to belong to the user on the underlying Unix system with
   43    that uid.  This allows us to require the user's OS password instead of having
   44    a separate Citadel password.
   45  
   46  - *usernum* -- these are assigned sequentially, and **NEVER REUSED**. This is
   47   important because it allows us to use this number in other data structures
   48   without having to worry about users being added/removed later on, as you'll
   49   see later in this document.
   50  
   51  
   52 ROOM RECORDS
   53 ------------
   54 These are room records.  There is a room record for every room on the
   55 system, public or private or mailbox.  It's indexed by room name (also in
   56 lower case for easy indexing) and it contains records which look like this:
   57 
   58     struct ctdlroom {
   59         char QRname[ROOMNAMELEN];       /* Name of room                     */
   60         char QRpasswd[10];              /* Only valid if it's a private rm  */
   61         long QRroomaide;                /* User number of room aide         */
   62         long QRhighest;                 /* Highest message NUMBER in room   */
   63         time_t QRgen;                   /* Generation number of room        */
   64         unsigned QRflags;               /* See flag values below            */
   65         char QRdirname[15];             /* Directory name, if applicable    */
   66         long QRinfo;                    /* Info file update relative to msgs*/
   67         char QRfloor;                   /* Which floor this room is on      */
   68         time_t QRmtime;                 /* Date/time of last post           */
   69         struct ExpirePolicy QRep;       /* Message expiration policy        */
   70         long QRnumber;                  /* Globally unique room number      */
   71         char QRorder;                   /* Sort key for room listing order  */
   72         unsigned QRflags2;              /* Additional flags                 */
   73         int QRdefaultview;              /* How to display the contents      */
   74     };
   75 
   76 Again, mostly self-explanatory.  Here are the interesting ones:
   77  
   78 *QRnumber* is a globally unique room ID, while QRgen is the "generation number"
   79 of the room (it's actually a timestamp).  The two combined produce a unique
   80 value which identifies the room.  The reason for two separate fields will be
   81 explained below when we discuss the visit table.  For now just remember that
   82 *QRnumber* remains the same for the duration of the room's existence, and QRgen
   83 is timestamped once during room creation but may be restamped later on when
   84 certain circumstances exist.
   85 
   86 FLOORTAB
   87 --------
   88 Floors.  This is so simplistic it's not worth going into detail about, except
   89 to note that we keep a reference count of the number of rooms on each floor.
   90  
   91 MSGLISTS
   92 --------
   93 Each record in this table consists of a bunch of message  numbers
   94 which represent the contents of a room.  A message can exist in more than one
   95 room (for example, a mail message with multiple recipients -- 'single instance
   96 store').  This table is never, ever traversed in its entirety.  When you do
   97 any type of read operation, it fetches the msglist for the room you're in
   98 (using the room's ID as the index key) and then you can go ahead and read
   99 those messages one by one.
  100 
  101 Each room is basically just a list of message numbers.  Each time
  102 we enter a new message in a room, its message number is appended to the end
  103 of the list.  If an old message is to be expired, we must delete it from the
  104 message base.  Reading a room is just a matter of looking up the messages
  105 one by one and sending them to the client for display, printing, or whatever.
  106  
  107 
  108 VISIT
  109 -----
  110 This is the tough one.  Put on your thinking cap and grab a fresh cup of
  111 coffee before attempting to grok the visit table.
  112  
  113 This table contains records which establish the relationship between users
  114 and rooms.  Its index is a hash of the user and room combination in question.
  115 When looking for such a relationship, the record in this table can tell the
  116 server things like "this user has zapped this room," "this user has access to
  117 this private room," etc.  It's also where we keep track of which messages
  118 the user has marked as "old" and which are "new" (which are not necessarily
  119 contiguous; contrast with older Citadel implementations which simply kept a
  120 "last read" pointer).
  121  
  122 
  123 Here's what the records look like:
  124  
  125     struct visit {
  126         long v_roomnum;
  127         long v_roomgen;
  128         long v_usernum;
  129         long v_lastseen;
  130         unsigned int v_flags;
  131         char v_seen[SIZ];
  132         int v_view;
  133     };
  134 
  135     #define V_FORGET        1       /* User has zapped this room        */
  136     #define V_LOCKOUT       2       /* User is locked out of this room  */
  137     #define V_ACCESS        4       /* Access is granted to this room   */
  138  
  139 This table is indexed by a concatenation of the first three fields.  Whenever
  140 we want to learn the relationship between a user and a room, we feed that
  141 data to a function which looks up the corresponding record.  The record is
  142 designed in such a way that an "all zeroes" record (which is what you get if
  143 the record isn't found) represents the default relationship.
  144  
  145 With this data, we now know which private rooms we're allowed to visit: if
  146 the *V_ACCESS* bit is set, the room is one which the user knows, and it may
  147 appear in his/her known rooms list.  Conversely, we also know which rooms the
  148 user has zapped: if the *V_FORGET* flag is set, we relegate the room to the
  149 zapped list and don't bring it up during new message searches.  It's also
  150 worth noting that the *V_LOCKOUT* flag works in a similar way to administratively
  151 lock users out of rooms.
  152  
  153 Implementing the "cause all users to forget room" command, then, becomes very
  154 simple: we simply change the generation number of the room by putting a new
  155 timestamp in the *QRgen* field.  This causes all relevant visit records to
  156 become irrelevant, because they appear to point to a different room.  At the
  157 same time, we don't lose the messages in the room, because the msglists table
  158 is indexed by the room number (*QRnumber*), which never changes.
  159  
  160 *v_seen* contains a string which represents the set of messages in this room
  161 which the user has read (marked as 'seen' or 'old').  It follows the same
  162 syntax used by IMAP and NNTP.  When we search for new messages, we simply
  163 return any messages that are in the room that are **not** represented by this
  164 set.  Naturally, when we do want to mark more messages as seen (or unmark
  165 them), we change this string.  Citadel BBS client implementations are naive
  166 and think linearly in terms of "everything is old up to this point," but IMAP
  167 clients want to have more granularity.
  168 
  169 
  170 DIRECTORY
  171 ---------
  172 This table simply maps Internet e-mail addresses to Citadel network addresses
  173 for quick lookup.  It is generated from data in the Global Address Book room.
  174 
  175 USETABLE
  176 --------
  177 This table keeps track of message ID's of messages arriving over a network,
  178 to prevent duplicates from being posted if someone misconfigures the network
  179 and a loop is created.  This table goes unused on a non-networked Citadel.
  180 
  181 THE MESSAGE STORE
  182 -----------------
  183 This is where all message text is stored.  It's indexed by message number:
  184 give it a number, get back a message.  Messages are numbered sequentially, and
  185 the message numbers are never reused.
  186  
  187 We also keep a "metadata" record for each message.  This record is also stored
  188 in the msgmain table, using the index (0 - msgnum).  We keep in the metadata
  189 record, among other things, a reference count for each message.  Since a
  190 message may exist in more than one room, it's important to keep this reference
  191 count up to date, and to delete the message from disk when the reference count
  192 reaches zero.
  193  
  194 #Here's the format for the message itself:
  195 
  196  - Each message begins with an 0xFF 'start of message' byte.
  197  
  198  - The next byte denotes whether this is an anonymous message.  The codes
  199    available are *MES_NORMAL*, *MES_ANON*, or *MES_AN2* (defined in citadel.h).
  200  
  201  - The third byte is a "message type" code.  The following codes are defined:
  202   - 0 - "Traditional" Citadel format.  Message is to be displayed "formatted."
  203   - 1 - Plain pre-formatted ASCII text (otherwise known as text/plain)
  204   - 4 - MIME formatted message.  The text of the message which follows is
  205         expected to begin with a "Content-type:" header.
  206  
  207  - After these three opening bytes, the remainder of
  208    the message consists of a sequence of character strings.  Each string
  209    begins with a type byte indicating the meaning of the string and is
  210    ended with a null.  All strings are printable ASCII: in particular,
  211    all numbers are in ASCII rather than binary.  This is for simplicity,
  212    both in implementing the system and in implementing other code to
  213    work with the system.  For instance, a database driven off Citadel archives
  214    can do wildcard matching without worrying about unpacking binary data such
  215    as message ID's first.  To provide later downward compatability
  216    all software should be written to IGNORE fields not currently defined.
  217 
  218 
  219 #The type bytes currently defined are:
  220 
  221 
  222 | BYTE  |       Enum        | NW   | Mnemonic       |  Enum / Comments
  223 |-------|-------------------|------|----------------|---------------------------------------------------------
  224 | A     |    eAuthor        | from | Author         |  The display name of the Author of the message.
  225 | B     |    eBig_message   |      | Big message    |  This is a flag which indicates that the message is
  226 |       |                   |      |                |  big, and Citadel is storing the body in a separate
  227 |       |                   |      |                |  record.  You will never see this field because the
  228 |       |                   |      |                |  internal API handles it.
  229 | E     |    eExclusiveID   | exti | Exclusive ID   |  A persistent alphanumeric Message ID used for
  230 |       |                   |      |                |  replication control.  When a message arrives that
  231 |       |                   |      |                |  contains an Exclusive ID, any existing messages which
  232 |       |                   |      |                |  contain the same Exclusive ID and are *older* than this
  233 |       |                   |      |                |  message should be deleted.  If there exist any messages
  234 |       |                   |      |                |  with the same Exclusive ID that are *newer*, then this
  235 |       |                   |      |                |  message should be dropped.
  236 | F     |    erFc822Addr    | rfca | rFc822 address |  email address or user principal name of the message
  237 |       |                   |      |                |  author.
  238 | I     |    emessageId     | msgn | Message ID     |  An RFC822-compatible message ID for this message.
  239 |       |                   |      |                |  
  240 | J     |    eJournal       | jrnl | Journal        |  The presence of this field indicates that the message
  241 |       |                   |      |                |  is disqualified from being journaled, perhaps because
  242 |       |                   |      |                |  it is itself a journalized message and we wish to
  243 |       |                   |      |                |  avoid double journaling.
  244 | K     |    eReplyTo       | rep2 | Reply-To       |  the Reply-To header for mailinglist outbound messages
  245 | L     |    eListID        | list | List-ID        |  Mailing list identification, as per RFC 2919
  246 | M     |    eMesageText    | text | Message Text   |  Normal ASCII, newlines seperated by CR's or LF's,
  247 |       |                   |      |                |  null terminated as always.
  248 | O     |    eOriginalRoom  | room | Room           |  Room of origin.
  249 | P     |    eMessagePath   | path | Path           |  Complete path of message, as in the UseNet news
  250 |       |                   |      |                |  standard.  A user should be able to send Internet mail
  251 |       |                   |      |                |  to this path. (Note that your system name will not be
  252 |       |                   |      |                |  tacked onto this until you're sending the message to
  253 |       |                   |      |                |  someone else)
  254 | R     |    eRecipient     | rcpt | Recipient      |  Only present in Mail messages.
  255 | T     |    eTimestamp     | time | date/Time      |  Unix timestamp containing the creation date/time of
  256 |       |                   |      |                |  the message.
  257 | U     |    eMsgSubject    | subj | sUbject        |  Message subject.  Optional.
  258 |       |                   |      |                |  Developers may choose whether they wish to
  259 |       |                   |      |                |  generate or display subject fields.
  260 | V     |    eenVelopeTo    | nvto | enVelope-to    |  The recipient specified in incoming SMTP messages.
  261 | W     |    eWeferences    | wefw | Wefewences     |  Previous message ID's for conversation threading.  When
  262 |       |                   |      |                |  converting from RFC822 we use References: if present, or
  263 |       |                   |      |                |  In-Reply-To: otherwise.
  264 |       |                   |      |                |  (Who in extnotify spool messages which don't need to know
  265 |       |                   |      |                |  other message ids)
  266 | Y     |    eCarbonCopY    | cccc | carbon copY    |  Carbon copy (CC) recipients.
  267 |       |                   |      |                |  Optional, and only in Mail messages.
  268 | %     |    eHeaderOnly    | nhdr | oNlyHeader     |  we will just be sending headers. for the Wire protocol only.
  269 | %     |    eFormatType    | type | type           |  type of citadel message: (Wire protocol only)
  270 |       |                   |      |                |     FMT\_CITADEL     0   Citadel vari-format (proprietary) 
  271 |       |                   |      |                |     FMT\_FIXED       1   Fixed format (proprietary)
  272 |       |                   |      |                |     FMT\_RFC822      4   Standard (headers are in M field)
  273 | %     |    eMessagePart   | part | emessagePart   |  eMessagePart is the id of this part in the mime hierachy
  274 | %     |	 eSubFolder     | suff | eSubFolder     |  descend into a mime sub container
  275 | %     | 	 ePevious       | pref | ePevious       |  exit a mime sub container
  276 | 0     |    eErrorMsg      |      | Error          |  This field is typically never found in a message on
  277 |       |                   |      |                |  disk or in transit.  Message scanning modules are
  278 |       |                   |      |                |  expected to fill in this field when rejecting a message
  279 |       |                   |      |                |  with an explanation as to what happened (virus found,
  280 |       |                   |      |                |  message looks like spam, etc.)
  281 | 1     |    eSuppressIdx   |      | suppress index |  The presence of this field indicates that the message is
  282 |       |                   |      |                |  disqualified from being added to the full text index.
  283 | 2     |    eExtnotify     |      | extnotify      |  Used internally by the serv_extnotify module.
  284 | 3     |    eVltMsgNum     |      | msgnum         |  Used internally to pass the local message number in the
  285 |       |                   |      |                |  database to after-save hooks.  Discarded afterwards.
  286 |       |                   | locl |                |  The presence of this field indicates that the message
  287 |       |                   |      |                |  is believed to have originated on the local Citadel node,
  288 |       |                   |      |                |  not as an inbound email or some other outside source.
  289 
  290 EXAMPLE
  291 -------
  292 Let *<FF>* be a *0xFF* byte, and *<0>* be a null *(0x00)* byte.  Then a message
  293 which prints as...
  294 
  295     Apr 12, 1988 23:16 From Test User In Network Test> @lifesys (Life Central)
  296     Have a nice day!
  297 
  298 might be stored as...
  299 
  300     <FF><40><0>I12345<0>Pneighbor!lifesys!test_user<0>T576918988<0>    (continued)
  301     -----------|Mesg ID#|--Message Path---------------|--Date------
  302     
  303     AThe Test User<0>ONetwork Test<0>Nlifesys<0>HLife Central<0>MHave a nice day!<0>
  304     |-----Author-----|-Room name-----|-nodename-|Human Name-|--Message text-----
  305 
  306 Weird things can happen if fields are missing, especially if you use the
  307 networker.  But basically, the date, author, room, and nodename may be in any
  308 order.  But the leading fields and the message text must remain in the same
  309 place.  The H field looks better when it is placed immediately after the N
  310 field.
  311 
  312 
  313 EUID (EXCLUSIVE MESSAGE ID'S)
  314 -----------------------------
  315 This is where the groupware magic happens.  Any message in any room may have
  316 a field called the Exclusive message *ID*, or *EUID*.  We keep an index in the
  317 table *CDB_EUIDINDEX* which knows the message number of any item that has an
  318 *EUID*.  This allows us to do two things:
  319  
  320  - If a subsequent message arrives with the same *EUID*, it automatically
  321    *deletes* the existing one, because the new one is considered a replacement
  322    for the existing one.
  323  - If we know the *EUID* of the item we're looking for, we can fetch it by *EUID*
  324    and get the most up-to-date version, even if it's been updated several times.
  325 
  326 This functionality is made more useful by server-side hooks.  For example,
  327 when we save a vCard to an address book room, or an iCalendar item to a
  328 calendar room, our server modules detect this condition, and automatically set
  329 the *EUID* of the message to the *UUID* of the *vCard* or *iCalendar* item.
  330 Therefore when you save an updated version of an address book entry or
  331 a calendar item, the old one is automatically deleted.
  332 
  333 NETWORKING (REPLICATION)
  334 ------------------------
  335 Citadel nodes network by sharing one or more rooms. Any Citadel node
  336 can choose to share messages with any other Citadel node, through the sending
  337 of spool files.  The sending system takes all messages it hasn't sent yet, and
  338 spools them to the recieving system, which posts them in the rooms.
  339 
  340 The *EUID* discussion above is extremely relevant, because *EUID* is carried over
  341 the network as well, and the replacement rules are followed over the network
  342 as well.  Therefore, when a message containing an *EUID* is saved in a networked
  343 room, it replaces any existing message with the same *EUID* *on every node in
  344 the network*.
  345 
  346 Complexities arise primarily from the possibility of densely connected
  347 networks: one does not wish to accumulate multiple copies of a given
  348 message, which can easily happen.  Nor does one want to see old messages
  349 percolating indefinitely through the system.
  350 
  351 This problem is handled by keeping track of the path a message has taken over
  352 the network, like the UseNet news system does.  When a system sends out a
  353 message, it adds its own name to the bang-path in the *<P>* field of the
  354 message.  If no path field is present, it generates one.
  355    
  356 With the path present, all the networker has to do to assure that it doesn't
  357 send another system a message it's already received is check the <P>ath field
  358 for that system's name somewhere in the bang path.  If it's present, the system
  359 has already seen the message, so we don't send it.
  360 
  361 We also keep a small database, called the "use table," containing the ID's of
  362 all messages we've seen recently.  If the same message arrives a second or
  363 subsequent time, we will find its ID in the use table, indicating that we
  364 already have a copy of that message.  It will therefore be discarded.
  365 
  366 The above discussion should make the function of the fields reasonably clear:
  367 
  368  o  Travelling messages need to carry original message-id, system of origin,
  369     date of origin, author, and path with them, to keep reproduction and
  370     cycling under control.
  371 
  372 (Uncoincidentally) the format used to transmit messages for networking
  373 purposes is precisely that used on disk, serialized.  The current
  374 distribution includes serv_network.c, which is basically a database replicator;
  375 please see network.txt on its operation and functionality (if any).
  376 
  377 PORTABILITY ISSUES
  378 ------------------
  379 Citadel is 64-bit clean and architecture-independent.  The software is
  380 developed and primarily run on the Linux operating system (which uses the
  381 Linux kernel) but it should compile and run on any reasonably POSIX
  382 compliant system.
  383 
  384 On the client side, it's also POSIX compliant.  The client even seems to
  385 build ok on non-POSIX systems with porting libraries (such as Cygwin and
  386 WSL).
  387 
  388 SUPPORTING PRIVATE MAIL
  389 -----------------------
  390 Can one have an elegant kludge?  This must come pretty close.
  391 
  392 Private mail is sent and recieved in the *Mail>* room, which otherwise
  393 behaves pretty much as any other room.        To make this work, we have a
  394 separate Mail> room for each user behind the scenes.  The actual room name
  395 in the database looks like *"0000001234.Mail"* (where *'1234'* is the user
  396 number) and it's flagged with the *QR_MAILBOX* flag.  The user number is
  397 stripped off by the server before the name is presented to the client.  This
  398 provides the ability to give each user a separate namespace for mailboxes
  399 and personal rooms.
  400 
  401 This requires a little fiddling to get things just right. For example,
  402 *make_message()* has to be kludged to ask for the name of the recipient
  403 of the message whenever a message is entered in *Mail>*. But basically
  404 it works pretty well, keeping the code and user interface simple and
  405 regular.
  406 
  407 PASSWORDS AND NAME VALIDATION
  408 -----------------------------
  409 This has changed a couple of times over the course of Citadel's history.  At
  410 this point it's very simple, again due to the fact that record managers are
  411 used for everything.    The user file (user) is indexed using the user's
  412 name, converted to all lower-case.  Searching for a user, then, is easy.  We
  413 just lowercase the name we're looking for and query the database.  If no
  414 match is found, it is assumed that the user does not exist.
  415 
  416 This makes it difficult to forge messages from an existing user.  (Fine
  417 point: nonprinting characters are converted to printing characters, and
  418 leading, trailing, and double blanks are deleted.)