% -*- mode: noweb; noweb-default-code-mode: lisp-mode; -*- \documentclass[a4paper,11pt]{article} \usepackage[text={6.75in,10in},centering]{geometry} \usepackage{graphicx} \usepackage{microtype} \DisableLigatures[-]{family=tt*} \usepackage[T1]{fontenc} \usepackage[utf8]{inputenc} \usepackage{csquotes} \usepackage{babel} \usepackage{etoolbox} \AtBeginEnvironment{quote}{\small} \AtBeginEnvironment{verbatim}{\small} %% \usepackage[backend=biber]{biblatex} %% \addbibresource{refs.bib} %% \renewcommand{\cite}{\parencite} % \usepackage[hyperref]{xcolor} \usepackage[colorlinks=true,citecolor=]{hyperref} % linkcolor=red \usepackage{amsmath,amsthm,amssymb} \allowdisplaybreaks \usepackage{lmodern} \usepackage{noweb} \noweboptions{longchunks,smallcode} \DeclareMathOperator{\mdc}{mdc} \DeclareMathOperator{\gcdext}{gcdext} \DeclareMathOperator{\remainder}{remainder} \DeclareMathOperator{\quotient}{quotient} \DeclareMathOperator{\diff}{diff} \def\nwendcode{\endtrivlist \endgroup} \let\nwdocspar=\par %% Popular words. \newcommand{\lxxp}{{\tt loop}} \newcommand{\Lp}{{\tt LOOP}} \newcommand{\lp}{\Lp} \newcommand{\bug}{{\em bug}} \newcommand{\symlink}{{\em symbolic link}} \newcommand{\symlinks}{\symlink s} \title{\Lp\\ {a circle out of fashion}} \date{January 2024} \begin{document} \pdfbookmark[1]{Introduction}{intro} \fontfamily{cmr}\selectfont \maketitle %\setlength{\parskip}{3pt} %\setlength{\parindent}{0pt} \Lp\ is an out-of-fashion program, used as medium of communication by antiquated people. \Lp\ members insist that technical communication be made in writing and not in a hurry. To give you an idea, they write \Lp\ in jurassic technology. You wouldn't pay them any attention. % \begin{verbatim} Drunk and dressed in their best brown baggies and their platform soles They don't give a damn about any trumpet-playing band It ain't what they call rock and roll -- Mark Knopfler, 1978. \end{verbatim} It's easy to make a conference on the Internet. E-mail works. When we write an e-mail to various friends to discuss a certain subject, we form a circle. When we decide to add another person to this quickly-made conference by e-mail, sometimes we use a well-known idiom---``adding John to the loop''. We add John's e-mail address to the list of destinaries. So long as everyone replies to everyone, John, too, will start getting all the messages. If anyone violates this rule of replying to everyone involved, the loop is broken. \begin{figure}[!htb] \centering \includegraphics[width=0.8\linewidth]{images/gnus-summary.png} \caption{Gnus, a news reader embedded in the GNU EMACS text editor.} \label{fg:gnus} \end{figure} \begin{figure}[!htb] \centering \includegraphics[width=0.8\linewidth]{images/tbird-summary.png} \caption{Thunderbird, a news reader produced by the Mozilla Foundation.} \label{fg:bird} \end{figure} \begin{figure}[!htb] \centering \includegraphics[width=0.8\linewidth]{images/sylpheed-summary.png} \caption{Sylpheed, a news reader produced by Hiroyuki Yamamoto.} \label{fg:sylpheed} \end{figure} There are surely inconveniences in using e-mail as conference medium. For example, after John has been added to the loop, he is not able to leave by his own account. He needs to ask everyone involved to stop writing to him. This is usually easy to do, but instead people tend to ask for more technology such as mailing lists. A mailing list is nothing by an automated version of this idea of writing to various people at once. When the mailing address is written to, a program resends the message to all subscribers of the mailing list and, this way, the conference takes place. But \lp\ has nothing to do with e-mail. \Lp\ uses a communication strategy---called a ``protocol''---that is even older than the web itself. The web started out around 1989--1990 and the protocol \lp\ uses begins its history in 1979, the year in which AT\&T released UNIX V7, which was UUCP-ready, an acronym that stands for for UNIX to UNIX copy. It's both a protocol and a set of programs for copying files between UNIX systems. People begun exchanging messages by copying files between machines. The idea eventually evolved into a protocol called NNTP---Network News Transfer Protocol---, which is the protocol used by \lp. (The protocol used to be called Netnews Transfer Protocol.) Since e-mail was already daily practice of members of the Internet back then, many things from e-mail were taken by the NNTP designers. So, an NNTP message looks a lot like an e-mail message and the two---NNTP and SMTP (the protocol used by e-mail)---can often mingle seamlessly. The impression we get from using NNTP is that we're sending e-mail to a certain group of people. It's as though the message goes into a collective mailbox and anyone interested in that mailbox reads the messages there. If anyone would like to reply to a message, they do so and, this way, communication flows among the interested crowd. If anyone would like to leave the group, nothing is needed---the person just doesn't go back to read any more messages. Unlike mailing lists, there is no need to formally commit to reading one of these collective mailboxes and no need to formally notify anyone or any system that you're not interested in that group any longer. These collective mailboxes are called ``news groups'' and are often written as ``newsgroups''. And the messages posted to these newsgroups are called ``articles''. Just like e-mail and the web, network news is an open protocol. Anyone could write a program capable of speaking NNTP. There are many NNTP-aware programs. You could write your own. Figures \ref{fg:gnus}--\ref{fg:sylpheed} show a few programs for reading network news via NNTP. \section*{Principles for a discussion group}\label{principles} \pdfbookmark[1]{Principles for a discussion group}{principles} We believe a discussion group should be small and grow slowly. By ``slowly'', we mean that each member comes in through an invitation. This way, the group being closed by definition, we keep spam out and give members a certain sense of privilege. A discussion group should be formed by interested people. If a participant doesn't log-in for a certain period of time, \lp\ locks the participant's account---see Section \ref{sec:inactive-users}. The account can be reactivated, but it will take asking another participant (with an active account) to do so. In other words, there's an encouragement for an uninterested member not to come back to the \lp. The idea is to keep a certain cohesion in the discussion groups. When an account is locked or unlocked, an article is posted to the group {\tt local.control.news}, so everyone knows who is leaving and arriving. This way, participants get to have an idea of who is reading them. Each invitation comes with a certain responsibility: it's possible to see who invited who. If {\tt BOB} misbehaves, everyone gets to see that {\tt ALICE} doesn't have nice friends. The {\tt USERS} command shows the relationship graph: % \begin{verbatim} USERS 200 List of current users: ANONYMOUS, last seen on Fri Mar 8 19:01:56 2024, invited (ALICE) ALICE, last seen on Sun Mar 10 21:25:45 2024, invited (BOB CARLA) BOB, last seen on Sun Mar 10 21:17:30 2024, invited nobody CARLA, last seen on Sun Mar 10 18:30:48 2024, invited nobody \end{verbatim} We conjecture that a discussion group tends to prosper when each member feels as important as any other member. So we think each member should have as much managerial power as any other. In an attempt to realize this ideal, each member is able to not only invite other people---see the [[CREATE-ACCOUNT]] command---, but also to create new groups---see the [[CREATE-GROUP]] command. Despite this introduction, this is not a tutorial on the history of the Internet or how NNTP works. This is the source code of \lp. Hereafter, our conversation continues in Lisp. Understanding how \lp\ is made is only necessary if you intend to modify it. If you just want to use the system, you probably should stop right here. \section*{How to install} See \href{https://git.antartida.xyz/loop/srv/raw/branch/main/README}{[[README]]} in \lp's source code. \section{Implementation strategy}\label{sec:design} Anything a user sends to the \lp\ is wrapped in a [[request]] and any command processing must produce a [[response]]: % \begin{verbatim} request ---> process ---> response. \end{verbatim} % The request arrives, \lp\ interprets it, finds out which command should look at the request and then dispatches the the request to the right command. The command process the request and returns a response. Then \lp\ takes this response and sends it out back to the user through the network using [[send-response!]]. That's all there is to \lp. Everything else is just the details of this strategy. Before you investigate the source code of \lp, you should have {\tt SBCL} installed---the Common Lisp compiler we've been using---and the package [[:loop]]. This way you can interact with the system: % \begin{verbatim} CL-USER> (ql:quickload :loop) To load "loop": Load 1 ASDF system: loop ; Loading "loop" .................. (:LOOP) CL-USER> (in-package :loop) # LOOP> (make-request "HELP") #S(REQUEST :VERB "HELP" :ARGS NIL :SAID NIL) \end{verbatim} % The procedure [[make-request]] constructs a [[request]]. When a user connects to the \lp, each command sent by the user is packaged inside a [[request]] structure like this one above. To see how \lp\ answers this request for help, we can invoke the procedure [[dispatch]], which is responsible to find the correct procedure to reply to the request. \begin{verbatim} LOOP> (dispatch (make-request :verb "HELP")) #S(RESPONSE :CODE 400 :DATA "unrecognized command" :REQUEST #S(REQUEST :VERB "HELP" :ARGS NIL :SAID NIL) :MULTI-LINE NIL) \end{verbatim} It turns that \lp\ doesn't recognize the command. That's because we only loaded the [[:loop]] package. The reason it doesn't recognize the command is that the command table has not been built. Let's build it and try again: \begin{verbatim} LOOP> (set-up-tables!) (("ARTICLE" . #S(COMMAND :FN # :VERB "ARTICLE" :DESCRIPTION "fetches full articles")) [...] ("HELP" . #S(COMMAND :FN # :VERB "HELP" :DESCRIPTION "displays this menu")) [...]) LOOP> (dispatch (make-request :verb "HELP")) #S(RESPONSE :CODE 200 :DATA (87 104 97 [...] 112) :REQUEST #S(REQUEST :VERB "HELP" :ARGS NIL :SAID NIL) :MULTI-LINE YES) \end{verbatim} That's better. These numbers we see in the response are the bytes in the response. You can get a string version of these numbers: \begin{verbatim} LOOP> (bytes->string (response-data (dispatch (make-request :verb "HELP")))) "What's on the menu today? ARTICLE fetches full articles AUTHINFO makes me trust you BODY fetches an article body GROUP sets the current group HEAD fetches article headers HELP displays this menu LIST lists all groups MODE handles the mode request from clients NEXT increments the article pointer POST posts your article QUIT politely says good-bye XDD displays your state of affairs XOVER fetches the overview database of a group" \end{verbatim} This is the text that the user sees when they ask for {\tt HELP}. In other words, the field {\tt data} in a [[response]] stores the data to be delivered back to the user. The program {\tt nc}---for ``netcat''---that we use below is capable of opening a TCP connection and handling that connection to our keyboard so that we can interact with the \lp. You can effectively achieve the same thing using a program such as {\tt telnet}. \begin{verbatim} C:\>nc antartida.xyz 119 200 Welcome! Say ``help'' for a menu. help 200 What's on the menu today? ARTICLE fetches full articles AUTHINFO makes me trust you BODY fetches an article body GROUP sets the current group HEAD fetches article headers HELP displays this menu LIST lists all groups MODE handles the mode request from clients NEXT increments the article pointer POST posts your article QUIT politely says good-bye XDD displays your state of affairs XOVER fetches the overview database of a group . quit 205 Good-bye. \end{verbatim} Despite \lp\ being made to talk to computer programs such as the ones illustrated in Figures~\ref{fg:gnus}--\ref{fg:sylpheed}, it's perfectly possible for a user to interact with \lp\ directly using a keyboard and a command-line tool such as {\tt nc} or {\tt telnet}. In fact, \lp\ takes advantage of that to be {\em hackable}---see also Section~\ref{sec:repl}. Commands such as [[CREATE-ACCOUNT]], [[CREATE-GROUP]], [[PASSWD]] are not part of the NNTP protocol, so users need to know how to use {\tt nc} or {\tt telnet} to take advantage of all of \lp's capabilities. \section{NNTP protocol} An Internet protocol is usually defined by a document whose tradition calls RFC---for ``request for comments''---, an acronym suggested by Stephen D.~Crocker, who wrote RFC 1 back in 1969, when ARPANET connected four computers. NNTP is defined by RFCs 977, 2980, 3977, 4643 and 5536. RFC 977 was the first and replaced by 3977. Still, reading RFC 977 is interesting precisely because it gives us a historical account of the protocol, making it easier to understand the evolution of the system. The objective of RFC 2980 was to implement new ideas to the NNTP protocol---to extend the protocol. RFC 3977 adopts some of these extensions. RFC 4643 also extends RFC 2980---addressing concerns with authentication. \section{It's a network server} The \lp\ program is a network server, that is, it serves connections on the network. However, instead of implementing the handling of TCP connections, we use what is called a ``TCP superserver'' or ``TCP wrappers'' or perhaps just ``TCP server''. The idea is---a program called ``TCP server'' waits for connections from the network. When a client arrives, the TCP server handles the network connection to \lp. From \lp's perspective, the client is just another keyboard talk to it directly. This strategy simplifies the implementation of \lp. Both programs---the TCP server and \lp---have nothing to do with one another, but their work together make the system work. This implementation strategy is typically found in UNIX programs. % \begin{quote}\small This is the Unix philosophy. Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams, because that is a universal interface.\\ --- Doug McIlroy, 1989, interviewed by Michael S.~Mahoney. \end{quote} % The TCP server just just one thing---listens for new connections and handles them to the interested program. It does the handling and does it well. \Lp, on the other hand, concerns itself with the NNTP protocol and does not worry about handling network connections. This way, they work together. And \lp\ handles only a text stream, which is why it's so easy to connect a keyboard to it and interact with it through the command line as illustrated in Section~\ref{sec:design}. \section{REPL for \lp}\label{sec:repl} There's a REPL to \lp's procedures---always useful. \begin{verbatim} %./loop --repl LOOP> (+ 1 1) 2 LOOP> (list-groups) (local.control.news local.test) \end{verbatim} The REPL can also be use in the middle of an NNTP conversation, which is useful for debugging. All authenticated users can use it, making passwords only a barrier to spammers, not to your circle of friends, since everyone can find a way to log in with anyone's account, change their passwords or just do whatever they want to \lxxp\ process or databases. As the comment below explains, we need to [[clear-output]] in the [[*standard-input*]] to avoid the following problem: % \begin{verbatim} repl LOOP> quit 200 Okay, no more REPL hacking. 400 I beg your pardon? \end{verbatim} % What happens is that [[read]] consumes just the symbol \verb|quit|, so there remains a \verb|\n| or \verb|\r\n| in the [[*standard-input*]], which we need to wipe out, or [[nntp-read-line]] will consume an empty line, which is what causes that 400 response. <>= (defun repl (r) (in-package :loop) (loop (print/finish "LOOP> ") (handler-case (let ((expr (read))) (if (eq 'quit expr) (progn ;; At this point, there's \r\n still in the stdin due ;; to read having returned a symbol followed by a line ;; termination, so we must wipe that out or nntp-read-line ;; will find an empty line after this. (clear-input) (return (make-response :code 200 :request r :data "Okay, that's enough hacking for today."))) (println "~a" (eval expr)))) (end-of-file () (print/finish "^D~%") (uiop:quit 0)) (interactive-interrupt () (print/finish "^C~%") (uiop:quit 0)) (t (c) (print/finish "Oops: ~a~%" (str:collapse-whitespaces (str:replace-all (string #\linefeed) " " (fmt "~a" c)))))))) @ %def repl \section{Representation of a client} How do we represent a client? A client is typically reading a group and an article; it's has authenticated itself or not yet. So we need a global structure to annonate the client's state. <>= (defstruct client group (article 1) (username "ANONYMOUS") (auth? 'no)) (defparameter *client* (make-client)) @ %def client *client* \section{Representation of a command} What does a client typically tell \lp? A client typically sends commands. Commands typically need arguments. Each command is dispatched to a procedure that answers it---it's the purpose of \verb|fn| in the command. Together, all commands make up a table of commands, which is essentially what the user sees when ask for \verb|HELP|. (Be aware that some clients use the output of \verb|HELP|. For example, Gnus v5.13.) <>= (defstruct command fn verb description) (defparameter *commands-assoc* nil) @ <>= (defun table-of-commands () `(("GROUP" ,#'cmd-group "sets the current group") ("NEXT" ,#'cmd-next "increments the article pointer") ("HELP" ,#'cmd-help "displays this menu") ("LIST" ,#'cmd-list "lists all groups") ("AUTHINFO" ,#'cmd-authinfo "makes me trust you") ("LOGIN" ,#'cmd-login "shorter interface to AUTHINFO") ("HEAD" ,#'cmd-head "fetches article headers") ("MODE" ,#'cmd-mode "handles the mode request from clients") ("BODY" ,#'cmd-body "fetches an article body") ("POST" ,#'cmd-post "posts your article") ("ARTICLE" ,#'cmd-article "fetches full articles") ("XOVER" ,#'cmd-xover "fetches the overview database of a group") ("CREATE-GROUP" ,#'cmd-create-group "creates a new group so you can discuss your favorite topic") ("CREATE-ACCOUNT",#'cmd-create-account "creates an account so you can invite a friend") ("PASSWD" ,#'cmd-passwd "changes your password") ("USERS" ,#'cmd-list-users "lists all users") ("DD" ,#'cmd-dd "[d]isplays [d]ata: your state of affairs") ("QUIT" ,#'cmd-quit "politely says good-bye") ("DATE" ,#'cmd-date "displays the current date at this server") ("UNLOCK-ACCOUNT" ,#'cmd-unlock-account "unlocks an account") ("REPL" ,#'cmd-repl "lets you hack away"))) (defun set-up-tables! () (labels ((build-commands-assoc (ls) (if (null ls) nil (cons (apply #'make-command-pair (car ls)) (build-commands-assoc (cdr ls))))) (make-command-pair (name fn desc) (cons name (make-command :fn fn :verb name :description desc)))) (setf *commands-assoc* (sort (build-commands-assoc (table-of-commands)) #'string-lessp :key #'car)))) (defun get-command (key) (let ((cmd (assoc key *commands-assoc* :test #'string=))) (labels ((unrecognized-command () (make-command :fn #'(lambda (r) (make-response :code 400 :data "unrecognized command" :request r)) :verb 'unrecognized :description "a command for all commands typed wrong"))) (or (cdr cmd) (unrecognized-command))))) @ %def *commands-assoc* set-up-tables! get-command \section{Representation of requests and responses} Each command is given through a text line written by the user. Let's call this text line the [[request]]. When \lp\ parses the request, it will extract (from the request) a verb and some arguments. We will take a verbatim copy of everything the user has said, possibly for debugging purposes. How do we represent a [[response]]? A [[response]] is always a reaction to a [[request]]. The NNTP protocol always specifies an integer as the code to a response, which is what we call the \verb|code| in the response. Long responses end with a period and we mark such responses with [[multi-line]]. <>= (defstruct request verb args said) (defstruct response code data request multi-line) @ <>= (defun empty-response () (make-response :code 400 :data "I beg your pardon?")) (defun prepend-response-with (message r) (make-response :code (response-code r) :data (data message (crlf) (response-data r)) :multi-line (response-multi-line r) :request (response-request r))) @ %def request response make-response make-request empty-response prepend-response-with Here's how to send a [[response]] to a client. <>= (defun append-crlf-if-needed (seq) (cond ((stringp seq) (append-crlf-if-needed (string->bytes seq))) ((listp seq) (append seq (when (not (= (car (last seq)) 10)) (list 13 10)))) (t (error (format nil "append-crlf-if-needed: unsupported type: ~a" (type-of seq)))))) (defun send-response! (r) (let ((bs (data (integer->string (response-code r)) " " (append-crlf-if-needed (response-data r))))) (my-write bs *standard-output*) (stderr ">>> ~a" (bytes->string (ucs-2->ascii bs)))) (when (response-multi-line r) (let ((bs (data "." (crlf)))) (my-write bs *standard-output*) (stderr ">>> ~a" (bytes->string (ucs-2->ascii bs))))) (force-output) r) @ %def send-response! The Windows Console---the one we use when we run {\tt cmd.exe}---makes the Lisp procedure [[write-sequence]] produce UCS-2. Windows will do what we need of [[write-sequence]] if we're not operating \lp\ on the Windows Console. Recall that we do not run \lp\ on Windows Console, so whatever happens because of it is of little importance to us. However, we implement \lp\ on Windows and so it's convenient for us that \lp\ and Windows play nicely with each other. A cheap solution here is to simply convert the bytes to a string if \lp\ is directly connected to an interactive Lisp stream. This way we effectively eliminate the UCS-2 encoding used by Windows. It is perfectly fine for us to destroy the encoding of articles while we're writing \Lp. It is not fine, however, when it's running in production. But, in production, {\tt (interactive-stream-p s)} will always be false. How else should we handle this? <>= (defun my-write (ls-of-bytes s) (if (interactive-stream-p s) (write-sequence (mapcar #'code-char ls-of-bytes) s) (write-sequence ls-of-bytes s))) @ %def my-write \section{Main loop} Every command consumes a [[request]] and produces a [[response]]. If any procedure always produces a [[response]], then delivering a [[response]] to the user is a matter of sending a string composed of the [[response]] code concatenated with the [[response]] data. What does \lp\ do? It repetitively reads a line from the user, processes that line and always replies something back. Then \lp\ is back at waiting for the user to say something else again. If the user says {\tt QUIT}, then we should identify it and terminate \lp's execution. That's even [[send-response!]] returns the [[request]] itself---so we can cascade actions based on a user's request. <
>= (defun main-loop () (loop (let* ((bs (nntp-read-line)) (ln (bytes->string (ucs-2->ascii bs)))) (let ((r (send-response! (dispatch-line ln)))) (when (response-quit? r) (return)))))) (defun request-quit? (r) (and r (string= 'quit (request-verb r)))) (defun response-quit? (r) (and r (request-quit? (response-request r)))) (defun server-start () (set-up-tables!) (send-banner!) (main-loop)) (defun main () (let ((app (cli/command))) (clingon:run app))) (defun send-banner! () (send-response! (make-response :code 200 :data "<>"))) @ %def main main-loop \noindent It's always useful to know which version exactly we're dealing with: % \begin{verbatim} %./loop.exe 200 Welcome! I am LOOP 9575ac2. Say ``help'' for a menu. \end{verbatim} % So we put a release tag on \lp's banner. <>= Welcome! I am <> <>. Say ``help'' for a menu. @ \noindent We take the opportunity and describe \lp's package, information which we also use in [[loop.asd]]. <>= LOOP @ <>= An NNTP server for a circle of friends. @ <>= a89e088 @ \section{Parsing of requests} The commands themselves we call {\tt verbs} and everything else the user types we call {\tt args}. Observe that upper and lower case letters are equivalent in request verbs. <>= (defun parse-request (r) (let* ((collapsed-s (str:collapse-whitespaces (request-said r))) (ls (str:split " " collapsed-s :omit-nulls 'please))) ;; What are we going to do with a null request? (cond ((null ls) (make-request :said (request-said r))) (t (let ((verb (car ls)) (args (cdr ls))) (make-request :said (request-said r) :verb (str:upcase verb) :args args)))))) @ %def parse-request \section{Parsing of command-line arguments} We're using the clingon library as per Vincent Dardel suggestion in ``The Common Lisp Cookbook''. We begin with writing a description of the program and options it understands. XXX: notice I don't know how to support a two-argument option, so I hacked a solution away. What we need to is to implement a new option. The library is extensible. <>= (defun cli/options () (list (clingon:make-option :string :description "creates a new account" :long-name "create-account" :key :create-account) (clingon:make-option :string :description "changes password" :long-name "change-passwd" :key :change-passwd) (clingon:make-option :flag :description "lists accounts" :short-name #\l :long-name "list-accounts" :key :list-accounts) (clingon:make-option :flag :description "runs a REPL right now" :short-name #\r :long-name "repl" :key :repl) (clingon:make-option :flag :description "disables the NNTP REPL" :long-name "disable-nntp-repl" :key :disable-nntp-repl) (clingon:make-option :flag :description "logging (on stderr)" :long-name "logging" :key :logging))) @ We implement first the procedures that handle options that represent an entire program. For example, saying [[--list-accounts]] is like running a program [[./list-accounts]]. <>= (defun cli/list-accounts () (println (str:join (crlf-string) (list-users)))) (defun cli/create-account (username args) (let ((invited-by (car args))) (cond ((null invited-by) (println "Must specify who invites the new account.")) ((get-account username) (println "Username account ``~a'' already exists." username)) ((not (get-account invited-by)) (println "Invited-by account ``~a'' doesn't exist." invited-by)) (t (multiple-value-bind (okay? pass-or-error) (new-account! username invited-by) (if okay? (progn (println "Okay, account ``~a'' created with password ``~a''." username pass-or-error) (notify-user-created username)) (println "Sorry, ~a." pass-or-error))))))) (defun cli/change-passwd (username args) (let* ((random-passwd (random-string 6)) (given-passwd (car args)) (new-passwd (or given-passwd random-passwd))) (if (not (get-account username)) (println "No such account ``~a''." username) (multiple-value-bind (okay? problem) (change-passwd! username new-passwd) (if okay? (println "Okay, account ~a now has password ``~a''." username new-passwd) (println "Sorry, could not change password: ~a." problem)))))) @ %def cli/change-passwd cli/list-accounts cli/create-account Now let's write the main procedure in command-line parsing. Notice that because of the design of the [[clingon]] library, command-line parsing becomes the main procedure of \lp. In other words, \lp's service starts with [[server-start]]. <>= (defun cli/main-with-handlers (cmd) (handler-case (cli/main cmd) (end-of-file () (print/finish "^D~%") (uiop:quit 0)) (interactive-interrupt () (print/finish "^C~%") (uiop:quit 0)))) (defun cli/main (cmd) (read-accounts!) (connect-index! "message-id.db") (create-index!) (let ((args (clingon:command-arguments cmd)) (run-server t) (repl (clingon:getopt cmd :repl)) (ca (clingon:getopt cmd :create-account)) (pa (clingon:getopt cmd :change-passwd)) (la (clingon:getopt cmd :list-accounts)) (logging (clingon:getopt cmd :logging)) (disable-nntp-repl (clingon:getopt cmd :disable-nntp-repl))) (setf *debug* logging) (when (or ca pa la) (setf run-server nil)) (when la (cli/list-accounts)) (when ca (cli/create-account ca args)) (when pa (cli/change-passwd pa args)) (when repl (repl (make-request :verb "repl" :args '(command-line)))) (when disable-nntp-repl (setq *enable-nntp-repl* nil)) (when run-server (server-start)))) (defun cli/command () (clingon:make-command :name "loop" :description "<>" :version "<>" :license "GPL v3" :options (cli/options) :handler #'cli/main-with-handlers)) @ %def cli/options cli/command \section{Request dispatching mechanism} Dispatching requests means consuming one and invoking the correct procedure that will process the request. The invoked procedure must produce a [[response]]. The work of dispatching is just delivering the task to an response-producing procedure and then raising the response to whoever needs to catch it. For example, [[response-quit?]] is used by [[main-loop]] to identify when the user has issued {\tt QUIT}, in which case we terminate [[main-loop]]. <>= (defun dispatch (r) (let* ((verb (request-verb r))) (if (null verb) (empty-response) (funcall (command-fn (get-command verb)) r)))) (defun dispatch-line (ln) (dispatch (parse-request (make-request :said ln)))) @ %def dispatch dispatch-line \section{Representation and parsing of articles} An article is made of two parts, the head and the body. We do need to parse the head, but we never parse the body: we don't want to intervene in anything that a user might be doing in the body of the article. The headers, however, are mostly under the jurisdiction of the server. This decision is due to the fact servers must read headers. For example, how do we know to which groups an article was posted? We look at the header {\tt newsgroups}. So, the server must understand the encoding of headers. Therefore, we assume ASCII encoding of all headers that we need to parse. The member [[headers]] of the structure [[article]] is just a string, while body is a vector of bytes. To get a list of pairs out of the set of all headers of an article, we can ask [[parse-headers]]. Yes, I should've called the member [[headers]] as [[head]] and not [[headers]] because both the word ``headers'' and its plural used here suggest a list of parsed headers. We're going to rename this in due time. %% TODO <>= (defstruct article headers body) @ <>= (defun parse-article (v) (let ((parts (split-vector (vector 13 10 13 10) v nil :limit 2))) (make-article :headers (map 'string #'code-char (car parts)) :body (cadr parts)))) (defun hs-space-collapsed (hs) (cl-ppcre:regex-replace-all (format nil "~a[ ~a]+" (crlf-string) #\tab) hs " ")) (defun hs-lines (lines) (str:split (crlf-string) lines)) (defun parse-header (header) (let* ((h (str:collapse-whitespaces header)) (pos (search ":" h))) (when (null pos) (throw 'article-syntax-error (values nil (format nil "missing colon in header |~a|" h)))) (when (<= (length h) (+ 2 pos)) (throw 'article-syntax-error (values nil (format nil "empty header ~a" h)))) (multiple-value-bind (key val) (values (subseq h 0 pos) (subseq h (+ 2 pos))) (cons (str:downcase key) val)))) (defun parse-headers (hs) (let ((ls (hs-lines (hs-space-collapsed hs)))) (mapcar #'(lambda (h) (parse-header h)) ls))) (defun string-integer? (s) (ignore-errors (parse-integer s))) @ %def parse-article parse-headers We now write some procedures that we use when we're build the {\em overview} of the command \verb|XOVER|. <>= (defun get-header-from-article (h a) (get-header h (parse-headers (article-headers (parse-article a))))) (defun get-header (key hs) (let ((pair (assoc key hs :test #'string=))) (if pair (cdr pair) ""))) (defun fetch-headers (g i) (let* ((a-string (fetch-article g i)) (a-parsed (parse-article a-string)) (headers (parse-headers (article-headers a-parsed)))) (enrich-headers headers a-string))) (defun enrich-headers (hs a) (append hs `(("line-count" . ,(format nil "~a" (nlines (article-body (parse-article a))))) ("byte-count" . ,(format nil "~a" (length a)))))) (defun nlines (v) (length (split-vector (crlf) v nil))) @ %def get-header fetch-headers \section{How to extract articles from the database} Notice that we do not care about which encoding is used in articles. We just read the article bytes and handle them to the client. It's the article viewer's---the NNTP client's, that is---responsibility of interpreting such bytes. That's why we call [[read-sequence]] here. <>= (defun fetch-article (g i) (in-groups (read-file-raw (format nil "~a/~a" g i)))) (defun read-file-raw (path) (with-open-file (in path :element-type '(unsigned-byte 8) :if-does-not-exist nil) (when in (let* ((size (sb-posix:stat-size (sb-posix:stat path))) (a (make-array size))) (read-sequence a in) a)))) (defun fetch-body (g i) (article-body (parse-article (fetch-article g i)))) @ %def fetch-article The purpose of [[encode-body]] is to eventually worry about the appearance of a period on a line by itself in the middle of an article. Since ``\verb|.\r\n|'' is part of the NNTP protocol, we must handle this gracefully---but notice we have not done anything about that so far. So we are essentially writing a bug right here. <>= (defun encode-body (a) a) @ %def encode-body The procedures [[extract-mid]] and [[lookup]] also belong belong in this section. Notice that I also wrote [[mid-by-name]], which should merge with [[extract-mid]]. I think I also wrote more redundancies---perhaps in the implementatio nof [[xover]]---for not using [[lookup]]. I need to seek out all such places and organize. %% TODO <>= (defun extract-mid (a) (lookup "message-id" (parse-headers (article-headers (parse-article a))))) (defun lookup (key table) (cdr (assoc key table :test #'string=))) @ %def extract-mid lookup \section{Commands} \subsection{{\tt HELP}} When someone asks for help, we present a table of commands. The table construction is made by [[menu]]. The procedure [[menu]] was one of the first things I wrote. (This is my first program written in Common Lisp.) I didn't want to get involved with the famous [[loop]] macro, so I used recursion in [[menu]]\footnote{I'd like to isolate these auxiliary procedures inside a single function that uses them. Common Lisp offers me [[labels]], but [[labels]] don't seem so helpful when I'm at the REPL. When I use [[defun]], I'm able to always invoke the procedure at the REPL, but that's not so with [[labels]]. I guess the use of [[labels]] is when the procedure is so trivial that we have no reason to think we're doing to debug it.} <>= (defun cmd-help (r) (let ((lines (menu *commands-assoc*))) (prepend-response-with "What's on the menu today?" (make-response :code 200 :multi-line 'yes :request r :data (str:join (crlf-string) lines))))) (defun menu (ls) (loop for item in ls collect (display-fn item))) (defun display-fn (cmd-pair) (let ((cmd (cdr cmd-pair))) (format nil "~A ~A" (command-verb cmd) (command-description cmd)))) @ \subsection{{\tt AUTHINFO}}\label{sec:authinfo} The implementation of {\tt AUTHINFO}. When we connect to \lp\ directly from a keyboard, it's a bit painful to authenticate with two commands---{\tt AUTHINFO user} and {\tt AUTHINFO pass}. So we also implemented the command {\tt LOGIN}---see Section~\ref{sec:login}. To check the user's password, we use the procedure [[pass?]] that's defined in the implementation of {\tt PASSWD}. Perhaps we should have called it {\tt is-password-correct?} or something more obvious. <>= Syntax error. Say ``authinfo USER /your-name/'' then ``authinfo PASS /your-pass/''. @ <>= (defun cmd-authinfo (r) (let* ((args (mapcar #'str:upcase (request-args r)))) (cond ((not (= (length args) 2)) (bad-input r "No, no: I take exactly two arguments.")) (t (multiple-value-bind (cmd arg) (apply #'values args) (cond ((string= cmd "USER") (setf (client-username *client*) arg) (make-response :code 381 :request r :data (format nil "Hey, ~a, please tell us your password." arg))) ((string= cmd "PASS") (if (authinfo-check (client-username *client*) arg) (progn (log-user-in!) (make-response :code 281 :request r :data (fmt "Welcome, ~a." (client-username *client*)))) (make-response :code 400 :request r :data "Sorry. Wrong password."))) (t (make-response :code 400 :request r :data "<>")))))))) (defun authinfo-check (username passwd) (pass? username passwd)) (defun auth? () (eq 'yes (client-auth? *client*))) (defun log-user-in! () (setf (client-auth? *client*) 'yes) (let ((u (get-account (client-username *client*)))) (setf (account-seen u) (get-universal-time))) (write-accounts!)) @ %def auth? log-user-in! \subsection{{\tt CREATE-ACCOUNT}} We allow authenticated members to invite their friends, which creates a tree of people. ({\em An idea}. We could envision that each tree trunk manages the rest of the tree underneath it. So I invite you, I could change your password, say, or handle any problems you might have. This decentralizes system administration, easing the support burden.) The name of each user must conform to the expression <
>= ^[^\\s]+$ @ Same as in @<>. We'll let users create whatever complicated user names they want. If they can type it up, it's their problem. <>= (defun user-name-conforms? (u) (conforms-to? u "<>")) <>= (defun cmd-create-account (r) (with-auth (with-n-args 1 r (let* ((args (mapcar #'str:upcase (request-args r))) (username (car args))) (multiple-value-bind (username pass-or-error) (new-account! username (client-username *client*)) (if (not username) (make-response :code 400 :request r :data (fmt "~a. Choose a new name." pass-or-error)) (progn (notify-user-created username) (make-response :code 200 :request r :data (fmt "Okay, account ~a created with password ``~a''." username pass-or-error))))))))) @ <>= (defstruct account username seen last-post friends pass pass-locked pass-locked-why creation) (defparameter *accounts* nil) @ %def *accounts* <>= (defun read-accounts! () (let ((*package* (find-package '#:loop))) (with-open-file (s "accounts.lisp" :direction :input) (setq *accounts* (read s)))) *accounts*) (defun string->array (s) (make-array (length s) :element-type '(unsigned-byte 8) :initial-contents (map 'vector #'char-code s))) (defun string->sha256 (s) (let ((d (ironclad:make-digest :sha256))) (ironclad:produce-digest (ironclad:update-digest d (string->array s))))) (defun new-account! (username invited-by) (let* ((u (str:upcase username)) (p (random-string 6)) (a (make-account :username u :pass (string->sha256 (str:upcase p)) :creation (get-universal-time)))) (multiple-value-bind (okay? reason) (user-name-conforms? u) (declare (ignore reason)) (cond ((not okay?) (values nil (fmt "username must conform to <>"))) ((get-account u) (values nil (fmt "account ~a already exists" u))) (t (push u (account-friends (get-account invited-by))) (push a *accounts*) (write-accounts!) (values (str:upcase username) p)))))) @ %def CREATE-ACCOUNT new-account! Notice that we have a race condition in [[write-accounts]]. What is the problem? Two processes in parallel may ask for the writing of [[accounts.lisp]]. The process that loses the race will have its modifications lost. What do we need to do? Either we use file locking or we do something smarter without a real file locking mechanism. It's not clear to me what is possible here, but this is definitely a problem that we need to solve. <>= (defun write-accounts! () (let ((name (loop (let* ((tmp (random-string 10)) (name (format nil "~a.tmp" tmp))) (when (ignore-errors (with-open-file (s name :direction :output :if-exists :error :if-does-not-exist :create) (write *accounts* :stream s))) (return name)))))) (if (ignore-errors (rename-file name "accounts.lisp")) (values t *accounts*) (values nil (format nil "could not rename ~a to accounts.lisp" name))))) (defun get-account (username) (loop for u in *accounts* do (when (string= (str:upcase username) (account-username u)) (return u)))) @ %def read-accounts! write-accounts! get-account \subsection{{\tt UNLOCK-ACCOUNT}} Inactive accounts are removed or locked---see Section \ref{sec:inactive-users}. When an account is locked, any member can unlock it. <>= (defun cmd-unlock-account (r) (with-auth (with-n-args 1 r (let* ((args (mapcar #'str:upcase (request-args r))) (username (car args))) (cond ((not (get-account username)) (make-response :code 400 :request r :data (fmt "No such account ~a." username))) ((not (locked? username)) (make-response :code 400 :request r :data (fmt "Can't unlock ~a because it's not locked." username))) (t (unlock-account! username) (notify-user-unlocked username) (make-response :code 200 :request r :data (fmt "Okay, account ~a unlocked." username)))))))) (defun unlock-account! (username) (let ((u (get-account username))) (cond ((not u) (values nil "no such account")) ((not (locked? username)) (values nil "account isn't locked")) (t (setf (account-pass u) (account-pass-locked u)) (setf (account-pass-locked u) nil) (setf (account-pass-locked-why u) nil))))) @ %def unlock-account! \subsection{{\tt LOGIN}}\label{sec:login} Besides {\tt AUTHINFO} in Section~\ref{sec:authinfo}, we also implement a more convenient command for authenticationwhen we interact with \lp\ through a command-line interface. Instead of having to say two commands, we can just say {\tt login user password}. <>= (defun cmd-login (r) (let* ((args (mapcar #'str:upcase (request-args r)))) (cond ((not (= (length args) 2)) (bad-input r "Usage: login your-username your-password")) (t (multiple-value-bind (name pass) (apply #'values args) (cond ((pass? name pass) (log-user-in-as! name) (make-response :code 200 :request r :data (fmt "Welcome, ~a." name))) (t (make-response :code 400 :request r :data (fmt "Wrong password."))))))))) (defun log-user-in-as! (name) (setf (client-username *client*) name) (log-user-in!)) @ %def log-user-in-as! \subsection{{\tt PASSWD}}\label{sec:passwd} A change of password is made with {\tt PASSWD current new}. Observe that we are duplicating code from other command procedures. I think there is a macro emerging here called [[with-upcase-args]]. %% TODO <>= (defun cmd-passwd (r) (with-auth (let* ((args (mapcar #'str:upcase (request-args r)))) (cond ((not (= (length args) 2)) (bad-input r "Usage: passwd current-password new-password")) (t (multiple-value-bind (cur new) (apply #'values args) (cond ((pass? (client-username *client*) cur) (multiple-value-bind (okay? problem) (change-passwd! (client-username *client*) new) (if okay? (make-response :code 200 :request r :data "You got it. Password changed.") (make-response :code 500 :request r :data (fmt "Sorry: ~a" problem))))) (t (make-response :code 400 :request r :data (fmt "Sorry. Wrong password.")))))))))) (defun pass? (username pass) (let ((u (get-account username))) (and u (cond ((null (account-pass u)) nil) ((integerp (account-pass u)) (eq (sxhash pass) (account-pass u))) ((arrayp (account-pass u)) (equalp (string->sha256 pass) (account-pass u))) (t (error "pass? expected to find INTEGERP or ARRAYP but found ~a" (type-of (account-pass u)))))))) (defun change-passwd! (username newpass) (let ((u (get-account username))) (when (not u) (error "I could not find account ~a." username)) (setf (account-pass u) (string->sha256 (str:upcase newpass))) (write-accounts!))) @ %def PASSWD pass? change-passwd! \subsection{{\tt USERS}}\label{sec:users} The tree of users and their friends is public. Anyone can know who invited who. <>= (defun cmd-list-users (r) (with-auth (prepend-response-with "List of current users:" (make-response :code 200 :request r :multi-line 'yes :data (str:join (crlf-string) (list-users)))))) (defun size-of-longest-username () (loop for u in *accounts* maximizing (length (account-username u)))) (defun list-users () (mapcar #'(lambda (row) (cadr row)) (sort (loop for u in *accounts* collect (list (account-username u) (fmt "~v@a~a, ~a, invited ~a" (size-of-longest-username) (account-username u) (if (locked? (account-username u)) (fmt " (account locked: ~a)" (account-pass-locked-why u)) "") (if (last-time-seen (account-username u)) (fmt "last seen on ~a" (last-time-seen (account-username u))) "never logged in") (or (account-friends u) "nobody")))) #'string<= :key #'(lambda (row) (car row))))) (defun universal-to-human (s) (format-timestring nil (universal-to-timestamp s) :format +asctime-format+)) (defun last-time-seen (username) (let ((u (get-account username))) (if u (let ((s (account-seen u))) (if s (universal-to-human s)))))) @ %def list-users last-time-seen size-of-longest-username \subsection{{\tt LIST}}\label{sec:list} The database of groups and articles is a UNIX directory. We just need to discover which directories exist and produce a listing. The heavy work here is finding the index interval of articles in the group. (I think we should already be optimizing this by merely caching the information in a file that is read at start-up. I think we should even cache the overview of the group.) %% TODO <>= (defstruct group name high low) (defun cmd-list (r) (prepend-response-with "Get in the loop! Lots to choose from." (make-response :code 215 :multi-line 'yes :data (str:join (crlf-string) (build-groups-lines (build-groups-structs))) :request r))) (defun build-groups-lines (ls) (reverse (mapcar #'(lambda (g) (format nil "~a ~a ~a y" (group-name g) (group-high g) (group-low g))) ls))) (defun build-groups-structs () (let ((ret-ls nil)) (dolist (g (list-groups) ret-ls) (multiple-value-bind (low high len) (group-high-low g) (declare (ignore len)) (setf ret-ls (cons (make-group :name g :high high :low low) ret-ls)))))) (defun between? (x from to) (<= from x to)) (declaim (inline between?)) (defun filesize (path) (sb-posix:stat-size (sb-posix:stat path))) (defun zero-file? (path) (= (filesize path) 0)) (defun temporary-article? (path) (or (zero-file? path) (cl-ppcre:scan "\.tmp$" (namestring path)))) (defun article-ready? (path) (not (temporary-article? path))) <> (defun get-articles (g &optional from to) (in-groups ;; We might want to optimize this some day. That's a ;; problem to be studied. (let ((as (articles->integers (remove-if #'temporary-article? (loop-list-files (truename g)))))) (sort (remove-if-not #'(lambda (x) (between? x (or from x) (or to x))) as) #'<)))) (defun group-high-low (g) (let* ((articles (get-articles g)) (sorted-ints (sort articles #'<))) (values (or (car sorted-ints) 0) (or (car (last sorted-ints)) 0) (length sorted-ints)))) (defun articles->integers (ls) (remove-if #'null (mapcar #'(lambda (g) (ignore-errors (parse-integer (basename (uiop:unix-namestring g))))) ls))) (defun list-groups () (let ((groups (in-groups (loop-list-directories (truename "."))))) (sort (mapcar #'(lambda (g) (basename (uiop:unix-namestring g))) groups) #'string-lessp))) (defun last-char (s) (char s (1- (length s)))) @ In [[basename]], what we want to do is---if the path ends with a slash, we invoke [[pathname]]. Otherwise, we invoke {\tt file-namestring}. <>= (defun basename (path) (let ((s (str:collapse-whitespaces path))) (if (char= #\/ (last-char s)) (car (last (pathname-directory s))) (file-namestring s)))) @ %def get-articles group-high-low To list directories and files, I have been using \href{https://github.com/Shinmera/filesystem-utils/tree/master}{{\tt filesystem-utils}} by Yukari Hafner. I found an issue with both [[list-directories]] and [[list-files]] in a fresh install of FreeBSD 14.2 and in a Debian 8.11 codename jessie. The issue is that the [[#+cffi]] chunk of the source code incorrectly produced [[NIL]]. (Dramatically, the same was not true in a FreeBSD 14.1.) The source code had an alternative chunk of code for [[#-cffi]] and I discovered that this alternative worked on these systems I tested. So, as a workaround, I incorporate these procedures below using the chunk [[#-cffi]] to get \Lp\ working on these systems. <>= (defun loop-directory* (directory &rest args &key &allow-other-keys) #+allegro (apply #'directory directory :directories-are-files NIL :follow-symbolic-links NIL args) #+(or clozure digitool) (apply #'directory directory :follow-links NIL args) #+clisp (apply #'directory directory :circle T :if-does-not-exist :ignore args) #+(or cmucl scl) (apply #'directory directory :follow-links NIL :truenamep NIL args) #+lispworks (apply #'directory directory :link-transparency NIL args) #+sbcl (apply #'directory directory :resolve-symlinks NIL args) #-(or allegro clozure digitool clisp cmucl scl lispworks sbcl) (apply #'directory directory args)) (defun loop-list-files (directory) (let ((directory (pathname-utils:to-directory directory))) (let* ((directory (pathname-utils:pathname* directory)) (entries (ignore-errors (loop-directory* (merge-pathnames pathname-utils:*wild-file* directory))))) (remove-if #'directory-p entries)))) (defun loop-list-directories (directory) (let ((directory (pathname-utils:to-directory directory))) (let* (#-(or abcl cormanlisp genera xcl) (wild (merge-pathnames #-(or abcl allegro cmucl lispworks sbcl scl xcl) pathname-utils:*wild-directory* #+(or abcl allegro cmucl lispworks sbcl scl xcl) "*.*" directory)) (dirs #+(or abcl xcl) (system:list-directory directory) #+cormanlisp (cl::directory-subdirs directory) #+genera (handler-case (loop for (p . k) in (fs:directory-list directory) when (eql :directory k) collect p) (fs:directory-not-found () nil)) #+clozure (ignore-errors (directory* wild :directories T :files NIL)) #+mcl (ignore-errors (directory* wild :directories T)) #-(or abcl xcl cormanlisp genera clozure mcl) (loop-directory* wild))) (loop for path in dirs when (directory-p path) collect (pathname-utils:force-directory path))))) @ %def loop-list-directories loop-list-files \subsection{{\tt GROUP}}\label{sec:group} We just need to verify if the group exists and modify [[*client*]]. <>= (defun cmd-group (r) (with-auth (with-n-args 1 r (let ((g (car (request-args r)))) (with-group g r (set-group! g) (multiple-value-bind (low high len) (group-high-low g) (let ((ln (format nil "~a ~a ~a ~a" len low high g))) (setf (client-article *client*) low) (make-response :code 211 :request r :data ln)))))))) (defun group? (g) (in-groups (ignore-errors (directory-p g)))) (defun xgroup? (g) (directory-p g)) (defun set-group! (g) (setf (client-group *client*) g)) @ %def group? Why have I written {\tt group?} and {\tt xgroup?}? There's probably a clean-up task here. %% TODO \subsection{{\tt BODY}, {\tt HEAD}, {\tt ARTICLE} e {\tt NEXT}} \label{sec:typical-cmds} Here we have an illustration of the expressive power of first-class procedures. We want to implement the commands {\tt BODY}, {\tt HEAD}, {\tt ARTICLE} and {\tt NEXT}. The ways to invoke them are (1) with no argument, (2) with a single integer argument and (3) with a text argument. In case (1), we use the number of the current article---which is kept in [[*client*]]. In case (2), the NNTP client specifies the article number. In case (3), the NNTP client specifies the exact {\tt message-id}, which forces us to query the index---see Section~\ref{sec:index} for the implementation of the index. <>= (defun typical-cmd-head-body-article (r fn-name) (with-auth (with-group-set (let ((args (request-args r))) (cond ((null args) (funcall fn-name r (client-group *client*) (client-article *client*))) ((= 1 (length args)) (let* ((n-or-mid (car args))) (cond ((string-integer? n-or-mid) (funcall fn-name r (client-group *client*) n-or-mid)) (t (multiple-value-bind (group n-str) (lookup-index n-or-mid) (if (and group n-str) (funcall fn-name r group n-str) (bad-input r (format nil "Unknown article ~a." n-or-mid)))))))) (t (bad-input r "No, no: it takes at most two arguments."))))))) (defun cmd-head (r) (typical-cmd-head-body-article r #'head-response)) (defun cmd-body (r) (typical-cmd-head-body-article r #'body-response)) (defun cmd-article (r) (typical-cmd-head-body-article r #'article-response)) (defun article-response (r g i) (typical-cmd-response 220 r g i #'(lambda (a) (encode-body a)))) (defun head-response (r g i) (typical-cmd-response 221 r g i #'(lambda (a) (article-headers (parse-article a))))) (defun body-response (r g i) (typical-cmd-response 222 r g i #'(lambda (a) (encode-body (article-body (parse-article a)))))) @ In processing such commands, we need to verify the existence of files {\em et cetera}. The procedure [[fetch-article]] returns [[nil]] when the article requested doesn't exist and it throws [[sb-posix:syscall-error]] due to [[sb-posix:stat-size]], which we need to find the size of the article. We need to know the file size not only to allocate an array at the right size, but also because we must provide the size when producing the \verb|OVERVIEW|. If a problem such as [[sb-posix:syscall-error]] appears, we just inform the client and terminate the request---nothing else to do. XXX: instead of only catching [[sb-posix:syscall-error]], we should catch anything else, reporting the error. Otherwise, we will blow up in case of some unexpected error, which might not be a bad idea---as long as we can log these errors and get a report later on of what's going on so we can improve the code. I still don't really know what to do here. Let's leave it as it is. The original idea is to put a [[t]]-case in the [[handler-case]] below and just log the error instead of crashing completely. We can simulate the catching of an unexpected condition by signaling it from fetch-article as a test. This type of situation should have a testing routine as well. So, yeah, first give yourself another read of the [[lisp-unit]] documentation, then how to handle conditions properly and then come back to this to-do item. <>= (defun typical-cmd-response (code r g i get-data) (handler-case (let ((a (fetch-article g i))) (cond ((null a) (make-response :code 400 :request r :data (format nil "article ~a/~a does not exist" g i))) (t (prepend-response-with (format nil "~a ~a" i (extract-mid a)) (make-response :multi-line 'yes :code code :request r :data (funcall get-data a)))))) (sb-posix:syscall-error (c) (make-response :code 400 :request r :data (format nil "article ~a/~a: ~a" g i c))))) @ %def typical-cmd-response The command \verb|NEXT| has a slight different semantics. <>= (defun cmd-next (r) (with-auth (let ((g (client-group *client*)) (n-cur (client-article *client*))) (cond ((not g) (bad-input :code 412 r "must say GROUP first")) (t (multiple-value-bind (low high len) (group-high-low g) (declare (ignore low len)) (cond ((= n-cur high) (bad-input r "you are at the last article already")) (t (article-next! r g))))))))) (defun article-next! (r g) (setf (client-article *client*) (1+ (client-article *client*))) (let ((cur (client-article *client*))) (make-response :code 223 :request r :data (format nil "~a ~a" cur (mid-by-name g cur))))) (defun mid-by-name (g name) (extract-mid (fetch-article g name))) @ %def cmd-next article-next! mid-by-name \subsection{{\tt XOVER}}\label{sec:xover} The procedure [[cmd-xover]] is used to figure out what the user said. Once we have that figured out, we invoke [[xover]], which finishes the work. Notice that when the argument [[to]] from [[xover]] is [[NIL]], then the user is asking for articles indexed from the integer [[fr]] to the last one. <>= (defun cmd-xover (r) (with-auth (with-group-set (let ((args (request-args r))) (cond ((null args) (xover r (client-article *client*) (client-article *client*))) ((= 1 (length args)) (multiple-value-bind (s v) (cl-ppcre:scan-to-strings "([0-9]+)([-]?)([0-9]*)" (car args)) (cond ((not s) (make-response :code 502 :request r :data "bad syntax")) (t (let ((fr (parse-integer (aref v 0))) (hifen (aref v 1)) (to (ignore-errors (parse-integer (aref v 2))))) (when (not (string= hifen "-")) (setq to fr)) (xover r fr to)))))) (t (make-response :code 502 :request r :data "bad syntax"))))))) (defun xover (r from to) (assert (client-group *client*)) (let* ((g (client-group *client*)) (ls (get-articles g from to))) (cond ((= 0 (length ls)) (make-response :code 420 :request r :data "no articles in the range")) (t (prepend-response-with "Okay, your overview follows..." (make-response :code 224 :request r :multi-line 'yes :data (str:join (crlf-string) (loop for i in ls collect (xover-format-line i (remove-if-not #'(lambda (h) (member (car h) (xover-headers) :test #'string=)) (fetch-headers g i))))))))))) (defun xover-format-line (i hs) (str:concat (format nil "~a~a" i #\tab) (str:join #\tab (mapcar #'(lambda (h) (get-header h hs)) (xover-headers))))) (defun xover-headers () '("subject" "from" "date" "message-id" "references" "line-count" "byte-count")) @ \subsection{{\tt MODE READER}}\label{sec:mode-reader} So, we're always in reader mode, so we just ignore this command. <>= (defun cmd-mode (r) ;; Whatever. (make-response :code 200 :request r :data "Sure thing.")) @ %def \subsection{{\tt DATE}}\label{sec:date} It's always useful to know the time and date at a computer. We should surely format it a bit better than what {\tt now} does. <>= (defun cmd-date (r) (make-response :code 201 :request r :data (format-timestring nil (now)))) @ %def \subsection{{\tt QUIT}}\label{sec:quit} The use of {\tt QUIT} has a conection to [[main-loop]]: when the user says {\tt QUIT}, [[main-loop]] must terminate. <>= (defun cmd-quit (r) (make-response :code 205 :data "Good-bye." :request r)) @ %def \subsection{{\tt DD}}\label{sec:dd} The command {\tt DD} means ``[d]isplay client [d]ata structures''. It shows to the client the internal state of how the server sees it. I've used only for debugging and it's not really useful any longer. I'm going to remove this very soon. <>= (defun cmd-dd (r) (make-response :code 200 :data (format nil "state: ~a" *client*) :request r)) @ %def \subsection{{\tt POST}}\label{sec:post} If the client says {\tt POST}, then we continue to read line after line until we find \verb|".\r\n"|. Having done that, we must check whether we have a conformant article at hands. The definition of conformant is given by [[conforms?]]. In a few words, the article must have \verb|message-id|, \verb|subject|, \verb|from|, \verb|newsgroups|. If the client doesn't provide us with a \verb|message-id|, then \lp\ adds one. (Similarly for \verb|date|.) <>= <> (defun suggest-message-id (&optional (n 20)) (format nil "<~a@loop>" (random-string n))) (defun random-string (size) (let* ((universe "abcdefghijklmnopqrstuvwxyz") (len (length universe)) (state (make-random-state t)) mid) (dotimes (c size) (setq mid (cons (char universe (random len state)) mid))) (coerce mid 'string))) @ Sometimes we parse an article and sometimes we want to undo that parsing. Am I doing something wrong? I wonder. <>= (defun unparse-article (parsed) (data (let ((ls)) (dolist (h (parse-headers (article-headers parsed))) (setq ls (cons (data (str:capitalize (car h)) ": " (cdr h) (crlf)) ls))) (nreverse ls)) (crlf) (article-body parsed))) @ %def unparse-article If an article being posted has no \verb|message-id| or \verb|date|, \lp\ provides these headers. We kill these two rabbits with [[ensure-header]], but we should probably make a table of headers and procedures that would generate such headers if they're missing. Right now, however, we have only these two to worry about. <>= (defun ensure-header (h fn bs) (let* ((headers (parse-headers (article-headers (parse-article bs))))) (if (lookup h headers) bs (unparse-article (make-article :headers (str:join (crlf-string) (mapcar #'(lambda (h) (format nil "~a: ~a" (car h) (cdr h))) (cons (cons h (funcall fn)) headers))) :body (article-body (parse-article bs))))))) (defun get-date () (multiple-value-bind (s m h day mon year dow dst-p tz) (get-decoded-time) (declare (ignore dow dst-p)) (format nil "~4,'0d-~2,'0d-~2,'0d ~2,'0d:~2,'0d:~2,'0d GMT~a" year mon day h m s (- tz)))) (defun ensure-mid (bs) (ensure-header "message-id" #'suggest-message-id bs)) (defun ensure-date (bs) (ensure-header "date" #'get-date bs)) @ %def ensure-mid ensure-date Now it's time to look at the header \verb|newsgroups|. XXX: we need to rewrite this because we have that plan of verifying everything there is to verify up front in [[conforms?]]. So when we invoke [[post]], there's nothing else to verify. We're verifying in two places at the same time. The name of each group must conform to the expression <>= ^[^\\s/]+$ @ In other words, let group names go wild. They cannot contain a slash or a space of any kind anywhere on the name---more literally: they must begin with any character that's not a space and must have at least one character. The problem wish slashes is that each group will be a directory on a UNIX file system, so we cannot let slashes appear. People should have total freedom in naming groups. If users create groups that mess up the local organization, then people should discuss the matter and find a solution. Let's let people mess it up instead of trying to stop them. <>= (defun cmd-post (r) (with-auth (send-response! (make-response :code 340 :data (format nil "Okay, go ahead. Suggested message-id ~a." (suggest-message-id)))) (let* ((bs (nntp-read-article))) (multiple-value-bind (okay? error) (conforms? bs) (cond ((not okay?) (make-response :code 400 :request r :data (format nil "Sorry. Your article doesn't conform: ~a." error))) (t (multiple-value-bind (code reply) (post bs) (make-response :code code :request r :data reply)))))))) @ It's time to write the action of posting. One thing to keep in mind is cross-posting. First, notice that we're---so far---duplicating articles on the file system. (We will undo that once we reimplement our index. More to follow.) More importantly, we cannot let the user post to any group if one of the groups is incorrectly named---for example, when the group doesn't exist. Why don't we post to the ones that are correct and warn the user of the ones that are incorrect? Because that is not prudent. The user could be trying to publish news to be received at the same time by various groups. We would make such plans all go down the drain. We collect a list of newsgroups that don't exist (or whose names do not conform for any reason). If we find any such group, then we refuse posting and return a 400 code with a message describing which group names failed. Otherwise we save the article. <>= (defun post (bs) (let ((ngs (newsgroups-header->list (get-header "newsgroups" (parse-headers (article-headers (parse-article bs)))))) (ngs-dont-exist)) (dolist (g ngs) (if (or (not (group-name-conforms? g)) (not (group? g))) (push g ngs-dont-exist))) (if (zerop (length ngs-dont-exist)) (progn (dolist (ng ngs) (let ((a (ensure-date (ensure-mid bs)))) (save-article-insist ng (get-next-article-id ng) a (extract-mid a)) (update-last-post-date! (client-username *client*)))) (values 240 (data "Thank you! Your article has been saved."))) (values 400 (data "Sorry. We did not post your article to any newsgroup because " "the " (word-plural (length ngs-dont-exist) "newsgroup") " " (str:join ", " (sort ngs-dont-exist #'string<)) " just " (word-plural (length ngs-dont-exist) "doesn't") " exist."))))) (defun newsgroups-header->list (s) (mapcar #'(lambda (n) (str:trim (string-downcase n))) (str:split "," s))) @ %def post XXX: notice we parse the article again to extract information from it that we need during [[post]]. That's not only a waste of time---because we already did that---, but it makes [[post]] a lot less generic. Perhaps [[conforms?]] should return a data structure that contains all that [[post]] needs. Then [[post]] consumes that and saves the article more easily. That's a better idea. I think [[post]] should not even use variables such as [[*client*]]. The username to which to update the last-seen date should be included in the data structure. <>= (defun update-last-post-date! (username) (let ((u (get-account username))) (setf (account-last-post u) (get-universal-time)))) @ %def update-last-post-date! The job of [[save-article-try]] is to atomically allocate an article name in a group. The atomicity is achieved with [[:if-exists]] in the [[with-open-file]] macro. When [[if:-exists]] is used, the {\tt open} system call uses the flag \verb|O_EXCL|, given us an atomic operation. So, the first [[with-open-file]] allocates the name. If successful, we take our time writing the article to the temporary file and we atomically rename it at the end. We should close [[name]] before trying to [[rename-no-extention]]---on UNIX systems, renaming an open target might be okay, but that's not allowed on Windows systems. Even though we have no interest in running \Lp\ on Windows, closing before renaming it sounds more like the Right Thing to do. <>= (defun rename-no-extension (old new) (rename-file old (make-pathname :name new :type :unspecific))) (defun save-article-try (name-try bs) (let ((name (format nil "~a" name-try)) (tmp (format nil "~a.tmp" name-try))) (with-open-file (s name :direction :output :if-exists nil ;; an atomic operation :if-does-not-exist :create) (when (null s) (progn (stderr "warning: save-article-try: ~a exists~%" name) (return-from save-article-try 'name-exists)))) (with-open-file (s tmp :direction :output :if-exists :error :if-does-not-exist :create :element-type '(unsigned-byte 8)) (write-sequence bs s)) (rename-no-extension tmp name))) @ %def save-article-try The procedure [[save-article-insist]] insists on calling [[save-article-try]] until it finds an article name that has not been allocated. Notice that the argument [[name]] is an integer, so [[name]] is incremented at each iteration. <>= (defun save-article-insist (g name a message-id) (loop for name from name do (in-dir (format nil "groups/~a/" g) (when (not (eql 'name-exists (save-article-try name a))) (return (values name (insert-index message-id g (fmt "~a" name)))))))) (defun get-next-article-id (g) (multiple-value-bind (low high len) (group-high-low g) (declare (ignore low len)) (1+ high))) @ %def save-article-insist get-next-article-id {\bf How to read lines in the NNTP protocol?} We've implemented the most trivial strategy possible. It's also the slowest. What I think we need to do here is to use [[vector-push-extend]]. But this is to be done in [[nntp-read-line]]. I hope to be able to get a faster procedure in [[nntp-read-line]] and keep [[nntp-read-article]] as it is. This is important to speed up posting. For instance, if we allow attachments (which we don't), the performance penalty is clearly noticeable. %%TODO <>= (defun nntp-read-article (&optional acc) ;; Returns List-of Byte. (let* ((ls (ucs-2->ascii (nntp-read-line)))) (cond ;; 46 == (byte #\.) ((equal (list 46) ls) (flatten (add-crlf-between acc))) (t (nntp-read-article (append acc (list ls))))))) @ %def nntp-read-article The NNTP protocol establishes that line termination is done with \verb|\r\n|, but it's useful to support UNIX line terminations, too, because we are using the command-line and interacting directly with the server using tools that will not always send \verb|\r\n| as line termination. For example, when someone is typing directly from the keyboard and insert an empty line, we need the {\tt (and acc ...)} because sometimes the list [[acc]] comes out empty, but an empty line never comes from the NNTP protocol because there's is always a {\tt CR} before {\tt LF}. <>= (defun nntp-read-line (&optional (s *standard-input*) acc) ;; Returns List-of Byte. (let ((x (read-byte s))) (cond ((or (null x) (= x 10)) (let ((bs (and acc (nreverse (if (= (car acc) 13) (cdr acc) acc))))) (stderr "<<< ~a~%" (bytes->string (ucs-2->ascii bs))) bs)) (t (nntp-read-line s (cons x acc)))))) (defun list->bytes (ls) (mapcar #'data->bytes ls)) (defun vector->bytes (v) (mapcar #'data->bytes (coerce v 'list))) (defun data->bytes (d) (cond ((null d) nil) ((integerp d) (list d)) ((stringp d) (string->bytes d)) ((consp d) (list->bytes d)) ((vectorp d) (vector->bytes d)) (t (error (format nil "type ~a is not supported" (type-of d)))))) (defun add-crlf-between (ls-of-ls) ;; Add \r\n to each ``line''. Returns List-of Byte. (mapcar #'(lambda (ls) (append ls (list 13 10))) ls-of-ls)) (defun string->bytes (s) (map 'list #'char-code s)) (defun bytes->string (ls) (map 'string #'code-char ls)) @ %def nntp-read-line nntp-read-article {\bf Does the article conform?} RFC 5536 obliges every article to have exactly \verb|date|, \verb|from|, \verb|message-id|, \verb|newsgroups|, \verb|path| and \verb|subject| headers. The headers \verb|path| is \Lp's responsibility, but it's used only in a network of servers---so we're leaving that out for now. The header \verb|message-id| is added by \Lp\ if the client doesn't write it itself. Let's criticize the writing of [[conforms?]]. We have a [[catch]] here and a [[throw]] in [[parse-headers]]. We also have a [[return]] here. It's getting hard to read this procedure because it's not easy to know that a procedure has to return a certain value to match the expectation of another procedure. I don't remember what [[catch]] does. I need to review this and then add the explanation for myself. If I don't remember how this works, other beginners won't know it either. %% TODO <>= (defun conforms? (bs) (catch 'article-syntax-error ;; parse-headers might throw (let ((headers (parse-headers (article-headers (parse-article bs))))) (let ((result (dolist (h (headers-required-from-clients)) (when (not (lookup h headers)) (return (format nil "missing the /~a/ header" h))))) (content-type (get-header "content-type" headers))) (cond ((stringp result) (values nil result)) ((not (text/plain? content-type)) (values nil (format nil "content-type must be plain/text, but it's ~a" content-type))) (t (values t nil))))))) (defun text/plain? (header-s) ;; I say T when S begins with "text/plain" or when S is "". (let* ((s (str:collapse-whitespaces header-s)) (needle "text/plain") (len (min (length needle) (length s)))) (or (zerop len) (and (<= (length needle) (length s)) (string= needle s :end1 len :end2 len))))) (defun headers-required-from-clients () '("from" "newsgroups" "subject")) @ %def conforms? Notice that up to this point we've only verified if the necessary headers are present. The \verb|newsgroups| header is a direct influence to the article storage. For instance, \verb|newsgroups| must mention only groups that exist. When we save the article, we check each group. If at least one group exists, we save the article; if at least one group doesn't exist, we report to the user all groups that don't exist, but we do save the article if at least one does exist. That's probably not the best thing to do. We should probably warn the user that one group doesn't exist because that could make all the difference to the user. For instance, someone might decide not to post at all if they can't cross post to all the groups they wish to. One typo in one group name and the article would be posted to some groups, but not to the misstyped one. We need to change this. %% TODO Also, do notice that to simplify matters we're duplicating articles cross-posted. What we should do is write the article to the first group in the list of \verb|newsgroups| and then make a symbolic link to all others. The problem is that I don't know how to do that on Windows. I'm not sure if Windows supports symbolic links at all. We could perhaps duplicate articles only when on Windows. %% TODO \subsection{{\tt CREATE-GROUP}} We allow every user to create their own groups. (Are we crazy?) When someone craetes a group, we post an article to {\tt local.control.news} notifying everyone that a new group has been created. People then have a chance to subscribe to the new group. We assume that when someone creates a group, it's either no problem at all or it has been discussed with the community beforehand. <>= (defun cmd-create-group (r) (with-n-args 1 r (let ((g (string-downcase (car (request-args r))))) (multiple-value-bind (okay? reason) (group-name-conforms? g) (if (not okay?) (make-response :code 580 :request r :data (format nil "group name does not conform: ~a" reason)) (progn (multiple-value-bind (path created?) (in-groups (ensure-directories-exist (concatenate 'string g "/"))) (declare (ignore created?)) (if (not path) (make-response :code 581 :request r :data (format nil "could not create group ~a" (if (group? g) "because it already exists" "but we don't know why---sorry!"))) (progn (notify-group-created g) (make-response :code 280 :request r :data (format nil "group ~a created" g))))))))))) (defun group-name-conforms? (g) (conforms-to? g "<>")) @ %def CREATE-GROUP group-name-conforms? \subsection{{\tt REPL}} \lp\ is totally {\em hackable}. Users can say {\tt repl} to have complete control over their \lxxp\ process. <>= (defun cmd-repl (r) (if *enable-nntp-repl* (with-auth (repl r)) (make-response :code 400 :data "The REPL has been *explicitly* disabled by the sysadmin. :(" :request r))) @ If your users are not the hacker-type, you can disable the NNTP REPL with the command-line option [[--disable-nntp-repl]]. We decide not to hide the command in the list of commands given by saying {\tt HELP} to \lp\ because this way users are advertised about the commands that exist---they could be having fun, but their sysadmin doesn't think they're skilled enough. <>= (defparameter *enable-nntp-repl* t) @ \section{Publication of news} If you're interested in being notified about what's going on in the \lp, then subscribe to {\tt local.control.news}. Group creation, invitations {\em et cetera} are published there. <>= (defun notify-group-created (g) (post-notification :subject (fmt "new group ~a by ~a" g (client-username *client*)) :body (fmt "Blame ~a for the group ~a just created." (client-username *client*) g))) (defun notify-user-created (u) (post-notification :subject (fmt "new account ~a by ~a" u (client-username *client*)) :body (fmt "Blame ~a for inviting ~a." (client-username *client*) u))) (defun notify-user-unlocked (u) (let ((guilty (client-username *client*))) (post-notification :subject (fmt "account ~a unlocked by ~a" u guilty) :body (fmt "Blame ~a for unlocking ~a." guilty u)))) (defun post-notification (&key subject body) (in-groups (ensure-directories-exist "local.control.news/")) (when (group? "local.control.news") (let ((a (make-news :subject subject :body body))) (post (concatenate 'vector (article-headers a) (crlf) (article-body a) (crlf)))))) (defun make-news (&key subject body) (make-article :headers (data (add-crlf-between (mapcar #'(lambda (p) (data (format nil "~a: ~a" (car p) (cdr p)))) `(("from" . "Loop") ("subject" . ,subject) ("newsgroups" . "local.control.news"))))) :body (data body))) @ %def notify-group-created notify-user-created \section{Algorithm of {\tt split-vector}} How should we describe the algorithm of [[split-vector]]? The objective is to consume lists of bytes such as [[(1 2 3 13 10 4 5 6 13 10 7 8 9)]] and produce lists of bytes such as [[((1 2 3) (4 5 6) (7 8 9))]]. We use [[search]] to find [[13 10]]. When we find this line termination, we have its position in the consumed list. With the position, we collect the line and we iterate searching for the next line. <>= (defun split-vector (delim v acc &key limit (so-far 1)) (let ((len (length v))) (split-vector-helper delim v len acc limit so-far 0))) (defun split-vector-helper (delim v len acc limit so-far start) (if (zerop len) acc (let ((pos (search delim v :start2 start :end2 len))) (cond ((or (not pos) (and limit (= so-far limit))) (nreverse (cons (subseq v start len) acc))) (t (split-vector-helper delim v len (cons (subseq v start (or pos len)) acc) limit (1+ so-far) (+ pos (length delim)))))))) @ %def split-vector \section{Article index}\label{sec:index} Every NNTP server needs to have an index of articles. Each article is indexed by its message-id. For example, the article % \begin{verbatim} From: Eva Lu Newsgroups: comp.unix Subject: xv6: an introduction Date: Wed, 24 Jul 2024 22:42:50 -0300 Message-ID: <87plr25js5.fsf@tor.soy> Cancel-Lock: sha1:82E8hqL1D1WQ9xc+CfnEYbFAaJo= MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit \end{verbatim} % is indexed by the header {\tt message-id}. If you ask for article {\tt <87plr25js5.fsf@tor.soy>}, \Lp\ will tell you that you can find the article in group {\tt comp.unix} and its numeric ID is 37. Note that the number 37 is not inside the article, but only in the name of the file stored in the file system. The index, therefore, knows in which file is each {\tt message-id}. This fact implies that you cannot rename files in the file system---of course not: you'd changing identifiers inside a database. If you have rename a file, you will need to rebuild the index. Given that the index is an SQL table, you can adjust the index relative to the any files you may need to rename for whatever reason. You can also rebuild the entire index by reading the file system---unless you have a lot of files, that's probably the easiest thing to do. The use of [[*default-database*]] by the library [[clsql]] is very convenient for us: we don't need to specify with which database we're working. Since we work with only one, we pretty much never need to specify anything. <>= (defparameter *default-database* nil) @ %def *default-database* <>= (defun connect-index! (filename) (setq *default-database* (clsql:connect (list filename) :database-type :sqlite3))) (defun create-index! () (clsql:execute-command "create table if not exists indices (id varchar(1000), grp varchar(1000), article varchar(300))") (clsql:execute-command "create unique index if not exists idx_id_1 on indices (id)")) (defun drop-create-index! () (clsql:execute-command "drop table if exists indices") (create-index!)) @ %def create-index! drop-create-index! connect-index! Of course, the creation and connection of the index must occur before [[main-loop]], so it takes place in [[main]]. When someone requests an article, it's either by its numeric index but then the client has already chosen a group, or it's by its {\tt message-id}. We don't need to tell the client to which groups the article belongs; we just give the entire article to the client. It is, therefore, the client's responsibility what to do with the article. However, to fetch an article, we need to know where in the database (in the file system) is the article; in other words, we must know one group in which the article was stored. This implies that the index must know at least one group. We've decided to always index the first group in the {\tt newsgroups} header. So the index's anatomy is $(m, g, i)$, where $m$ is the {\tt message-id}, $g$ is the name of the group and $i$, is the name of the article in the file system. This also defines the anatomy of the SQL table. Should we store more information in the index? Not really. If we need anything about an article, we can get it after we fetch it from the file system. For example, suppose that a search command wishes to display the fact that article was posted in various groups. Suppose further the command has already located in the index an article to be displayed. This means the command has the {\tt message-id} and one of the groups in which the article was posted. The command is then able to fetch the entire article from the file system. Now it's a matter of reading the article itself to know almost everything there is to know about it. (It's also interesting that we keep the index thin because we need to allow it to grow to great sizes.) %% (clsql:create-table "INDICES" '(([id] (string 1000)) ([grp] (string 1000)) ([article] (string 300)))) %% (clsql:execute-command "create table if not exists indices (id varchar(1000), grp varchar(1000), article varchar(300))") %% LOOP> (clsql:create-index "idx_id_1" :on "indices" :attributes "id" :unique "id") %% ; No value %% (clsql:execute-command "CREATE UNIQUE INDEX if not exists idx_id_1 ON INDICES (ID)") %% CL-USER> (clsql:list-attributes "indices") %% ("ID" "GRP" "ARTICLE") %% CL-USER> (clsql:insert-records :into "indices" :attributes '(id grp article) :values '("<87plr25js5.fsf@tor.soy>" "comp.unix" 37)) %% ; No value %% CL-USER> (clsql:select 'id 'grp 'article :from "indices") %% (("<87plr25js5.fsf@tor.soy>" "comp.unix")) %% ("ID" "GRP") \section{Essential operations relative to the index} Here's how to query the index or how to insert a new article into it. If [[insert-index]] returns [[nil]], then it's because it found no errors at all. The return value, therefore, indicates which error ocurred. We don't consider an error that an article has already been added to the index. For instance, we write multiple messages to the storage when someone cross-posts, but we'll add just a single record to the index, of course. So, an article already indexed is normal situation. Sure---in the future, we will not duplicate articles in storage; we will make symbolic links. We don't do that right now because Windows doesn't really support symbolic links. <>= (defun insert-index (m g i) (handler-case (clsql:insert-records :into "indices" :attributes '(id grp article) :values (list (str:trim m) (str:trim g) (str:trim i))) (clsql-sys:sql-database-data-error (c) (cond ((= (slot-value c 'clsql-sys::error-id) 19) 'already-indexed) (t ; We should log this error. ;(format t "other error: ~a" (slot-value c 'clsql-sys::database-message)) 'sql-error))) (:no-error () nil))) (defun lookup-index (mid) (let* ((found (clsql:select [grp] [article] :from [indices] :where [= [id] (str:trim mid)])) (article (first found)) (grp (first article)) (art (second article))) (when found (values grp art)))) @ %def insert-index lookup-index \section{Procedure to import the index from the file system} %% get group: %% (first (last (pathname-directory (car (in-groups (directory "**/*")))))) %% get article name: %% (pathname-name (car (in-groups (directory "**/*")))) %% get all articles %% LOOP> (in-groups (directory "**/*")) %% get the newsgroup header from the article %% LOOP> (get-header-from-article "newsgroups" (fetch-article "comp.lang.lisp" "1")) When we build the index from scratch, it's important to have a procedure capable of reading all articles in the file system and index them one by one. This is what we implement right here. For each news group, we get the name of the file relative to the article---that's the numeric ID of the article---and its {\tt message-id}. That's all we need to index it. <>= (defun index-from-fs! () (loop for path in (in-groups (directory "**/*")) do (let* ((g (str:trim (first (last (pathname-directory path))))) (i (str:trim (pathname-name path))) (m (str:trim (extract-mid (fetch-article g i))))) (when (> (length m) 0) (format t "article ~a/~a indexed by ~a~%" g i m) (insert-index m g i))))) (defun remake-index-from-fs () (drop-create-index!) (index-from-fs!)) @ Here's a program to build the index from a UNIX shell. <>= <> (ql:quickload :loop :silent t) (in-package #:loop) (connect-index! "message-id.db") (drop-create-index!) (index-from-fs!) (format t "Index built.~%") @ Be careful when using this program: it will build the database [[message-id.db]], which is an operation that needs to be done only once. Here's how to use it: % \begin{verbatim} %pwd /home/dbastos/loop %sbcl --script build-index-from-fs.lisp Index built. %ls -l message-id.db -rw-r--r-- 1 dbastos wheel 65536 Aug 26 13:32 message-id.db \end{verbatim} \section{Deletion and locking of inactive accounts}\label{sec:inactive-users} We now implement some of the \hyperref[principles]{principles} exposed earlier on page~\pageref{principles}. The program @<> would be run by {\tt cron} every day (at midnight, say). It checks all accounts that are inactive and either locks them (to be deleted later) or deletes them {\em for good}. If you want to keep accounts forever, just don't run the program. XXX: our idea is to also delete {\em for good} all accounts that are locked (by the same period of time), but we have not yet done that. <>= <> (ql:quickload :loop :silent t) (in-package #:loop) (read-accounts!) (connect-index! "message-id.db") (remove-inactive-users!) (write-accounts!) @ %def cron-remove-inactive-users.lisp The entire program is really [[remove-inactive-users!]]. <>= (defun remove-inactive-users! () (loop for u in *accounts* do (let ((username (account-username u))) (cond ((and (not (locked? username)) (inactive-from-never-logged-in? username)) (post-notification :subject (fmt "account ~a removed by Loop" username) :body (fmt "~a didn't log in a first time (for ~a month~a) since account creation." username *months-never-logged-in* (plural *months-never-logged-in* "s"))) (remove-account! username) (format t "Removed ~a due to never logging in.~%" username)) ((and (not (locked? username)) (inactive-from-last-seen? username)) (post-notification :subject (fmt "account ~a locked by Loop" username) :body (fmt "~a disappeared for over ~a month~a." username *months-inactive-allowed* (plural *months-inactive-allowed* "s"))) (lock-account! username (fmt "disappeared for over ~a months" *months-inactive-allowed*)) (format t "Locked ~a due to long-time-no-see.~%" username)))))) @ %def remove-inactive-users! To remove an account, we need to first remove the username (to be removed) from anyone's list of friends. So, this involves scanning the entire list of accounts. Also, notice that delete ``may modify {\em sequence}''. More important is to understand that we really must {\tt setf} the return, otherwise we might find the deletion did not take effect---for example, when deleting the first element of a list. (XXX: this deserves a better explanation, but if you know how linked lists are implemented in C, say, then you're likely well aware of how it works.) <>= (defun remove-account! (username) (loop for u in *accounts* do (setf (account-friends u) (delete username (account-friends u) :test #'equal))) (setf *accounts* (delete-if #'(lambda (a) (equal (account-username a) username)) *accounts*))) (defun lock-account! (username why) (let ((u (get-account username))) (setf (account-pass-locked u) (account-pass u)) (setf (account-pass u) "locked") (setf (account-pass-locked-why u) why))) @ %def remove-account! Accounts that do not have a creation date up until today---Tue Sep 17 21:37:18 ESAST 2024---will have its creation dates migrated to the \Lp\ epoch, which is January 1st 2024, the exact month in which \Lp\ was written. But notice that this migration is done only once. New system administrators of \Lp\ will never need to run this. (We do not remove this set of source code chunks because they serve as an example of how to a migration like that.) <>= (defun loop-epoch () (encode-timestamp 0 0 0 0 1 1 2024)) (defun migrate-add-creation-and-post-date! () (read-accounts!) (loop for u in *accounts* do (if (not (account-creation u)) (setf (account-creation u) (timestamp-to-universal (loop-epoch))) (setf (account-last-post u) (account-seen u)))) (write-accounts!)) @ %def migrate-add-creation-and-post-date! loop-epoch Here's a program to run the migration in a UNIX shell. <>= <> (ql:quickload :loop :silent t) (in-package #:loop) (migrate-add-creation-and-post-date!) (format t "Accounts rewritten.~%") @ %def migrate-add-creation-date.lisp Now we write the procedures that discover what accounts are inactive. <>= (defparameter *months-inactive-allowed* 3) (defparameter *months-never-logged-in* 1) @ <>= (defun user-inactive? (username) (or (inactive-from-never-logged-in? username) (inactive-from-last-seen? username))) (defun inactive-from-never-logged-in? (username) (let ((u (get-account username))) (if (ever-logged-in? username) NIL (inactive-from? username *months-never-logged-in* #'(lambda () (account-creation u)))))) (defun locked? (username) (equal "locked" (account-pass (get-account username)))) (defun inactive-from-last-post? (username) (let ((last-post (account-last-post (get-account username))) (creation (account-creation (get-account username)))) (inactive-from? username *months-inactive-allowed* (if last-post #'(lambda () last-post) #'(lambda () creation))))) (defun inactive-from-last-seen? (username) (let* ((u (get-account username)) (last-seen (account-seen u)) (creation (account-creation u))) (inactive-from? username *months-inactive-allowed* (if last-seen #'(lambda () last-seen) #'(lambda () creation))))) (defun inactive-from? (username months timestamp-source) (declare (ignore username)) (timestamp< (timestamp+ (universal-to-timestamp (funcall timestamp-source)) months :month) (now))) (defun ever-logged-in? (username) (account-seen (get-account username))) (defun never-logged-in? (username) (not (ever-logged-in? username))) (defun list-inactive-users () (loop for u in *accounts* do (format t "Username ~a is inactive? ~a~%" (account-username u) (user-inactive? (account-username u))))) @ %def list-inactive-users \section{Macros to ease writing}\label{sec:dsl} These macros make up a tiny language to ease the writing of \lp. For example, when we need to access the group database, we use [[in-groups]]. When a certain command demands authentication, we use [[with-auth]]. <>= (defmacro in-dir (dir &rest body) `(let ((*default-pathname-defaults* (truename ,dir))) (uiop:with-current-directory (,dir) ,@body))) @ Notice that we set [[*default-pathname-defaults*]] and we set the process' current working directory. That's not necessary because Lisp always uses [[*default-pathname-defaults*]] and does not care about the current working directory. We did this out of the fact that we used to invoke [[renameat2]] through the [[cffi]], but we don't use it any more. <>= (defmacro in-groups (&rest body) `(in-dir "groups/" ,@body)) (defun in-group-lambda (g fn) (in-dir g (funcall fn))) (defmacro in-group (g &rest body) `(in-group-lambda ,(fmt "groups/~a/" g) #'(lambda () ,@body))) (defmacro with-group (g r &rest body) (let ((g-var (gensym)) (r-var (gensym))) `(let ((,g-var ,g) (,r-var ,r)) (if (not (group? ,g-var)) (make-response :code 411 :request ,r-var :data (format nil "no such group ``~a''" ,g-var)) (progn ,@body))))) (defmacro with-n-args (n r &rest body) (let ((args-var (gensym)) (message-var (gensym)) (n-var n)) `(let ((,args-var (request-args r)) (,message-var ,(fmt "bad arguments: needs exactly ~a" n-var))) (if (not (= ,n-var (length ,args-var))) (make-response :code 400 :request ,r :data ,message-var) (progn ,@body))))) (defmacro with-group-set (&rest body) (let ((g-var (gensym))) `(let ((,g-var (client-group *client*))) (if (not ,g-var) (bad-input r "must say GROUP first") ,@body)))) (defmacro with-auth (&rest body) `(if (not (auth?)) (make-response :code 400 :data "You must authenticate first.") (progn ,@body))) @ %def in-groups with-group with-n-args with-group-set with-auth \section{Other procedures} Of small importance, they have nothing. Notice that [[ucs-2->ascii]] is iseful only in Windows systems---and just for development. The procedure destructively converts UCS-2 to ASCII, so it's only really useful when we're converting an implicitly ASCII-content in the form of UCS-2. Despite the name UCS-2, notice it is UTF-16. The name UCS stands for ``Universal Character Set'' and I speculate the number 2 means 2 bytes. So our conversion is just removing the first byte. <>= (defun conforms-to? (s re &optional error-msg) "Does string S conform to regular expression RE?" (let ((okay? (cl-ppcre:scan-to-strings re s))) (if okay? (values t nil) (values nil (or error-msg (fmt "must match ~a" re)))))) (defun print/finish (&rest args) (apply #'format (cons t args)) (finish-output)) (defun word-plural (n word) (let ((table '(("doesn't" . "don't") ("newsgroup" . "newsgroups")))) (let ((w (assoc word table :test #'string=))) (when (not w) (error "word not found")) (if (< n 2) (car w) (cdr w))))) (defun plural (v suffix) (if (> v 1) suffix "")) (eval-when (:compile-toplevel :load-toplevel :execute) (defun fmt (cstr &rest args) (apply #'format nil (list* cstr args)))) (defun out (stream &rest args) (apply #'format (cons stream args))) (defun stderr (&rest args) (when *debug* (apply #'out (cons *error-output* args)) (finish-output *error-output*))) (defun stdout (&rest args) (apply #'out (list* *standard-output* args))) (defun println (&rest args) (apply #'out (list* *standard-output* (str:concat (car args) "~%") (cdr args)))) (defun enumerate (ls &optional (first-index 0)) (loop for e in ls and i from first-index collect (cons i e))) (defun ucs-2->ascii (bs) ;; I'm a Windows user. #-win32 bs #+win32 (remove-if #'zerop bs)) (defun bad-input (r msg &key code) (make-response :code (or code 400) :data msg :request r)) (defun integer->string (n) (format nil "~a" n)) (defun mkstr (&rest args) ;; a utility (with-output-to-string (s) (dolist (a args) (princ a s)))) (defun data (&rest args) ;; a utility (flatten (map 'list #'data->bytes args))) (defun crlf () (vector 13 10)) (defun crlf-string () (format nil "~c~c" #\return #\linefeed)) (defun flatten (obj) (do* ((result (list obj)) (node result)) ((null node) (delete nil result)) (cond ((consp (car node)) (when (cdar node) (push (cdar node) (cdr node))) (setf (car node) (caar node))) (t (setf node (cdr node)))))) (defmacro mac (&rest body) `(macroexpand-1 ,@body)) @ %def bad-input crlf mkstr data crlf-string flatten ucs-2->ascii enumerate \section{Tests} I studied the minimum to be able to add these tests as we comprehend better the direction in which we're going. A test system is essential for us to trust we can move forward without breaking important past decisions in the code. XXX: we should not include these tests in production code as we are doing right now. Divert them to a tests package or something that makes more sense. To run the tests, you need to invoke [[lisp-unit:run-tests]]. I believe that invoking it in a script will make all tests run. Oh, that should be included the binary, too, so that we can always test an archived project. That's a good idea. Make it a clingon option on the command line. Of course, by including tests in the executable, we should isolate all the tests here. I know how---just define a test package and isolate it all in it. Packages are for namespace isolation. Lisp's ``systems'' are the type of packaging meant for loading. Lisp's ``packages'' are merely namespace isolation. Our test package should use \lp's package, so that we can use any of the procedures under testing. When we define a variable, it will be defined in the test package, not in \lp's, but when we are in \lp's package, we see no names of the test package. It's simple and a good solution. When testing, it's important for us not to clutter a production system---we will want to run tests on production systems. So what we need to do is to wrap any file system modification to a certain other directory in which \lp's tests will find the groups directory in place. Making that happen is as simple as changing [[*default-pathname-defaults*]]. <>= (setq lisp-unit:*print-failures* t) (define-test dispatching (assert-true (equalp (empty-response) (dispatch (make-request))))) (defun unix->nntp (s) "I substitute \n for \r\n" (str:replace-all (fmt "~a" #\linefeed) (crlf-string) s)) (defvar a-post (unix->nntp "From: root Message-id: Subject: test Newsgroups: local.test Quickest test of the West. ")) (defvar a-bad-post (unix->nntp "From: root Message-id: Subject: a bad test Newsgroups: local.test, bad.newsgroup A bad test from the biggest mouth of the south. ")) (define-test post-wrong-newsgroup (multiple-value-bind (code msg) (post (string->bytes a-bad-post)) (declare (ignore msg)) (assert-true (equal code 400)))) (define-test post-okay (read-accounts!) (connect-index! "test.db") (create-index!) (setq *client* (make-client :username "ROOT" :auth? 'yes)) (multiple-value-bind (code msg) (post (string->bytes a-post)) (declare (ignore msg)) (assert-true (equal code 240))) (clsql:disconnect)) @ %def XXX: we got a problem with test [[post-okay]]. We're getting an execution error, but we can't see any error message. The posting is taking place---here in the REPL at least. \section{How to produce the binary executable} Just say {\tt make loop} to your shell. <>= <> (ql:quickload :loop) (sb-ext:save-lisp-and-die #P"loop" :toplevel #'loop:main :executable t :save-runtime-options t) @ <>= #-quicklisp (let ((quicklisp-init (merge-pathnames "quicklisp/setup.lisp" (user-homedir-pathname)))) (when (probe-file quicklisp-init) (load quicklisp-init))) @ %def quicklisp \section{How to get a log of \lp's communication} If you invoke \lxxp\ with option [[--logging]], you get logging on [[stderr]]: % \begin{verbatim} $ ./loop -s --logging > /dev/null @>>> 200 Welcome! Say ``help'' for a menu. quit @<<< quit @>>> 205 Good-bye. \end{verbatim} But remember that \lp\ runs various processes in parallel, so the question now is how to unify all processes' logs into a single one. We can't use \href{https://cr.yp.to/daemontools/multilog.html}{multilog}, for instance. The easiest solution now is to send all messages to {\tt syslog}. FreeBSD and GNU systems come with a program called {\tt logger}, which is able to write messages to {\tt syslog}. If you're using \href{https://cr.yp.to/daemontools/svscan.html}{{\tt svscan}}, then you can use a program such as % \begin{verbatim} $ cat log/run #!/bin/sh exec /usr/bin/logger -i -t loop \end{verbatim} % as your {\tt log/run} script. See {\tt logger(1)} for more information. Using {\tt logger(1)} means you need to set up {\tt syslog}, too. By the default, {\tt logger(1)} will use the {\em facility} {\tt user} and the {\em level} {\tt notice}. So you can specify in {\tt syslog.conf} the selector {\tt user.notice} and specify a log file such as {\tt /var/log/loop.log}. See {\tt syslog.conf(5)} and {\tt newsyslog(1)} for more information. \section{Package {\tt loop.lisp} as the compiler needs it} We now put together all source code chunks in the order the compiler needs to read it. One thing to keep in mind here is---I wonder if people that might read this source code would read the literate programming \LaTeX\ output or would they read [[loop.lisp]] directly. For literate programmers, it doesn't matter how [[loop.lisp]] turns out because only the compiler reads [[loop.lisp]]. But if we care about anyone who might read [[loop.lisp]], then we should perhaps tell our literate programming tools to generate a nice-looking file. For instance, I declare global variables in the chunks where it's used. But for someone reading [[loop.lisp]] directly, it is perhaps better if they would see all global variables at the top of the file. That's something to think about. <>= ;;; -*- mode: LISP; syntax: COMMON-LISP; package: LOOP; -*- (eval-when (:compile-toplevel :load-toplevel :execute) (ql:quickload '(<>) :silent t)) (clsql:file-enable-sql-reader-syntax) (defpackage #:loop (:use :common-lisp :local-time) (:import-from :lisp-unit define-test assert-true) (:import-from :org.shirakumo.filesystem-utils directory-p list-directories list-files) (:import-from :sb-sys interactive-interrupt) (:export :main)) (in-package #:loop) <> <
> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <> <
> <> @ %def <>= (defparameter *debug* nil) <> <> <> <> <> <> <> <> @ On which packages do we depend? <>= :lisp-unit :str :uiop :cl-ppcre :local-time :clsql-sqlite3 :clingon :filesystem-utils :ironclad/digest/sha256 @ The \lp\ system definition: <>= ;;; -*- Mode: LISP; Syntax: COMMON-LISP; -*- (asdf:defsystem :<> :version "<>" :description "<>" :depends-on (<>) :components ((:file "loop"))) @ \section{Other source code} The shell script {\tt format-def} is invoked whenever we build any lisp source code. That's to format the source code a bit better for readers that will be reading it directly. It is not what we do. We read the documentation in PDF format and we work on the NOWEB file {\tt loop.nw}. But we know that potential readers will not do the same and will hack {\tt loop.lisp} directly. Paying respect to these readers, we try to format Lisp source code as best as possible. So we do two things: first, we produce the final source code in an order that should produce no warnings during compilation; second, we make sure there's one and only one blank line between procedure or macro definition. We don't add a blank line between global variables. The following shell script does the job. The first {\tt sed} program finds our definitions of interest and inserts a new blank line before the definition. Such action makes function definitions separated by two blank lines in some cases. We then remove the excess with the second program. Notice we need the {\tt -E} option because we're using the {\tt |} metacharacter. The second program find a blank line as its first step. Then we say {\tt N} to expand the pattern space to include the next line. Then we delete the {\em first} blank line and not the second---that's what the {\tt D} command does. This strategy is explained by Dale Dougherty and Arnold Robbins in ``sed \& awk'' second edition, pages 112--114. <>= #!/bin/sh usage() { printf 'usage: %s [file.lisp]\n' $0 exit 1 } sed -E '/^\(defun |\(defmacro /{ i\ }' "$@" | \ sed '/^[ \t]*$/{ N /^[ \t]*\n$/D }' @ When we make a new release of \lp, we like to name its version as the tip of the source code repository. We get the information usually with a command line such as % \begin{verbatim} $ git log --oneline | head -1 | awk '{print $1}' 52663d1 \end{verbatim} % To include this version string in the executable, we need to make it part of the source code. We get help from {\tt sed} once again. As the usage explains, we invoke it as {\tt ./make-release 52663d1 loop.nw}. The script then rewrites {\tt loop.nw} with the string in the body of the chunk @<>. The {\tt sed} program is straightforward: locate the chunk definition, move down a line, change that line and that's all. <>= #!/bin/sh usage() { printf 'usage: %s tag file\n' $0 exit 1 } test $# -lt 2 && usage tag="$1"; shift sed "/<>=/ { n; c\\ $tag }" "$@" @ \section*{Index of chunks} \nowebchunks \section*{Index of names} \nowebindex \end{document}