Coding and Commit Conventions ¶

Code modularity and interface visibility ¶

Subversion's code and headers files are segregated along a couple of key lines: library-specific vs. inter-library; public vs. private. This separation is done primarily because of our focus on proper modularity and code organization, but also because of our promises as providers and maintainers of a widely adopted public API. As you write new functions in Subversion, you'll need to carefully consider these things, asking some questions of yourself as you go along:

"Are the consumers of my new code local to a particular source code file in a single library?" If so, you probably want a static function in that same source file.

"Is my new function of the sort that other source code within this library will need to use, but nothing *outside* the library?" If so, you want to use a non-static, double-underscore-named function (such as svn_foo__do_something), with its prototype in the appropriate library-specific header file.

"Will my code need to be accessed from a different library?" Here you have some additional questions to answer, such as "Should my code live in the original library I was going to put it in, or should it live in a more generic utility library such as libsvn_subr?" Either way, you're now looking at using an inter-library header file. But see the next question before you decide which one...

"Is my code such that it has a clean, maintainable API that can reasonably be maintained in perpetuity and brings value to the Subversion public API offering?" If so, you'll be adding prototypes to the public API, immediately inside subversion/include/. If not, double-check your plans -- maybe you haven't chosen the best way to abstract your functionality. But sometimes it just happens that libraries need to share some function that is arguably of no use to other software besides Subversion itself. In those cases, use the private header files in subversion/include/private/.

Coding style ¶

Subversion uses ANSI C, and follows the GNU coding standards, except that we do not put a space between the name of a function and the opening parenthesis of its parameter list. Emacs users can just load svn-dev.el to get the right indentation behavior (most source files here will load it automatically, if `enable-local-eval' is set appropriately).

Read https://www.gnu.org/prep/standards.html for a full description of the GNU coding standards. Below is a short example demonstrating the most important formatting guidelines, including our no-space-before-param-list-paren exception:

   char *                                     /* func type on own line */
   argblarg(char *arg1, int arg2)             /* func name on own line */
   {                                          /* first brace on own line */
     if ((some_very_long_condition && arg2)   /* indent 2 cols */
         || remaining_condition)              /* new line before operator */
       {                                      /* brace on own line, indent 2 */
         arg1 = some_func(arg1, arg2);        /* NO SPACE BEFORE PAREN */
       }                                      /* close brace on own line */
     else
       {
         do                                   /* format do-while like this */
           {
             arg1 = another_func(arg1);
           }
         while (*arg1);
       }
   }

In general, be generous with parentheses even when you're sure about the operator precedence, and be willing to add spaces and newlines to avoid "code crunch". Don't worry too much about vertical density; it's more important to make code readable than to fit that extra line on the screen.

Using page breaks ¶

We're using page breaks (the Ctrl-L character, ASCII 12) for section boundaries in both code and plaintext prose files. Each section starts with a page break, and immediately after the page break comes the title of the section.

This helps out people who use the Emacs page commands, such as `pages-directory' and `narrow-to-page'. Such people are not as scarce as you might think, and if you'd like to become one of them, then add (require 'page-ext) to your .emacs and type C-x C-p C-h sometime.

Error message conventions ¶

For error messages the following conventions apply:

Provide specific error messages only when there is information to add to the general error message found in subversion/include/svn_error_codes.h.
Messages start with a capital letter.
Try keeping messages below 70 characters.
Don't end the error message with a period (".").
Don't include newline characters in error messages.
Quoting information is done using single quotes (e.g. "'some info'").
Don't include the name of the function where the error occurs in the error message. If Subversion is compiled using the '--enable-maintainer-mode' configure-flag, it will provide this information by itself.
When including path or filenames in the error string, be sure to quote them (e.g. "Can't find '/path/to/repos/userfile'").
When including path or filenames in the error string, be sure to convert them using svn_dirent_local_style() before inclusion (since paths passed to and from Subversion APIs are assumed to be in canonical form).
Don't use Subversion-specific abbreviations (e.g. use "repository" instead of "repo", "working copy" instead of "wc").

If you want to add an explanation to the error, report it followed by a colon and the explanation like this:

       "Invalid " SVN_PROP_EXTERNALS " property on '%s': "
       "target involves '.' or '..'".

Suggestions or other additions can be added after a semi-colon, like this:

       "Can't write to '%s': object of same name already exists; remove "
       "before retrying".

Try to stay within the boundaries of these conventions, so please avoid separating different parts of error messages by other separators such as '--' and others.

APR pool usage conventions ¶

(This assumes you already basically understand how APR pools work; see apr_pools.h for details.)

Applications using the Subversion libraries must call apr_initialize() before calling any Subversion functions.

Subversion's general pool usage strategy can be summed up in two principles:

The call level that created a pool is the only place to clear or destroy that pool.

When iterating an unbounded number of times, create a subpool before entering the iteration, use it inside the loop and clear it at the start of each iteration, then destroy it after the loop is done, like so:

         apr_pool_t *iterpool = svn_pool_create(scratch_pool);

         for (i = 0; i < n; ++i)
         {
           svn_pool_clear(iterpool);
           do_operation(..., iterpool);
         }

         svn_pool_destroy(iterpool);

Supporting the above rules, we use the following pool names as conventions to represent various pool lifetimes:

result_pool: The pool in which the output of a function should be allocated. A result pool declaration should always be found in a function argument list, and never inside a local block. (But not all functions need or have result pools.)
scratch_pool: The pool in which all function-local data should be allocated. This pool is also provided by the caller, who may choose to clear this pool immediately when control returns to it.
iterpool: An iteration pool, used inside loops, as per the above example.

(Note: Some legacy code uses a single pool function argument, which operates as both the result and scratch pools.)

By using an iterpool for loop-bounded data, you ensure O(1) instead of O(N) memory leakage should the function return abruptly from within the loop (say, due to error). That's why you shouldn't make a subpool for data which persists throughout a function, but instead should use the pool passed in by the caller. That memory will be reclaimed when the caller's pool is cleared or destroyed. If the caller is invoking the callee in a loop, then trust the caller to take care of clearing the pool on each iteration. The same logic propagates all the way up the call stack.

The pool you use also helps readers of the code understand object lifetimes. Is a given object used only during one iteration of the loop, or will it need to last beyond the end of the loop? For example, pool choices indicate a lot about what's going on in this code:

      apr_hash_t *persistent_objects = apr_hash_make(result_pool);
      apr_pool_t *iterpool = svn_pool_create(scratch_pool);

      for (i = 0; i < n; ++i)
      {
        const char *intermediate_result;
        const char *key, *val;
        
        svn_pool_clear(iterpool);
        SVN_ERR(do_something(&intermediate_result, ..., iterpool));
        SVN_ERR(get_result(intermediate_result, &key, &val, ...,
                           result_pool));
        apr_hash_set(persistent_objects, key, APR_HASH_KEY_STRING, val);
      }

      svn_pool_destroy(iterpool);
      return persistent_objects;

Except for some legacy code, which was written before these principles were fully understood, virtually all pool usage in Subversion follows the above guidelines.

One such legacy pattern is a tendency to allocate an object inside a pool, store the pool in the object, and then free that pool (either directly or through a close_foo() function) to destroy the object.

For example:

   /*** Example of how NOT to use pools.  Don't be like this. ***/

   static foo_t *
   make_foo_object(arg1, arg2, apr_pool_t *pool)
   {
      apr_pool_t *subpool = svn_pool_create(pool);
      foo_t *foo = apr_palloc(subpool, sizeof(*foo));

      foo->field1 = arg1;
      foo->field2 = arg2;
      foo->pool   = subpool;
   }

   [...]

   [Now some function calls make_foo_object() and returns, passing
   back a new foo object.]

   [...]

   [Now someone, at some random call level, decides that the foo's
   lifetime is over, and calls svn_pool_destroy(foo->pool).]

This is tempting, but it defeats the point of using pools, which is to not worry so much about individual allocations, but rather about overall performance and lifetime groups. Instead, foo_t generally should not have a `pool' field. Just allocate as many foo objects as you need in the current pool — when that pool gets cleared or destroyed, they will all go away simultaneously.

See also the Exception handling section, for details of how resources associated with a pool are cleaned up when that pool is destroyed.

In summary:

Objects should not have their own pools. An object is allocated into a pool defined by the constructor's caller. The caller knows the lifetime of the object and will manage it via the pool.
Functions should not create/destroy pools for their operation; they should use a pool provided by the caller. Again, the caller knows more about how the function will be used, how often, how many times, etc. thus, it should be in charge of the function's memory usage.

For example, think of a function that is being called multiple times in a tight loop. The caller clears the scratch pool on each iteration. Thus, the creation of an internal subpool is unnecessary and could be a significant overhead; instead, the function should just use the passed-in pool.
Whenever an unbounded iteration occurs, an iteration subpool should be used.
Given all of the above, it is pretty well mandatory to pass a pool to every function. Since objects are not recording pools for themselves, and the caller is always supposed to be managing memory, then each function needs a pool, rather than relying on some hidden magic pool. In limited cases, objects may record the pool used for their construction so that they can construct sub-parts, but these cases should be examined carefully.

See also Tracking down memory leaks for tips on diagnosing pool usage problems.

APR status codes ¶

Always check for APR status codes (except APR_SUCCESS) with the APR_STATUS_IS_...() macros, not by direct comparison. This is required for portability to non-Unix platforms.

Exception handling ¶

OK, here's how to use exceptions in Subversion.

Exceptions are stored in svn_error_t structures:

typedef struct svn_error_t
{
  apr_status_t apr_err;      /* APR error value, possibly SVN_ custom err */
  const char *message;       /* details from producer of error */
  struct svn_error_t *child; /* ptr to the error we "wrap" */
  apr_pool_t *pool;          /* place to generate message strings from */
  const char *file;          /* Only used iff SVN_DEBUG */
  long line;                 /* Only used iff SVN_DEBUG */
} svn_error_t;

If you are the original creator of an error, you would do something like this:
```
return svn_error_create(SVN_ERR_FOO, NULL, 
                        "User not permitted to write file");
    
```
NOTICE the NULL field... indicating that this error has no child, i.e. it is the bottom-most error.

See also the section on writing error messages.

Subversion internally uses UTF-8 to store its data. This also applies to the 'message' string. APR is assumed to return its data in the current locale, so any text returned by APR needs conversion to UTF-8 before inclusion in the message string.
If you receive an error, you have three choices:
1. Handle the error yourself. Use either your own code, or just call the primitive svn_handle_error(err). (This routine unwinds the error stack and prints out messages converting them from UTF-8 to the current locale.)
  
  When your routine receives an error which it intends to ignore or handle itself, be sure to clean it up using svn_error_clear(). Any time such an error is not cleared constitutes a memory leak.
  
  A function that returns an error is not required to initialize its output parameters.
2. Throw the error upwards, unmodified:
```
        error = some_routine(foo);
        if (error)
          return svn_error_trace(error);
        
```
  Actually, a better way to do this would be with the SVN_ERR() macro, which does the same thing:
```
        SVN_ERR(some_routine(foo));
        
```
3. Throw the error upwards, wrapping it in a new error structure by including it as the "child" argument:
```
        error = some_routine(foo);
        if (error)
          {
           svn_error_t *wrapper = svn_error_create(SVN_ERR_FOO, error,
                                                   "Authorization failed");
           return wrapper;
          }
        
```
  Of course, there's a convenience routine which creates a wrapper error with the same fields as the child, except for your custom message:
```
        error = some_routine(foo);
        if (error)
          {
           return svn_error_quick_wrap(error, 
                                       "Authorization failed");
          }
        
```
  The same can (and should) be done by using the SVN_ERR_W() macro:
```
          SVN_ERR_W(some_routine(foo), "Authorization failed");
        
```
In cases (b) and (c) it is important to know that resources allocated by your routine which are associated with a pool, are automatically cleaned up when the pool is destroyed. This means that there is no need to cleanup these resources before passing the error. There is therefore no reason not to use the SVN_ERR() and SVN_ERR_W() macros. Resources associated with pools are:
- Memory
- Files
  
  All files opened with apr_file_open are closed at pool cleanup. Subversion uses this function in its svn_io_file_* api, which means that files opened with svn_io_file_* or apr_file_open will be closed at pool cleanup.
  
  Some files (lock files for example) need to be removed when an operation is finished. APR has the APR_DELONCLOSE flag for this purpose. The following functions create files which are removed on pool cleanup:
  - apr_file_open and svn_io_file_open (when passed the APR_DELONCLOSE flag)
  - svn_io_open_unique_file (when passed TRUE in its delete_on_close)
  Locked files are unlocked if they were locked using svn_io_file_lock.
The SVN_ERR() macro will create a wrapped error when SVN_ERR__TRACING is defined. This helps developers determine what caused the error, and can be enabled with the --enable-maintainer-mode option to configure.
Sometimes, you just want to return whatever a called function returns, usually at the end of your own function. Avoid the temptation to directly return the result:
```
    /* Don't do this! */
    return some_routine(foo);
```
Instead, use the svn_error_trace meta-function to return the value. This ensures that stack tracing happens correctly when enabled.
```
    return svn_error_trace(some_routine(foo));
```

Secure coding guidelines ¶

Just like almost any other programming language, C has undesirable features which enables an attacker to make your program fail in predictable ways, often to the attacker's benefit. The goal of these guidelines is to make you aware of the pitfalls of C as they apply to the Subversion project. You are encouraged to keep these pitfalls in mind when reviewing code of your peers, as even the most skilled and paranoid programmers make occasional mistakes.

Input validation is the act of defining legal input and rejecting everything else. The code must perform input validation on all untrusted input.

Security boundaries:

A security boundary in the Subversion server code must be identified as such as this enables auditors to quickly determine the quality of the boundary. Security boundaries exist where the running code has access to information the user does not or where the code runs with privileges above those of the user making the request. Typical examples of such is code that does access control or an application with the SUID bit set.

Functions which make calls to a security boundary must include validation checks of the arguments passed. Functions which themselves are security boundaries should audit the input received and alarm when invoked with improper values.

[### todo: need some examples from Subversion here...]

String operations:

Use the string functions provided in apr_strings.h instead of standard C library functions that write to strings. The APR functions are safer because they do bounds-checking and dest allocation automatically. Although there may be circumstances where it's theoretically safe to use plain C string functions (such as when you already know the lengths of the source and dest), please use the APR functions anyway, so the code is less brittle and more reviewable.

Password storage:

Help users keep their passwords secret: When the client reads or writes password locally, it should ensure that the file is mode 0600. If the file is readable by other users, the client should exit with a message that tells the user to change the filemode due to the risk of exposure.

Destruction of stacked resources ¶

Some resources need destruction to ensure correct functioning of the application. Such resources include files, especially since open files cannot be deleted on Windows.

When writing an API which creates and returns a stream, in the background this stream may be stacked on a file or other stream. To ensure correct destruction of the resources the stream is built upon, it must correctly call the destructors of the stream(s) it is built upon (owns).

At first in https://svn.haxx.se/dev/archive-2005-12/0487.shtml and later in https://svn.haxx.se/dev/archive-2005-12/0633.shtml this was discussed in more general terms for files, streams, editors and window handlers.

As Greg Hudson put it:

On consideration, here is where I would like us to be:

Streams which read from or write to an underlying object own that object, i.e. closing the stream closes the underlying object, if applicable.

The layer (function or data type) which created a stream is responsible for closing it, except when the above rule applies.

Window handlers are thought of as an odd kind of stream, and passing the final NULL window is considered closing the stream.

If you think of apply_textdelta as creating a window handler, then I don't think we're too far off. svn_stream_from_aprfile isn't owning its subsidiary file, svn_txdelta_apply is erroneously taking responsibility for closing the window stream it is passed, and there may be some other deviations.

There is one exception to the rules above though. When a stream is passed to a function as an argument (for example: the 'out' parameter of svn_client_cat2()), that routine can't call the streams destructor, since it did not create that resource.

If svn_client_cat2() creates a stream, it must also call the destructor for that stream. By the above model, that stream will call the destructor for the 'out' parameter. This is however wrong, because the responsibility to destruct the 'out' parameter lies elsewhere.

To solve this problem, at least in the stream case, svn_stream_disown() has been introduced. This function wraps a stream, making sure it's not destroyed, even though any streams stacked upon it may try to do so.

Variadic argument lists ¶

When you call a function that accepts a variable number of arguments and expects the list to be terminated with a null pointer constant (an example of such a function is apr_pstrcat), do not use the NULL symbol to terminate the list. Depending on compiler and platform, NULL may or may not be a pointer-sized constant; if it isn't the function may end up reading data beyond the end of the argument list.

Instead, use SVN_VA_NULL (defined since 1.9 in svn_types.h), which is guaranteed to be a null pointer constant. For example:

   return apr_pstrcat(cmd->temp_pool, "Cannot parse expression '",
                      arg2, "' in SVNPath: ", expr_err, SVN_VA_NULL);

Other coding conventions ¶

In addition to the GNU standards, Subversion uses these conventions:

When using a path or file name as input to most Subversion APIs, be sure to convert them to Subversion's internal/canonical form using the svn_dirent_internal_style() API. Alternately, when receiving a path or file name as output from a Subversion API, convert them into the expected form for your platform using the svn_dirent_local_style() API.
Use only spaces for indenting code, never tabs. Tab display width is not standardized enough, and anyway it's easier to manually adjust indentation that uses spaces.
Restrict lines to 79 columns, so that code will display well in a minimal standard display window. (There can be exceptions, such as when declaring a block of 80-column text with a few extra columns taken up by indentation, quotes, etc., if splitting each line in two would be unreasonably messy.)
All published functions, variables, and structures must be signified with the corresponding library name - such as libsvn_wc's svn_wc_adm_open. All library-internal declarations made in a library-private header file (such as libsvn_wc/wc.h) must be signified by two underscores after the library prefix (such as svn_wc__ensure_directory). All declarations private to a single file (such as the static function get_entry_url inside of libsvn_wc/update_editor.c) do not require any additional namespace decorations. Symbols that need to be used outside a library, but still are not public are put in a shared header file in the include/private/ directory, and use the double underscore notation. Such symbols may be used by Subversion core code only.

To recap:
```
         /* Part of published API: subversion/include/svn_wc.h */
         svn_wc_adm_open()            
         #define SVN_WC_ADM_DIR_NAME ...
         typedef enum svn_wc_schedule_t ...

         /* For use within one library only: subversion/libsvn_wc/wc.h */
         svn_wc__ensure_directory()   
         #define SVN_WC__BASE_EXT ... 
         typedef struct svn_wc__compat_notify_baton_t ...

         /* For use within one file: subversion/libsvn_wc/update_editor.c */ 
         get_entry_url()
         struct handler_baton {

         /* For internal use in svn core code only:
            subversion/include/private/svn_wc_private.h */
         svn_wc__entry_versioned()
      
```
Pre-Subversion 1.5, private symbols which needed to be used outside of a library were put into public header files, using the double underscore notation. This practice has been abandoned, and any such symbols are legacy, maintained for backwards compatibility.
In text strings that might be printed out (or otherwise made available) to users, use only forward quotes around paths and other quotable things. For example:
```
         $ svn revert foo
         svn: warning: svn_wc_is_wc_root: 'foo' is not a versioned resource
         $
      
```
There used to be a lot of strings that used a backtick for the first quote (`foo' instead of 'foo'), but that looked bad in some fonts, and also messed up some people's auto-highlighting, so we settled on the convention of always using forward quotes.

If you use Emacs, put something like this in your .emacs file, so you get svn-dev.el and svnbook.el when needed:

         ;;; Begin Subversion development section
         (defun my-find-file-hook ()
           (let ((svn-tree-path (expand-file-name "~/projects/subversion"))
                 (book-tree-path (expand-file-name "~/projects/svnbook")))
             (cond
              ((string-match svn-tree-path buffer-file-name)
               (load (concat svn-tree-path "/tools/dev/svn-dev")))
              ((string-match book-tree-path buffer-file-name)
               ;; Handle load exception for svnbook.el, because it tries to
               ;; load psgml, and not everyone has that available.
               (condition-case nil
                   (load (concat book-tree-path "/src/tools/svnbook"))
                 (error
                  (message "(Ignored problem loading svnbook.el.)")))))))

         (add-hook 'find-file-hooks 'my-find-file-hook)
         ;;; End Subversion development section

You'll need to customize the path for your setup, of course. You can also make the regexp to string-match more selective; for example, one developer says:

      > Here's the regexp I'm using:
      > 
      >     "src/svn/[^/]*/\\(subversion\\|tools\\|build\\)/"
      >
      > Two things to notice there: (1) I sometimes have several
      > working copies checked out under ...src/svn, and I want the
      > regexp to match all of them; (2) I want the hook to catch only
      > in "our" directories within the working copy, so I match
      > "subversion", "tools" and "build" explicitly; I don't want to
      > use GNU style in the APR that's checked out into my repo. :-)

We have a tradition of not marking files with the names of individual authors (i.e., we don't put lines like "Author: foo" or "@author foo" in a special position at the top of a source file). This is to discourage territoriality — even when a file has only one author, we want to make sure others feel free to make changes. People might be unnecessarily hesitant if someone appears to have staked a personal claim to the file.
Put two spaces between the end of one sentence and the start of the next. This helps readability, and allows people to use their editors' sentence-motion and -manipulation commands.
There are many other unspoken conventions maintained throughout the code, that are only noticed when someone unintentionally fails to follow them. Just try to have a sensitive eye for the way things are done, and when in doubt, ask.

Writing log messages ¶

Every commit needs a log message.

The intended audience for a log message is a developer who is already familiar with Subversion, but not necessarily familiar with this particular commit. Usually when someone goes back and reads a change, he no longer has in his head all the context around that change. This is true even if he is the author of the change! All the discussions and mailing list threads and everything else may be forgotten; the only clue to what the change is about comes from the log message and the diff itself. People revisit changes with surprising frequency, too: for example, it might be months after the original commit and now the change is being ported to a maintenance branch.

The log message is the introduction to the change.

If you are working on a branch, prefix your log message with:

   On the 'name-of-branch' branch: (Start of your log message)

Start your log message with one line indicating the general nature of the change, and follow that with a descriptive paragraph if necessary.

This helps put developers in the right frame of mind for reading the rest of the log message.

If the commit is just one simple change to one file, then you can dispense with the general description and simply go straight to the detailed description, in the standard filename-then-symbol format shown below.

Throughout the log message, use full sentences, not sentence fragments. Fragments are more often ambiguous, and it takes only a few more seconds to write out what you mean. Certain fragments like "Doc fix", "New file", or "New function" are acceptable because they are standard idioms, and all further details should appear in the source code.

The log message should name every affected function, variable, macro, makefile target, grammar rule, etc, including the names of symbols that are being removed in this commit. This helps people searching through the logs later. Don't hide names in wildcards, because the globbed portion may be what someone searches for later. For example, this is bad:

   * subversion/libsvn_ra_pigeons/twirl.c
     (twirling_baton_*): Removed these obsolete structures.
     (handle_parser_warning): Pass data directly to callees, instead
      of storing in twirling_baton_*.

   * subversion/libsvn_ra_pigeons/twirl.h: Fix indentation.

Later on, when someone is trying to figure out what happened to `twirling_baton_fast', they may not find it if they just search for "_fast". A better entry would be:

   * subversion/libsvn_ra_pigeons/twirl.c
     (twirling_baton_fast, twirling_baton_slow): Removed these
      obsolete structures. 
     (handle_parser_warning): Pass data directly to callees, instead
      of storing in twirling_baton_*. 

   * subversion/libsvn_ra_pigeons/twirl.h: Fix indentation.

The wildcard is okay in the description for `handle_parser_warning', but only because the two structures were mentioned by full name elsewhere in the log entry.

You should also include property changes in your log messages. For example, if you were to modify the "svn:ignore" property on the trunk, you might put something like this in your log:

   * trunk/ (svn:ignore): Ignore 'build'.

The above only applies to properties you maintain, not those maintained by subversion like "svn:mergeinfo".

Note how each file gets its own entry prefixed with an "*", and the changes within a file are grouped by symbol, with the symbols listed in parentheses followed by a colon, followed by text describing the change. Please adhere to this format, even when only one file is changed — not only does consistency aid readability, it also allows software to colorize log entries automatically.

As an exception to the above, if you make exactly the same change in several files, list all the changed files in one entry. For example:

   * subversion/libsvn_ra_pigeons/twirl.c,
     subversion/libsvn_ra_pigeons/roost.c:
     Include svn_private_config.h.

If all the changed files are deep inside the source tree, you can shorten the file name entries by noting the common prefix before the change entries:

   [in subversion/bindings/swig/birdsong]

   * dialects/nightingale.c (get_base_pitch): Allow 3/4-tone
     pitch variation to account for trait variability amongst
     isolated populations Erithacus megarhynchos.

   * dialects/gallus_domesticus.c: Remove. Unreliable due to
     extremely low brain-to-body mass ratio.

If your change is related to a specific issue in the issue tracker, then include a string like "issue #N" in the log message, but make sure you still summarize what the change is about. For example, if a patch resolves issue #1729, then the log message might be:

   Fix issue #1729: Don't crash because of a missing file.

   * subversion/libsvn_ra_ansible/get_editor.c
     (frobnicate_file): Check that file exists before frobnicating.

Try to put related changes together. For example, if you create svn_ra_get_ansible2(), deprecating svn_ra_get_ansible(), then those two things should be near each other in the log message:

   * subversion/include/svn_ra.h
     (svn_ra_get_ansible2): New prototype, obsoletes svn_ra_get_ansible.
     (svn_ra_get_ansible): Deprecate.

For large changes or change groups, group the log entry into paragraphs separated by blank lines. Each paragraph should be a set of changes that accomplishes a single goal, and each group should start with a sentence or two summarizing the change. Truly independent changes should be made in separate commits, of course.

See Crediting for how to give credit to someone else if you are committing their patch, or committing a change they suggested.

One should never need the log entries to understand the current code. If you find yourself writing a significant explanation in the log, you should consider carefully whether your text doesn't actually belong in a comment, alongside the code it explains. Here's an example of doing it right:

   (consume_count): If `count' is unreasonable, return 0 and don't
    advance input pointer.

And then, in `consume_count' in `cplus-dem.c':

   while (isdigit((unsigned char)**type))
     {
       count *= 10;
       count += **type - '0';
       /* A sanity check.  Otherwise a symbol like
         `_Utf390_1__1_9223372036854775807__9223372036854775'
         can cause this function to return a negative value.
         In this case we just consume until the end of the string.  */
      if (count > strlen(*type))
        {
          *type = save;
          return 0;
        }

This is why a new function, for example, needs only a log entry saying "New Function" --- all the details should be in the source.

You can make common-sense exceptions to the need to name everything that was changed. For example, if you have made a change which requires trivial changes throughout the rest of the program (e.g., renaming a variable), you needn't name all the functions affected, you can just say "All callers changed". When renaming any symbol, please remember to mention both the old and new names, for traceability; see r861020 for an example.

In general, there is a tension between making entries easy to find by searching for identifiers, and wasting time or producing unreadable entries by being exhaustive. Use the above guidelines and your best judgment, and be considerate of your fellow developers. (Also, use "svn log" to see how others have been writing their log entries.)

Log messages for documentation or translation have somewhat looser guidelines. The requirement to name every symbol obviously does not apply, and if the change is just one more increment in a continuous process such as translation, it's not even necessary to name every file. Just briefly summarize the change, for example: "More work on Malagasy translation." Please write your log messages in English, so everybody involved in the project can understand the changes you made.

If you're using a branch to "checkpoint" your code, and don't feel it's ready for review, please put some sort of notice at the top of the log message, after the 'On the 'xxx' branch notice', such as:

   *** checkpoint commit -- please don't waste your time reviewing it ***

And if a later commit on that branch should be reviewed, then please supply, in the log message, the appropriate 'svn diff' command, since the diff would likely involve two non-adjacent commits on that branch, and reviewers shouldn't have to spend time figuring out which ones they are.

Crediting ¶

It is very important to record code contributions in a consistent and parseable way. This allows us to write scripts to figure out who has been actively contributing — and what they have contributed — so we can spot potential new committers quickly. The Subversion project uses human-readable but machine-parseable fields in log messages to accomplish this.

When committing a patch written by someone else, use "Patch by: " at the beginning of a line to indicate the author:

   Fix issue #1729: Don't crash because of a missing file.

   * subversion/libsvn_ra_ansible/get_editor.c
     (frobnicate_file): Check that file exists before frobnicating.

   Patch by: J. Random <jrandom@example.com>

If multiple individuals wrote the patch, list them each on a separate line — making sure to start each continuation line with whitespace. Non-committers should be listed by name, if known, and e-mail. Full and partial committers should be listed by their canonical usernames from COMMITTERS (the leftmost column in that file). Additionally, "me" is an acceptable shorthand for the person actually committing the change.

   Fix issue #1729: Don't crash because of a missing file.

   * subversion/libsvn_ra_ansible/get_editor.c
     (frobnicate_file): Check that file exists before frobnicating.

   Patch by: J. Random <jrandom@example.com>
             Enrico Caruso <codingtenor@codingtenor.com>
             jcommitter
             me

If someone found the bug or pointed out the problem, but didn't write the patch, indicate their contribution with "Found by: " (or "Reported by: "):

   Fix issue #1729: Don't crash because of a missing file.

   * subversion/libsvn_ra_ansible/get_editor.c
     (frobnicate_file): Check that file exists before frobnicating.

   Found by: J. Random <jrandom@example.com>

If someone suggested something useful, but didn't write the patch, indicate their contribution with "Suggested by: ":

   Extend the Contribulyzer syntax to distinguish finds from ideas.

   * www/hacking.html (crediting): Adjust accordingly.

   Suggested by: dlr

If someone test-drove a patch, use "Tested by: ":

    Fix issue #23: random crashes on FreeBSD 3.14.
    
    Tested by: Old Platformer
    (I couldn't reproduce the problem, but Old hasn't seen any crashes since
    he applied the patch.)
    
    * subversion/libsvn_fs_sieve/obliterate.c
      (cover_up): Account for sieve(2) returning 6.

If someone reviewed the change, use "Review by: " (or "Reviewed by: " if you prefer):

   Fix issue #1729: Don't crash because of a missing file.

   * subversion/libsvn_ra_ansible/get_editor.c
     (frobnicate_file): Check that file exists before frobnicating.

   Review by: Eagle Eyes <eeyes@example.com>

A field may have multiple lines, and a log message may contain any combination of fields:

   Fix issue #1729: Don't crash because of a missing file.

   * subversion/libsvn_ra_ansible/get_editor.c
     (frobnicate_file): Check that file exists before frobnicating.

   Patch by: J. Random <jrandom@example.com>
             Enrico Caruso <codingtenor@codingtenor.com>
             me
   Found by: J. Random <jrandom@example.com>
   Review by: Eagle Eyes <eeyes@example.com>
              jcommitter

Further details about a contribution should be listed in a parenthetical aside immediately after the corresponding field. Such an aside always applies to the field right above it; in the following example, the fields have been spaced out for readability, but note that the spacing is optional and not necessary for parseability:

   Fix issue #1729: Don't crash because of a missing file.

   * subversion/libsvn_ra_ansible/get_editor.c
     (frobnicate_file): Check that file exists before frobnicating.

   Patch by: J. Random <jrandom@example.com>
   (Tweaked by me.)

   Review by: Eagle Eyes <eeyes@example.com>
              jcommitter
   (Eagle Eyes caught an off-by-one-error in the basename extraction.)

Currently, these fields

   Patch by:
   Suggested by:
   Found by:
   Review by:
   Tested by:

are the only officially-supported crediting fields (where "supported" means scripts know to look for them), and they are widely used in Subversion log messages. Future fields will probably be of the form "VERB by: ", and from time to time someone may use a field that sounds official but really is not — for example, there are a few instances of "Inspired by: ". These are okay, but try to use an official field, or a parenthetical aside, in preference to creating your own. Also, don't use "Reported by: " when the reporter is already recorded in an issue; instead, simply refer to the issue.

Look over Subversion's existing log messages to see how to use these fields in practice. This command from the top of your trunk working copy will help:

$ contrib/client-side/search-svnlog.pl "(Patch|Review|Suggested) by: "

The script runs 'svn log' internally. You can customize the invocation of 'svn log' by piping its output to the script; the '-f -' arguments are required for this. For example, to search only the last 100 log messages:

$ svn log -l 100 | contrib/client-side/search-svnlog.pl -f - "(Patch|Review|Suggested) by: "

Note: The "Approved by: " field seen in some commit messages is totally unrelated to these crediting fields, and is generally not parsed by scripts. It is simply the standard syntax for indicating either who approved a partial committer's commit outside their usual area, or (in the case of merges to release branches) who voted for the change to be merged.

Github ¶

The Subversion repository is mirrored to GitHub at https://github.com/apache/subversion/.

Some users might create pull requests in GitHub. If the code is committed to the Subversion repository, make sure to include text in the log message to automatically close the pull request:

This fixes #NNN in GitHub

To manage pull requests without committing code, you must have a GitHub account connected to your ASF id and you must have the triager role assigned to your account by ASF Infra.