Skip to content

A good Git repository

Note

This list is inspired by the JOSS review criteria and by the Data Carpentry project's reproducibility.

This lists all the important parts of a good Git repository, that is a repository that others will want to use/replicate/browse. If you work alone with your own private repository, those tips are not necessary useful. They are for the final code that will come with your MSc thesis.

General

  • The repository should be publicly available

Documentation

  • There should be a README file that indicates

    • The purpose of the project, what the code is for,
    • Installation instructions
    • Example usage
    • API documentation (if this applies)
  • The README should be a text file (readme.md or readme.txt), and not a binary file like Word

  • There should be a LICENSE file. (Help with choosing a license)
  • There should be a CITATION file that tells users how to site the project, data, and code
  • A changelog.md detailing the changes between the releases should be available (help with changelog)
  • There should be clear guidelines for third-parties wishing to: (1) contribute to the software; (2) report issues or problems with the software; (3) seek support

Organization

  • Folders should be used to separate data, code, documentation, and results

    • It is the custom to put all source code in /src
    • And /test for unit tests
    • But each language will have different setups and habits
    • The files should use a consistent naming scheme that indicates what they contain

Files that should not be added to the repository

All files that are created by compiling/running code should not be added to the Git repository.

You can configure you repository to ignore files (so that you don't see them as modified and not committed to the repository).

Examples of files to ignore:

  • *.pyc for Python
  • *.aux for LaTeX
  • *.pdf for LaTeX
  • /build the whole build folder for C++

Software

  • There should be releases to package the software (how to create a release)
  • Is a container available to run the proiect (eg Docker)?
  • Are unit tests available for the code?

Data

  • If you project has data: are they included or a link is provided?
  • If data is not included, is this because it is not necessary or generated as part of the project?
  • Are your raw data (if any) and processed data files separated?

Others

  • The code should be well documented
  • Does the repository make use of continuous integration tools to insure internal reproduciblity?