Skip to content

Adding codelists to a project

Adding a codelist🔗

Your example research template doesn't include any codelists but the folder structure and text files that are needed to include codelists already exist. Take a look at the codelists/codelists.txt file in the repo, this file is currently empty but any codelists that you add to your project will appear here.

You can add a codelist from OpenCodelists to your project by manually editing the codelists.txt file or by using the opensafely codelists add command.

For example, running the following command in your terminal:

opensafely codelists add https://www.opencodelists.org/codelist/opensafely/covid-identification/2020-06-03/

will add the OpenSAFELY COVID Identification codelist to codelists.txt and also download and add opensafely-covid-identification.csv to your project.

Manually editing codelists.txt🔗

The naming convention of the line that you need to add to the codelists/codelists.txt file follows this structure: a <codelist-id>is followed by / and a <version-id>. Note that the version ID is a sequence of 8 characters. Some codelists may also have a version tag in the form of a date (YYYY-MM-DD) or a version number (e.g., v1.2) that can be used in place of the version ID.

<codelist-id>/<version-id>

To find this information on the page for each of the codelists on OpenCodelists, see orange boxes in the screenshot below.

Finding the codelist ID and version ID on OpenCodelists.

You need to add each line into a new line of the codelists.txt file. The next time you run the command opensafely codelists update in your terminal, the codelists you specified earlier will be added to the the codelists/ subfolder in your project automatically so you don't need to add these files manually to your project.

For example, a codelists.txt file of a project may consist of four different lines:

opensafely/aplastic-anaemia/58ac196d
opensafely/asplenia/3ce9e642
opensafely/current-asthma/2020-05-06
primis-covid19-vacc-uptake/bmi_stage/v1.2

After running the command opensafely codelists update the following four .csv files will be added to your project:

opensafely-aplastic-anaemia.csv
opensafely-asplenia.csv
opensafely-current-asthma.csv
primis-covid19-vacc-uptake-bmi_stage.csv

A codelist may be owned by an individual user, rather than an organisation. In this case, the entry in codelists.txt follows this structure: user/<username>/<codelist-id>/<version-id>.

If necessary, during initial development you can even import codelists this way before they are published (provided they have been put "under review", not in "draft" state), but ensure they are finalised and updated in your study before running in the real data.

Adding/updating a codelist CSV file🔗

Once you have listed the codelists you need from OpenCodelists in the codelists.txt file, you can download the specified files into the codelists/ folder using the opensafely program by running

opensafely codelists update

This command should be re-run every time a codelist is added or removed from the codelists.txt file. Running this command will automatically generate a file called codelists.json, which contains a dictionary of codelists files, the URLs they have been downloaded from, download dates, and secure hash algorithms (SHA). The file should not be manually edited; however, you will need to add and commit the change and push to GitHub. If you don't, or a newer version is available than that committed, the automated tests will fail with an error message. Beware that in Windows, if one or more of these codelist files is open then this command won't be able to run; close them first.

If necessary, you can also import CSVs not via OpenCodelists - just manually copy the CSV files into codelists/. However, we would recommend uploading these to OpenCodelists to import them as above. Note, if you are also using some codelists from OpenCodelists, any manually imported codelists should be stored in a local_codelists folder so that they are not overwritten in the next step, as manual changes to CSV files will be clobbered the next time the command is run.

See more on using Codelists in your study definition in Working with codelists.