Encoding Models

A project reproducing & replicating endocing models published by LeBel et al. 2023.

We documented our results in this Report.

Reproducing the published figures

This guide shows how to reproduce the figures from the precomputed model performance scores from all experiments (model-brain correlations) using the provided code. It requires the installation of the relevant dependencies and the correlation results of various encoding model runs.

1. Clone repository, navigate into created repository directory

git clone git@github.com:GabrielKP/enc.git
cd enc

2. Setup a virtual environment and install dependencies

Next, create a virtual environment with Python 3.12 using your preferred manager.

For example, using conda:

# conda environment
conda create -n enc python=3.12
conda activate enc

Or using uv (in project root directory):

uv venv --python 3.12 --seed # creates .venv environment folder and installs python and pip
source .venv/bin/activate # activate the environment

Now that you have python and pin ready, install the project code and its dependencies:

# install package
pip install .

# install git-annex (required to download data from Lebel et al. 2011)

3. Install git-annex and datalad (required to download data from Lebel et al. 2023)

4. Download the subject-specific cortical surface data

You can use our wrapper script that downloads the data from the original fMRI data repository with Datalad:

# Only download data required for plotting results
python src/encoders/download_data.py --figures [--data_dir DATA_DIR]

Without --data_dir DATA_DIR the data will be downloaded into the folder ds003020 of the project directory. To download the data into a custom dir, specify --data_dir DATA_DIR (it is recommended to call the last folder ds003020 as that is the default dataset name).

5. Setup/check `config.yaml`.

The config.yaml should be created automatically when you run the download script in step 4., if you can copy the example config file. Make sure that the DATA_DIR points to the folder where you downloaded the data to in step 4.:

CACHE_DIR: .cache
DATA_DIR: ds003020 # <-- should point to the download location
RUNS_DIR: runs
INKSCAPE_PATH: /path/to/inkscape
INKSCAPE_VERSION: X.Y.Z
TR_LEN: 2.0

[!NOTE] If you get the error ImportError: cannot import name 'getargspec' from 'inspect' then try to update your datalad version python -m pip install datalad --upgrade

6. Download the pre-computed experiment results (model performance scores)

Download runs.zip and unzip it such that you have a runs directory which contains the results of individual experiments in separate subfolders:

runs/extension_ridgeCV
runs/replication_ridge_huth
runs/replication_ridgeCV
runs/reproduction

[!NOTE] The runs folder should be placed in the project root (i.e. at the same level as data, src etc.)

7. Install inkscape (required for plotting):

Open the config.yaml and set the following values accordingly.

INKSCAPE_PATH: path/to/inkscape/binary
INKSCAPE_VERSION: X.Y.Z

For mac, you usually can find inkscape as described here.

8. Configure pycortex (required for plotting):

Using the provided script

You can use the script in the repository to configure the pycortex:

python src/encoders/update_pycortex_config.py

Configure manually

Find the location of your pycortext config with the python terminal. Type python in the command line with the virtual environment activated. Then execute following commands:

import cortex
cortex.options.usercfg

This should give you a path to the config file, copy it and exit the terminal.

# open the file with an editor of choice (e.g. vim)
vim path/to/options.cfg

Modify the entry at filestore to DATA_DIR/derivative/pycortex-db. Whereas DATA_DIR is the directory of the Lebel et al. data repository. E.g. if you did not specify a custom datadir, DATA_DIR then it is /path/to/this/repository/ds003020.

9. Run the script that plots the data:

You can now run the plotting script. From the project root folder call:

# will create plots for all figures
python src/encoders/plot.py

The script creates a plots folder in the project root folder with three subfolders, one for each published figure:

./plots
├── figure1
│   ├── colorbar.pdf
│   ├── colorbar.png
│   ├── colorbar.svg
│   ├── replication_ridgeCV_semantic_performance.pdf
│   ├── replication_ridgeCV_semantic_performance.png
│   ├── replication_ridgeCV_semantic_performance.svg
│   ├── reproduction_semantic_performance.pdf
│   ├── reproduction_semantic_performance.png
│   ├── reproduction_semantic_performance.svg
│   ├── training_curve_replication_ridgeCV.pdf
│   ├── training_curve_replication_ridgeCV.png
│   ├── training_curve_replication_ridgeCV.svg
│   ├── training_curve_reproduction.pdf
│   ├── training_curve_reproduction.png
│   └── training_curve_reproduction.svg
├── figure2
│   ├── training_curve_ridge_huth.pdf
│   ├── training_curve_ridge_huth.png
│   ├── training_curve_ridge_huth.svg
│   ├── training_curve_ridgeCV.pdf
│   ├── training_curve_ridgeCV.png
│   └── training_curve_ridgeCV.svg
└── figure3
    ├── colorbar.pdf
    ├── colorbar.png
    ├── colorbar.svg
    ├── semantic_performance_extension_ridgeCV.pdf
    ├── semantic_performance_extension_ridgeCV.png
    ├── semantic_performance_extension_ridgeCV.svg
    ├── training_curve_extension_ridgeCV.pdf
    ├── training_curve_extension_ridgeCV.png
    └── training_curve_extension_ridgeCV.svg

Reproduce correlation results

Follow step 1. and 2. from above.
Download data

# all stories for the 3 subjects in our analysis
python src/encoders/download_data.py --stories all --subjects UTS01 UTS02 UTS03

Run test regression

python src/encoders/run_all.py\
  --cross_validation simple\
  --subject UTS02\
  --feature eng1000\
  --n_train_stories 2\
  --n_repeats 3\
  --ridge_implementation ridgeCV\
  --run_folder_name example

Run regressions reproducing our results

# Reproduction
python -m lebel_encoding.run_all_replication \
--subject UTS01 UTS02 UTS03 \
--feature eng1000 \
--test_story wheretheressmoke \
--n_train_stories 1 2 3 5 7 9 11 13 15 17 19 21 23 25 \
--n_repeats 15 \
--trim 5 \
--ndelays 4 \
--nboots 20 \
--chunklen 10 \
--nchunks 10 \
--use_corr \
--run_folder_name reproduction


# Replication ridgeCV
python -m encoders.run_all \
--subject UTS01 UTS02 UTS03 \
--feature eng1000 \
--ndelays 4 \
--interpolation 'lanczos' \
--ridge_implementation 'ridgeCV' \
--n_train_stories 1 2 3 5 7 9 11 13 15 17 19 21 23 25 \
--test_story 'wheretheressmoke' \
--cross_validation 'simple' \
--n_repeats 15 \
--no_keep_train_stories_in_mem \
--run_folder_name replication_ridgeCV


# Replication ridge_huth
python -m encoders.run_all \
--subject UTS01 UTS02 UTS03 \
--feature eng1000 \
--n_train_stories 1 2 3 5 7 9 11 13 15 17 19 21 23 25  \
--test_story wheretheressmoke \
--cross_validation 'simple' \
--interpolation 'lanczos' \
--ridge_implementation 'ridge_huth' \
--n_repeats 15 \
--ndelays 4 \
--nboots 20 \
--chunklen 10 \
--nchunks 10 \
--use_corr \
--no_keep_train_stories_in_mem \
--run_folder_name replication_ridge_huth


# Extension
python -m encoders.run_all \
--subject UTS01 UTS02 UTS03 \
--feature 'envelope' \
--do_shuffle \
--ndelays 4 \
--interpolation 'lanczos' \
--ridge_implementation 'ridgeCV' \
--n_train_stories 1 2 3 5 7 9 11 13 15 17 19 21 23 25 \
--test_story 'wheretheressmoke' \
--cross_validation 'simple' \
--n_repeats 15 \
--no_keep_train_stories_in_mem \
--run_folder_name extension_ridgeCV

It is likely you will need to run the analyses on a HPC system due to RAM requirements.

For examples how we deployed the scripts on a cluster, see the hpc folder.

Development setup

Install poetry
Tested for poetry==2.1.2
Run following commands:

# setup up conda environment (optional)
conda create -n enc python=3.12
conda activate enc

# install dependencies
poetry install

# install pre-commit
pre-commit install

# download some data for testing
python src/encoders/download_data.py # subject 2 & few stories

# Setup the config with the editor of your choice
nano config.yaml

Encoding Models

Reproducing the published figures

1. Clone repository, navigate into created repository directory

2. Setup a virtual environment and install dependencies

3. Install git-annex and datalad (required to download data from Lebel et al. 2023)

4. Download the subject-specific cortical surface data

5. Setup/check `config.yaml`.

6. Download the pre-computed experiment results (model performance scores)

7. Install inkscape (required for plotting):

8. Configure pycortex (required for plotting):

9. Run the script that plots the data:

Reproduce correlation results

Development setup

Team

License

Encoding Models

Reproducing the published figures

1. Clone repository, navigate into created repository directory

2. Setup a virtual environment and install dependencies

3. Install git-annex and datalad (required to download data from Lebel et al. 2023)

4. Download the subject-specific cortical surface data

5. Setup/check config.yaml.

6. Download the pre-computed experiment results (model performance scores)

7. Install inkscape (required for plotting):

8. Configure pycortex (required for plotting):

9. Run the script that plots the data:

Reproduce correlation results

Development setup

Team

License

5. Setup/check `config.yaml`.