Skip to content

Encoding Models

A project reproducing & replicating endocing models published by LeBel et al. 2023.

We documented our results in this Report.

code documentation documentation ML framework Code style: Ruff packaging framwork: Poetry tool: pre-commit


Reproduce figures

The reproduction of the figures requires the installation of the relevant dependencies and the correlation results of various encoding model runs. We provide these results in an online repository. See below (Reproduce correlation results) how to reproduce the correlation results.

  1. Clone repository, change directory into repository directory
git clone git@github.com:GabrielKP/enc.git
cd enc
  1. Setup virtual environment
# conda environment
conda create -n enc python=3.12
conda activate enc

# install package
pip install .

# install git-annex (required to download data from Lebel et al. 2011)

  1. Install git-annex and datalad (required to download data from Lebel et al. 2011)

  2. Download repository data

# Only download data required for plotting results
python src/encoders/download_data.py --figures [-d DATA_DIR]

Without -d DATA_DIR the data will be downloaded into the folder ds003020 of the project directory. To download the data into a custom dir, specify -d DATA_DIR (it is recommended to call the last folder ds003020 as that is the default dataset name).

  1. Setup/check config.yaml. It should be created automatically by the download script, if not copy it from config.example.yaml. The important dir is the DATA_DIR.

[!NOTE] If you get the error ImportError: cannot import name 'getargspec' from 'inspect' then try to update your datalad version python -m pip install datalad --upgrade

  1. Download correlation results

Download runs.zip file and unzip it such that you have a runs directory with the experiment folders as subdirectories:

runs/extension_ridgeCV
runs/replication_ridge_huth
runs/replication_ridgeCV
runs/reproduction
  1. Install inkscape (required for plotting):

Open the config.yaml and set the following values accordingly.

INKSCAPE_PATH: path/to/inkscape/binary
INKSCAPE_VERSION: X.Y.Z

For mac, you usually can find inkscape as described here.

  1. Configure pycortex (required for plotting):

Script

python src/encoders/update_pycortex_config.py

Manual

Find the location of your pycortext config with the python terminal. Type python in the command line with the virtual environment activated. Then execute following commands:

import cortex
cortex.options.usercfg

This should give you a path to the config file, copy it and exit the terminal.

# open the file with an editor of choice (e.g. vim)
vim path/to/options.cfg

Modify the entry at filestore to DATA_DIR/derivative/pycortex-db. Whereas DATA_DIR is the directory of the Lebel et al. data repository. E.g. if you did not specify a custom datadir, DATA_DIR then it is /path/to/this/repository/ds003020.

  1. Reproduce the plots in the figures:
# will create plots for all figures
python src/encoders/plot.py

Reproduce correlation results

  1. Follow step 1. and 2. from above.

  2. Download data

# all stories for the 3 subjects in our analysis
python src/encoders/download_data.py --stories all --subjects UTS01 UTS02 UTS03
  1. Run test regression
python src/encoders/run_all.py\
  --cross_validation simple\
  --subject UTS02\
  --feature eng1000\
  --n_train_stories 2\
  --n_repeats 3\
  --ridge_implementation ridgeCV\
  --run_folder_name example
  1. Run regressions reproducing our results
# Reproduction
python -m lebel_encoding.run_all_replication \
--subject UTS01 UTS02 UTS03 \
--feature eng1000 \
--test_story wheretheressmoke \
--n_train_stories 1 2 3 5 7 9 11 13 15 17 19 21 23 25 \
--n_repeats 15 \
--trim 5 \
--ndelays 4 \
--nboots 20 \
--chunklen 10 \
--nchunks 10 \
--use_corr \
--run_folder_name reproduction


# Replication ridgeCV
python -m encoders.run_all \
--subject UTS01 UTS02 UTS03 \
--feature eng1000 \
--ndelays 4 \
--interpolation 'lanczos' \
--ridge_implementation 'ridgeCV' \
--n_train_stories 1 2 3 5 7 9 11 13 15 17 19 21 23 25 \
--test_story 'wheretheressmoke' \
--cross_validation 'simple' \
--n_repeats 15 \
--no_keep_train_stories_in_mem \
--run_folder_name replication_ridgeCV


# Replication ridge_huth
python -m encoders.run_all \
--subject UTS01 UTS02 UTS03 \
--feature eng1000 \
--n_train_stories 1 2 3 5 7 9 11 13 15 17 19 21 23 25  \
--test_story wheretheressmoke \
--cross_validation 'simple' \
--interpolation 'lanczos' \
--ridge_implementation 'ridge_huth' \
--n_repeats 15 \
--ndelays 4 \
--nboots 20 \
--chunklen 10 \
--nchunks 10 \
--use_corr \
--no_keep_train_stories_in_mem \
--run_folder_name replication_ridge_huth


# Extension
python -m encoders.run_all \
--subject UTS01 UTS02 UTS03 \
--feature 'envelope' \
--do_shuffle \
--ndelays 4 \
--interpolation 'lanczos' \
--ridge_implementation 'ridgeCV' \
--n_train_stories 1 2 3 5 7 9 11 13 15 17 19 21 23 25 \
--test_story 'wheretheressmoke' \
--cross_validation 'simple' \
--n_repeats 15 \
--no_keep_train_stories_in_mem \
--run_folder_name extension_ridgeCV

It is likely you will need to run the analyses on a HPC system due to RAM requirements.

For examples how we deployed the scripts on a cluster, see the hpc folder.

Development setup

  1. Install poetry
  2. Tested for poetry==2.1.2
  3. Run following commands:
# setup up conda environment (optional)
conda create -n enc python=3.12
conda activate enc

# install dependencies
poetry install

# install pre-commit
pre-commit install

# download some data for testing
python src/encoders/download_data.py # subject 2 & few stories

# Setup the config with the editor of your choice
nano config.yaml

Team

License