patkit

Configuration files

NOTE: Parts of the description haven’t yet been implemented for version 0.16.

All (or almost all) PATKIT configuration should be stored with the data. This is a philosophical choice that stems from the fact that even GUI settings (such as which frequencies are shown on a spectrogram and with which color map) potentially affect analysis results. The idea is that standardising analysis and annotation settings across a project or research group should be easy. If you find that the current way PATKIT attempts to do this, does not work for you, please get in touch and we’ll see how things can be improved.

There are some very simple ways of circumventing this system and PATKIT will not try to prevent that. However, things might not work quite as intended if the assumptions about where files are stored are broken. Please refer to Data Management for how PATKIT assumes recorded/imported data is separated from PATKIT/derived/saved data and configuration files.

At time of releasing v0.16.0, there is an example of configuration files in the GitHub repository in the scenarios folder and an example of a simulation configuration file in example_configs/ultrafest2024.

TODO: 0.20 Make full examples of data and configuration of various complexity.

Default settings and user preferences

PATKIT does contain default parameter files and you can save your own preferred settings which PATKIT will place in the ~/.patkit (on Linux/macOS) or %userprofile%\.patkit (on Windows) folder. However, when PATKIT uses given settings with data, it will store those settings with the data. This means that default settings and user preferences get copied to the data directories as needed.

Command history

By default, the command history of the interactive commandline is stored with the global (user specific) configuration in the .patkit folder inside the history file. It is plain text and of the same format as the .python_history file. In fact, with a bit of tweaking a user could use .python_history instead because the PATKIT interactive interpreter is actually just a Python interpreter with PATKIT data preloaded. However, this is only recommended if you know what you are doing.

Configuration files

PATKIT has a number of configuration files. Briefly they are

patkit_data.yaml which defines how data should be processed, which metrics, statistics (in the PATKIT-specific sense of an aggregate statistic such as mean image of an ultrasound recording), and annotation should be generated.
patkit_gui.yaml configures the PATKIT annotator GUI. It specifies what data is displayed and how. Essentially these are the settings that should be standardised for different annotators working on the same project or dataset.
patkit-publish.yaml specifies graphs, exported data and other outputs that are intended for publication.
patkit-simulation.yaml specifies simulated data and analysis of that data.
patkit-manifest.yaml is autogenerated by PATKIT. It is a list of Scenarios that refer to recorded data and stored with the recorded data. This file can be added to by humans if a Scenario gets copied from one computer or user to another. This will require some care but will save a lot of time if the PATKIT data in question requires a lot of computation to generate.
session-config.yaml specifies the data sources in a session and the path structure, that is where to find files of different types.
spline-config.yaml tells PATKIT - under import_config - how to import a given set of splines and - under data_config how to trim the splines before analysis.

Below you can find what is intended to be a comprehensive list of configuration fields available in the different config files.

`patkit_data.yaml`

These are set globally so that they may be omitted locally. They can be overridden locally though and should be when, for example, different parts of the data have been recorded in areas with different mains frequencies.

# What precision is treated as equal when comparing annotation boundaries.
epsilon: 0.00001

# Used in filtering sound signals before beep detection.
mains_frequency: 50

Path to recorded data. This should be relative to the ´patkit_data.yaml` file.

recorded_data_path: "../../recorded_data/tongue_data_1_1"

Various flags are gathered in one group.

detect_beep specifies if PATKIT should attempt to detect a 1 kiloHertz beep at the beginning of recordings.
If test is set to True PATKIT will process only first 10 recordings. This is handy when testing out settings.

flags:
  detect_beep: True # Should onset beep detection be run?
  test: False # Run on only the first 10 recordings.

Aggregate images can be generated with different metrics from ultrasound images. Default is to generate the image from raw data, but this can be overridden.

Also by default the images will be preloaded - that is they will be generated when PATKIT is started rather than when a recording is opened in the annotator.

Finally, it is generally a good idea to release data memory. This means that ultrasound videos are not kept in the computers memory after a recording has been processed unless that recording is being displayed. Only set this option to False if you know that you have very very much RAM at your disposal. This option repeats in many other contexts below.

aggregate_image_arguments:
  metrics:
    - 'mean'
  run_on_interpolated_data: False
  preload: True
  release_data_memory: True

Pixel difference can be generated with many different norms and these can be specified in parallel. Same is true of timesteps used in the calculations. mask_images is used to optionally mask either top or bottom of the image from analysis. If pd_on_interpolated_data is True then PD will be calculated on the interpolated, human readable fanned images. See above what preload and releasing data memory do.

pd_arguments:
  'norms':
    - 'l1'
    - 'l2'
  'timesteps':
    - 1
  mask_images: False
  pd_on_interpolated_data: False
  preload: True
  release_data_memory: True

Spline metric arguments are very similar to PD arguments in syntax and meaning except they specify the metrics calculated on tongue splines, so the choice of metrics includes now annd for Average Nearest Neighbour Distance, mpbpd for Median Point-by-Point Distance, modified_curvature for the Modified Curvature Index, and fourier for the Fourier coefficients.

spline_metric_arguments:
 'metrics':
   - 'annd'
   - 'mpbpd'
   - 'modified_curvature'
   - 'fourier'
 'timesteps':
   - 3
 'release_data_memory': False
 'preload': True

TODO 0.24: Verify that all these work. Especially the exclusion list.

Distance matrices are used for evaluating ultrasound probe alignment. They have their own exclusion list (give it as a relative path) and metrics (currently only mean_squared_error).

slice_max_step simulates rotating the probe by slicing incrementally so that the sector is always the same size. This parameter determines how many steps of size one to take. slice_max_step is exclusive with slice_step_to which generates a pair of maximally distant sectors for each step size ranging from one to slice_step_to.

Setting sort to True sorts the recordings by their prompts. Additionally setting sort_criteria will match the prompts in the order given and then sort alphabetically in each match group. Non-matching recordings will be added to a last group which also will be sorted within group.

preload and release_data_memory work as above.

distance_matrix_arguments:
 exclusion_list: "alignment/data/patkit_exclusion_list.yaml"
 metrics:
   - 'mean_squared_error'
 preload: True
 release_data_memory: False
#  slice_max_step: 6
 slice_step_to: 6
 sort: True
 sort_criteria:
   - 'i'
   - 'o'

`patkit_gui.yaml`

Most of these parameters deal with data display.

General parameters for axes in the main plot

Height ratio of data display area vs textgrid tier display area. This does not control directly the height of individual data displays nor individual tier displays, but instead controls the ratio between the sum of data displays vs sum of tier displays.

data_and_tier_height_ratios: 
  data: 2
  tier: 1

Shared configuration for data axes and tier axes.

general_axes_params:
  data_axes:
    sharex: True
    auto_ylim: True
  tier_axes:
    sharex: True

Axes definitions for the main plot

data_axes:
  PD l1:
    modalities:
      - PD l1 on RawUltrasound
#    modality_names:
#      - l1
    sharex: True
    auto_ylim: True
  PD l2:
    modalities:
      - PD l2 on RawUltrasound
#    modality_names:
#      - l2
    sharex: True
    ylim:
      - 100
      - 2000
  PD normalised:
    modalities:
      - PD l1 on RawUltrasound
      - PD l2 on RawUltrasound
    modality_names:
      - l1
      - l2
    sharex: True
    normalisation: both # none, peak, bottom, both
  spectrogram2:
    sharex: True
  wav:
    sharex: True
  # density:
  #   sharex: False

TextGrid display parameters

# Tiers drawn on the data axes. Ignored if the tiers are not found in the TextGrid.  
pervasive_tiers:
  - Segment
  - Segments
  - segment
  - segments
  - phoneme

x (time) axis parameters

You can either set the limits or set auto_xlim to True which means that the whole recording will be displayed. This is implemented as a greedy ‘all’ in case some modalities extend further in time than others.

# Initial limits for x-axis
#xlim:
#  - -.25
#  - 1.5
auto_xlim: True

General display style parameters

# Font parameters
default_font_size: 10

Dark vs light mode. Accepted values are dark, follow_system, and light.

color_scheme: dark

`patkit-publish.yaml`

This file will be documented in a later release.

TODO 1.0: Document this.

`patkit-simulation.yaml`

Simulations are run on mock up tongue splines/contours extracted manually from Peter Ladefoged’s Vowels and Consonants. A commented version of the configuration walked through below can be found on PATKIT’s GitHub repository in the example_configs folder.

The first parameters define where to save the resulting plots, if existing files should be overwritten, and what message prefix should be used in logging messages. If the overwrite_plots parameter is omitted, overwriting will be confirmed individually for each existing plot file.

output_directory: "ultrafest2024/"
overwrite_plots: True
logging_notice_base: "Ultrafest 2024 simulation: "

Sound/contour selection is defined with IPA characters:

sounds: 
  - 'æ'
  - 'i'

This parameter defines the used point-wise perturbations in millimeters.

perturbations: 
  - -2 
  - -1 
  - -.5 
  - .5 
  - 1 
  - 2

This parameter group defines the spline distance metric simulation.

contour_distance:
  metric: "annd"
  timestep: 1
  sound_pair_params:
    sounds:
      - 'æ'
      - 'i'
    perturbed:
      - 'second'
      - 'first'
    combinations: full_cartesian # also accepted: only_cross, only_self

This (rather simple) parameter group defines the spline shape metric simulation.

spline_shape_params:
  metric: 'mci'

And finally the result figures are defined with the following parameter groups. To omit a figure from the final plotting just comment out or delete the group from the configuration file.

# This produces a plot with change in the distance metric plotted
# as a ray on the perturbed point of the contour.
distance_metric_ray_plot:
  figure_size:
    - 10.1
    - 4.72
  scale: 200
  color_threshold:
    - .1
    - -.1

# Same as distance_metric_ray_plot but for shape metrics.
shape_metric_ray_plot:
  figure_size:
    - 7
    - 3.35
  scale: 20
  color_threshold:
    - 2
    - .5

# Two panel plot that demonstrates how the perturbations are applied.
# See the Ultrafest 2024 extended abstract for an example.
demonstration_contour_plot:
  filename: "demonstration_contour_plot.pdf"
  figure_size:
    - 6.4
    - 4.8
  sounds:
    - 'æ'
    - 'i'

`patkit-manifest.yaml`

These files are generated by PATKIT and may occasionally need to be amended by users when Scenarios and Exercises get moved around between users and computers. The file format is shown below.

Paths should be given as relative to the manifest file. That way if the subtree containing the recorded and the PATKIT data gets moved, the links do not break.

Scenarios:
  ../../derived_data/tongue_data_1_1
  ../../derived_data/tongue_data_1_1_exclusions
  ../../derived_data/tongue_data_1_1_splines
Exercises:
  ../../exercises/tongue_data_1_1_Pertti
  ../../exercises/tongue_data_1_1_Phoebe_Phonetician
  ../../exercises/tongue_data_1_1_Participant_1

`session-config.yaml`

TODO 0.27: Update the below description.

Until the data structure update in v0.19.0 only one data source is allowed. Accepted names will be AAA, RASL, EVA (once flow data reading is included), and WAV for plain audio recordings with possible accompanying TextGrids.

# This tells SATKIT which metadata importer to use.
data_source_name: AAA

Paths detail where different types of data can be found.

# Paths to data, metadata and instructions.
paths:
  # This is where we read the data and metadata from. Leave the data type
  # specific directories empty if everything is in one directory. Dataset's
  # root directory is determined at run time .
  wav:
  textgrid: 
  ultrasound:

  # Where to find the spline import data. This is assumed to be relative to data
  # path.
  spline_config: spline_config.yaml

Spline config specifies spline format (these can vary quite a lot) and gives additional instructions on how to trim splines (which spline points are unreliable) before processing.

`spline-config.yaml`

TODO 0.24: This will be updated in the next configuration update.

import_config:
  # Single spline file for all recordings (True) or one for each recording
  # (False).
  single_spline_file: True

  # Only one of the following will be in use.
  # If a single spline file, what is it called.
  spline_file: File003_splines.csv
  # If not a single spline, what glob pattern should be used to find the splines.
  # E.g. '*.csv'
  spline_file_extension: '_splines.csv'

  # Do the files have a header row?
  # Please note that possible header row information is ignored.
  headers: True

  # What delimiter does the file use. If left empty, this defaults to a tabulator.
  delimiter:

  # Either 'polar' or 'Cartesian' 
  coordinates: polar

  # Are the coordinates interleaved in 
  #  interleaved format (True): point1/x point1/y point2/x point2/y
  #  or non-interleaved (False): point1/x point2/x ... point1/y point2/y
  interleaved_coords: False

  # These are listed in order of appearance in the file. 
  # Please note that possible header row information is ignored.
  # Accepted values:
    # - ignore: marks a column to be ignored, unlike the others below, 
    #   can be used several times
    # - id: used to identify the speaker, 
    #   often contained in a csv field called 'family name'
    # - given names: appended to 'id' if not marked 'ignore'
    # - date and time: dat3 and time of recording
    # - prompt: prompt of recording, used to identify the recording with 'id'
    # - annotation label: optional field containing annotation information
    # - time in recording: timestamp of the frame this spline belongs to
    # - number of spline points: number of sample points in the spline used 
    #   to parse the coordinates and possible confidence information
  meta_columns:
    - id
    - date and time
    - time in recording
    - prompt
    - number of spline points

  # These will be either interleaved or not as specified by 'interleaved coords'.
  # Confidence values are always assumed to be non-interleaved.
  # Accepted values: 'r' with 'phi', 'x' with 'y', and 'confidence'
  data_columns:
    - r
    - phi
    - confidence

Finally, ignore_points specifies how many points should be trimmed from the front and how many from the back of the spline (in this order). To trim no points, give the value 0.

data_config:
  # How many points should be ignored at the front and at the end of a spline.
  # All of the data is always read in, This setting is used in plotting and
  # processing the splines. Defaults to 0 at front and 0 at back if not specified:
  # ignore_points:
  #   - 0
  #   - 0
  ignore_points:
  - 11
  - 0

This site is open source. Improve this page.