NOTE: Parts of the description haven’t yet been implemented for version 0.16.
All (or almost all) PATKIT configuration should be stored with the data. This is a philosophical choice that stems from the fact that even GUI settings (such as which frequencies are shown on a spectrogram and with which color map) potentially affect analysis results. The idea is that standardising analysis and annotation settings across a project or research group should be easy. If you find that the current way PATKIT attempts to do this, does not work for you, please get in touch and we’ll see how things can be improved.
There are some very simple ways of circumventing this system and PATKIT will not try to prevent that. However, things might not work quite as intended if the assumptions about where files are stored are broken. Please refer to Data Management for how PATKIT assumes recorded/imported data is separated from PATKIT/derived/saved data and configuration files.
At time of releasing v0.16.0, there is an example of configuration files in the
GitHub repository in the scenarios folder and an example of a simulation
configuration file in example_configs/ultrafest2024.
TODO: 0.20 Make full examples of data and configuration of various complexity.
PATKIT does contain default parameter files and you can save your own preferred
settings which PATKIT will place in the ~/.patkit (on Linux/macOS) or
%userprofile%\.patkit (on Windows) folder. However, when PATKIT uses given
settings with data, it will store those settings with the data. This means that
default settings and user preferences get copied to the data directories as
needed.
By default, the command history of the interactive commandline is stored with
the global (user specific) configuration in the .patkit folder inside the
history file. It is plain text and of the same format as the .python_history
file. In fact, with a bit of tweaking a user could use .python_history instead
because the PATKIT interactive interpreter is actually just a Python interpreter
with PATKIT data preloaded. However, this is only recommended if you know what
you are doing.
PATKIT has a number of configuration files. Briefly they are
patkit_data.yaml which defines how data should be processed, which metrics,
statistics (in the PATKIT-specific sense of an aggregate statistic such as
mean image of an ultrasound recording), and annotation should be generated.patkit_gui.yaml configures the PATKIT annotator GUI. It specifies what data
is displayed and how. Essentially these are the settings that should be
standardised for different annotators working on the same project or dataset.patkit-publish.yaml specifies graphs, exported data and other outputs
that are intended for publication.patkit-simulation.yaml specifies simulated data and analysis of that data.patkit-manifest.yaml is autogenerated by PATKIT. It is a list of Scenarios
that refer to recorded data and stored with the recorded data. This file can
be added to by humans if a Scenario gets copied from one computer or user to
another. This will require some care but will save a lot of time if the
PATKIT data in question requires a lot of computation to generate.session-config.yaml specifies the data sources in a session and the path
structure, that is where to find files of different types.spline-config.yaml tells PATKIT - under import_config - how to import a
given set of splines and - under data_config how to trim the splines before analysis.Below you can find what is intended to be a comprehensive list of configuration fields available in the different config files.
patkit_data.yamlThese are set globally so that they may be omitted locally. They can be overridden locally though and should be when, for example, different parts of the data have been recorded in areas with different mains frequencies.
# What precision is treated as equal when comparing annotation boundaries.
epsilon: 0.00001
# Used in filtering sound signals before beep detection.
mains_frequency: 50
Path to recorded data. This should be relative to the ´patkit_data.yaml` file.
recorded_data_path: "../../recorded_data/tongue_data_1_1"
Various flags are gathered in one group.
detect_beep specifies if PATKIT should attempt to detect a 1 kiloHertz beep
at the beginning of recordings.test is set to True PATKIT will process only first 10 recordings. This
is handy when testing out settings.flags:
detect_beep: True # Should onset beep detection be run?
test: False # Run on only the first 10 recordings.
Aggregate images can be generated with different metrics from ultrasound images. Default is to generate the image from raw data, but this can be overridden.
Also by default the images will be preloaded - that is they will be generated when PATKIT is started rather than when a recording is opened in the annotator.
Finally, it is generally a good idea to release data memory. This means that
ultrasound videos are not kept in the computers memory after a recording has
been processed unless that recording is being displayed. Only set this option
to False if you know that you have very very much RAM at your disposal. This
option repeats in many other contexts below.
aggregate_image_arguments:
metrics:
- 'mean'
run_on_interpolated_data: False
preload: True
release_data_memory: True
Pixel difference can be generated with many different norms and these can be
specified in parallel. Same is true of timesteps used in the calculations.
mask_images is used to optionally mask either top or bottom of the image from
analysis. If pd_on_interpolated_data is True then PD will be calculated on
the interpolated, human readable fanned images. See above what preload and releasing data memory do.
pd_arguments:
'norms':
- 'l1'
- 'l2'
'timesteps':
- 1
mask_images: False
pd_on_interpolated_data: False
preload: True
release_data_memory: True
Spline metric arguments are very similar to PD arguments in syntax and meaning
except they specify the metrics calculated on tongue splines, so the choice of
metrics includes now annd for Average Nearest Neighbour Distance, mpbpd for
Median Point-by-Point Distance, modified_curvature for the Modified Curvature Index, and fourier for the Fourier coefficients.
spline_metric_arguments:
'metrics':
- 'annd'
- 'mpbpd'
- 'modified_curvature'
- 'fourier'
'timesteps':
- 3
'release_data_memory': False
'preload': True
TODO 0.24: Verify that all these work. Especially the exclusion list.
Distance matrices are used for evaluating ultrasound probe alignment. They have
their own exclusion list (give it as a relative path) and metrics (currently
only mean_squared_error).
slice_max_step simulates rotating the probe by slicing incrementally so that
the sector is always the same size. This parameter determines how many steps of
size one to take. slice_max_step is exclusive with slice_step_to which
generates a pair of maximally distant sectors for each step size ranging from
one to slice_step_to.
Setting sort to True sorts the recordings by their prompts. Additionally
setting sort_criteria will match the prompts in the order given and then sort
alphabetically in each match group. Non-matching recordings will be added to a last
group which also will be sorted within group.
preload and release_data_memory work as above.
distance_matrix_arguments:
exclusion_list: "alignment/data/patkit_exclusion_list.yaml"
metrics:
- 'mean_squared_error'
preload: True
release_data_memory: False
# slice_max_step: 6
slice_step_to: 6
sort: True
sort_criteria:
- 'i'
- 'o'
patkit_gui.yamlMost of these parameters deal with data display.
Height ratio of data display area vs textgrid tier display area. This does not control directly the height of individual data displays nor individual tier displays, but instead controls the ratio between the sum of data displays vs sum of tier displays.
data_and_tier_height_ratios:
data: 2
tier: 1
Shared configuration for data axes and tier axes.
general_axes_params:
data_axes:
sharex: True
auto_ylim: True
tier_axes:
sharex: True
data_axes:
PD l1:
modalities:
- PD l1 on RawUltrasound
# modality_names:
# - l1
sharex: True
auto_ylim: True
PD l2:
modalities:
- PD l2 on RawUltrasound
# modality_names:
# - l2
sharex: True
ylim:
- 100
- 2000
PD normalised:
modalities:
- PD l1 on RawUltrasound
- PD l2 on RawUltrasound
modality_names:
- l1
- l2
sharex: True
normalisation: both # none, peak, bottom, both
spectrogram2:
sharex: True
wav:
sharex: True
# density:
# sharex: False
# Tiers drawn on the data axes. Ignored if the tiers are not found in the TextGrid.
pervasive_tiers:
- Segment
- Segments
- segment
- segments
- phoneme
You can either set the limits or set auto_xlim to True which means that the
whole recording will be displayed. This is implemented as a greedy ‘all’ in
case some modalities extend further in time than others.
# Initial limits for x-axis
#xlim:
# - -.25
# - 1.5
auto_xlim: True
# Font parameters
default_font_size: 10
Dark vs light mode. Accepted values are dark, follow_system, and light.
color_scheme: dark
patkit-publish.yamlThis file will be documented in a later release.
TODO 1.0: Document this.
patkit-simulation.yamlSimulations are run on mock up tongue splines/contours extracted manually from
Peter Ladefoged’s Vowels and Consonants. A commented version of the
configuration walked through below can be found on PATKIT’s GitHub repository
in the example_configs folder.
The first parameters define where to save the resulting plots, if existing
files should be overwritten, and what message prefix should be used in logging
messages. If the overwrite_plots parameter is omitted, overwriting will be
confirmed individually for each existing plot file.
output_directory: "ultrafest2024/"
overwrite_plots: True
logging_notice_base: "Ultrafest 2024 simulation: "
Sound/contour selection is defined with IPA characters:
sounds:
- 'æ'
- 'i'
This parameter defines the used point-wise perturbations in millimeters.
perturbations:
- -2
- -1
- -.5
- .5
- 1
- 2
This parameter group defines the spline distance metric simulation.
contour_distance:
metric: "annd"
timestep: 1
sound_pair_params:
sounds:
- 'æ'
- 'i'
perturbed:
- 'second'
- 'first'
combinations: full_cartesian # also accepted: only_cross, only_self
This (rather simple) parameter group defines the spline shape metric simulation.
spline_shape_params:
metric: 'mci'
And finally the result figures are defined with the following parameter groups. To omit a figure from the final plotting just comment out or delete the group from the configuration file.
# This produces a plot with change in the distance metric plotted
# as a ray on the perturbed point of the contour.
distance_metric_ray_plot:
figure_size:
- 10.1
- 4.72
scale: 200
color_threshold:
- .1
- -.1
# Same as distance_metric_ray_plot but for shape metrics.
shape_metric_ray_plot:
figure_size:
- 7
- 3.35
scale: 20
color_threshold:
- 2
- .5
# Two panel plot that demonstrates how the perturbations are applied.
# See the Ultrafest 2024 extended abstract for an example.
demonstration_contour_plot:
filename: "demonstration_contour_plot.pdf"
figure_size:
- 6.4
- 4.8
sounds:
- 'æ'
- 'i'
patkit-manifest.yamlThese files are generated by PATKIT and may occasionally need to be amended by users when Scenarios and Exercises get moved around between users and computers. The file format is shown below.
Paths should be given as relative to the manifest file. That way if the subtree containing the recorded and the PATKIT data gets moved, the links do not break.
Scenarios:
../../derived_data/tongue_data_1_1
../../derived_data/tongue_data_1_1_exclusions
../../derived_data/tongue_data_1_1_splines
Exercises:
../../exercises/tongue_data_1_1_Pertti
../../exercises/tongue_data_1_1_Phoebe_Phonetician
../../exercises/tongue_data_1_1_Participant_1
session-config.yamlTODO 0.27: Update the below description.
Until the data structure update in v0.19.0 only one data source is allowed.
Accepted names will be AAA, RASL, EVA (once flow data reading is
included), and WAV for plain audio recordings with possible accompanying
TextGrids.
# This tells SATKIT which metadata importer to use.
data_source_name: AAA
Paths detail where different types of data can be found.
# Paths to data, metadata and instructions.
paths:
# This is where we read the data and metadata from. Leave the data type
# specific directories empty if everything is in one directory. Dataset's
# root directory is determined at run time .
wav:
textgrid:
ultrasound:
# Where to find the spline import data. This is assumed to be relative to data
# path.
spline_config: spline_config.yaml
Spline config specifies spline format (these can vary quite a lot) and gives additional instructions on how to trim splines (which spline points are unreliable) before processing.
spline-config.yamlTODO 0.24: This will be updated in the next configuration update.
import_config:
# Single spline file for all recordings (True) or one for each recording
# (False).
single_spline_file: True
# Only one of the following will be in use.
# If a single spline file, what is it called.
spline_file: File003_splines.csv
# If not a single spline, what glob pattern should be used to find the splines.
# E.g. '*.csv'
spline_file_extension: '_splines.csv'
# Do the files have a header row?
# Please note that possible header row information is ignored.
headers: True
# What delimiter does the file use. If left empty, this defaults to a tabulator.
delimiter:
# Either 'polar' or 'Cartesian'
coordinates: polar
# Are the coordinates interleaved in
# interleaved format (True): point1/x point1/y point2/x point2/y
# or non-interleaved (False): point1/x point2/x ... point1/y point2/y
interleaved_coords: False
# These are listed in order of appearance in the file.
# Please note that possible header row information is ignored.
# Accepted values:
# - ignore: marks a column to be ignored, unlike the others below,
# can be used several times
# - id: used to identify the speaker,
# often contained in a csv field called 'family name'
# - given names: appended to 'id' if not marked 'ignore'
# - date and time: dat3 and time of recording
# - prompt: prompt of recording, used to identify the recording with 'id'
# - annotation label: optional field containing annotation information
# - time in recording: timestamp of the frame this spline belongs to
# - number of spline points: number of sample points in the spline used
# to parse the coordinates and possible confidence information
meta_columns:
- id
- date and time
- time in recording
- prompt
- number of spline points
# These will be either interleaved or not as specified by 'interleaved coords'.
# Confidence values are always assumed to be non-interleaved.
# Accepted values: 'r' with 'phi', 'x' with 'y', and 'confidence'
data_columns:
- r
- phi
- confidence
Finally, ignore_points specifies how many points should be trimmed from the
front and how many from the back of the spline (in this order). To trim no
points, give the value 0.
data_config:
# How many points should be ignored at the front and at the end of a spline.
# All of the data is always read in, This setting is used in plotting and
# processing the splines. Defaults to 0 at front and 0 at back if not specified:
# ignore_points:
# - 0
# - 0
ignore_points:
- 11
- 0