Please note that not all of the Python classes supporting the structure describe below, will necessarily be part of the 1.0 release of PATKIT.
It is possible to work with many differenet kinds of data directory sturctures when using PATKIT. The description below matches fairly closely the internal representation of data in PATKIT and how PATKIT will save data and metadata when saving and/or exporting results.
Following this kind of structure when saving the data in the first place, therefore makes data management easier by keeping the original data and the files generated by PATKIT in the same place when they correspond to each other.
The directory tree organisation follows the hierarchy of the Database classes. Depending on if a Trial level is needed – if there is more than one datasource – there are two directory structures used by PATKIT. Here is the one with only one datasource (which may have produced more than one kind of data):
└── dataset
└── participant
└── session
└── [file types]
With two or more datasources (systems with their own internal syncronisation) it is easy to go with structure like the one below:
└── dataset
└── datasource
└── participant
└── session
└── [file types]
However, since we want to cross synchronise the datasources, it is a better idea to do this to keep files that correspond to each other close together in the directory tree:
└── dataset
└── participant
└── session
└── datasource
└── [file types]
Specifically the extra level for datasource after session will help keep shared file types (most systems will have one or more wav files in the saved data) from clashing. Below is a more detailed example in which the different file types have been sorted to subdirectories.
└── dataset
├── participant 1
│ ├── session 1
│ │ ├── AAA
│ │ │ ├── wav # including TextGrids
│ │ │ ├── ultrasound # including .param files
│ │ │ ├── prompts # .txt files
│ │ │ └── video # .avi files
│ │ └── EVA
│ │ ├── wav # including TextGrids, if needed
│ │ │ # given they are already in AAA
│ │ └── oral_airflow # .oaf files]
│ └── session 2
│ └── [etc]
└── participant 2
└── [etc]
Some sources like RASL will produce this sort of directory structure by default, others like AAA by default put all saved files in the same directory leaving out the final level in this example.
Another optional decision is whether individual datatypes are stored in subdirectories of a session. This one really depends on how difficult it will be to browse through a directories content if all files for a session are in it.
If files are devided into subdirectories by type, then it is still a good idea to keep .wav files and TextGrid files in the same directory. This makes it easier when working with them in Praat.
There is a temptation of putting individual trials in their own subdirectories. Grouping files by trial rather than type mostly makes it more difficult to find a file when looking for one manually.