LIG-AIKUMA used to enrich a child language acquisition corpus collected in Bolivia

We have been using daylong recordings (i.e., recordings gathered with a device worn by the child, as he/she goes about her normal day/night) for several years, including with children learning languages as diverse as Tsimane’ and Ju|’hoan. One of the most challenging aspects of annotating these data is to figure out the “cast of characters”: Deciding which voice is the mother’s, the siblings’, and sometimes even the child’s can be difficult because the annotator doesn’t know the family (and frequently doesn’t speak the language). (We rely on foreigners because locals who do know the family would have access to private conversations of the family, which seems problematic.)

This summer, we figured out how to surmount this obstacle, thanks to Lig Aikuma. When we picked up the device with the recording, we processed it with DiViMe (divime.readthedocs.io) to perform basic speech detection over samples extracted throughout the recording (15 seconds every 10 minutes). Using the respeak function of Lig Aikuma, we then played back sections that contained speech, and asked the participating family to recognize those voices. We are confident that this is the best solution because we can get naturalistic samples of how the most talkative people sound, including young children (siblings and friends), who may be too shy to speak when we are around.

Thanks, Lig Aikuma team, for this terrific tool!