Recordings
We recorded spontaneous cries from 15 boys and 13 girls (we aimed to record between 12 and 15 children of each sex within the study period) of on average 4 months of age (M = 116 ± 21 days), while they were given their bath by their parents at home. Recordings were performed with a microphone (Sennheiser MD42) positioned approximately 30 cm from the baby’s face and connected to a Marantz PMD690/W1B recorder. To limit pseudo-replication, we recorded each baby during three independent sessions.
Sound analyses
We isolated two sequences of crying from each recording session, resulting in a total of six crying sequences for each baby (mean sequence duration = 7.8 ± 1.1 s), and extracted a set of 15 temporal and spectral variables from each sequence. To describe the acoustic structure of cries, we used a dedicated batch-processing script in PRAAT [6], which contained four distinct procedures. These procedures have been applied successfully to the characterisation of acoustic variation in previous studies of babies’ cries [20].
The first procedure of the script characterized the fundamental frequency (F0 or pitch) and the intonation (F0 contour variation) of the cries. The F0 contour was extracted using the To Pitch (cc), command. The experimenter systematically inspected the extracted Pitch contour and verified it using a narrow band spectrogram displaying the first 2000 Hz of the signal. Spurious octave jumps were manually corrected by selecting the appropriate F0 candidate values in the edited pitch object. In the relatively rare segments including double vibration (where a weak subharmonic equal to half the fundamental frequency is present), the F0 was systematically preferred over the subharmonic. Each extracted F0 contour (pitch object) was saved as a text file for future reference. These numerical representations were used to derive the following parameters: %voiced (percentage of the signal that is characterized by a detectable pitch), mean F0, max F0, min F0 (respectively the mean, maximum and minimum F0 calculated over the duration of the signal) and F0CV (coefficient of variation of F0 over the duration of the signal). In a second step, two distinct smoothing algorithms (Smooth… command in Praat) were performed on the pitch contour: the first allowed a relatively broad bandwidth (Smooth… command parameter = 25), to suppress very short-term frequency fluctuation while preserving minor intonation events (such as bleat-like frequency modulation), and the second only allowed a narrow bandwidth (Smooth… command parameter = 2), to only characterize strong F0 modulation (major intonation events). Inflection points were counted (as each change in the sign of the contour’s derivative) after each smoothing procedure, and divided by the total duration of the voiced segments in each recording, resulting in two distinct indexes of F0 variation (inflex25 and inflex2).
The second procedure focused on the intensity contour and allowed the characterization of the variability of the cries’ intensity by calculating intCV, the coefficient of variation of the intensity contour estimated using the To intensity command in PRAAT.
A third procedure focused on the periodic quality of the signal and measured the harmonicity (harm, degree of acoustic periodicity, measured as the ratio of harmonics to noise in the signal and expressed in dB), an index of jitter (jitter, small fluctuation in periodicity measured as the average of ‘local’, ‘rap’ and ‘ppq5’ measures in PRAAT) and an index of shimmer (shimmer, small variation in amplitude between consecutive periods, measured as the average of ‘local’, ‘apq5’ and ‘apq11’ parameters in PRAAT).
The final procedure characterized the spectral envelope of the cry by applying a cepstral smoothing procedure (bandwidth: 900 Hz) to each crying sequence, followed by the extraction of the first four spectral prominences (fsp1, fsp2, fsp3, fsp4) of the resulting smoothed spectrum. Because babies’ cries can be strongly nasalized [41], and can contain biphonation phenomena [42] that can create resonance-independent broadband components, the measured spectral peaks cannot be safely considered as accurate measure of formant frequencies and are therefore termed spectral prominences. However, the observed values 1.2, 3.1, 5.7 and 8.6 kHz are consistent with the newborn/infant vocal tract length (~7.5 cm between 2 and 6 months; [46]) predicting vocal tract resonances at about 1.1, 3.3, 5.5 and 7.7 kHz).
Statistical analysis of the acoustic structure
To investigate acoustic differences between boys’ and girls’ cries (study 1), we first performed a Principal Component Analysis to collapse the 15 acoustic parameters into two single composite scores (principal components PC1 and PC2). We then used two linear mixed effect models with PC1 and PC2 as dependent measures (fixed effect: “sex”; random effect: “baby identity”). P values were obtained from a likelihood-ratio test comparing the fit of full models with a null model lacking sex effect. We also compared each 15 acoustic parameters between sexes using a mixed model analysis with “sex” as fixed effect, “baby identity” as random effect, “age” and “weight” as covariates. Finally, we used a cross-validated and permuted Discriminant Function Analysis (pDFA) to assess the possibility of discrimination between both sexes. A training data set (2/3 of the cries from each individual) was used to generate linear discriminant functions on the basis of the 15 acoustic features describing the cries. The remaining 1/3 of the cries were used as a cross-validation set to measure the percentage of correctly classified cries. The mean effect size was calculated from 100 random iterations. To obtain the statistical significance of the effect size, we compared the percent correct obtained in the analysis to the distribution of percent correct values obtained by randomly assigning the sex to each baby. This distribution was obtained from 1000 randomly created data sets where the sex identity of each individual is permuted (permuted DFA) [26, 31]. All data were analyzed using R [39].
Sound re-synthesis
One randomly selected cry from each of 24 babies (13 boys and 11 girls whose recordings were already available at the time of the playback experiment) was re-synthesised using the PSOLA algorithm (“Change Gender” command in PRAAT)[30]. PSOLA re-synthesis enables the independent rescaling of the Fundamental Frequency (F0, affecting the perceived pitch) while leaving all other parameters of the signal unchanged. PSOLA is a well established method for independently manipulating acoustic features in animal vocalisations (e.g. [40]) as well as human speech signals (e.g. [9, 16]). From each natural cry, we created a set of stimuli varying in their mean F0 only. We chose mean F0 values of 310 Hz, 375 Hz, 440 Hz, 505 Hz and 570 Hz, to fit to the mean cry pitch ± n SD (with n = 0, 1 and 2) as measured in our sample (Fig. 1).
Playback experiments
The experiment aimed at testing the effect of cry pitch on sex attribution to natural cries by adult listeners (study 2) was performed using a Marantz PMD690/W1B recorder and Sennheiser HD 25-1 headphones. All participants were parents of 3-month-old babies (25 fathers and 27 mothers - we aimed at recording between 25 and 30 parents of each sex depending on recruiting availability). These participants were the parents of the babies whose recordings were used in the sound analyses and subsequent listening experiments. Each adult rated two successive experimental sets of cries, with 5 minutes separating the two sets. Each set included 12 cries: three different cries from each of four babies unfamiliar to the parent (two boys and two girls). The order of presentation of the cries was randomized and the adult listeners were unaware of the number of babies and of the sex ratio in the set of cries. Listeners were given the option to answer that they could not guess the sex of the baby. The playback test was conducted as a double-blind experiment.
The remaining psycho-acoustic experiments were performed in quiet rooms at the University Jean Monnet/Saint-Etienne or at the University of Sussex, from Dell (desktop) or Apple (laptop) computers using the Experiment Multiple Forced Choice tool in PRAAT. Stimuli were played via Sennheiser HD 201 Closed Back Headphones or Dynamode DH-660 headsets. Stimuli presentation was randomized and participants were invited to pause after every 12 ratings. First, participants entered each rating by clicking on the chosen button on the screen, then they could either confirm their choice (“OK” button), replay the sound (replay button) or change their rating (“oops” button).
To test if the pitch of the cry affected listeners’ sex attributions (study 3), we played back re-synthesised cry variants (120 stimuli) to 32 adult listeners (21 women and 11 men; 18 French parents followed by 14 undergraduate students in Psychology at the University of Sussex, attending a final year module; recruitment was terminated when the participation of the 14 undergraduate students brought the sample above our target of 30 participants). Participants were asked to identify the sex of the baby from listening to one of its cries.
All the subsequent experiments (studies 4 and 5) involved second year undergraduate students in Psychology at the University of Sussex following the Cognitive Psychology module. Participants only performed one of the four experiments (one type of rating), and were attributed to a given experiment by splitting the full sample of into four groups of approximately equal size, based on the initial of their name (listed in alphabetical order). Groups of participants were tested simultaneously during several practical sessions. The minimum sample of 30 participants for each experiment was reached for all experiments. All tested participants who terminated the experiment and provided an output data file were included in the analysis.
To test the hypothesis that cry pitch affects perceived gender attributes (masculinity in male babies and femininity in female babies, study 4), we used our set of natural cries and associated re-synthesised pitch variants in listening experiments where adult participants were asked to rate gender attributes of babies from listening to their cries. Thirty listeners (25 women and 5 men) were told that the cries were from 4 month-old baby boys and asked to rate their masculinity (on a Likert scale of 1 to 7: 1 = extremely low, 4 = average, 7 = extremely high). The question was: “Please rate the masculinity of this baby boy on a scale of 1 to 7: 1 = extremely feminine, 4 = neither feminine nor masculine, 7 = extremely masculine”. Thirty-eight different listeners (26 women and 12 men) were told that these cries originated from 4 month-old baby girls and asked to rate their femininity (also on a scale of 1 to 7). The question was: “Please rate the femininity of this baby girl on a scale of 1 to 7: 1 = extremely masculine, 4 = neither masculine nor feminine, 7 = extremely feminine”. Each adult rated a total of 24 natural cries, from 13 boys and 11 girls, and 120 re-synthesised cries, corresponding to the 5 pitch variants for the 24 exemplars (the presentation of natural and re-synthesised cries was randomized throughout).
To test the effect of cry pitch and declared baby sex on the perception of discomfort (study 5), different adult listeners were asked to rate the level of discomfort expressed by each cry, here too using a seven-point Likert scale. Two groups of participants were asked to rate our sets of 24 natural and 120 re-synthesised cries: one set of participants (30 women and 6 men) was told that the cries originated from boys, and the other (30 women and 11 men) that they originated from girls.
Statistical analysis of the results of playback experiments
The effect of natural and artificial variation in acoustic parameters on listener’s ratings were tested using Linear Mixed Models (for continuous outcome variables: femininity, masculinity and discomfort) and Generalized Linear Mixed Models (with logistic regression link for the binary variable sex) in SPSS 21 for MAC. Reported statistics correspond to fully factorial models. Model structures are detailed in the footnotes of the Supplementary Tables. The sizes of main effects (fixed mean F0) or correlations (naturally varying meanF0) were estimated using R coefficients derived from simple linear regressions between the main meanF0 and the ratings averaged by exemplar and/or listener (sex, femininity, masculinity and discomfort).