The Voice of Bats: How Greater Mouse-eared Bats Recognize Individuals Based on Their Echolocation Calls

Yossi Yovel; Mariana Laura Melcon; Matthias O. Franz; Annette Denzinger; Hans-Ulrich Schnitzler

doi:10.1371/journal.pcbi.1000400

Abstract

Echolocating bats use the echoes from their echolocation calls to perceive their surroundings. The ability to use these continuously emitted calls, whose main function is not communication, for recognition of individual conspecifics might facilitate many of the social behaviours observed in bats. Several studies of individual-specific information in echolocation calls found some evidence for its existence but did not quantify or explain it. We used a direct paradigm to show that greater mouse-eared bats (Myotis myotis) can easily discriminate between individuals based on their echolocation calls and that they can generalize their knowledge to discriminate new individuals that they were not trained to recognize. We conclude that, despite their high variability, broadband bat-echolocation calls contain individual-specific information that is sufficient for recognition. An analysis of the call spectra showed that formant-related features are suitable cues for individual recognition. As a model for the bat's decision strategy, we trained nonlinear statistical classifiers to reproduce the behaviour of the bats, namely to repeat correct and incorrect decisions of the bats. The comparison of the bats with the model strongly implies that the bats are using a prototype classification approach: they learn the average call characteristics of individuals and use them as a reference for classification.

Author Summary

Animals must recognize each other in order to engage in social behaviour. Vocal communication signals could be helpful for recognizing individuals, especially in nocturnal organisms such as bats. Echolocating bats continuously emit special vocalizations, known as echolocation calls, and perceive their surroundings by analyzing the returning echoes. In this work we show that bats can use these vocalizations for the recognition of individuals, despite the fact that their main function is not communication. We used a statistical approach to analyze how the bats could do so. We created a computer model that reproduces the recognition behaviour of the bats. Our model suggests that the bats learn the average calls of other individuals and recognize individuals by comparing their calls with the learnt average representations.

Figures

Citation: Yovel Y, Melcon ML, Franz MO, Denzinger A, Schnitzler H-U (2009) The Voice of Bats: How Greater Mouse-eared Bats Recognize Individuals Based on Their Echolocation Calls. PLoS Comput Biol 5(6): e1000400. https://doi.org/10.1371/journal.pcbi.1000400

Editor: Karl J. Friston, University College London, United Kingdom

Received: January 13, 2009; Accepted: May 4, 2009; Published: June 5, 2009

Copyright: © 2009 Yovel et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Funding: This work was funded by SFB 550, by the Graduiertenkolleg Neurobiologie. It was supported in part by the IST Program of the European Community, under the PASCAL network of excellence, IST-2002-506778. This work was also supported by the human resources and mobility activity Marie Curie host fellowships for early stage research training under contract MEST-CT-2004-504321 PERACT by the European Union. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

Voice is defined as the entirety of all acoustic signals produced by the vocal organs of an organism and its ability to produce them. Vocalizations are mostly used for communication. They can contain information about identity, gender, maturity, health, behavioural context, etc [1]–[3]. Specific properties of the sound production and articulation apparatus are responsible for the individual-specific spectral properties of vocalizations. The human voice, for instance, reveals the identity of individuals and lately it has been shown that other animals can also recognize individuals according to their social vocalizations [4]–[10]. Social vocalizations constitute an important part of the vocal repertoire of bats. These vocalizations have been characterized for many species and contexts and were shown to contain individual signatures [11]–[23]. In addition to social vocalizations, microchiropteran bats constantly emit echolocation calls and use the returning echoes to perceive their surroundings [24]. These echolocation calls are tonal signals that exhibit a structured change in frequency over time that is normally less variable than that of the social vocalizations. The ability to recognize individuals based on echolocation calls might explain many of the social behaviours observed in bats [e.g., 16]. Several studies tried to find individual-specific cues in bat echolocation calls [2], [25]–[28]. Recently, the response of bats to the echolocation calls of different individuals has been tested and the results suggested that they could recognize individuals according to their echolocation calls [29].

The echolocation calls of the greater mouse-eared bats (Myotis myotis) used in this study are ∼3 ms long frequency-modulated (FM) down-sweeps ranging from ∼100 kHz to ∼30 kHz. The exact spectral-temporal structure of the calls changed depending on the task. We hypothesize that, despite this variability, the echolocation signals might contain individual-specific characteristics, generated by the bats' vocal apparatus, which are sufficient for individual recognition. We first tested whether bats can distinguish between individuals according to their echolocation calls using the most direct approach used until today: training greater mouse-eared bats to classify echolocation calls of other individuals played back to them in a two alternative forced choice (2-AFC) experiment. After showing that the bats can clearly recognize their conspecifics, we used a statistical approach, new in this field, to train statistical classifiers to reproduce the bats' behaviour, namely to make similar correct and incorrect decisions as the bats. Our approach offers two main advantages in comparison to former unsuccessful attempts to statistically identify individual bats according to their echolocation calls [30]. First, our method is almost unlimited in the number of parameters that can be fed into it. This enabled us to use the raw representations of the calls and not to limit ourselves to a set of parameters as was always the case before. Second, we used a large data set containing ca. 800 calls per bat. Such a large data set enables us to create a good model of the individual's call despite its large variability. We used the statistical classifier as a model of the bat's underlying decision process to show how classification is statistically possible and to understand how the bats might be able to recognize other individuals.

Results

Echolocation calls

All bats emitted calls typical for flying in confined spaces with a very characteristic spectral-temporal structure. Despite this repeating pattern, the spectral content of the calls varied largely among individuals for both behavioral and technical reasons (see Materials and Methods and Figure 1). There was also some intra-individual variability of the sweep rate (Table 1) depicting the differences in the time structure of the calls. Finally, it is worth emphasizing that the SNR of the calls varied dramatically (Table 1) as a result of the varying distance from the microphone.

Download:

Figure 1. Normalized spectra (means and SD) of four of the bats used in the experiments.

Each spectrum was normalized to have a maximum of 1. Note the overall similar shape and the high variability.

https://doi.org/10.1371/journal.pcbi.1000400.g001

Download:

Table 1. Basic Call Parameters.

https://doi.org/10.1371/journal.pcbi.1000400.t001

Behavioral classification experiments

The bats required 15–24 days before they were able to stably correctly recognize the individuals in more than 75% of the trials. The learning curves (Figure 2) fluctuated between days. After training, all bats were able to recognize S+ (a single call of the bat they learned to recognize) with much higher accuracy than chance level (Table 2).

Download:

Figure 2. The learning curves.

The correct decision percentage is presented as a function of the day of training for each bat. Training was stopped once a bat performed 75% or more of the trials for three consecutive days.

https://doi.org/10.1371/journal.pcbi.1000400.g002

Download:

Table 2. Bat Performance.

https://doi.org/10.1371/journal.pcbi.1000400.t002

Test of generality

Bats were able to generalize from the learned task to recognize S+ or avoid S− (a single call of the bat that they learned to avoid) when presented with calls of new bats that were never heard during training (Table 2). Most of the bats showed both a preference for S+ and an avoidance of S−. The higher percentage of approaching S+ when presented with S0 (a single call of a bat that they did not encounter during training) can be a result of the fact that the S+ calls in these experiments were taken from the training set and thus - the bats might have already heard them during training. The lower avoidance of S− when presented with S0 could result from the fact that they were familiar to the bats and the bats were even rewarded when approaching them during the test phase.

Machine classification

A linear classifier (Support Vector Machine – SVM) learned to classify the calls with high accuracy (correct decision rates of 81–90%). This was the case for both types of representations of the calls, i.e. the temporal-spectral spectrograms and the spectral power spectrum densities (PSD, Table 3) although in the case of the PSDs the performance was a bit lower (77–84%). This indicates that individual-specific information is abundant in the calls. The overall performance of the linear machines was similar to that of the bats.

Download:

Table 3. The Performance of Linear and Non-linear SVM Classifiers.

https://doi.org/10.1371/journal.pcbi.1000400.t003

Comparison of the metrics

Our main goal was to model the behavior of the bats. Therefore, more than the overall performance, we were interested to find a classifier that behaves like the bat in the sense that it makes more errors in trials that the model considers to be more difficult and vice versa. We assessed the similarity between the bat and its model by measuring the correlation between the performance of the bat and the performance of the model on the same test set (see Materials and Methods). The performance of the model was indirectly measured by calculating the distances between the pairs of calls in the test set. This reflects the metric of the model. A high correlation between the two indicates that the bat made more errors in trials that are considered to be difficult by the machine and vice versa. Except for a single case (using the PSD for the classification task of bat 6 vs. bat 1) the metrics (distances to the hyperplane) of the linear classifiers are actually negatively correlated with the error rate of the bats, implying that they were using different features than the model to classify the calls (Table 3). We were, however, able to train non-linear SVMs that correlated with the bat's behavior in each of the classification tasks. This was true both for the spectrograms and the PSDs, although the correlation seems a bit less salient in the case of the PSDs (Figure 3). The overall performance of the non-linear SVMs behaving most similarly to the bats was very close to that of the bats, when using the spectrograms and was a bit lower when using the PSDs (Table 3). In one case (classification of bat 5 vs. bat 2) the performance when using the PSDs was much lower.

Download:

Figure 3. Bats mean performance as a function of the non-linear classifiers' metric – the distance to the hyperplane.

The performance of each bat was normalized to a maximum of 1 for the distance class with the highest performance. The distance classes are organized in increasing distances from the hyperplane - i.e., 4 is the class farthest from the hyperplane (easiest to classify), while 1 is the closest (most difficult to classify). The positive correlation implies that the model behaves similarly to the bat.

https://doi.org/10.1371/journal.pcbi.1000400.g003

Single cue comparisons

To eliminate the possibility that a single simple cue was sufficient for classification we analyzed the commonly used call parameters (starting/terminal/maximum energy frequencies, bandwidth and call duration, Table 1) and tested the performance when relying solely on each of them. We used exactly the same pairs of calls that were presented to the bats in the testing phase and measured the percent of correct decisions if the bat would rely on one of the above parameters, (e.g. always go to the call with a lower or higher terminal frequency). In almost all cases, relying on any single cueresulted in a performance at chance level (45–55%). For the classification task of bat 2 vs. bat 5, using two single cues (the bandwidth or the initial frequency) was sufficient to correctly classify 60–65% of the calls - higher than chance but much lower than the observed performance.

Discussion

The voice of individual greater mouse-eared bats is specific enough that they can distinguish between the echolocation calls of conspecifics despite their extremely short duration and highly situation-dependent variability. The bats were able to generalize their knowledge to recognize the rewarded individual (S+) and avoide the unrewarded one (S−) when presented with the calls of new individuals that they had not heard during training (S0). A standard linear classifier (SVM) can be trained to fulfill the recognition task with an overall performance similar to that of the bats. The linear models, however, did not reproduce the decision metrics of the bats, implying that the discriminative features they were using were not the ones used by the bats. The linear model can be extended (after a nonlinear transformation of the data with an RBF kernel) to reproduce the behavior of the bats, in other words, the bats made more errors in trials that were considered difficult by the model. Thus, the analysis of these classifiers provides candidate discriminative features derived from the call statistics that might be used by the bats to distinguish between individuals.

Examining the PSDs of the calls is a straight-forward approach of searching for spectral individual-specific features. The PSDs of two bats (Figure 4A) reveal a general bimodal pattern in both bats with energy peaks around ∼65 kHz and ∼45 kHz. bat 1 (black), however, tends to have a higher average energy than bat 3 in the 65 kHz peak, while bat 3 (blue) tends to have higher energy in the ∼45 kHz peak.

Download:

Figure 4. Normalized PSDs (mean and SEM) of the calls of bat 1 (black) and bat 3 (blue).

(A) The mean of all calls; (B) The mean of calls misclassified by the bats; (C) The mean of the 15 calls closest to the hyperplane; (D) The mean of the 15 calls farthest from the hyperplane. (E) Difference between the mean PSDs of bat 1 and bat 3 for the four groups of calls shown in A–D. (F) Linear correlation coefficients (a measure of similarity) between the curves presented in Figure 3E representing the difference between the average PSDs (Figure 3A–D).

https://doi.org/10.1371/journal.pcbi.1000400.g004

An extremely over-simplified classification rule could be: “The call with lower energy at ∼65 kHz and higher energy at ∼45 kHz belongs to Bat 3 (S+).” An SVM, however, does not use a single feature, such as the energy at 65 kHz, to classify, but rather takes advantage of all possible cues and their combinations. Examining the PSDs according to the decision rule learned by the SVM can provide some insights about the relative importance of different features (Figure 4). The most obvious observation is that the average difference between the PSDs of calls near the hyperplane is most similar to the average difference between the misclassified calls. This is supported by a high correlation coefficient (0.62, Figure 4F). This means that the calls that are difficult to classify for bats are also difficult for the machine and vice versa. An even more interesting observation is that the average difference between calls far from the hyperplane is very similar to the average difference between all calls, supported by a very high correlation coefficient (0.83). Actually it can be described as an emphasized version of the average difference between all calls.

Prototype classification

This last similarity implies that the decisions of the bats can be modeled as a prototype classifier [31] in the sense that the bat learns the mean calls of the bat pair as a prototype for the two classes (S+/S−). To test this hypothesis we applied a simple prototype classifier to our data. We used the nearest mean-of class prototype classifier, in which each class is represented by its mean and each call is assigned to the class whose mean PSD is closer to its PSD using the Euclidean distance. The means were calculated from the training data exclusively. Since the bats heard two calls in each trial, we calculated the sum of distances between the PSDs of these calls and the mean PSDs for both the correct and the incorrect assignments. We considered any case for which the correct sum of distances was smaller than the incorrect sum of distances as a correct decision of the classifier. We repeated this for the spectrograms as well.

Despite its simplicity, the prototype classifier achieved a classification performance significantly higher than chance level for both the PSDs and the spectrograms (Table 4). The lower performance compared to the non-linear SVM is not surprising due to the simplicity of this classifier. The overall performance however, is less important in our case. It could probably be increased by a more sophisticated prototype classifier, for instance one that only learns the means of features that have a large inter-bat variability. Much more important is the very high correlation between the distance metric of this classifier (sum of prototype distances) and the bat performance, meaning that the bats tend to make more errors when the calls presented to them are farther from the mean calls (Figure 5A).

Download:

Figure 5. Testing the prototype hypothesis.

(A) The mean normalized performance of the bats as a function of the sum of prototype distances. The performance of each bat was normalized to a maximum of 1, for the distance class with the highest performance. The distance used was the sum of Euclidian distances from the pair of calls to the means of the classes. The distance classes are organized according to the distances from the prototype: 4 is the farthest class from the prototype, while 1 is the closest. In contrast to the distances from the SVM hyperplane, for the prototype classifier far means far from the prototype and therefore difficult to classify. We thus expected to find a negative correlation between performance and distance, which is what happened. (B) The similarity between the test call pairs of bat 1 and bat 3 and the mean difference between spectrograms. X axis depicts the distance between the calls according to the SVM metric. The strong positive correlation (linear coefficient C = ∼0.6) implies that the pairs that are more similar to the mean are considered easier to classify by the model.

https://doi.org/10.1371/journal.pcbi.1000400.g005

Download:

Table 4. The Performance of a Prototype Classifier.

https://doi.org/10.1371/journal.pcbi.1000400.t004

An interpretation of the SVM decision rule regarding the spectrograms is not easy due to their high dimensionality, but the above analysis suggests a prototype classifier as well (Figure. 5 and Table 4). To validate this idea we ranked the spectrograms of the presented call pairs of Bat 1 and Bat 3 according to distances between them (based on the non-linear SVM metric). The closer the two spectrograms are to each other, the more difficult they should be to classify. To test the prototype hypothesis we next measured how similar each spectrogram pair is to the pair created by the two class means. We calculated the linear correlation between a) the difference between the pairs and b) the difference between the mean spectrograms. We found a strong positive correlation between the two.which shows that the more similar the difference between two spectrograms is to the mean difference, the easier it is to classify by the trained SVM. As this SVM was trained to imitate the bat's behavior, this once again supports the hypothesis that the bats are using some sort of a prototype classifier (Figure 5B).

In summary, for both PSDs and spectrograms, we found evidence that the bats use a prototype classifier in which they evaluate the mean difference between the calls of the bat couple as a reference to which they compare the difference between any new pair of calls they hear. This hypothesis is strengthened by the results of the generalization experiments, which suggest that the bats are using both S+ and S− to classify (Table 3). We did not observe the exact PSDs of all classification tasks, mainly because the amount of errors for the other tasks was very small. The application of a prototype classifier (Table 4 and Figure 5A) however, implies that all of them were using a sort of a prototype classifier.

Conclusions

Researchers were always fascinated by the social behaviors exhibited by bats. There are, for instance, some reports of bats leaving the roost and flying to and between foraging sites in groups of between two and six individuals [16],[22]. Little is known about how bats might perform the strenuous task of remaining in a group when flying at high speeds in darkness, or about how they avoid interference between each others' echolocation calls. The finding that bats can recognize their conspecifics based on their echolocation calls might have some significant implications in this context.

Despite their stereotyped spectrograms, echolocation calls show a large task-dependent variability that obscures possible features in the calls that might facilitate the recognition of individual bats [30]. For this reason, we had to use statistical classifiers as a new method of analysis in a context that requires a minimal set of restrictive assumptions on candidate discriminative features. The results pointed strongly towards a prototype strategy. This now enables us to design additional behavioral experiments to test this hypothesis. To test the prototype hypothesis one could, for instance, divide the calls of one of the bats into 2 subgroups that are selected such that their prototype (mean) is very different. The tested bat should then be trained using calls from one subgroup and tested using calls from the other. If the prototype hypothesis holds, the bat would be expected to have a very high error rate. An alternative approach could be to use the hyperplane learnt by the SVM to simulate artificial calls at known distances from the hyperplane and therefore known difficulty [see 32 for more details].

Comparing the performance of the tested classifiers on the PSDs or on the spectrograms reveals that the performance when using the PSDs does not drop as we would expect from taking into account the drop of information (Table 3). This implies that most of the information necessary for classification already exists in the frequency domain. Along with the above analysis of PSDs, this suggests that the filtering properties of the vocal tracts of the individuals, which reflect vocal tract resonances (formants) provide sufficient acoustic cues for individual recognition. These findings are in line with some recent evidence supporting the presence of formants in animal calls [8]–[10], [33]–[35]. It is quite probable that for the classification of the complete repertoire of M. myotis calls, including calls emitted in different behavioral situations that show a much higher variation of temporal-spectral relations, the PSDs might even be advantageous compared to the spectrograms since they provide a time-independent set of cues.

Materials and Methods

Animals

We conducted the experiments using five adult male M. myotis (Borkhausen, 1797), captured in Bulgaria (license from the Ministry of Environment and Waters, 34/04.07.2005, Sofia, Bulgaria) and housed under standardized conditions (16∶8 h light∶ dark cycle, 24±2°C and 65±5% humidity). Bats were fed on mealworms (larvae of Tenebrio molitor) only during training and experimental sessions. The diet was supplemented with minerals (Korvimin®, WDT) and vitamins (Nutrical©, Albrecht) and freshwater was accessible all the time. The animals used in the experiments were kept together for a few months in a flight cage that enabled them to fly regularly.

Data acquisition

Five bats were recorded separately while freely flying in a flight room (3.6×6.0×2.8 m) covered with acoustic foam to reduce echoes from the walls and floor. The flight behavior consisted of two patterns: The animals either circled in the room ca. 2 m above ground, or they flew to one of the walls and hung on it. In the latter case we encouraged them to fly again by clapping the hands or gently poking them with a butterfly net. The sound recordings were performed with custom-made equipment (Universität Tübingen, Germany) including an ultrasonic microphone (flat response ±3 dB between 18 and 200 kHz) in a stationary position pointing 45° upwards at one end of the room and a digital recorder (PCTape), with a sampling rate of 480 kHz. The order of the animals was selected using the Latin squares method [36] to mitigate undesired effects caused by the order or time of the day.

The recordings lasted 20 minutes in total, collected on two consecutive days. This procedure provided us with a large data set of over 2000 calls per bat. The characteristics of the calls varied greatly within each individual even though they were emitted under the same conditions. This variability had at least two causes: 1) Behavioral - the bats were constantly changing their distance from the walls, especially when approaching them to land and adjusted their echolocation accordingly [37],[38]. 2) Acoustical - the calls were recorded when the bats were at different distances from the microphone and with different aspect angles to it. This resulted in substantial changes in the signal to noise ratio (SNR: see Results for more details). We discarded all calls that were shorter than 2 ms since they were severely affected by the directionality of the microphone (i.e. calls with a strong attenuation at high frequencies). This procedure left us with approximately 800 calls for each bat.