Conceived and designed the experiments: YY MLM MOF AD HUS. Performed the experiments: YY MLM. Analyzed the data: YY MLM. Contributed reagents/materials/analysis tools: YY MLM MOF. Wrote the paper: YY MLM.
The authors have declared that no competing interests exist.
Echolocating bats use the echoes from their echolocation calls to perceive their
surroundings. The ability to use these continuously emitted calls, whose main
function is not communication, for recognition of individual conspecifics might
facilitate many of the social behaviours observed in bats. Several studies of
individual-specific information in echolocation calls found some evidence for
its existence but did not quantify or explain it. We used a direct paradigm to
show that greater mouse-eared bats (
Animals must recognize each other in order to engage in social behaviour. Vocal communication signals could be helpful for recognizing individuals, especially in nocturnal organisms such as bats. Echolocating bats continuously emit special vocalizations, known as echolocation calls, and perceive their surroundings by analyzing the returning echoes. In this work we show that bats can use these vocalizations for the recognition of individuals, despite the fact that their main function is not communication. We used a statistical approach to analyze how the bats could do so. We created a computer model that reproduces the recognition behaviour of the bats. Our model suggests that the bats learn the average calls of other individuals and recognize individuals by comparing their calls with the learnt average representations.
Voice is defined as the entirety of all acoustic signals produced by the vocal organs
of an organism and its ability to produce them. Vocalizations are mostly used for
communication. They can contain information about identity, gender, maturity,
health, behavioural context, etc
The echolocation calls of the greater mouse-eared bats (
All bats emitted calls typical for flying in confined spaces with a very
characteristic spectral-temporal structure. Despite this repeating pattern, the
spectral content of the calls varied largely among individuals for both
behavioral and technical reasons (see
Each spectrum was normalized to have a maximum of 1. Note the overall similar shape and the high variability.
Bat 1 | Bat 2 | Bat 3 | Bat 5 | Bat 6 | |
Call Duration (ms) | 2.5±0.2 | 2.6±0.3 | 2.6±0.2 | 2.5±0.3 | 2.7±0.3 |
Starting Frequency (kHz) | 96±8 | 98±10 | 96±6 | 93±7 | 95±9 |
Terminal Frequency (kHz) | 34±3 | 34±3 | 37±3 | 36±3 | 34±3 |
Maximum Energy Frequency (kHz) | 56±11 | 54±11 | 57±11 | 63±12 | 56±13 |
Sweep Rate (kHz/ms) | 25±4 | 25±4 | 24±4 | 23±3 | 23±4 |
SNR | 31±36 | 26±32 | 39±40 | 35±40 | 38±41 |
Basic calls parameters (mean+SD) for the bats whose calls were used in the experiments. The onset and end of the calls were defined to be 25 dB lower than the maximum. SNR was calculated as the ratio between the maximum call amplitude and the maximum noise amplitude determined from the spectrogram background.
The bats required 15–24 days before they were able to stably correctly
recognize the individuals in more than 75% of the trials. The
learning curves (
The correct decision percentage is presented as a function of the day of training for each bat. Training was stopped once a bat performed 75% or more of the trials for three consecutive days.
Experimental task | S+ vs. S− S+ approach percentage | S+ vs. S0 S+ approach percentage | S0 vs. S− S− avoidance percentage |
Bat 3 vs. bat 1 | 77% (125) | 95% (40) | 70% (40) |
Bat 2 vs. bat 6 | 83% (96) | 95% (40) | 80% (40) |
Bat 6 vs. bat 1 | 90% (126) | 85% (40) | 75% (40) |
Bat 5 vs. bat 2 | 91% (122) | 95% (40) | 60% (40) |
Overall percent of correct decisions in the test phase. Numbers in brackets depict the number of trials. The S+ and S− columns present the behavior for the controls with calls of the new bats. 95% for S+ means that the bat approached S+ in 95% of the trials when played along with S0 and 70% for S− means that the bat avoided S− in 70% of trials when played with S0.
Bats were able to generalize from the learned task to recognize S+ or
avoid S− (a single call of the bat that they learned to avoid) when
presented with calls of new bats that were never heard during training (
A linear classifier (Support Vector Machine – SVM) learned to classify
the calls with high accuracy (correct decision rates of
81–90%). This was the case for both types of
representations of the calls, i.e. the temporal-spectral spectrograms and the
spectral power spectrum densities (PSD,
Classification task\Information used | Bat 5 vs. bat 2 | Bat 6 vs. bat 1 | Bat 2 vs. bat 6 | Bat 1 vs. bat 3 |
|
||||
|
90±11% | 84±16% | 81±11% | 87±7% |
C | 10 | 100 | 1 | 1 |
Correlation with bat performance | −0.12±0.40 | −0.15±0.30 | −0.53±0.15 | −0.15±0.08 |
Identical decisions | 90±6% (82) | 83±3% (77) | 74±8% (71) | 69±2% (70) |
|
79±11 | 77±9 | 84±8% | 84±5% |
C | 0.1 | 50 | 10 | 1 |
Correlation with bat performance | −0.05±0.18 | 0.55±0.44 | −0.05±0.12 | −0.05±2% |
Identical decisions | 72±4% (74) | 85±12% (78) | 74±4% (72) | 70±2% (69) |
|
||||
|
94±3% | 91±1% | 77±1% | 82±3% |
C, σ | 10,5 | 20,5 | 50,100 | 1,5 |
Correlation with bat performance | 0.45±0.05 | 0.15±0.08 | 0.16±0.09 | 0.51±0.10 |
Identical decisions | 88±3% (86) | 85±1% (82) | 68±2% (68) | 70±2% (67) |
|
62±6% | 78±2 | 84±2% | 72±2% |
C, σ | 5,5 | 1,1 | 10,10 | 5,1 |
Correlation with bat performance | 0.36±0.05 | 0.61±0.11 | 0.11±0.18 | 0.60±0.14 |
Identical decisions | 77±3% (60) | 79±4% (72) | 70±1% (72) | 60±3% (62) |
Overall performance of the linear and non-linear SVMs when using
either the spectrograms (time and frequency information) or the PSDs
(only frequency information) of the calls. The C and σ
parameters of the best classifiers are presented. The correlation
with the bats' performance is the linear correlation
coefficient between the bats' performance and the distances
from the hyperplane, \as explained in the
Our main goal was to model the behavior of the bats. Therefore, more than the
overall performance, we were interested to find a classifier that behaves like
the bat in the sense that it makes more errors in trials that the model
considers to be more difficult and vice versa. We assessed the similarity
between the bat and its model by measuring the correlation between the
performance of the bat and the performance of the model on the same test set
(see
The performance of each bat was normalized to a maximum of 1 for the distance class with the highest performance. The distance classes are organized in increasing distances from the hyperplane - i.e., 4 is the class farthest from the hyperplane (easiest to classify), while 1 is the closest (most difficult to classify). The positive correlation implies that the model behaves similarly to the bat.
To eliminate the possibility that a single simple cue was sufficient for
classification we analyzed the commonly used call parameters
(starting/terminal/maximum energy frequencies, bandwidth and call duration,
The voice of individual greater mouse-eared bats is specific enough that they can distinguish between the echolocation calls of conspecifics despite their extremely short duration and highly situation-dependent variability. The bats were able to generalize their knowledge to recognize the rewarded individual (S+) and avoide the unrewarded one (S−) when presented with the calls of new individuals that they had not heard during training (S0). A standard linear classifier (SVM) can be trained to fulfill the recognition task with an overall performance similar to that of the bats. The linear models, however, did not reproduce the decision metrics of the bats, implying that the discriminative features they were using were not the ones used by the bats. The linear model can be extended (after a nonlinear transformation of the data with an RBF kernel) to reproduce the behavior of the bats, in other words, the bats made more errors in trials that were considered difficult by the model. Thus, the analysis of these classifiers provides candidate discriminative features derived from the call statistics that might be used by the bats to distinguish between individuals.
Examining the PSDs of the calls is a straight-forward approach of searching for
spectral individual-specific features. The PSDs of two bats (
(A) The mean of all calls; (B) The mean of calls misclassified by the bats;
(C) The mean of the 15 calls closest to the hyperplane; (D) The mean of the
15 calls farthest from the hyperplane. (E) Difference between the mean PSDs
of bat 1 and bat 3 for the four groups of calls shown in A–D. (F)
Linear correlation coefficients (a measure of similarity) between the curves
presented in
An extremely over-simplified classification rule could be: “The call with
lower energy at ∼65 kHz and higher energy at ∼45 kHz belongs to Bat
3 (S+).” An SVM, however, does not use a single feature, such as
the energy at 65 kHz, to classify, but rather takes advantage of all possible cues
and their combinations. Examining the PSDs according to the decision rule learned by
the SVM can provide some insights about the relative importance of different
features (
This last similarity implies that the decisions of the bats can be modeled as a
prototype classifier
Despite its simplicity, the prototype classifier achieved a classification
performance significantly higher than chance level for both the PSDs and the
spectrograms (
(A) The mean normalized performance of the bats as a function of the sum of prototype distances. The performance of each bat was normalized to a maximum of 1, for the distance class with the highest performance. The distance used was the sum of Euclidian distances from the pair of calls to the means of the classes. The distance classes are organized according to the distances from the prototype: 4 is the farthest class from the prototype, while 1 is the closest. In contrast to the distances from the SVM hyperplane, for the prototype classifier far means far from the prototype and therefore difficult to classify. We thus expected to find a negative correlation between performance and distance, which is what happened. (B) The similarity between the test call pairs of bat 1 and bat 3 and the mean difference between spectrograms. X axis depicts the distance between the calls according to the SVM metric. The strong positive correlation (linear coefficient C = ∼0.6) implies that the pairs that are more similar to the mean are considered easier to classify by the model.
Classification task\Information used | Bat 2 vs. bat 5 | Bat 1 vs. bat 6 | Bat 2 vs. bat 6 | Bat 1 vs. bat 3 |
Time+Frequency | 70±3% | 70±4% | 70±5% | 64±3% |
Correlation with bat performance | −0.65±0.31 | −0.32±0.33 | −0.27±0.33 | −0.67±0.50 |
Identical decisions | 65±3% | 62±4% | 62±3% | 61±3% |
Frequency | 73±1% | 62±3% | 69±5% | 59±4% |
Correlation with bat performance | −0.58±0.12 | −0.85±0.13 | −0.18±0.33 | −0.92±0.10 |
Identical decisions | 68±1% | 60±3% | 64±3% | 52±2% |
The performance of a prototype classifier for the different tasks
when using the spectrograms (time+frequency information) or
PSDs (frequency only). The identical decisions are as in
An interpretation of the SVM decision rule regarding the spectrograms is not easy
due to their high dimensionality, but the above analysis suggests a prototype
classifier as well (
In summary, for both PSDs and spectrograms, we found evidence that the bats use a
prototype classifier in which they evaluate the mean difference between the
calls of the bat couple as a reference to which they compare the difference
between any new pair of calls they hear. This hypothesis is strengthened by the
results of the generalization experiments, which suggest that the bats are using
both S+ and S− to classify (
Researchers were always fascinated by the social behaviors exhibited by bats.
There are, for instance, some reports of bats leaving the roost and flying to
and between foraging sites in groups of between two and six individuals
Despite their stereotyped spectrograms, echolocation calls show a large
task-dependent variability that obscures possible features in the calls that
might facilitate the recognition of individual bats
Comparing the performance of the tested classifiers on the PSDs or on the
spectrograms reveals that the performance when using the PSDs does not drop as
we would expect from taking into account the drop of information (
We conducted the experiments using five adult male
Five bats were recorded separately while freely flying in a flight room
(3.6×6.0×2.8 m) covered with acoustic foam to reduce echoes
from the walls and floor. The flight behavior consisted of two patterns: The
animals either circled in the room ca. 2 m above ground, or they flew to one of
the walls and hung on it. In the latter case we encouraged them to fly again by
clapping the hands or gently poking them with a butterfly net. The sound
recordings were performed with custom-made equipment (Universität
Tübingen, Germany) including an ultrasonic microphone (flat response
±3 dB between 18 and 200 kHz) in a stationary position pointing
45° upwards at one end of the room and a digital recorder (PCTape), with
a sampling rate of 480 kHz. The order of the animals was selected using the
Latin squares method
The recordings lasted 20 minutes in total, collected on two consecutive days.
This procedure provided us with a large data set of over 2000 calls per bat. The
characteristics of the calls varied greatly within each individual even though
they were emitted under the same conditions. This variability had at least two
causes: 1) Behavioral - the bats were constantly changing their distance from
the walls, especially when approaching them to land and adjusted their
echolocation accordingly
In the behavioral experiments each bat was trained to distinguish between two other specific bats in a 2-AFC paradigm. Each experimental bat was assigned two other bats between whose calls it had to distinguish. We will refer to the bat it had to approach as S+ and to the other one as S−. The bats had to sit on a Y-shaped platform and crawl to the side where the calls of S+ were played. The stimuli consisted of alternately playing a single call of S+ on one side of the platform and a single call of S− on the other side with a 0.5 s pause between them until the bat made a decision. All calls were normalized in the time domain to have the same maximum amplitude. We used custom-made equipment (Universität Tübingen, Germany) to play back the calls with a sampling rate of 480 kHz. The loudspeakers (Thiel Diamond Driver D2 20-6) were positioned 1.35 m from the platform and 1.35 m apart from each other, forming an equilateral triangle together with the platform. The side on which S+ was presented varied randomly between the trials. The experiments were divided into a training phase and a testing phase. In the training phase the bats were trained to perform the task using a subset of the data composed of 80% of the calls (the training set) chosen randomly. During training, when the bat crawled to S+, it was rewarded with a mealworm. The bats needed ∼4 days of training to get used to sitting on the Y-platform (they were fed on it). They needed another ∼3 days to learn to crawl to one of the sides of the Y-platform to get the reward. To do this, we placed the bat in the starting arm and played back S+ from one side and S− from the other one, showing the mealworm at the end of the correct arm and rewarding the bat for crawling towards it. The next step (the training phase) consisted of the training on the task. S+ and S− were played back as described above and the bats were rewarded for crawling to the correct side. When they made an error the trial would be repeated up to 3 times. If the bat continued misclassifying we moved to the next pair of calls. Once a bat made more than 75% correct decisions\3 days in a row it was transfered into the testing phase. The training phase lasted ∼20 days on average so that each bat performed ∼25 trials per day so that in total the bats heard ∼500 calls of each bat before starting the testing phase. In the testing phase, we used the remaining 20% of the calls that had never been heard by the bats before. Each pair of calls was played back during a single trial. The decision of the bats was always rewarded, so that the experimenter could not give the bats a hint about the correct answer (a double blind paradigm). The assignment of bat pairs (S+ vs. S−) were as following: bat1–bat2 vs. bat6, bat3–bat6 vs. bat1, bat4–bat5 vs. bat2 and bat5–bat3 vs. bat1. We used four different pair of bats (rather than testing all bats on the same task) assuming that all tasks were more or less equally hard and thus a high performance in all of them would imply high performance for any chosen pair of bats.
We recorded the calls that were played back by the speakers to validate that the system was working properly with the same recording equipment mentioned above.
To test the ability of the bats to generalize and to estimate whether they learned to recognize S+ or to avoid S− we conducted another set of control experiments. Here S+ or S− were presented on one side and S0, which consisted of a call of one of two novel bats never played back to that animal before, on the other side. The S+/S− calls were randomly selected from the training set, since the bats recently heard all of the testing calls and were not exposed to training calls for at least 2 weeks. The order of presentation of S+ or S− and S0 was random as well as the side on which they were played. The rest of the procedure was the same as in the testing session.
We used Support Vector Machines
We tested the performance of the classifier using two different representations
of the calls: spectrograms and power spectral densities (PSD). The spectrograms
are a time-frequency decomposition of the calls and therefore represent both
types of information the bats possess after the basic filtering in the ear
We restricted the spectrograms to the frequency range between 21–140
kHz, which contains the entire frequency range of the calls. This left us with
very high-dimensional data (4200 dimensions: 60 frequencies times 70 time
points). We aligned all spectrograms in the time axis such that in all calls the
maximal energy at 30 kHz was at the same time instant of the spectrogram. We
used Principal Component Analysis (PCA) to reduce the dimensionality of the
data. Each data point (representing a single call) was projected on the 300
eigenvectors with the highest eigenvalues. This reduced the dimensionality of
the data to 300 dimensions. In a spectrogram of a frequency-modulated
The PSD contains only the frequency information of the calls, leading to a classification that is independent of temporal information (e.g., call duration, sweep rate) which tends to vary widely in nature. Throughout the paper they will sometimes be referred to as spectra. The PSDs were calculated with Welch's method with a 2 ms window with 0.5 overlap. We then under-sampled the PSDs so that their frequency resolution was identical to that of the spectrograms, ensuring that they contained the same spectral information as the spectrograms but no temporal information. All data points (spectrograms after PCA and PSDs) were normalized (divided by the maximum) so that each of them had a maximum of 1 before they were used for classification.
SVMs are state-of-the-art learning algorithms based on statistical learning theory. A linear SVM uses a training data set to learn a hyperplane (a multidimensional decision boundary) that divides the data set into two classes. It does so by minimizing the classification error and at the same time by maximizing the distance between the hyperplane and the data points that are closest to it. A non-linear SVM is used when the data cannot be separated linearly. It first transforms the data non-linearly into a higher-dimensional space (feature space) and then finds a hyperplane that divides the data into the two classes in this space. In both cases the hyperplane is simply a geometrical multidimensional plane either in the original or in the feature space. Since in many cases a perfect separation of the data into two classes is not possible, the learning algorithm is adjusted to enable a certain amount of misclassification. This is controlled by a constant (C) that defines the penalty for misclassified points. This constant is known as the free parameter of the SVM.
We applied SVM classifiers on both types of data (i.e., spectrograms and PSDs).
We used the same training set of calls that was used to train the bats in order
to train the classification machines and the same test set to test them. We
tested both linear and non-linear SVMs. For the non-linear SVMs, we trained
non-linear machines using the radial basis Gaussian kernel
There are several possibilities to optimize the model such that it behaves like a bat. The overall performance (error rate) is not a sufficient criterion since it does not provide any information about the classification strategy - e.g., the bat and model could do the exact opposite right and wrong decisions but still have the same error rate. An exact comparison between the decisions of the bat and the decisions of the model (percent of identical right/wrong decisions) is a better criterion, but it is also limited since it divides the trials into identical decisions and non-identical decisions but provides no information about how difficult each decision was. We therefore chose a different criterion, one which is, to our understanding, more informative. For each model (linear/non-linear SVM) we computed the distances between the pairs of test calls the bat had to classify according to the model. This can be done by computing the distance of each call from the hyperplane. The distance from the hyperplane can be thought of as an estimation of how difficult the call is to classify. The closer a call is to the hyperplane, the more difficult it is to classify, since it is closer to the boundary between the two classes. We refer to this measure as the metric of the model and it reflects how difficult/easy each trial is considered to be according to the model. We assumed that if the machine captured the features used by the bats for classification, the distance between the calls should positively correlate with the performance of the bats, meaning that the farther apart the two calls presented to the bat were, the easier it should be for the bats to classify them correctly. In practice we divided the entire distance range into 4 distance classes, each containing an equal number of calls and plotted the error rate of the bats for each of these distance ranges. We then calculated the correlation between the performance of the bat and the difficulty of the trials it performed, represented by the average distances of the group of trials. We searched for the parameters that yielded a classifier that maximizes this correlation. To choose the best parameters we divided the test set into 3 equally sized sub-sets of data. We then used only two thirds of the test set to choose the best model (this set is called the validation set) and we measured the results on the un-used third. This process was repeated three times and ensures that the test set did not influence our decision. This procedure also provided us with an estimation of the variance of the model's performance.
We implemented the SVM classifier using the free “spider”
software (