[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Inferring colocation and conversation networks from privacy-sensitive audio with implications for computational social science

Published: 24 January 2011 Publication History

Abstract

New technologies have made it possible to collect information about social networks as they are acted and observed in the wild, instead of as they are reported in retrospective surveys. These technologies offer opportunities to address many new research questions: How can meaningful information about social interaction be extracted from automatically recorded raw data on human behavior? What can we learn about social networks from such fine-grained behavioral data? And how can all of this be done while protecting privacy? With the goal of addressing these questions, this article presents new methods for inferring colocation and conversation networks from privacy-sensitive audio. These methods are applied in a study of face-to-face interactions among 24 students in a graduate school cohort during an academic year. The resulting analysis shows that networks derived from colocation and conversation inferences are quite different. This distinction can inform future research in computational social science, especially work that only measures colocation or employs colocation data as a proxy for conversation networks.

References

[1]
Ajmera, J., Lathoud, G., and McCowan, I. 2004. Clustering and segmenting speakers and their locations in meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
[2]
Ang, J. 2002. Prosody-based automatic detection of annoyance and frustration in human-computer dialog. In Proceedings of the International Conference on Spoken Language Processing (ICSLP).
[3]
Anguera, X. 2006. Robust speaker diarization for meetings. Ph.D. dissertation, Universitat Politècnica de Catalunya.
[4]
Basu, S. 2002. Conversational scene analysis. Ph.D. dissertation, Massachusetts Institute of Technology.
[5]
Basu, S. 2003. A linked-HMM model for robust voicing and speech detection. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
[6]
Batliner, A., Fisher, K., Huber, R., Spilker, J., and Nöth, E. 2000. Desperately seeking emotions or: actors, wizards and human beings. In Proceeding of the ISCA Tutorial and Research Workshop on Speech and Emotion.
[7]
Baym, N., Zhang, Y. B., and Lin, M. C. 2004. Social interactions across media: Interpersonal communication on the internet, telephone and face-to-face. New Media Society 6, 299--318.
[8]
Bernard, H. R. and Killworth, P. D. 1977. Informant accuracy in social networks II. Hum. Comm. Resear. 4, 1, 3--18.
[9]
Bernard, H. R., Killworth, P. D., and Sailer, L. 1980. Informant accuracy in social network data IV: a comparison of clique-level structure in behavioral and cognitive network data. Social Netw. 2, 3, 191--218.
[10]
Bernard, H. R., Killworth, P. D., and Sailer, L. 1982. Informant accuracy in social network data V: An experimental attempt to predict actual communication from recall data. Social Sci. Resear. 11, 30--66.
[11]
Bilmes, J. 2004. On soft evidence in bayesian networks. Tech. rep. 16, Department of Electrical Engineering, University of Washingon.
[12]
Borovoy, R. 2002. Folk computing: Designing technology to support face-to-face community building. Ph.D. dissertation, MIT MediaLab.
[13]
Campbell, N. 2002. The recording of emotional speech: JST/CREST database research. In Proceedings of the Annual Conference on Language Resources and Evaluation (LREC).
[14]
Choudhury, T. 2004. Sensing and modeling human networks. Ph.D. dissertation, MIT Media Lab.
[15]
Choudhury, T. and Pentland, A. S. 2003. Sensing and modeling human networks using the sociometer. In Proceedings of the International Conference on Wearable Computing.
[16]
Connolly, C. I., Burns, J. B., and Bui, H. H. 2008. Recovering social networks from massive track datasets. In Proceedings of the IEEE Workshop on Applications of Computer Vision.
[17]
Corman, S. R. and Scott, C. R. 1994. A synchronous digital signal processing method for detecting face-to-face organizational communication behavior. Social Netw. 16, 2, 163--179.
[18]
Davis, J. A. 1970. Clustering and hierarchy in interpersonal relations: Testing two graph theoretical models on 742 sociomatrices. Amer. Socio. Rev. 35, 5, 843--851.
[19]
Dellaert, F., Polzin, T., and Waibel, A. 1996. Recognizing emotion in speech. In Proceedings of the International Conference on Spoken Language Processing (ICSLP).
[20]
Dielmann, A. and Renals, S. 2004. Multi-stream segmentation of meetings. In Proceedings of the IEEE Workshop on Multimedia Signal Processing.
[21]
Donovan, R. 1996. Trainable speech synthesis. Ph.D. dissertation, Cambridge University.
[22]
Douglas-Cowie, E., Campbell, N., Cowie, R., and Roach, P. 2003. Emotional speech: Towards a new generation of databases. Speech Comm. 40, 33--60.
[23]
Douglas-Cowie, E., Cowie, R., and Schroeder, M. 2000. A new emotion database: considerations, sources and scope. In Proceedings of the ISCA Tutorial and Research Workshop on Speech and Emotion.
[24]
Eagle, N. and Pentland, A. S. 2006. Reality mining: Sensing complex social systems. Person. Ubiq. Comput. 10, 4, 255--268.
[25]
Ferris, B., Haehnel, D., and Fox, D. 2006. Gaussian processes for signal strength-based location estimation. In Proceedings of Robotics: Science and Systems.
[26]
Freeman, L. 1992. Filling in the blanks: A theory of cognitive categories and the structure of social affiliation. Social Psych. Quart. 55, 2, 118--127.
[27]
Freeman, L., Romney, A. K., and Freeman, S. C. 1987. Cognitive structure and informant accuracy. Amer. Anthropol. 89, 311--325.
[28]
Gatica-Perez, D., McCowan, I., Zhang, D., and Bengio, S. 2005. Detecting group interest-level in meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
[29]
Goodreau, S. M., Kitts, J. A., and Morris, M. 2009. Birds of a feather or friend of a friend? Using exponential random graph models to investigate adolescent social networks. Demography 46, 103--125.
[30]
Gray, R. M. and Davisson, L. D. 2004. An Introduction to Statistical Signal Processing. Cambridge University Press.
[31]
Greasley, P., Setter, J., Waterman, M., Sherrard, C., Roach, P., Arnfield, S., and Horton, D. 1995. Representation of prosodic and emotional features in a spoken language database. In Proceedings of the International Congress of Phonetic Sciences.
[32]
Hawkins, K. 1991. Some consequences of deep interruption in task-oriented communication. J. Lang. Social Psych. 10, 185--203.
[33]
Holland, P. W. and Leinhardt, S. 1975. The statistical analysis of local structure in social networks. In Sociological Methodology, Jossey-Bass, 1--45.
[34]
Hurlburt, R., Koch, M., and Heavey, C. 2002. Descriptive experience sampling demonstrates the connection of thinking to externally observable behavior. Cogn. Therapy Res. 26, 1, 117--134.
[35]
Ingram, P. and Morris, M. 2007. Do people mix at mixers? Structure, homophily, and the “life of the party”. Adminis. Sci. Quarter. 52, 4, 558--585.
[36]
Kampstra, P. 2008. Beanplot: A boxplot alternative for visual comparison of distributions. J. Stat. Softw. 28, 1, 1--9.
[37]
Killworth, P. D. and Bernard, H. R. 1976. Informant accuracy in social network data. Human Org. 35, 3, 269--286.
[38]
Killworth, P. D. and Bernard, H. R. 1979. Informant accuracy in social netw. data: III a comparison of triadic structure in behavioral and cognitive datasets. Social Netw. 2, 10--46.
[39]
Kossinets, G. and Watts, D. J. 2006. Empirical analysis of an evolving social network. Science 311, 88--90.
[40]
Lazega, E. and van Duijn, M. 1997. Position in formal structure, personal characteristics and choices of advisors in a law firm: A logistic regression model for dyadic network data. Social Netw. 19, 375--397.
[41]
Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, M. W. 2008. Statistical properties of community structure in large social and information networks. In Proceedings of the International World Wide Web Conference (WWW).
[42]
Lester, J., Choudhury, T., Kern, N., Borriello, G., and Hannaford, B. 2005. A hybrid discriminative-generative approach for modeling human activities. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
[43]
Lian, C. and Hsu, J. 2009. Probabilistic models for concurrent chatting activity recognition. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
[44]
McCowan, I., Bengio, S., Gatica-Perez, D., Lathoud, G., Monay, F., Moore, D., Wellner, P., and Bourlard, H. 2003. Modeling human interaction in meetings. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
[45]
NIST. 2009. NIST rich transcription evaluations. http://www.itl.nist.gov/iad/mig/tests/rt/2009/index.html.
[46]
Onnela, J.-P., Saramäki, J., Hyvönen, J., Szabó, G., de Menezes, M. A., Kaski, K., Barabási, A.-L., and Kertész, J. 2007. Analysis of a large-scale weighted network of one-to-one human communication. New J. Physics 9, 179.
[47]
Palla, G., Barabási, A.-L., and Vicsek, T. 2006. Quantifying social group evolution. Nature 446, 664--667.
[48]
Quatieri, T. 2001. Discrete-Time Speech Signal Processing: Principles and Practice. Prentice Hall.
[49]
Rabiner, L. 1977. On the use of autocorrelation analysis for pitch detection. IEEE Trans. Acoustics, Speech, Sig. Process 25, 1, 24--33.
[50]
Rabiner, L. R. 1989. A tutorial on hidden markov models and selected applications in speech recognition. Proc. IEEE 77, 2, 257--286.
[51]
Reynolds, D. A. and Torres-Carrasquillo, P. 2005. Approaches and applications of audio diarization. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
[52]
Saramäki, J., Kivelä, M., Onnela, J.-P., Kaski, K., and Kertész, J. 2007. Generalizations of the clustering coefficient to weighted complex networks. Phys. Rev. E 75, 027105, 1--4.
[53]
Schuller, B., Rigoll, G., and Lang, M. 2004. Speech emotion recognition combining acoustic features and linguistic information in a hybrid support vector machine-belief network architecture. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
[54]
Stupakov, A., Hanusa, E., Bilmes, J., and Fox, D. 2009. COSINE—A corpus of multi-party conversational speech in noisy environments. In Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP).
[55]
Watts, D. J. and Strogatz, S. H. 1998. Collective dynamics of ‘small-world’ networks. Nature 393, 440--442.
[56]
Wren, C. R., Ivanov, Y. A., Leigh, D., and Westhues, J. 2007. The MERL motion detector dataset. Tech. Rep. 2007-069, MERL.
[57]
Wyatt, D., Choudhury, T., and Bilmes, J. 2007. Conversation detection and speaker segmentation in privacy-sensitive situated speech data. In Proceedings of Interspeech.

Cited By

View all
  • (2024)Nature, Buildings, and Humans: Residents’ Perceptions of Well-Being in Permanent Supportive HousingEnvironment and Behavior10.1177/0013916524130548556:7-8(577-613)Online publication date: 16-Dec-2024
  • (2024)Survey on Objective Measurement and Sensor-Based Detection of Physical and Social Activities2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00083(568-577)Online publication date: 2-Jul-2024
  • (2023)When Good Turns Evil: Encrypted 5G/4G Voice Calls Can Leak Your Identities2023 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS59707.2023.10288900(1-9)Online publication date: 2-Oct-2023
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 2, Issue 1
January 2011
187 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/1889681
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2011
Accepted: 01 October 2010
Revised: 01 October 2010
Received: 01 August 2010
Published in TIST Volume 2, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Social networks
  2. mobile sensing

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)24
  • Downloads (Last 6 weeks)3
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Nature, Buildings, and Humans: Residents’ Perceptions of Well-Being in Permanent Supportive HousingEnvironment and Behavior10.1177/0013916524130548556:7-8(577-613)Online publication date: 16-Dec-2024
  • (2024)Survey on Objective Measurement and Sensor-Based Detection of Physical and Social Activities2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00083(568-577)Online publication date: 2-Jul-2024
  • (2023)When Good Turns Evil: Encrypted 5G/4G Voice Calls Can Leak Your Identities2023 IEEE Conference on Communications and Network Security (CNS)10.1109/CNS59707.2023.10288900(1-9)Online publication date: 2-Oct-2023
  • (2023)A photoplethysmography-based system for talking detection in bedridden patientsBiomedical Signal Processing and Control10.1016/j.bspc.2022.10447781(104477)Online publication date: Mar-2023
  • (2022)Privacy Sensitive Speech Analysis Using Federated Learning to Assess DepressionICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP43922.2022.9746827(6272-6276)Online publication date: 23-May-2022
  • (2022)Networks never rest: An investigation of network evolution in three species of animalsSocial Networks10.1016/j.socnet.2021.09.00268(356-373)Online publication date: Jan-2022
  • (2021)What Does Social Support Sound Like? Challenges and Opportunities for Using Passive Episodic Audio Collection to Assess the Social EnvironmentFrontiers in Public Health10.3389/fpubh.2021.6336069Online publication date: 29-Mar-2021
  • (2021)Creation, evolution, and dissolution of social groupsScientific Reports10.1038/s41598-021-96805-711:1Online publication date: 1-Sep-2021
  • (2020)A human data-driven interaction estimation using IoT sensors for workplace designAutomation in Construction10.1016/j.autcon.2020.103352119(103352)Online publication date: Nov-2020
  • (2020)Sounds of Healthy Aging: Assessing Everyday Social and Cognitive Activity from Ecologically Sampled Ambient Audio DataPersonality and Healthy Aging in Adulthood10.1007/978-3-030-32053-9_8(111-132)Online publication date: 29-Feb-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media