[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3091478.3091482acmconferencesArticle/Chapter ViewAbstractPublication PageswebsciConference Proceedingsconference-collections
research-article

Analyzing the Keystroke Dynamics of Web Identifiers

Published: 25 June 2017 Publication History

Abstract

Web identifiers such as usernames, hashtags, and domain names serve important roles in online navigation, communication, and community building. Therefore the entities that choose such names must ensure that end-users are able to quickly and accurately enter them in applications. Uniqueness requirements, a desire for short strings, and an absence of delimiters often constrain this name selection process.
To gain perspective on the speed and correctness of name entry, we crowdsource the typing of 51,000+ web identifiers. Surface level analysis reveals, for example, that typing speed is generally a linear function of identifier length. Examining keystroke dynamics at finer granularity proves more interesting. First, we identify features predictive of typing time/accuracy, finding: (1) the commonality of character bi-grams inside a name, and (2) the degree of ambiguity when tokenizing a name - to be most indicative. A machine-learning model built over 10 such features exhibits moderate predictive capability. Second, we evaluate our hypothesis that users subconsciously insert pauses in their typing cadence where text delimiters (e.g., spaces) would exist, if permitted. The data generally supports this claim, suggesting its application alongside algorithmic tokenization methods, and possibly in name suggestion frameworks.

References

[1]
Pieter Agten, Wouter Joosen, Frank Piessens, and Nick Nikiforakis. 2015. Seven Months' Worth of Mistakes: A Longitudinal Study of Typosquatting Abuse. In NDSS '15: Proceedings of the ISOC Network and Distributed System Security Symposium.
[2]
Jerry R. Van Aken. 2011. A Statistical Learning Algorithm for Word Segmentation. arXiv/CoRR abs/1105.6162 (2011).
[3]
Amazon Mechanical Turk Command Line Tools. 2016. https://requester.mturk.com/developer/tools/clt. (2016).
[4]
Salil P. Banerjee and Damon L. Woodard. 2012. Biometric Authentication and Identification using Keystroke Dynamics: A Survey. Journal of Pattern Recognition Research 7, 1 (2012).
[5]
Michael Capewell. 2005. Keyboard Comparison Applet. http://www.michaelcapewell.com/projects/keyboard/compare_applet.htm. (2005).
[6]
Clayton Epp, Michael Lippold, and Regan L. Mandryk. 2011. Identifying Emotional States using Keystroke Dynamics. In CHI '11: Proceedings of the 29th SIGCHI Conference on Human Factors in Computing Systems.
[7]
Leah Findlater, Jacob Wobbrock, and Daniel Wigdor. 2011. Typing on Flat Glass: Examining Ten-finger Expert Typing Patterns on Touch Surfaces. In SIGCHI '11: Conf. on Human Factors in Comp. Systems.
[8]
Eibe Frank, Mark A. Hall, and Ian H. Witten. 2016. Data Mining: Practical Machine Learning Tools and Techniques (4th ed.). Morgan Kaufmann, Chapter The WEKA Workbench.
[9]
Donald R. Gentner. 1982. Evidence Against a Central Control Model of Timing in Typing. Journal of Experimental Psychology: Human Perception and Performance 8, 6 (1982).
[10]
Internet Corporation for Assigned Names and Numbers. 2016. Global Registrant Survey. https://newgtlds.icann.org/en/reviews/cct/global-registrant-survey-15sep16-en.pdf. (2016).
[11]
Panagiotis G. Ipeirotis. 2010. Analyzing the Amazon Mechanical Turk Marketplace. XRDS 17, 2 (December 2010).
[12]
Kevin S. Killourhy and Roy A. Maxion. 2009. Comparing Anamoly-detection Algorithms for Keystroke Dynamics. In DSN '09: Proc. of the 39th IEEE/IFIP Intl. Conference on Dependable Systems & Networks.
[13]
Jeremy Kun. 2012. Word Segmentation, or Makingsenseofthis. https://jeremykun.com/2012/01/15/word-segmentation/. (2012).
[14]
Chris Matyszczyk. 2013. Confusing Twitter Hashtag Leaves Cher Fans in Mourning. https://www.cnet.com/news/confusing-twitter-hashtag-leaves-cher-fans-in-mourning/. (2013).
[15]
Fabian Monrose and Aviel Rubin. 1997. Authentication via Keystroke Dynamics. In CCS '97: Proc. of the Conf. on Comp. and Comm. Security.
[16]
Tyler Moore and Benjamin Edelman. 2010. Measuring the Perpetrators and Funders of Typosquatting. In FC '10: Proc. of the 14th International Conference on Financial Cryptography and Data Security.
[17]
Peter Norvig. 2009. Beautiful Data: The Stories Behind Elegant Data Solutions. O'Reilly Media, Chapter Natural Language Corpus Data.
[18]
Uriel Priva. 2010. Constructing Typing-Time Corpora: A New Way to Answer Old Questions. In CogSci '10: Proceedings of the 32nd Annual Meetings of the Cognitive Science Society.
[19]
Lisa P. Ramsey. 2010. Brandjacking on Social Networks: Trademark Infringement by Impersonation of Markholders. Buffalo Law R. (2010).
[20]
Jack Reuter, Jhonata Pereira-Martins, and Jugal Kalita. 2016. Segmenting Twitter Hashtags. Intl. J. on Natural Lang. Computing 5, 4 (2016).
[21]
Marko Robnik-Sikonja and Igor Kononenko. 1997. An Adaptation of Relief for Attribute Estimation in Regression. In ICML '97: Proc. of the 14th International Conference on Machine Learning.
[22]
Andrea Rowland. 2015. 10 Tips for Choosing the Perfect Domain Name. https://www.godaddy.com/garage/smallbusiness/launch/10-tips-for-choosing-the-perfect-domain-name/. (2015).
[23]
Shirish Shevade, S Sathiya Keerthi, Chiranjib Bhattacharyya, and Karaturi Radha Krishna Murthy. 2000. Improvements to the SMO algorithm for SVM regression. IEEE Transac. on Neural Networks 11, 5 (2000).
[24]
SimpleCaptcha. 2011. http://simplecaptcha.sourceforge.net/. (2011).
[25]
Sriram Srinivasan, Sourangshu Bhattacharya, and Rudrasis Chakraborty. 2012. Segmenting Web-domains and Hashtags Using Length Specific Models. In CIKM '12: Proc. of the 21st ACM International Conference on Information and Knowledge Management.
[26]
New gTLD Statistics. 2017. https://ntldstats.com/. (2017).
[27]
Janos Szurdi, Balazs Kocso, Gabor Cseh, Johnathan Spring, Mark Felegyhazi, and Chis Kanich. 2014. The Long Taile of Typosquatting Domain Names. In SEC '14: Proc. of the 23rd UNIX Security Sym.
[28]
Matthew Thomas and Jasenko Ibrahimbegovic. 2015. Evaluating Typeability of Domain Names. (2015). US Patent 9,195,316.
[29]
Verisign, Inc. 2016. The Domain Name Industry Brief. https://www.verisign.com/assets/domain-name-report-july2016.pdf. (2016).
[30]
Kuansan Wang, Christopher Thrasher, and Bo-June Paul Hsu. 2011. Web Scale NLP: A Case Study on URL Word Breaking. In WWW '11: Proc. of the 20th International Conference on World Wide Web.
[31]
Pascal Zesiger, Jean-Pierre Orliaguet and Louis-Jean Boe, and Pierre Mounou. 1994. Advances in Handwriting and Drawing: A Multidisciplinary Approach. Europia, Chapter The Influence of Syllabic Structure in Handwriting and Typing Production.

Cited By

View all
  • (2020)Aspects of Continuous User Identification Based on Free Texts and Hidden MonitoringProgramming and Computing Software10.1134/S036176882001003X46:1(12-24)Online publication date: 1-Jan-2020

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
WebSci '17: Proceedings of the 2017 ACM on Web Science Conference
June 2017
438 pages
ISBN:9781450348966
DOI:10.1145/3091478
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. domain names
  2. hashtags
  3. keyboards
  4. keystroke dynamics
  5. typeability
  6. typos
  7. usernames
  8. web identifier

Qualifiers

  • Research-article

Conference

WebSci '17
Sponsor:
WebSci '17: ACM Web Science Conference
June 25 - 28, 2017
New York, Troy, USA

Acceptance Rates

WebSci '17 Paper Acceptance Rate 30 of 85 submissions, 35%;
Overall Acceptance Rate 245 of 933 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2020)Aspects of Continuous User Identification Based on Free Texts and Hidden MonitoringProgramming and Computing Software10.1134/S036176882001003X46:1(12-24)Online publication date: 1-Jan-2020

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media