[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2766462.2767754acmconferencesArticle/Chapter ViewAbstractPublication PagesirConference Proceedingsconference-collections
research-article
Open access

Impact of Surrogate Assessments on High-Recall Retrieval

Published: 09 August 2015 Publication History

Abstract

We are concerned with the effect of using a surrogate assessor to train a passive (i.e., batch) supervised-learning method to rank documents for subsequent review, where the effectiveness of the ranking will be evaluated using a different assessor deemed to be authoritative. Previous studies suggest that surrogate assessments may be a reasonable proxy for authoritative assessments for this task. Nonetheless, concern persists in some application domains---such as electronic discovery---that errors in surrogate training assessments will be amplified by the learning method, materially degrading performance. We demonstrate, through a re-analysis of data used in previous studies, that, with passive supervised-learning methods, using surrogate assessments for training can substantially impair classifier performance, relative to using the same deemed-authoritative assessor for both training and assessment. In particular, using a single surrogate to replace the authoritative assessor for training often yields a ranking that must be traversed much lower to achieve the same level of recall as the ranking that would have resulted had the authoritative assessor been used for training. We also show that steps can be taken to mitigate, and sometimes overcome, the impact of surrogate assessments for training: relevance assessments may be diversified through the use of multiple surrogates; and, a more liberal view of relevance can be adopted by having the surrogate label borderline documents as relevant. By taking these steps, rankings derived from surrogate assessments can match, and sometimes exceed, the performance of the ranking that would have been achieved, had the authority been used for training. Finally, we show that our results still hold when the role of surrogate and authority are interchanged, indicating that the results may simply reflect differing conceptions of relevance between surrogate and authority, as opposed to the authority having special skill or knowledge lacked by the surrogate.

References

[1]
Memorandum in Support of Motion for Protective Order Approving the Use of Predictive Coding, Global Aerospace v. Landow Aviation, No. CL 61040, 2012 WL 1419842 (Va. Cir. Ct., Loudoun Cnty., Apr. 9, 2012).
[2]
Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y., 2012).
[3]
P. Bailey, N. Craswell, I. Soboroff, P. Thomas, A. P. de Vries, and E. Yilmaz. Relevance assessment: are judges exchangeable and does it matter? In Proc. SIGIR, 2008.
[4]
T. Barnett, S. Godjevac, J.-M. Renders, C. Privault, J. Schneider, and R. Wickstrom. Machine learning classification for document review. In ICAIL DESI III Workshop, 2009.
[5]
C. E. Brodley and M. A. Friedl. Identifying mislabeled training data. J. of A.I. Research, 11, 2011.
[6]
C. Buckley, D. Dimmick, I. Soboroff, and E. Voorhees. Bias and the limits of pooling for large collections. Information Retrieval, 10(6), 2007.
[7]
R. Burgin. Variations in relevance judgments and the evaluation of retrieval performance. Information Processing & Management, 28(5), 1992.
[8]
J. Cheng, A. Jones, C. Privault, and J.-M. Renders. Soft labeling for multi-pass document review. In ICAIL DESI V Workshop, 2013.
[9]
C. W. Cleverdon. The Effect of Variations in Relevance Assessments in Comparative Experimental Tests of Index Languages. Cranfield Library, 3, 1970.
[10]
G. V. Cormack, C. L. A. Clarke, C. R. Palmer, and S. S. L. To. passage-based refinement (MultiText experiments for TREC-6). In Proc. TREC-6, 1997.
[11]
G. V. Cormack and M. R. Grossman. The Grossman-Cormack glossary of technology-assisted review with foreword by John M. Facciola, Magistrate Judge. Fed. Courts Law Rev., 7(1), 2013.
[12]
G. V. Cormack and M. R. Grossman. Evaluation of machine-learning protocols for technology-assisted review in electronic discovery. In Proc. SIGIR, 2014.
[13]
G. V. Cormack and A. Kolcz. Spam filter evaluation with imprecise ground truth. In Proc. SIGIR, 2009.
[14]
G. V. Cormack and T. R. Lynam. Spam corpus creation for TREC. In Proc. 2nd CEAS, 2005.
[15]
G. V. Cormack and M. Mojdeh. Machine learning for information retrieval: TREC 2009 Web, Relevance Feedback and Legal Tracks. In Proc. TREC-18, 2009.
[16]
G. V. Cormack, M. D. Smucker, and C. L. A. Clarke. Efficient and effective spam filtering and re-ranking for large web datasets. Information Retrieval, 14(5), 2011.
[17]
B. Frénay and M. Verleysen. Classification in the presence of label noise: a survey. IEEE Trans. Neural Networks and Learning Systems, 25(5), 2013.
[18]
T. Joachims. Making large-scale SVM learning practical. In Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999.
[19]
D. Gonsowski. A look into the e-discovery crystal ball. Inside Counsel, Dec. 2, 2011.
[20]
M. R. Grossman and G. V. Cormack. Technology-assisted review in e-discovery can be more effective and more efficient than exhaustive manual review. Rich. J. Law & Tech., 17, 2011.
[21]
M. R. Grossman and G. V. Cormack. Inconsistent responsiveness determination in document review: Difference of opinion or human error? Pace Law Rev., 32, 2012.
[22]
D. Harman. Overview of the Fourth Text REtrieval Conference (TREC-4). In Proc. TREC-4, 1995.
[23]
B. Hedin, S. Tomlinson, J. R. Baron, and D. W. Oard. Overview of the TREC 2009 Legal Track. In Proc. TREC-18, 2009.
[24]
J. P. Higgins, S. Green, eds. Cochrane Handbook for Systematic Reviews of Interventions, Wiley Online Library, 2008.
[25]
A. Kolcz and G. V. Cormack. Genre-based decomposition of email class noise. In Proc. KDD, 2009.
[26]
M. E. Lesk and G. Salton. Relevance assessments and retrieval system evaluation. Information Storage and Retrieval, 4(4), 1968.
[27]
C. Li, Y. Wang, P. Resnick, and Q. Mei. ReQ-ReC: High Recall Retrieval with Query Pooling and Interactive Classification. In Proc. SIGIR, 2014.
[28]
J. Pickens. In TAR, wrong decisions can lead to the right documents (a response to Ralph Losey). http://web.archive.org/save/http://www.catalystsecure.com/blog/2014/02/in-tar-wrong-decisions-can-lead-to-the-right-documents-a-response-to-ralph-losey/.
[29]
S. Robertson and I. Soboroff. The TREC 2002 Filtering Track report. In Proc. TREC-11, 2002.
[30]
H. L. Roitblat, A. Kershaw, and P. Oot. Document categorization in legal electronic discovery: Computer classification vs. manual review. J. Am. Soc. for Info. Sci. and Tech., 61(1), 2010.
[31]
T. Saracevic. Relevance: A review of the literature and a framework for thinking on the notion in information science. J. Am. Soc. Info. Sci., 26(6), 1975.
[32]
T. Saracevic. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part II: Nature and manifestations of relevance. J. Am. Soc. for Info. Sci. and Tech., 58(13), 2007.
[33]
T. Saracevic. Relevance: A review of the literature and a framework for thinking on the notion in information science. Part III: Behavior and effects of relevance. J. Am. Soc. for Info. Sci. and Tech., 58(13), 2007.
[34]
L. Schamber. Relevance and information behavior. Annual Rev. Info. Sci. and Tech. (ARIST), 29, 1994.
[35]
J. C. Scholtes, T. van Cann, and M. Mack. The impact of incorrect training sets and rolling collection on technology assisted review. In ICAIL DESI V Workshop, 2013.
[36]
D. Sculley and G. V. Cormack. Filtering email spam in the presence of noisy user feedback. In Proc. CEAS, 2008.
[37]
F. Sebastiani. Machine learning in automated text categorization. ACM Comput. Surv., 34(1), 2002.
[38]
B. Settles. Active learning literature survey. TR 1648, University of Wisconsin, Madison, 2010.
[39]
M. D. Smucker and C. P. Jethani. Human performance and retrieval precision revisited. In Proc. SIGIR, 2010.
[40]
A. Trotman and D. Jenkinson. IR evaluation using multiple assessors per topic. In Proc. ADCS, 2007.
[41]
E. M. Voorhees. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management, 36(5), 2000.
[42]
E. M. Voorhees. The philosophy of information retrieval evaluation. In Evaluation of Cross-Language Information Retrieval Systems. Springer, 2002.
[43]
E. M. Voorhees and D. Harman. Overview of the Sixth Text REtrieval Conference (TREC-6). In Proc. TREC-6, 1997.
[44]
E. M. Voorhees and D. K. Harman, eds. TREC: Experiment and Evaluation in Information Retrieval. MIT Press, 2005.
[45]
W. Webber, D. W. Oard, F. Scholer, and B. Hedin. Assessor error in stratified evaluation. In Proc. CIKM, 2010.
[46]
W. Webber and J. Pickens. Assessor disagreement and text classifier accuracy. In Proc. SIGIR, 2013.
[47]
X. Zhu and X. Wu. Class noise vs. attribute noise: A quantitative study. A. I. Rev., 22(3), 2004.

Cited By

View all
  • (2024)Unbiased Validation of Technology-Assisted Review for eDiscoveryProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657903(2677-2681)Online publication date: 10-Jul-2024
  • (2024)Beyond the Bar: Generative AI as a Transformative Component in Legal Document Review2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10826089(4779-4788)Online publication date: 15-Dec-2024
  • (2024)Comparison of Tools and Methods for Technology-Assisted ReviewInformation Management10.1007/978-3-031-64359-0_9(106-126)Online publication date: 18-Jul-2024
  • Show More Cited By

Index Terms

  1. Impact of Surrogate Assessments on High-Recall Retrieval

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGIR '15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval
    August 2015
    1198 pages
    ISBN:9781450336215
    DOI:10.1145/2766462
    Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 09 August 2015

    Check for updates

    Author Tags

    1. assessor error
    2. ediscovery
    3. electronic discovery
    4. evaluation
    5. recall
    6. relevance ranking
    7. supervised learning

    Qualifiers

    • Research-article

    Conference

    SIGIR '15
    Sponsor:

    Acceptance Rates

    SIGIR '15 Paper Acceptance Rate 70 of 351 submissions, 20%;
    Overall Acceptance Rate 792 of 3,983 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 01 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Unbiased Validation of Technology-Assisted Review for eDiscoveryProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657903(2677-2681)Online publication date: 10-Jul-2024
    • (2024)Beyond the Bar: Generative AI as a Transformative Component in Legal Document Review2024 IEEE International Conference on Big Data (BigData)10.1109/BigData62323.2024.10826089(4779-4788)Online publication date: 15-Dec-2024
    • (2024)Comparison of Tools and Methods for Technology-Assisted ReviewInformation Management10.1007/978-3-031-64359-0_9(106-126)Online publication date: 18-Jul-2024
    • (2024)Limitations of the Utility of Categorization in eDiscovery Review EffortsInformation Management10.1007/978-3-031-64359-0_24(301-311)Online publication date: 18-Jul-2024
    • (2019)A Regularization Approach to Combining Keywords and Training Data in Technology-Assisted ReviewProceedings of the Seventeenth International Conference on Artificial Intelligence and Law10.1145/3322640.3326713(153-162)Online publication date: 17-Jun-2019
    • (2019)Variations in Assessor Agreement in Due DiligenceProceedings of the 2019 Conference on Human Information Interaction and Retrieval10.1145/3295750.3298945(243-247)Online publication date: 8-Mar-2019
    • (2018)A Dataset and an Examination of Identifying Passages for Due DiligenceThe 41st International ACM SIGIR Conference on Research & Development in Information Retrieval10.1145/3209978.3210015(465-474)Online publication date: 27-Jun-2018
    • (2016)Total RecallProceedings of the 2016 ACM International Conference on the Theory of Information Retrieval10.1145/2970398.2970430(45-48)Online publication date: 12-Sep-2016
    • (2016)Impact of Review-Set Selection on Human Assessment for Text ClassificationProceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval10.1145/2911451.2914709(861-864)Online publication date: 7-Jul-2016
    • (2016)Retrieving patents with inverse patent category frequencyProceedings of the 2016 International Conference on Big Data and Smart Computing (BigComp)10.1109/BIGCOMP.2016.7425808(109-114)Online publication date: 18-Jan-2016

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media