[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1133219.1133223acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesw4aConference Proceedingsconference-collections
Article

Transforming web pages to become standard-compliant through reverse engineering

Published: 22 May 2006 Publication History

Abstract

Developing Web pages following established standards can make the information more accessible, their rendering more efficient, and their processing by computer applications easier. Unfortunately, more than 95% of the existing Web pages today are not "valid" in that they do not follow some of the recommendations (standards) of the World Wide Web Consortium (W3C). Fixing any Web page to make it standard-compliant is a major undertaking. There is now an open-source tool called HTML Tidy which will attempt to fix the invalid HTML code automatically. However, Tidy often changes the Web page's appearance after processing. It is not an effective tool to transform existing Web pages to make them standard-compliant.In this paper we report the design and implementation of PURE, a tool that cleans up an HTML document through reverse engineering. PURE starts with the rendering result of a given Web page and generates valid HTML code and CSS automatically to produce the same appearance. It is found to be effective for many existing Web pages. A prototype is now available for public testing and comments.

References

[1]
Alexa.com. "Top 500 sites." http://www.alexa.com/site/ds/top_500
[2]
European Computer Manufacturers Association (ECMA). "Standard ECMA-262: ECMAScript Language Specification". http://www.ecma-international.org/publicatins/standards/Ecma-262.htm
[3]
GNU Foundation. "GNU General Public License". http://www.gnu.org/copyleft/gpl.html
[4]
John Haller. "Browser Rendering Engine Statistics". http://johnhaller.com/jh/useful_stuff/browser_statistics
[5]
MaxDesign.com. "The benefits of Web Standards to your visitors, your clients and you!". http://www.maxdesign.com.au/presentation/benefits
[6]
Microsoft Corporation. "Programming and Reusing the Browser". http://msdn.microsoft.com/workshop/browser/prog_browser_node_entry.asp
[7]
François Nonnenmacher. "Web Standards for business". http://www.webstandards.org/learn/reference/web_standards_for_business.html. 2003
[8]
Dave Raggett. "Clean up your Web pages with HTML TIDY". http://www.w3.org/People/Raggett/tidy
[9]
Chen Shan, Hong Dan, Vincent Shen. "An Experimental Study on Validation Problems with Existing HTML Webpages". Proceedings of International Conference on Internet Computing (ICOMP'05), Las Vegas, 2005. pp. 373--379.
[10]
The Web Standard Project. http://www.webstandards.org
[11]
Jeffrey Veen. "The Business Value of Web Standards". http://www.adaptivepath.com/publications/essays/archives/000266.php
[12]
World Wide Web Consortium (W3C). "Web Content Accessibility Guidelines".http://www.w3.org/TR/WAIWEBCONTENT
[13]
World Wide Web Consortium (W3C). "The global structure of an HTML document". http://www.w3.org/TR/REC-html40971218/struct/global.html
[14]
World Wide Web Consortium (W3C). "Visual formatting model".http://www.w3.org/TR/REC-CSS2/visuren.html
[15]
World Wide Web Consortium (W3C). Positioning HTML Elements with Cascading Style Sheets http://www.w3.org/TR/1999/WD-positioning-19990902
[16]
World Wide Web Consortium (W3C). "HTML 4.01 Specification". http://www.w3.org/TR/REC-html40
[17]
World Wide Web Consortium (W3C). "What is the Document Object Model?". http://www.w3.org/TR/DOM-Level-2-Core/introduction.html
[18]
World Wide Web Consortium (W3C). "Positioning schemes". http://www.w3.org/TR/CSS21/visuren.html#positioning-scheme
[19]
World Wide Web Consortium (W3C). Mobile Web Initiative. http://www.w3.org/Mobile
[20]
World Wide Web Consortium (W3C). "Mobile Web Best Practices 1.0". http://www.w3.org/TR/2005/WD-mobilebp-20051017
[21]
Jeffrey Zeldman. "Designing with web standards". New Riders: Berkeley, 2003. 456 pages.

Cited By

View all
  • (2018)The quality of the XML WebWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2012.12.00119(59-68)Online publication date: 20-Dec-2018
  • (2012)A Framework for Detecting and Diagnosing Configuration Faults in Web ApplicationsAdvances in Computers Volume 8610.1016/B978-0-12-396535-6.00005-3(137-181)Online publication date: 2012
  • (2011)The quality of the XML webProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063824(1719-1724)Online publication date: 24-Oct-2011

Index Terms

  1. Transforming web pages to become standard-compliant through reverse engineering

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    W4A '06: Proceedings of the 2006 international cross-disciplinary workshop on Web accessibility (W4A): Building the mobile web: rediscovering accessibility?
    May 2006
    153 pages
    ISBN:159593281X
    DOI:10.1145/1133219
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 May 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. HTML
    2. HTML tidy
    3. W3C recommendations
    4. browser
    5. cascade style sheets
    6. rendering engine
    7. web page

    Qualifiers

    • Article

    Acceptance Rates

    Overall Acceptance Rate 171 of 371 submissions, 46%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)2
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 12 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)The quality of the XML WebWeb Semantics: Science, Services and Agents on the World Wide Web10.1016/j.websem.2012.12.00119(59-68)Online publication date: 20-Dec-2018
    • (2012)A Framework for Detecting and Diagnosing Configuration Faults in Web ApplicationsAdvances in Computers Volume 8610.1016/B978-0-12-396535-6.00005-3(137-181)Online publication date: 2012
    • (2011)The quality of the XML webProceedings of the 20th ACM international conference on Information and knowledge management10.1145/2063576.2063824(1719-1724)Online publication date: 24-Oct-2011

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media