ScienceDirect® Home Skip Main Navigation Links
You have guest access to ScienceDirect. Find out more.
 
Home
Browse
My Settings
Alerts
Help
 Quick Search
 Search tips (Opens new window)
    Clear all fields    
advertisementadvertisement
Information Processing & Management
Volume 43, Issue 2, March 2007, Pages 431-444
Special issue on AIRS2005: Information Retrieval Research in Asia
 
Font Size: Decrease Font Size  Increase Font Size
 Abstract - selected
Article
Purchase PDF (301 K)

  E-mail Article   
  Add to my Quick Links   
Bookmark and share in 2collab (opens in new window)
Request permission to reuse this article
  Cited By in Scopus (0)
 
 
 
Related Articles in ScienceDirect
View More Related Articles
 
Special issue
View Record in Scopus
 
doi:10.1016/j.ipm.2006.07.019    How to Cite or Link Using DOI (Opens New Window)
Copyright © 2006 Elsevier Ltd All rights reserved.

Supervised categorization of JavaScriptTM using program analysis features

Wei Lua, Corresponding Author Contact Information, E-mail The Corresponding Author and Min-Yen Kanb, E-mail The Corresponding Author

aSingapore-MIT Alliance, E4-04-10, 4, Engineering Drive 3, Singapore 117576, Singapore bDepartment of Computer Science, School of Computing, National University of Singapore, Singapore 117543, Singapore

Received 16 May 2006; 
accepted 25 July 2006. 
Available online 18 October 2006.

Purchase the full-text article



References and further reading may be available for this article. To view references and further reading you must purchase this article.

Abstract

Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts as a text categorization problem. We show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and software analysis to improve classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50% of errors over a standard text classification baseline.

Keywords: Information retrieval; Machine learning; JavaScript; ECMAScript; Program comprehension; Source clone; Program pattern; Software metrics; Program classification; Automated code classification

Article Outline

1. Introduction
2. Background
3. JavaScript categorization
4. Methods
4.1. Using language features for improved tokenization
4.2. Code metrics
4.2.1. Standard metrics
4.2.2. JavaScript language-specific metrics
4.2.3. Code reuse using edit distance
4.3. Program comprehension using the document object model
4.3.1. Static analysis
4.3.2. Dynamic analysis
5. Evaluation
5.1. Lexical analysis
5.2. Metrics
5.3. Program comprehension
6. Annotation evaluation
7. Conclusion and future work
Acknowledgements
References







Information Processing & Management
Volume 43, Issue 2, March 2007, Pages 431-444
Special issue on AIRS2005: Information Retrieval Research in Asia
 
Home
Browse
My Settings
Alerts
Help
Elsevier.com (Opens new window)
About ScienceDirect  |  Contact Us  |  Information for Advertisers  |  Terms & Conditions  |  Privacy Policy
Copyright © 2008 Elsevier B.V. All rights reserved. ScienceDirect® is a registered trademark of Elsevier B.V.