Copyright © 2006 Elsevier Ltd All rights reserved.
Supervised categorization of JavaScriptTM using program analysis features
Received 16 May 2006;
References and further reading may be available for this article. To view references and further reading you must purchase this article.
Abstract
Web pages often embed scripts for a variety of purposes, including advertising and dynamic interaction. Understanding embedded scripts and their purpose can often help to interpret or provide crucial information about the web page. We have developed a functionality-based categorization of JavaScript, the most widely used web page scripting language. We then view understanding embedded scripts as a text categorization problem. We show how traditional information retrieval methods can be augmented with the features distilled from the domain knowledge of JavaScript and software analysis to improve classification performance. We perform experiments on the standard WT10G web page corpus, and show that our techniques eliminate over 50% of errors over a standard text classification baseline.
Keywords: Information retrieval; Machine learning; JavaScript; ECMAScript; Program comprehension; Source clone; Program pattern; Software metrics; Program classification; Automated code classification
Article Outline
- 1. Introduction
- 2. Background
- 3. JavaScript categorization
- 4. Methods
- 4.1. Using language features for improved tokenization
- 4.2. Code metrics
- 4.2.1. Standard metrics
- 4.2.2. JavaScript language-specific metrics
- 4.2.3. Code reuse using edit distance
- 4.3. Program comprehension using the document object model
- 4.3.1. Static analysis
- 4.3.2. Dynamic analysis
- 5. Evaluation
- 5.1. Lexical analysis
- 5.2. Metrics
- 5.3. Program comprehension
- 6. Annotation evaluation
- 7. Conclusion and future work
- Acknowledgements
- References







E-mail Article
Add to my Quick Links

Cited By in Scopus (0)







