InfoTracker
Rapidly detect sensitive text reuse and uncover hidden overlaps across massive document collections.
InfoTracker, developed by Stottler Henke, is a powerful system for performing rapid, scalable, and intelligent full-text comparisons between two large sets of documents. Designed for mission-critical environments, InfoTracker enables analysts and organizations to identify and rank meaningful textual similarities—whether the content has been paraphrased, partially obscured, or reused without attribution.
Behind its high performance is a uniquely engineered text indexing and comparison architecture, specifically optimized for the unique properties of natural language texts and real-world document collections. This architecture enables InfoTracker to efficiently index and compare millions of documents, delivering linear-time full-text matching while screening out false matches that arise due to boilerplate language and other common phrases.
InfoTracker is engineered to detect meaningful text reuse—even when content is deliberately obfuscated. Its tunable sensitivity uncovers everything from direct copying to subtle paraphrasing. Unlike keyword-based systems, InfoTracker automatically highlights non‑chance overlaps, reducing analyst workload and improving investigative precision.
Whether detecting potential information leaks, verifying originality, or redacting sensitive content, InfoTracker empowers analysts and organizations to quickly identify and rank meaningful textual similarities.
Key Capabilities
- Full-Text Comparison at Scale
Efficiently compares two massive document collections with linear-time performance, even at multi-million-document scale. - Sensitive Text Detection
Identifies potentially leaked or plagiarized content by spotting non-chance overlaps that suggest content reuse or unauthorized dissemination. - False Positive Reduction
Automatically filters out boilerplate text and common phrases to minimize noise and surface only meaningful matches. - Ofuscation-Aware Detection
Capable of recognizing suspicious similarity—even when content is reworded or paraphrased, supporting use cases ranging from direct copying to subtle paraphrasing. - Intelligent Ranking
Matches are scored and ranked to reflect the likelihood of meaningful reuse—supporting triage and focused investigation.
Use Cases
- Plagiarism Detection – Flag potential reuse across large academic or corporate repositories
- Redaction & Declassification – Identify and remove sensitive material from public disclosures
- Intelligence Leak Detection – Trace sensitive content spills or insider threats across datasets
- Document Comparison & Differencing – Understand what changed across versions or revisions
Status
InfoTracker was originally developed for the Intelligence Advanced Research Projects Activity (IARPA) and is currently deployed and in daily use by users within the U.S. Intelligence Community (i.e., TRL 9).
This work was supported by the ARDA Advanced Information Assurance program under contract NBCHC030077.
To schedule a demo or discuss how InfoTracker can support your organization’s needs, please contact: info@stottlerhenke.com.
Publications
Goan, T., Fujioka, E., Kaneshiro, R., & Gasch, L. (2006). Identifying information provenance in support of intelligence analysis, sharing, and protection. In International Conference on Intelligence and Security Informatics (pp. 692-693). Berlin, Heidelberg: Springer Berlin Heidelberg. Paper.
Goan, T., & Broadhead, M. (2005). Detecting the Misappropriation of Sensitive Information through Bottleneck Monitoring. Workshop on Secure Knowledge Management (SKM 2004). Paper.
Related Work
TBD…