Drag & drop your .md, .txt, .json, .srt, .docx, .epub, or .org file here
or
Please upload a file or process a URL to begin analyzing your corpus.
Censor Studio
Click a token to replace it with a privacy placeholder. Suggested phone numbers and ID numbers appear in the sidebar. Download the censored JSON when finished.
ExportDownload JSON with privacy placeholders ([CENSOR_*])
Upload or process a corpus, then open this tab to censor tokens.
Metadata Studio
Edit the top-level metadata object in your tokenized JSON. Use multiple rows with the same key to store an array of values. Apply changes to the loaded corpus, then download the full JSON.
Corpus metadataChanges apply to the in-memory corpus until you reset or reload.
Upload or process a corpus to edit metadata.
Key Word In Context (KWIC) Concordances
KWIC concordances display search terms with surrounding context, enabling pattern analysis and collocation studies. This view helps identify how words are used in different contexts and reveals linguistic patterns in your corpus.
Search Options
Generating KWIC...
N-gram Analysis
N-gram analysis extracts contiguous sequences of n tokens, revealing frequent word combinations and linguistic patterns in your corpus. This method is fundamental to corpus linguistics for identifying collocations, phraseology, and lexical patterns.
Filter Options
Generating n-grams...
Wordcloud Visualization
Wordcloud visualization represents word frequency through size and prominence, providing an intuitive overview of lexical distribution in your corpus. More frequent words appear larger, enabling quick identification of key vocabulary.
Filter Options
Generating wordcloud...
Collocation Analysis
Collocation analysis identifies words that frequently co-occur with a keyword, revealing lexical patterns and phraseology. This view uses log-likelihood (G²) to measure the strength of association between words.