CorpusSense
CorpusSense
Log in
|

Features

A complete toolkit for linguistic corpus analysis — from search to AI-powered insights, all in one place.

01

Semantic, lexical & pattern search

Search your corpus three ways: semantic search uses embeddings to find conceptually related content, lexical search matches exact terms, and pattern search uses regular expressions. Combine modes for maximum precision across any of 23 supported languages.

SemanticLexicalRegex
Screenshot placeholder
Screenshot placeholder
02

Word analysis & frequencies

Explore word-level analysis with frequency distributions, concordances, and contextual usage patterns. Understand how language is used across your corpus with detailed statistical breakdowns and visualizations.

FrequenciesConcordancesDistributions
03

Topic modeling with BERTopic

Automatically discover themes and topics using state-of-the-art BERTopic. Visualize topic distributions, explore relationships between themes, and understand the thematic structure of your data.

BERTopicClustering
Screenshot placeholder
Screenshot placeholder
04

AI-powered insights across 15 dimensions

Leverage Qwen 2.5 for deep analysis: sentiment, argumentation, bias detection, stance, emotions, irony, hate speech, persuasion techniques, topic relevance, and more — all automatically.

SentimentArgumentationBias
05

Grammatical collocations

Discover word associations and co-occurrence patterns using configurable statistical measures.

06

Subcorpora & XML metadata

Create and manage subcorpora. Upload XML-annotated data, filter by metadata tags, organize collections.

07

Snapshot & corpus overview

Get instant overviews with automated snapshots. Key metrics, distributions, and trends at a glance.

Built on proven technology

State-of-the-art NLP stack for reliable, accurate linguistic analysis.

SpaCyNLP pipeline & tokenization
Qwen 2.5LLM-powered analysis engine
BERTopicTopic modeling & clustering
EmbeddingsSemantic vector search