Understand language at scale
CorpusSense combines semantic search, topic modeling, and LLM-powered analysis across 23 languages. Built by the Tecnolengua research group at Universidad de Malaga.
Why CorpusSense
Traditional corpus tools are fragmented and technical. We built the platform we wished existed — one that lets you focus on discovery, not infrastructure.
See all features →Three search modes in one interface. Find what you need by meaning, exact match, or regex.
Sentiment, argumentation, bias, stance, emotions, irony — analyzed automatically by Qwen 2.5.
Discover hidden themes automatically. Visualize distributions and explore topic relationships.
“From Sentitext to Lingmotif to CorpusSense — each tool we build pushes the boundary of what's possible in computational linguistics.”
What you can do
Semantic, lexical & pattern search
Find content by meaning, exact terms, or regex patterns. Combine modes for maximum precision.
15 analytical dimensions powered by AI
Sentiment, argumentation, bias, stance, emotions, irony, hate speech, and more.
BERTopic modeling
Discover hidden themes. Visualize distributions and relationships.
Frequencies, concordances & distributions
Understand how language is used across your corpus with statistical precision.
Word associations
Grammatical collocations with configurable statistical measures.
Slice, filter & organize with XML metadata
Create subcorpora from metadata tags. Upload XML-annotated data. Manage collections for targeted analysis across your entire research workflow.
Corpus snapshots at a glance
Backed by two decades of NLP research
CorpusSense is the latest in a lineage of tools built by the Tecnolengua group (TIC-219): Sentitext, Lingmotif, DisParSA. Each iteration has advanced what's possible in computational analysis of language.
View publications →Start understanding your corpus today.
Get in touch
Whether for research collaboration, enterprise licensing, or general inquiries — we'd love to hear from you.