Linguistic corpus analysis, reimagined

Understand language at scale

CorpusSense combines semantic search, topic modeling, and LLM-powered analysis across 23 languages. Built by the Tecnolengua research group at Universidad de Malaga.

Start analyzing →Read the paper

Screenshot placeholder

23languages supported

15analytical dimensions

3search modes

10thematic corpora

Why CorpusSense

Traditional corpus tools are fragmented and technical. We built the platform we wished existed — one that lets you focus on discovery, not infrastructure.

See all features →

Semantic + lexical + pattern search

Three search modes in one interface. Find what you need by meaning, exact match, or regex.

15 LLM-powered insight dimensions

Sentiment, argumentation, bias, stance, emotions, irony — analyzed automatically by Qwen 2.5.

Topic modeling with BERTopic

Discover hidden themes automatically. Visualize distributions and explore topic relationships.

“From Sentitext to Lingmotif to CorpusSense — each tool we build pushes the boundary of what's possible in computational linguistics.”

Dr. Antonio Jesus Moreno Ortiz — Principal Investigator, Universidad de Malaga

What you can do

Semantic, lexical & pattern search

Find content by meaning, exact terms, or regex patterns. Combine modes for maximum precision.

Insights

15 analytical dimensions powered by AI

Sentiment, argumentation, bias, stance, emotions, irony, hate speech, and more.

Topics

BERTopic modeling

Discover hidden themes. Visualize distributions and relationships.

Words

Frequencies, concordances & distributions

Understand how language is used across your corpus with statistical precision.

Collocations

Word associations

Grammatical collocations with configurable statistical measures.

Subcorpora

Slice, filter & organize with XML metadata

Create subcorpora from metadata tags. Upload XML-annotated data. Manage collections for targeted analysis across your entire research workflow.

Overview

Corpus snapshots at a glance

Backed by two decades of NLP research

CorpusSense is the latest in a lineage of tools built by the Tecnolengua group (TIC-219): Sentitext, Lingmotif, DisParSA. Each iteration has advanced what's possible in computational analysis of language.

View publications →

2024CorpusSense

2021DisParSA

2017Lingmotif

2014Sentitext

Start understanding your corpus today.

Try free →orcontact the team

Get in touch

Whether for research collaboration, enterprise licensing, or general inquiries — we'd love to hear from you.