Introducing CorpusSense 2.0: semantic search across 23 languages
Our biggest update yet brings multilingual semantic search powered by sentence embeddings, a redesigned analysis dashboard, and expanded LLM insight capabilities.
When we first started building CorpusSense, we knew that search would be at the heart of the experience. Researchers need to find exactly what they're looking for — whether that's a specific word, a grammatical pattern, or a broader concept spread across thousands of documents.
With version 2.0, we're introducing semantic search: the ability to find text by meaning, not just by exact terms. This changes everything for researchers working with large multilingual corpora.
How semantic search works
Traditional corpus search relies on exact string matching or regular expressions. These are powerful tools, but they miss conceptual connections. If you search for "climate anxiety" you won't find passages about "eco-grief" or "environmental dread" — even though they express the same idea.
Semantic search uses sentence embeddings — dense vector representations of text generated by transformer models — to find passages that are conceptually similar to your query, regardless of the specific words used.
Semantic search doesn't replace lexical search — it complements it. The real power comes from combining both modes in a single query workflow.
— Dr. Moreno OrtizThe embedding pipeline
Each text in your corpus is processed through our embedding pipeline at upload time. We use multilingual sentence transformers to generate 768-dimensional vectors for every segment, which are then indexed for fast approximate nearest-neighbor search.
Fig. 1 — The semantic embedding pipeline processes texts at upload time, enabling sub-second search across millions of segments.
# Example: semantic search API call
results = corpus.search(
query="climate anxiety",
mode="semantic",
languages=["en", "es", "fr"],
top_k=50
)What's new in 2.0
— Semantic search across all 23 supported languages
— Cross-lingual retrieval (search in English, find results in Spanish)
— Combined mode: semantic + lexical + pattern in one query
— Redesigned results interface with similarity scores
— Expanded LLM insights: 15 analytical dimensions (up from 10)
We're excited to see what researchers discover with these new capabilities. Semantic search opens up entirely new workflows — from cross-lingual comparative studies to exploratory analysis of themes you didn't know to look for. Try it today at corpus-sense.app.