Blog
Updates on CorpusSense development, research findings, and NLP insights.
Introducing CorpusSense 2.0: semantic search across 23 languages
Our biggest update yet brings multilingual semantic search powered by sentence embeddings, a redesigned analysis dashboard, and expanded LLM insight capabilities.
Read article →BERTopic integration: automatic theme discovery in your corpora
How we integrated BERTopic for unsupervised topic modeling, enabling researchers to discover latent themes across large text collections without manual annotation.
15 dimensions of text analysis: what Qwen 2.5 reveals about your data
A deep dive into the 15 analytical aspects — from sentiment and argumentation to irony and hate speech — and how the LLM-powered pipeline extracts them.
Getting started with XML metadata and subcorpora
A step-by-step guide to uploading annotated corpora, defining metadata schemas, and creating targeted subcorpora for focused analysis.
CorpusSense at LREC-COLING 2024: our first public presentation
We presented CorpusSense at the Language Resources and Evaluation Conference in Torino. Here's what we shared and what we learned.
Building a multilingual collocation engine with SpaCy
How we implemented grammatical collocation extraction across 23 languages using SpaCy's dependency parsing and configurable statistical measures.