Blog

Updates on CorpusSense development, research findings, and NLP insights.

Featured image

Featured·January 15, 2025

Introducing CorpusSense 2.0: semantic search across 23 languages

Our biggest update yet brings multilingual semantic search powered by sentence embeddings, a redesigned analysis dashboard, and expanded LLM insight capabilities.

Read article →

December 2024Development

BERTopic integration: automatic theme discovery in your corpora

How we integrated BERTopic for unsupervised topic modeling, enabling researchers to discover latent themes across large text collections without manual annotation.

November 2024Research

15 dimensions of text analysis: what Qwen 2.5 reveals about your data

A deep dive into the 15 analytical aspects — from sentiment and argumentation to irony and hate speech — and how the LLM-powered pipeline extracts them.

October 2024Tutorial

Getting started with XML metadata and subcorpora

A step-by-step guide to uploading annotated corpora, defining metadata schemas, and creating targeted subcorpora for focused analysis.

September 2024Announcement

CorpusSense at LREC-COLING 2024: our first public presentation

We presented CorpusSense at the Language Resources and Evaluation Conference in Torino. Here's what we shared and what we learned.

July 2024Development

Building a multilingual collocation engine with SpaCy

How we implemented grammatical collocation extraction across 23 languages using SpaCy's dependency parsing and configurable statistical measures.