Skip to content

Release Notes 6.0

Super Excited to share the latest development in our library, which essentially giving you more embedding choices -- Cohere and siglip, new chunking method-- late chunking and more crates that facilitates amazing modality and maintainability for our rust codebase, --processor crate. so let's dive in.

Late Chunking

The new 0.5.6 version adds Late Chunking to EmbedAnything, a technique introduced by Jina AI and Weaviate. Here's how we've implemented Late Chunking in EA:

๐—•๐—ฎ๐˜๐—ฐ๐—ต ๐—ฎ๐˜€ ๐—–๐—ต๐˜‚๐—ป๐—ธ ๐—š๐—ฟ๐—ผ๐˜‚๐—ฝ: In EmbedAnything, with late chunking enabled, the batch size determines the number of neighboring chunks that will be processed together.

๐—๐—ผ๐—ถ๐—ป๐˜ ๐—˜๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด: The grouped chunks are fed into the embedding model as a single, larger input. This allows the model to capture relationships and dependencies between adjacent chunks.

๐—˜๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด ๐—ฆ๐—ฝ๐—น๐—ถ๐˜: After embedding, the combined output is divided back into the embeddings for the original, individual chunks.

๐— ๐—ฒ๐—ฎ๐—ป ๐—ฃ๐—ผ๐—ผ๐—น๐—ถ๐—ป๐—ด (๐—ฝ๐—ฒ๐—ฟ ๐—–๐—ต๐˜‚๐—ป๐—ธ): Mean pooling is then applied to each individual chunk's embedding, incorporating the contextual information learned during the joint embedding phase.

๐พ๐‘’๐‘ฆ ๐ต๐‘’๐‘›๐‘’๐‘“๐‘–๐‘ก๐‘ :

๐—–๐—ผ๐—ป๐˜๐—ฒ๐˜…๐˜-๐—”๐˜„๐—ฎ๐—ฟ๐—ฒ ๐—˜๐—บ๐—ฏ๐—ฒ๐—ฑ๐—ฑ๐—ถ๐—ป๐—ด๐˜€: By embedding neighboring chunks together, we capture crucial contextual information that would be lost with independent chunking.

๐—ข๐—ฝ๐˜๐—ถ๐—บ๐—ถ๐˜‡๐—ฒ๐—ฑ ๐—ฅ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฎ๐—น ๐—ฃ๐—ฒ๐—ฟ๐—ณ๐—ผ๐—ฟ๐—บ๐—ฎ๐—ป๐—ฐ๐—ฒ: Expect a significant improvement in the accuracy and relevance of your search results.

model:EmbeddingModel = EmbeddingModel.from_pretrained_onnx(
    WhichModel.Jina, hf_model_id="jinaai/jina-embeddings-v2-small-en", path_in_repo="model.onnx"
)
config = TextEmbedConfig(
    chunk_size=1000,
    batch_size=8,
    splitting_strategy="sentence",
    late_chunking=True,
)

# Embed a single file
data: list[EmbedData] = model.embed_file("test_files/attention.pdf", config=config)

Cohere Embed 4:

๐ŸงŠ Single embedding per document, even for multimodal inputs ๐Ÿ“š Handles up to 128K tokens โ€“ perfect for long-form business documents ๐Ÿ—ƒ๏ธ Supports compressed vector formats (int8, binary) for real-world scalability ๐ŸŒ Multilingual across 100+ languages

The catch? Itโ€™s not open-sourceโ€”and even if it were, the model would be quite hefty to run locally. But if youโ€™re already using cloud-based embeddings like OpenAIโ€™s, Embed v4 is worth testing.

# Initialize the model once
model: EmbeddingModel = EmbeddingModel.from_pretrained_cloud(
    WhichModel.CohereVision, model_id="embed-v4.0"
)

SigLIP

We already had Clip support but many of you asked for siglip support. It out performs clip for zero shot classification for smaller batch. It also has better memory efficinecy.

# Load the model.
model = embed_anything.EmbeddingModel.from_pretrained_hf(
    embed_anything.WhichModel.Clip,
    model_id="google/siglip-base-patch16-224",
)

Processor Crate:

This crate contains various "processors" that accepts files and produces a chunked, metadata-rich document description. This is especially helpful for retrieval-augmented generation!

We have also received some additional cool feature requests on GitHub, which we would like to implement. If you want to help out please check out EmbedAnything on GitHub. We would love to have a contribution. ๐Ÿš€