OpenAI-Compatible Server
This server provides an OpenAI-compatible API for generating embeddings using the EmbedAnything library. We choose Actix Server for:
- Blazing fast: Consistently ranks among the fastest web frameworks in benchmarks like TechEmpower.
- Asynchronous by default: Built on Rustโs async/await, enabling efficient I/O-bound workloads.
- Lightweight & modular: Minimal core with extensible middleware, plugins, and integrations.
- Type-safe: Strong type guarantees ensure fewer runtime surprises.
- Production-ready: Stable, mature, and already used in industries like fintech, IoT, and SaaS platforms.
For benchmarks between python and rust servers, you check out this blog: https://www.jonvet.com/blog/benchmarking-python-rust-web-servers
Features
- OpenAI-compatible
/v1/embeddingsendpoint - Support for multiple embedding models (Jina, BERT, etc.)
- Health check endpoint
Running the Server
The server will start on http://0.0.0.0:8080.
API Usage
Create Embeddings
Endpoint: POST /v1/embeddings
Request:
{
"model": "sentence-transformers/all-MiniLM-L12-v2",
"input": ["The quick brown fox jumps over the lazy dog"]
}
Response:
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [0.0023064255, -0.009327292, ...]
}
],
"model": "sentence-transformers/all-MiniLM-L12-v2",
"usage": {
"prompt_tokens": 9,
"total_tokens": 9
}
}
Health Check
Endpoint: GET /health_check
Returns a 200 OK status if the server is running.
Error Handling
The API returns OpenAI-compatible error responses:
Example Usage with curl
# Create embeddings
curl -X POST http://localhost:8080/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "sentence-transformers/all-MiniLM-L12-v2",
"input": ["Hello world", "How are you?"]
}'
# Health check
curl http://localhost:8080/health_check
Example Usage with Python
import requests
# Create embeddings
response = requests.post(
"http://localhost:8080/v1/embeddings",
json={
"model": "sentence-transformers/all-MiniLM-L12-v2",
"input": ["The quick brown fox jumps over the lazy dog"]
}
)
if response.status_code == 200:
data = response.json()
print(f"Generated {len(data['data'])} embeddings")
print(f"First embedding dimension: {len(data['data'][0]['embedding'])}")
else:
print(f"Error: {response.json()}")