10 databases. Real specs. No hype. Pick the right store for your RAG pipeline in under 5 minutes.
Scroll horizontally on mobile. Click a DB name to go to its official docs.
| Database | Deployment | Free Tier | Free Vectors | Paid Starts | Hybrid Search | Filtering | SDKs | Notable Users | Best For |
|---|---|---|---|---|---|---|---|---|---|
| Pinecone | Managed | Yes | 2M (1 index) | $70/mo (Standard) | Yes (sparse+dense) | Yes (metadata) | PythonJSGoJava |
Notion, Shopify, Zapier | Zero-ops RAG at scale |
| Weaviate | Hybrid | Yes | 1M (Sandbox) | $25/mo (Serverless) | Yes (BM25+HNSW) | Yes (GraphQL) | PythonJSGoJava |
Stack Overflow, Cisco | GraphQL-native multi-modal search |
| Qdrant | Hybrid | Yes | 1M (Cloud free) | $0.014/hr per cluster | Yes (sparse + dense) | Yes (payload filters) | PythonJSGoRust |
Uber, Microsoft, Grammarly | High-performance self-hosted RAG |
| Chroma | Self-host | Yes (OSS) | Unlimited (local) | Cloud beta (invite) | Partial (full-text) | Yes (metadata) | PythonJS |
LangChain default, LlamaIndex | Local prototyping & LangChain default |
| pgvector | Self-host | Yes (Postgres ext) | Unlimited (your DB) | Your Postgres cost | Partial (full-text + HNSW) | Yes (SQL WHERE) | PythonJSGoRust |
Supabase, Neon, Tembo | Teams already on Postgres |
| Milvus | Hybrid | Yes (OSS) | Unlimited (self-host) | Zilliz Cloud from $65/mo | Yes (sparse+dense) | Yes (scalar filters) | PythonJSGoJava |
Salesforce, Walmart, PayPal | Billion-scale enterprise workloads |
| Vespa | Hybrid | Yes (Vespa Cloud trial) | ~5M (trial) | Pay-as-you-go | Yes (BM25+ANN native) | Yes (YQL) | PythonJava |
Yahoo, Spotify, OkCupid | Complex ranking + real-time serving |
| LanceDB | Self-host | Yes (OSS) | Unlimited (local) | LanceDB Cloud (beta) | Partial (FTS planned) | Yes (SQL-like) | PythonJSRust |
Roboflow, Replicate | Multimodal + embedded/edge use cases |
| Turbopuffer | Managed | Yes (free writes) | Pay on reads only | $0.20/GB stored/mo | Partial (BM25 beta) | Yes (attribute filters) | PythonJS |
Braintrust, Cursor | Serverless cold-start with low cost |
| MongoDB Atlas Vector Search |
Managed | Yes (M0 cluster) | ~512MB total (M0) | M10 cluster ~$57/mo | Yes (Atlas Search + vector) | Yes (MQL filters) | PythonJSGoJava |
Forbes, Toyota, Square | Teams already on MongoDB Atlas |
Dataset: 1M vectors, 1536 dims (OpenAI ada-002). Managed tiers where applicable. Sources: ANN-Benchmarks, vendor docs, independent community runs.
| Database | Latency p50 | Latency p95 | Recall@10 | Cost / 1M vectors/mo | Index Type |
|---|---|---|---|---|---|
| Qdrant (self-host) | 2.1 ms | 5.8 ms | 99.2% | ~$18 (own infra) | HNSW |
| Pinecone (pod s1) | 3.4 ms | 9.2 ms | 98.8% | ~$70 (1 pod) | Pinecone proprietary |
| Weaviate (cloud) | 4.0 ms | 12 ms | 98.5% | ~$25–80 | HNSW |
| pgvector (HNSW) | 5.2 ms | 18 ms | 97.9% | Your Postgres cost | HNSW (v0.6+) |
| Milvus (self-host) | 2.8 ms | 7.4 ms | 99.0% | ~$30 (own infra) | HNSW / IVF_FLAT |
| Turbopuffer | 38 ms | 120 ms | 97.5% | ~$0.20/GB | Flat (object storage) |
| LanceDB (local) | 1.8 ms | 4.2 ms | 98.1% | $0 (local) | IVF + HNSW |
| Vespa | 3.1 ms | 8.6 ms | 98.9% | ~$60+ (cloud) | HNSW |
| MongoDB Atlas | 8.5 ms | 28 ms | 96.8% | ~$57+ (M10) | HNSW |
| Chroma (local) | 2.4 ms | 6.1 ms | 97.2% | $0 (local) | HNSW (hnswlib) |
Benchmarks are indicative. Production results vary by dimensionality, query load, hardware, and index parameters. Always run your own benchmark before choosing.
Answer 5 questions. Get a ranked recommendation. No AI, pure logic.
One paragraph per database. No fluff.
Choose Pinecone when your team has no desire to manage infrastructure and needs a battle-tested, globally available service. It's the fastest path from OpenAI embeddings to production. The free tier (2M vectors, 1 index) covers most side projects. Costs rise quickly at scale — budget carefully before committing.
Weaviate is the strongest choice when you need native hybrid search (BM25 + dense) and a rich GraphQL query interface. It supports text, image, and multi-modal objects natively. The Serverless Cloud tier is affordable for production. Self-host via Docker or Kubernetes if you need data residency.
Qdrant delivers the best raw query performance among open-source options, with Rust internals, HNSW indexing, and first-class payload filtering. Use it when latency under 5 ms matters, you want to self-host, or you need sparse+dense hybrid search without paying managed fees. The Python SDK is excellent.
Chroma is the default in LangChain tutorials for a reason — zero-config, in-process, and works locally in minutes. It is ideal for prototypes, notebooks, and internal tooling. Do not use Chroma in high-traffic production today; the managed cloud is still in beta and the local server lacks auth.
If you already run Postgres (via Supabase, Neon, RDS, or self-hosted), pgvector is the obvious choice — no new service, no new billing, and SQL filtering "just works." The HNSW index (added in v0.6) brought recall to par with dedicated vector stores. Latency degrades above 10M+ vectors unless you shard carefully.
Milvus is built for billion-scale. It supports multiple index types (HNSW, IVF_FLAT, DiskANN), multi-tenancy, and role-based access control. The Zilliz Cloud managed layer provides the ops surface. Use Milvus when your vector counts are in the hundreds of millions and you need horizontal scale-out.
Vespa is a full search and serving engine that predates the vector DB wave. It excels when you need complex multi-stage ranking pipelines, real-time document updates, and hybrid retrieval in a single system. It's overkill for simple RAG but exceptional for recommendation engines and e-commerce search.
LanceDB stores data in the Lance columnar format (built on Apache Arrow) and can run entirely embedded — no server process. It is the go-to for multimodal (images, video) workloads and edge deployments. Perfect for building locally-run AI apps or when you want to skip the network overhead entirely.
Turbopuffer stores vectors in object storage (S3-compatible) and computes on read. Latency is higher (30–120 ms) but cost per vector is extremely low. It's a strong choice for RAG pipelines where queries are infrequent and cost-per-query matters more than speed — think batch document processing or async search.
If your application already uses MongoDB Atlas, enabling Vector Search costs nothing extra and keeps your embeddings co-located with your documents. The HNSW index is solid, and Atlas Search (BM25) can be combined with vector search for hybrid retrieval. Avoid it if you're not already on Atlas — it adds complexity for no gain.