Name
Humongous - Mongo
a type of no SQL database
Structure
Document
use JSON as document model - data is represented in JSON format (which is flexible)
Different from relational schema
Collection
a collection for documents
Database
a collection for collections
Node
a group of database
Cluster
a group of nodes
Data Modeling in MongoDB
- Is your app read or write heavy?
- What data is frequently accessed together?
- What are your performance considerations?
- How will your data set grow and scale?
Embedding vs referencing
| One side | Many side | |
|---|---|---|
| Embedding | Embedding on the “one” side | Embedding on the “many” side |
| Referencing | Referencing on the “one” side | Referencing on the “many” side |
MongoDB Atlas
- Database-as-a-service (DBaaS)
- Automatically manages MongoDB for you, including:
- Deployment
- Monitoring
- Backup & restoration
- Archiving
- Security
- Built-in Replication
CRUD
MongoDB Query API Syntax Structure
db (keyword) → collection (coll name) → operator (command) → query/filter (criteria) → options (settings)
Example MongoDB find command
db (keyword) → collection (call name) → operator (command) → query/filter (criteria) → options (settings)
db.collection.find(
{ age: { $gte: 25 } }, // query/filter: Find documents where age is greater than or equal to 25
{
projection: { name: 1, age: 1 }, // options/settings: Include only the name and age fields in the results
sort: { age: -1 }, // options/settings: Sort the results by age in descending order
limit: 10 // options/settings: Limit the number of results to 10
}
);
Create
db.authors.insertOne({
"name": "Jane Austen",
"books": [ "0141439688", "0375757813", "1551114798" ],
"aliases": [ "Austen, Jane", "Jane Austen" ]
})
Read
db.authors.find({
"name": "Jane Austen"
})
Update
db.authors.updateOne(
{ name: "Jane Austin" },
{ $set: { name: "Jane Austen" } }
)
Delete
db.authors.deleteOne(
{ name: "Jane Austin" }
)
Hands-on
MongoDB Songs Playlist
| Title | Artist | Genre | Duration | Plays |
|---|---|---|---|---|
| Sucks To Be Ex-QL | $Avg $Max | Pop/Dance | 3:28 | 4 |
| Changes Stream | 2.Pack | Hip Hop | 3:33 | 8 |
| Primary’s Gonna Be Me | MSYNC | Pop | 2:40 | 4 |
| I Haven’t Met AI Yet | MiCachel Bubléson | Jazz | 3:36 | 6 |
| Where Is The Log? | Backed By DBs | Pop | 3:10 | 6 |
| All About that Database | Schema Trainor | Pop | 2:34 | 20 |
| What About $toDate | Doc-Tree | Rock | 3:48 | 18 |
| NoSQL Paradise | CLI-io | Hip Hop | 2:57 | 26 |
| NoS8L Boy | Docuesenece | Rock/Punk | 3:27 | 26 |
| Back or Front | Michael JSON | Pop | 3:11 | 26 |
| Free-Tier Styler | D-BomFunktion MCs | Hip Hop/Electro | 2:55 | 20 |
| Thnks Fr th MdbMemrriz | Doc Out Boy | Rock/Punk | 3:15 | 18 |
| Mocking Data Board | SlimShardy | Hip Hop | 3:31 | 20 |
| My _id, My Ride | Spring Boots | Country | 3:48 | 24 |
| S One | NULLy | Hip Hop | 3:12 | 16 |
| $moreLikeThis | Dolla $earch | Hip Hop/Trap | 2:53 | 32 |
| Let It Beat | The Bugs | 70s/Pop | 3:56 | 32 |
| Cluster Busy | connecTsean Pool | Dancehall | 2:55 | 22 |
| Index of Change (ESR) | Crystal Clusters | Techno | 3:20 | 22 |
| Code Vibin' | Masked WiredTiger | Hip Hop/Trap | 3:01 | 52 |
| URI Hero | IndexBack | Rock | 3:40 | 20 |
| Tiering Up My App | MSYNC | Pop | 3:23 | 34 |
| 3 Little Nodes | DB Marley | Reggae | 3:45 | 50 |
| Am I in the Void | Relational Migrators | Industrial Modern/Rock | 3:45 | 24 |
| I Believe I Can Shard | R Klusterly | R&B/Soul | 3:38 | 38 |
| Harder, Better, Faster, Secured | US_West_2 | Hip Hop/Trap | 3:12 | 34 |
| No SCRAM | TLS | R&B | 3:07 | 52 |
| Colores en la Cloud | J. Cloudin | Reggaeton | 3:09 | 40 |
| No Sequel Needed | SaaS Girls | Pop | 3:11 | 40 |
| Clave De Shard | Sharde Dezona | Reggaeton | 2:06 | 48 |
The badges on MLH are associated to the MongoDB official credentials - finish on
VectorSearch: Beginner to Pro
What is vector search?
Do you ever find yourself…
- Looking for something but you don’t quite have the words?
- Remembering some characteristics of a book but not the title?
- Trying to get another sweatshirt just like the one you had back in the day, but you don’t know how to search for it?
Now the lexicon search (keyword searching) is not working anymore.
Lexical search vs Vector search
Lexical search
What?
- Keyword search
When?
- Your text corpus closely matches how users search
- First pass at text-based relevancy
Vector search
What?
- Semantic similarities
When?
- “Vocabulary gap” between corpus and how users search
- Text, image, audio, video search
Vector example
store (Home depot)
[aisle, bin] - 2 dimensional vectors to locate
Embeddings
Definition:
Numeric, multi-dimensional representation of a piece of information
Key Points:
- Capture semantic qualities of data
- Semantically similar data ends up close together in vector space
Example (Vector Space Representation):
- dog → [0.243, 0.765, …]
- cat → [0.293, 0.774, …]
- apple → [0.443, 0.965, …]
- orange → [0.493, 0.9774, …]
How to embed data
Flow:
- Data → Raw input data (text, image, audio, etc.)
- Embedding model → Processed by an embedding model
- Vector → Converted into a vector representation, e.g.
[0.3, 0.1, 0.2, ..., 0.4]
Adding embeddings to existing data
Before:
{
"_id": "0028608488",
"title": "David Copperfield's Tales of the Impossible",
"cover": "https://images.isbndb.com/covers/22/86/9780061052286.jpg",
"year": 1995,
"pages": 385,
"synopsis": "David Copperfield, Arguably The Greatest Illusionist–magician..."
}
After (with embeddings):
{
"_id": "0028608488",
"title": "David Copperfield's Tales of the Impossible",
"cover": "https://images.isbndb.com/covers/22/86/9780061052286.jpg",
"year": 1995,
"pages": 385,
"synopsis": "David Copperfield, Arguably The Greatest Illusionist–magician...",
"embedding": [
0.03898080065846443,
-0.05879044095304909,
0.04323239979442215,
...,
0.034243063451233547
]
}
Recap
- Embeddings are an array of numbers that capture semantic qualities of data
- Embeddings are generated by specialized ML models
- Embeddings can be added in-place into existing MongoDB documents
Vector search
Definition:
Search based on intent/meaning using embeddings
Process:
- User submits a query
- Query is processed by an Embedding Model
- The model converts the query into a Query vector
- Perform similarity search (e.g., k=3 nearest neighbors)
- Return the most relevant results in vector space
How vector search works in MongoDB
Hierarchical Navigable Small Worlds (HNSW)
- Creates layered, connected graphs with vectors as nodes, edges created based on distance in vector space
- Coarse search at top layers, refinement at lower layers
- Efficient way of searching through large datasets
Source: Towards Data Science
Calculating distance in vector space
Euclidean Distance
- Measures absolute distance between vectors
Example:
- dog → [0.243, 0.765, …]
- cat → [0.243, 0.774, …]
- Euclidean Distance = √((0.243 - 0.243)² + (0.765 - 0.774)² + …)
Dot product
- Vector multiplication as a measure of alignment
Example:
- Vectors: [4, 0, 1] and [3, 1, 2]
- Calculation:
(4 × 3) + (0 × 1) + (1 × 2)
= 12 + 0 + 2
= 14
Cosine similarity
- Measures the angle between vectors
Example:
- dog → [0.243, 0.765, …]
- cat → [0.293, 0.774, …]
Formula:
Cosine Similarity = (A · B) / (||A|| × ||B||)
Where:
A · B= dot product of vectors A and B||A||,||B||= magnitude (length) of each vector
Recap
- Vector search retrieves documents closest to the query embedding in vector space
- Use the same embedding model to embed the data you want to search on, and the user queries
- Distance in vector space is calculated using mathematical functions
- Cosine similarity works well with most embedding models
Vector Search in MongoDB
Create a vector search index
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 512,
"similarity": "cosine"
}
]
}
Send a vector search query
pipeline = [
{
"$vectorSearch": {
"index": "vector_index",
"path": "embedding",
"queryVector": [0.02421053, -0.022372592, ...],
"numCandidates": 150,
"limit": 10
}
},
{
"$project": {
"_id": 0,
"title": 1,
"score": { "$meta": "vectorSearchScore" }
}
}
]
MongoDB Atlas Vector Search
Integrated platform that simplifies your application architecture
- Data is automatically synchronized between the database and vector index
- Developers work with database and vector search via the unified MongoDB Query API
- Fully managed for you so you can focus on your application
- Search nodes scale your search workloads independent of the operational database
Benefits
- Vector search simplified
- Avoid the tax synchronization
- Remove operational heavy lifting
Recap
- To perform vector search in MongoDB, you need to generate embeddings, create a vector search index and send a query
- The number of dimensions in the vector search index depends on the embedding model used
- Vector search should always be the first stage in a vector search aggregation pipeline
Optimizing Vector Search
pre-filtering (save time)
apply first filter condition to narrow down
Adding pre-filters to vector search
{
"fields": [
{
"type": "vector",
"path": "embedding",
"numDimensions": 1536,
"similarity": "cosine"
},
{
"type": "filter",
"path": "pages"
}
]
}