Troy Cheng / 2025-09-12

Name

Humongous - Mongo

a type of no SQL database

Structure

Document

use JSON as document model - data is represented in JSON format (which is flexible)

Different from relational schema

Collection

a collection for documents

Database

a collection for collections

Node

a group of database

Cluster

a group of nodes

Data Modeling in MongoDB

Is your app read or write heavy?
What data is frequently accessed together?
What are your performance considerations?
How will your data set grow and scale?

Embedding vs referencing

	One side	Many side
Embedding	Embedding on the “one” side	Embedding on the “many” side
Referencing	Referencing on the “one” side	Referencing on the “many” side

MongoDB Atlas

Database-as-a-service (DBaaS)
Automatically manages MongoDB for you, including:
- Deployment
- Monitoring
- Backup & restoration
- Archiving
Security
Built-in Replication

CRUD

MongoDB Query API Syntax Structure

db (keyword) → collection (coll name) → operator (command) → query/filter (criteria) → options (settings)

Example MongoDB find command

db (keyword) → collection (call name) → operator (command) → query/filter (criteria) → options (settings)

db.collection.find(
  { age: { $gte: 25 } }, // query/filter: Find documents where age is greater than or equal to 25
  {
    projection: { name: 1, age: 1 }, // options/settings: Include only the name and age fields in the results
    sort: { age: -1 }, // options/settings: Sort the results by age in descending order
    limit: 10 // options/settings: Limit the number of results to 10
  }
);

Create

db.authors.insertOne({
  "name": "Jane Austen",
  "books": [ "0141439688", "0375757813", "1551114798" ],
  "aliases": [ "Austen, Jane", "Jane Austen" ]
})

Read

db.authors.find({
  "name": "Jane Austen"
})

Update

db.authors.updateOne(
  { name: "Jane Austin" },
  { $set: { name: "Jane Austen" } }
)

Delete

db.authors.deleteOne(
  { name: "Jane Austin" }
)

Hands-on

MongoDB Songs Playlist

Title	Artist	Genre	Duration	Plays
Sucks To Be Ex-QL	$Avg $Max	Pop/Dance	3:28	4
Changes Stream	2.Pack	Hip Hop	3:33	8
Primary’s Gonna Be Me	MSYNC	Pop	2:40	4
I Haven’t Met AI Yet	MiCachel Bubléson	Jazz	3:36	6
Where Is The Log?	Backed By DBs	Pop	3:10	6
All About that Database	Schema Trainor	Pop	2:34	20
What About $toDate	Doc-Tree	Rock	3:48	18
NoSQL Paradise	CLI-io	Hip Hop	2:57	26
NoS8L Boy	Docuesenece	Rock/Punk	3:27	26
Back or Front	Michael JSON	Pop	3:11	26
Free-Tier Styler	D-BomFunktion MCs	Hip Hop/Electro	2:55	20
Thnks Fr th MdbMemrriz	Doc Out Boy	Rock/Punk	3:15	18
Mocking Data Board	SlimShardy	Hip Hop	3:31	20
My _id, My Ride	Spring Boots	Country	3:48	24
S One	NULLy	Hip Hop	3:12	16
$moreLikeThis	Dolla $earch	Hip Hop/Trap	2:53	32
Let It Beat	The Bugs	70s/Pop	3:56	32
Cluster Busy	connecTsean Pool	Dancehall	2:55	22
Index of Change (ESR)	Crystal Clusters	Techno	3:20	22
Code Vibin'	Masked WiredTiger	Hip Hop/Trap	3:01	52
URI Hero	IndexBack	Rock	3:40	20
Tiering Up My App	MSYNC	Pop	3:23	34
3 Little Nodes	DB Marley	Reggae	3:45	50
Am I in the Void	Relational Migrators	Industrial Modern/Rock	3:45	24
I Believe I Can Shard	R Klusterly	R&B/Soul	3:38	38
Harder, Better, Faster, Secured	US_West_2	Hip Hop/Trap	3:12	34
No SCRAM	TLS	R&B	3:07	52
Colores en la Cloud	J. Cloudin	Reggaeton	3:09	40
No Sequel Needed	SaaS Girls	Pop	3:11	40
Clave De Shard	Sharde Dezona	Reggaeton	2:06	48

The badges on MLH are associated to the MongoDB official credentials - finish on

VectorSearch: Beginner to Pro

What is vector search?

Do you ever find yourself…

Looking for something but you don’t quite have the words?
Remembering some characteristics of a book but not the title?
Trying to get another sweatshirt just like the one you had back in the day, but you don’t know how to search for it?

Now the lexicon search (keyword searching) is not working anymore.

Lexical search vs Vector search

Lexical search

What?

Keyword search

When?

Your text corpus closely matches how users search
First pass at text-based relevancy

Vector search

What?

Semantic similarities

When?

“Vocabulary gap” between corpus and how users search
Text, image, audio, video search

Vector example

store (Home depot)

[aisle, bin] - 2 dimensional vectors to locate

Embeddings

Definition:
Numeric, multi-dimensional representation of a piece of information

Key Points:

Capture semantic qualities of data
Semantically similar data ends up close together in vector space

Example (Vector Space Representation):

dog → [0.243, 0.765, …]
cat → [0.293, 0.774, …]
apple → [0.443, 0.965, …]
orange → [0.493, 0.9774, …]

How to embed data

Flow:

Data → Raw input data (text, image, audio, etc.)
Embedding model → Processed by an embedding model
Vector → Converted into a vector representation, e.g. [0.3, 0.1, 0.2, ..., 0.4]

Adding embeddings to existing data

Before:

{
  "_id": "0028608488",
  "title": "David Copperfield's Tales of the Impossible",
  "cover": "https://images.isbndb.com/covers/22/86/9780061052286.jpg",
  "year": 1995,
  "pages": 385,
  "synopsis": "David Copperfield, Arguably The Greatest Illusionist–magician..."
}

After (with embeddings):

{
  "_id": "0028608488",
  "title": "David Copperfield's Tales of the Impossible",
  "cover": "https://images.isbndb.com/covers/22/86/9780061052286.jpg",
  "year": 1995,
  "pages": 385,
  "synopsis": "David Copperfield, Arguably The Greatest Illusionist–magician...",
  "embedding": [
    0.03898080065846443,
    -0.05879044095304909,
    0.04323239979442215,
    ...,
    0.034243063451233547
  ]
}

Recap

Embeddings are an array of numbers that capture semantic qualities of data
Embeddings are generated by specialized ML models
Embeddings can be added in-place into existing MongoDB documents

Vector search

Definition:
Search based on intent/meaning using embeddings

Process:

User submits a query
Query is processed by an Embedding Model
The model converts the query into a Query vector
Perform similarity search (e.g., k=3 nearest neighbors)
Return the most relevant results in vector space

How vector search works in MongoDB

Hierarchical Navigable Small Worlds (HNSW)

Creates layered, connected graphs with vectors as nodes, edges created based on distance in vector space
Coarse search at top layers, refinement at lower layers
Efficient way of searching through large datasets

Source: Towards Data Science

Calculating distance in vector space

Euclidean Distance

Measures absolute distance between vectors

Example:

dog → [0.243, 0.765, …]
cat → [0.243, 0.774, …]
Euclidean Distance = √((0.243 - 0.243)² + (0.765 - 0.774)² + …)

Dot product

Vector multiplication as a measure of alignment

Example:

Vectors: [4, 0, 1] and [3, 1, 2]
Calculation:
(4 × 3) + (0 × 1) + (1 × 2)
= 12 + 0 + 2
= 14

Cosine similarity

Measures the angle between vectors

Example:

dog → [0.243, 0.765, …]
cat → [0.293, 0.774, …]

Formula:
Cosine Similarity = (A · B) / (||A|| × ||B||)

Where:

A · B = dot product of vectors A and B
||A||, ||B|| = magnitude (length) of each vector

Recap

Vector search retrieves documents closest to the query embedding in vector space
Use the same embedding model to embed the data you want to search on, and the user queries
Distance in vector space is calculated using mathematical functions
Cosine similarity works well with most embedding models

Vector Search in MongoDB

Create a vector search index

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 512,
      "similarity": "cosine"
    }
  ]
}

Send a vector search query

pipeline = [
  {
    "$vectorSearch": {
      "index": "vector_index",
      "path": "embedding",
      "queryVector": [0.02421053, -0.022372592, ...],
      "numCandidates": 150,
      "limit": 10
    }
  },
  {
    "$project": {
      "_id": 0,
      "title": 1,
      "score": { "$meta": "vectorSearchScore" }
    }
  }
]

MongoDB Atlas Vector Search

Integrated platform that simplifies your application architecture

Data is automatically synchronized between the database and vector index
Developers work with database and vector search via the unified MongoDB Query API
Fully managed for you so you can focus on your application
Search nodes scale your search workloads independent of the operational database

Benefits

Vector search simplified
Avoid the tax synchronization
Remove operational heavy lifting

Recap

To perform vector search in MongoDB, you need to generate embeddings, create a vector search index and send a query
The number of dimensions in the vector search index depends on the embedding model used
Vector search should always be the first stage in a vector search aggregation pipeline

Optimizing Vector Search

pre-filtering (save time)

apply first filter condition to narrow down

Adding pre-filters to vector search

{
  "fields": [
    {
      "type": "vector",
      "path": "embedding",
      "numDimensions": 1536,
      "similarity": "cosine"
    },
    {
      "type": "filter",
      "path": "pages"
    }
  ]
}

#note

MongoDB 101

Name

Structure

Document

Collection

Database

Node

Cluster

Data Modeling in MongoDB

MongoDB Atlas

CRUD

MongoDB Query API Syntax Structure

Example MongoDB find command

Create

Read

Update

Delete

Hands-on

MongoDB Songs Playlist

VectorSearch: Beginner to Pro

What is vector search?

Lexical search vs Vector search

Lexical search

Vector search

Vector example

Embeddings

Recap

How vector search works in MongoDB

Hierarchical Navigable Small Worlds (HNSW)

Calculating distance in vector space

Euclidean Distance

Dot product

Cosine similarity

Recap

Vector Search in MongoDB

Create a vector search index

Send a vector search query

MongoDB Atlas Vector Search

Integrated platform that simplifies your application architecture

Benefits

Recap

Optimizing Vector Search

pre-filtering (save time)

Adding pre-filters to vector search