raggedy-ann
How it works
raggedy-ann
simplifies the process handling datasets by loading, chunking, emebedding and storing them in vector databases making it easier to retrieve information, manage and process data for RAG applications.
You can use the embed
function to start processing datasets and the query
function to find the most relevant information from your dataset.
The embed
function handles everything from loading to chunking, embedding and storing the dataset but you can choose to use specific functionalities alone. E.g loading and chunking without embedding.
Limitations
-
raggedy-ann
currently only supports web url data types. - You can only add one dataset at a time.
- You can't customize the model that handles your embeddings. Default right now is
sentence-transformers
. - Default vectordb is
chromadb
- Doesn't support web pages that would amount to over ten pages
You can follow the roadmap to see when new features and functionalities are added.
Features
- Data Loading: Easily load your data from various sources into your application. Currently only supports data from webpages.
- Chunking: Automatic context-aware data chunking into manageable sizes for processing and retrieval.
- Query and Retrieval: Efficiently query and retrieve relevant data for your RAG applications.
Prerequisites
Before installing raggedy-ann
, please ensure the following prerequisites are met:
- Node.js: A current version of Node.js must be installed. You can download Node.js from nodejs.org.
- Python: A Python 3.x installation is required. You can download Python from python.org. Please ensure Python is added to your system's PATH.
- pip: Ensure pip is installed and updated. pip is included with Python 3.x installations. You can update pip using the following command:
python -m pip install --upgrade pip
Installation
Step 1: Install raggedy-ann
Install raggedy-ann
using npm:
npm install raggedy-ann
Step 2: Set Up Python Environment
Create a virtual environment for your project (optional but recommended):
python -m venv myenv
source myenv/bin/activate # On Windows use `myenv/Scripts/activate`
Step 3: Install python requirements
Run this in your terminal:
pip install -r "C:your-user-directory\repo-name\node_modules\@purple-labs\raggedy-ann\requirements.txt"
Step 4: Set up configurations
Run npx setup
which would create a config file for your application.
Add environment variables
Copy the following to a new file named .env
in your project root and adjust the variables as needed:
SCRAPING_BEE_API=
SUPABASE_BUCKET=
SUPABASE_URL=
SUPABASE_KEY=
Usage
You can use the main functions of raggedy-ann directly by importing them into your project. The embed
function takes in a web url then loads, chunks, embeds and stores it:
const { embed } = require('@purple-labs/raggedy-ann');
embed('web url')
.then(response => {
console.log('Output:', response);
})
.catch(error => {
console.error('Error:', error.message);
});
// Other functions
The query
function takes in your query text and returns the most relevant chunk of your dataset:
const { query } = require('@purple-labs/raggedy-ann');
query('query text')
.then(response => {
console.log('Output:', response);
})
.catch(error => {
console.error('Error:', error.message);
});
// Other functions
Alternatively, you can select and use specific functionalities, such as loaders or chunkers, according to your needs.
const { webLoader } = require('@purple-labs/raggedy-ann');
//OR
const { webChunker } = require('@purple-labs/raggedy-ann');
webLoader('web url') // e.g: https://example.com
// or webChunker
.then(response => {
console.log('Output:', response);
})
.catch(error => {
console.error('Error:', error.message);
});
The webLoader
function simply loads the data from the url you provide and returns the web content as a string while the webChunker
loads the data, chunks it and returns the chunks. You can then use your preferred models to embed and store in a vectordb
Contributing
Contributions are welcome! If you'd like to contribute, please fork the repository and create a pull request, or open an issue with the tag "enhancement".
License
raggedy-ann is released under the GPL -3.0 License. See the LICENSE file for more details.