@purple-labs/raggedy-ann

0.1.9 • Public • Published

raggedy-ann

How it works

raggedy-ann simplifies the process handling datasets by loading, chunking, emebedding and storing them in vector databases making it easier to retrieve information, manage and process data for RAG applications.

You can use the embed function to start processing datasets and the query function to find the most relevant information from your dataset.

The embed function handles everything from loading to chunking, embedding and storing the dataset but you can choose to use specific functionalities alone. E.g loading and chunking without embedding.

Limitations

  • raggedy-ann currently only supports web url data types.
  • You can only add one dataset at a time.
  • You can't customize the model that handles your embeddings. Default right now is sentence-transformers.
  • Default vectordb is chromadb
  • Doesn't support web pages that would amount to over ten pages

You can follow the roadmap to see when new features and functionalities are added.

Features

  • Data Loading: Easily load your data from various sources into your application. Currently only supports data from webpages.
  • Chunking: Automatic context-aware data chunking into manageable sizes for processing and retrieval.
  • Query and Retrieval: Efficiently query and retrieve relevant data for your RAG applications.

Prerequisites

Before installing raggedy-ann, please ensure the following prerequisites are met:

  • Node.js: A current version of Node.js must be installed. You can download Node.js from nodejs.org.
  • Python: A Python 3.x installation is required. You can download Python from python.org. Please ensure Python is added to your system's PATH.
  • pip: Ensure pip is installed and updated. pip is included with Python 3.x installations. You can update pip using the following command:
python -m pip install --upgrade pip

Installation

Step 1: Install raggedy-ann

Install raggedy-ann using npm:

npm install raggedy-ann

Step 2: Set Up Python Environment

Create a virtual environment for your project (optional but recommended):

python -m venv myenv
source myenv/bin/activate  # On Windows use `myenv/Scripts/activate`

Step 3: Install python requirements

Run this in your terminal:

pip install -r "C:your-user-directory\repo-name\node_modules\@purple-labs\raggedy-ann\requirements.txt"

Step 4: Set up configurations

Run npx setup which would create a config file for your application.

Add environment variables

Copy the following to a new file named .env in your project root and adjust the variables as needed:

SCRAPING_BEE_API=
SUPABASE_BUCKET=
SUPABASE_URL=
SUPABASE_KEY=

Usage

You can use the main functions of raggedy-ann directly by importing them into your project. The embed function takes in a web url then loads, chunks, embeds and stores it:

const { embed } = require('@purple-labs/raggedy-ann');

embed('web url')
  .then(response => {
    console.log('Output:', response);
  })
  .catch(error => {
    console.error('Error:', error.message);
  });

// Other functions

The query function takes in your query text and returns the most relevant chunk of your dataset:

const { query } = require('@purple-labs/raggedy-ann');

query('query text')
  .then(response => {
    console.log('Output:', response);
  })
  .catch(error => {
    console.error('Error:', error.message);
  });

// Other functions

Alternatively, you can select and use specific functionalities, such as loaders or chunkers, according to your needs.

const { webLoader } = require('@purple-labs/raggedy-ann');
//OR
const { webChunker } = require('@purple-labs/raggedy-ann');

webLoader('web url') // e.g: https://example.com
// or webChunker
  .then(response => {
    console.log('Output:', response);
  })
  .catch(error => {
    console.error('Error:', error.message);
  });

The webLoader function simply loads the data from the url you provide and returns the web content as a string while the webChunker loads the data, chunks it and returns the chunks. You can then use your preferred models to embed and store in a vectordb

Contributing

Contributions are welcome! If you'd like to contribute, please fork the repository and create a pull request, or open an issue with the tag "enhancement".

License

raggedy-ann is released under the GPL -3.0 License. See the LICENSE file for more details.

Package Sidebar

Install

npm i @purple-labs/raggedy-ann

Weekly Downloads

1

Version

0.1.9

License

GPL -3.0

Unpacked Size

55.1 kB

Total Files

17

Last publish

Collaborators

  • tammilore