randomness-score-generator

1.0.2 • Public • Published

String randomness score generator

A lightweight, 0 dependency package to generate a randomness score for a string. Used to identify if a string is gibberish or word-like. Some applications include -

  • Identify if a user is typing something or just banging the keyboard
  • Determine if a string is an API Key, Access Token, etc
  • Check if a string is something randomly generated by a computer

Usage

The tool returns back a randomness score for a string. You can tune the conditions according to your use case, but, generally, a score above 4 signifies that the input string is random.

NPM package

  • Install the npm package
  • Import and use it in your code like
const Model = require('./Model');

// Remember to load the model before using it
Model.loadModel();

const score = Model.score("helloWorld");

How does it work?

Training

  • At its core the model uses a bigram model to calculate the probability of the next character, given a character (Using a n-gram model would give better results, but its WIP).
  • We parse through a comprehensive list of words in the English language to create a 2D table which stores the occurrence of each character following the current character.
  • While generating this table, we also add a special <.> character at the start and end of each word to get the count of words starting & ending with a character. This table is then row-normalized to make the data uniform. This gives us the probability of a character following the current character. These probabilities are used in score calculation.

Score Calculation

  • We first parse the word to convert it to lowercase and remove any extra characters.
  • Then, since we have a bigram model, we break down the word into pairs of 2. ( including the special start and end <.> character )
  • Next, we get the log of the probability of this pair (As these probabilities are minute, their log is a better uniform measure)
  • We add these log values for all the pairs in the word.
  • As this sum is a negative number, we invert it to get a positive value.
  • We divide this score by the number of characters in the word to get the final score.

Contribute

  • Create a fork and clone it.
  • To contribute to the model generation part, navigate to the modelGenerator/ folder . This contains a python notebook used for generating the model. Feel free to suggest improvements to the model
  • To contribute to the npm package, go into the modelGenerator/ directory which contains the source code for the npm package, as well as the latest model being used for calculation

Bug Reporting

Report your issues at https://github.com/Pranav2612000/string_randomness_score_generator/issues

Gotchas & Improvements

  • The model is trained on English words and may not work for other languages.
  • To reduce training complexity the model is case-insensitive.
  • The current model is not very accurate for very short strings.
  • The dataset the model is built on does not have first class support for numbers and some special characters, so strings involving these can be inaccurate.
  • The dataset does not include keyboard-common strings like "qwerty", so the results may not be correct for strings of these category.
  • The current model is a bigram. We can use Deep Learning to replace this with a n-gram model for better results.

Maintainer

  • Pranav Joglekar

License

This project is licensed under the terms of the MIT open source license. Please refer to LICENSE.md for the full terms.

References

Package Sidebar

Install

npm i randomness-score-generator

Weekly Downloads

3

Version

1.0.2

License

ISC

Unpacked Size

49.9 kB

Total Files

6

Last publish

Collaborators

  • pranav2612000