This package has been deprecated

Author message:

Package no longer supported. Use @stdlib/datasets-spam-assassin instead or check out https://github.com/stdlib-js/datasets-spam-assassin for pre-built bundles.

@stdlib/dist-datasets-spam-assassin

0.0.96 • Public • Published

Spam Assassin

The Spam Assassin public mail corpus.

Usage

var corpus = require( '@stdlib/dist-datasets-spam-assassin' ).SPAM_ASSASSIN;

corpus()

Returns the Spam Assassin public mail corpus.

var data = corpus();
// returns [{...},{...},...]

Each array element has the following fields:

  • id: message id (relative to message group)
  • group: message group
  • checksum: object containing checksum info
  • text: message text (including headers)

The message group may be one of the following:

  • easy-ham-1: easier to detect non-spam e-mails (2500 messages)
  • easy-ham-2: easier to detect non-spam e-mails collected at a later date (1400 messages)
  • hard-ham-1: harder to detect non-spam e-mails (250 messages)
  • spam-1: spam e-mails (500 messages)
  • spam-2: spam e-mails collected at a later date (1396 messages)

The checksum object contains the following fields:

  • type: checksum type (e.g., MD5)
  • value: checksum value

Notes

  • This package contains distributable files for use in browser environments or as shared ("vendored") libraries in server environments. Each distributable file is a standalone UMD bundle which, if no recognized module system is present, will expose bundle contents to the global scope.

  • Each minified bundle has a corresponding gzip-compressed bundle. The gzip compression level for each compressed bundle is 9, which is the highest (and most optimal) compression level. Deciding between uncompressed and compressed bundles depends on the application and whether compression is handled elsewhere in the application stack (e.g., nginx, CDN, et cetera).

  • While you are strongly encouraged to vendor bundles and host with a CDN/provider which can provide availability guarantees, especially for production applications, bundles are available via unpkg for quick demos, proof-of-concepts, and instructional material. For example,

    <script type="text/javascript" src="https://unpkg.com/@stdlib/dist-datasets-spam-assassin"></script>

    Please be mindful that unpkg is a free, best-effort service relying on donated infrastructure which does not provide any availability guarantees. Under no circumstances should you abuse or misuse the service. You have been warned.

  • If you intend on embedding a standalone bundle within another bundle, you may need to rename require calls within the standalone bundle before bundling in order to maintain scoped module resolution. For example, if you plan on using browserify to generate a bundle containing embedded bundles, browserify plugins exist to "de-require" those bundles prior to bundling.

  • The bundles in this package expose the following stdlib packages:


Examples

var corpus = require( '@stdlib/dist-datasets-spam-assassin' ).SPAM_ASSASSIN;

var data;
var i;

data = corpus();
for ( i = 0; i < data.length; i++ ) {
    console.log( 'Character Count: %d', data[ i ].text.length );
}

To include the bundle in a webpage,

<script type="text/javascript" src="/path/to/@stdlib/dist-datasets-spam-assassin/build/bundle.min.js"></script>

If no recognized module system is present, access bundle contents via the global scope.

<script type="text/javascript">
    // If no recognized module system present, exposed to global scope:
    var dataset = stdlib_datasets_spam_assassin.SPAM_ASSASSIN;
    console.log( dataset() );
</script>

Package Sidebar

Install

npm i @stdlib/dist-datasets-spam-assassin

Weekly Downloads

1

Version

0.0.96

License

Apache-2.0

Unpacked Size

44.6 MB

Total Files

5

Last publish

Collaborators

  • kgryte
  • planeshifter
  • rreusser
  • stdlib-bot