I've invented something dubbed
module harvesting. It's a method of automated Node.js module creation. What does that mean?
Do you ever find yourself creating a project and needing to use functions you've already built in other projects? If the functions are a couple of lines long you can find yourself copying and pasting them in your project every time you use them. Why didn't you put it on NPM the first time? We're told that monolithic apps are bad, but they need to take shape that way, then down the road your supposed to refactor. I absolutely hate this paradigm. The reason you didn't put it that three line function on
npm the first time is because there's way to much overhead in creating the repo, publishing, documentation, tests, and setting dependencies. Module Harvesting is here to help.
Here's a great quote from
If some component is reusable enough to be a module then the maintenance gains are really worth the overhead of making a new project with separate tests and docs. Splitting out a reusable component might take 5 or 10 minutes to set up all the package overhead but it's much easier to test and document a piece that is completely separate from a larger project. When something is easy, you do it more often.
The question I wanted to address was "What if there was no overhead?".
A file is provided to the terminal command for instance, the following code is what I used to harvest
This code does a myriad of things. Here's the rundown of the most important.
local_modulesdirectory in the root folder.
.gitignore) that might exist and hard links them (
ln) to the new
initcode to the new repo
At the end of which I have a folder that looks like this
thomas@workstation:local_modules$ tree -I node_modules.└── module-harvest├── arr-extract.js├── assimilate.js├── bin│ └── module-harvest.js├── faux-project.js├── github-create-repo.js├── module-harvest.js├── package-deps.js├── package.json├── promise-props-series.js├── readme.md├── recursive-deps.js├── test│ ├── arr-extract.js│ ├── data-project-definitions.js│ ├── package-deps.js│ ├── promise-props-series.js│ └── recursive-deps.js└── test-chdir-temp.js
You can see that it it takes in the file
module-harvest.js and from there extracts all of it's local deps shown here:
# root filesbin/module-harvest.jsmodule-harvest.js# depsarr-extract.jsassimilate.jsfaux-project.jsgithub-create-repo.jspackage-deps.jspromise-props-series.jsrecursive-deps.jstest-chdir-temp.js
module-harvest takes all of these files and checks for test files. And finds these:
module-harvest's settings are completely customizable, however there is a set of
defaults that, by default you append to, an option to overwrite all the defaults is available as well. Because everyone has such a different way of crafting their node modules I wanted to cake in as much support for versatility as possible (also I change my mind a lot).
Because the options are so complex a
harvest.config.js file can be used at the root of the project to provied
module-harvest with arguments.
All of which have a
1:1 relationship with the module function.
var moduleHarvest = require('./module-harvest')moduleHarvest(moduleFile, moduleName, moduleDesc, moduleVersion, packageSrc, localModulesDirName, directory, buildLinks, trackDeps, trackDevDeps, postBuildReverseLinks, githubAccessToken, githubRepoPrefix, preventMerge)
Here's an example what this might look like in
harvest.config.js, in fact this is the default. Unless the
preventMerge argument is true, these options are used. If you add your own optios you append to these. There's no harm in having links to possible files that don't exist.
Note here that
trackDevDeps is a function not an array. This is because as mentioned earlier a module can have many local file references. The
trackDevDeps function is run recursively for every local file found in the tree.
Note: You can also set variables from a static harvest.config.json file.
If your smart you might have realized that theres a nefarious
githubAccessToken argument. This can't go in a publicly used place so you can use a
harvest.secret.json file that looks like this.
You'll wanna add
harvest.secret.json to your
Note: You can also set variables from a static harvest.secret.json file.
For module specific settings, specifically
package.json contents you're not gonna be able to pass them in via command line every time. So I caked in
JSDoc support. Let's start with an example.
This comment block is in the beginning of
./module-harvest.js it sets some stuff that will be used by
module-harvest. You can see that there are
@package declarations this isn't valid to
JSDoc this is specific to
module-harvest, upon running this, the
@package. calls will be formed into an object and placed into
package.json so if you want to specify a unique
version the package will be created with it here.
@packge.privateis set this will prevent the github repo creation & init git commit + push & npm publish. It will still allow the module to be created.
Maintaining file state is high priority after the initial build of a module. It's apparent that the hard links alone will not cut it there's gonna need to be a watch script that builds the module(s) whenever a dependency is saved. The problem with that is, without having a map of which local files being used in which project I'd have to build every module, every time something is saved. The map would help a ton, even when a module doesn't yet
require() it. Because a file that is already connected to a module would need to be updated to include it, at which time
module-harvest would run and the new file would be added to the map watch list. An expected
-w flag would be appropriate.
gitslave is a lovely piece of software that makes it possible to use one supermodule like
reggi/node-reggi and distribute of the packages modules from there. Lets say you change a dependency
./arr-camelize.js and three of your
local_modules are using it, because it's linked (and ideally theres a watch script invloved) when it's time to commit the code across all of the four repos (including the supermodule) you can commit the change to all the repos using the following command, note that
gits is used here instead of
gits add -Agits commit -m "updated arr-camelize"gits push
Something that I find the need for more and more are hooks, ideally in the style of
npm. Something like
postbuild. Right now this can all be done with a more complex command.
# prebuild# adds a TOC to a readme filejsdoc2md <docs>doctoc ./docs --github
I'd love the ability to write code using
es6 and have my modules get rendered out in
es5. I haven't started writing my modules in
es6 just yet, so I'm not exactly keen on it's challenges. But this feature is something to keep in mind.
A lot of people organize their files by setting a
lib folder and all the single files live within there instead of the root directory.
module-harvest fully supports a main file within a lib dir
module-harvest ./lib/alpha.js would create a module
alpha with the file in the folder
lib/alpha.js. Part of why I created
module-harvest is because I value consistency across how modules are created, it makes code easier to write.
I have two main problems using
lib. The first is that in most cases it's creating a main file is referenced straight to a file in a dir, there's no root module
alpha.js file. The second is that when if you want to call a from file within a module you have to call it
requrie('reggi/lib/arr-camelize') instead of the much cleaner
module-harvest by nature solves both of these problems (kinda).
Another pro to not using
lib and having all your files in
root is that it allows them to be viewed easier on github.
The idea I had was what if you could customize the root, so
lib could act as root for the newly created module, however this would add inconsistency between how modules operate, plus you would not be able to
require anything one directory up from
lib. To illustrate this:
# note the `lib` dir wasn't imported/lib/hello.js -> /local_modules/hello/hello.js
This feature might be helpful if one wanted to organize their supermodule in a different way then the submodule. However I believe it adds in too many inconsistencies.
Currently if two modules share the same source they both link it into their projects. Which means that the source is duplicated, and not truly modularized. That shared source file get's created into a module too and there's no reason why the deps that use it can't legitimately install it as a module after it's been published.
In order to change this I'd need to stop using hard links and start copying files directly into the modules. I'd have to track down all the references to a local module call and change them to module calls.
For instance if the reference
./arr-extend.js was used and I'd need to map the location in the file string and convert it to
arr-extend (if placed on npm or github). This of course would alter the source code.
# creates github repo `reggi/node-module-bin`thomas@workstation:node-reggi$ ./bin/module-bin.js ./github-create-repo.js <github-access-token> node-module-bin --type=promise.. response from github# harvests `module-bin`thomas@workstation:node-reggi$ ./bin/module-harvest.js ./module-bin.js# init's the new repothomas@workstation:node-reggi$ git -C ./local_modules/module-bin/ initInitialized empty Git repository in /Users/thomas/Desktop/labratory/node-reggi/local_modules/module-bin/.git/# add all filesthomas@workstation:node-reggi$ git -C ./local_modules/module-bin/ add -A# commit initialthomas@workstation:node-reggi$ git -C ./local_modules/module-bin/ commit -m 'init'[master (root-commit) 130b8aa] init4 files changed, 174 insertions(+)create mode 100755 bin/module-bin.jscreate mode 100644 module-bin.jscreate mode 100644 package.jsoncreate mode 100644 test/module-bin.js# add the github origin repothomas@workstation:node-reggi$ git -C ./local_modules/module-bin/ remote add origin# push the modulethomas@workstation:node-reggi$ git -C ./local_modules/module-bin/ push origin master
Then it seems the only way to continue is to remove the new module and clone it again with
$$thomas@workstation:node-reggi$ rm -rf ./local_modules/module-binthomas@workstation:node-reggi$ gits attach local_modules/module-binCloning into 'local_modules/module-bin'...
At the end of all this, you have a
slave repo to the
superproject and evey thing is in-sync.
I've created this super-project
Consequently, one of these files is called
./module-harvest.js, it's executable counterpart
package.json file. When
module-harvest consumes itself in the case of running the following command.
./bin/module-harvest.js ./module-harvest.js --desc=':corn: Harvests package dependencies and builds module from file.'`
This command creates the module
If you have a
harvest.json file present in the working directory, as I do, and it contains a github access token / prefix, the command will also create the repo and commit it to github. (these options can also be added to the command)
Here it is:
Now if you run the following its up on npm.
cd ./local_modules/modulenpm publish
And it seems that if I want it to be apparent of
gitslave for the project that I need to do this.
rm -rf ./local_modules/module-harvest.jsgits attach local_modules/module-harvest.js