Process a dataset using a Groovy script. The script is not trusted and executed in its own safe sandbox.
A Dataset of molecules
A Dataset of molecules
|Script||Groovy script to execute that processes the dataset|
Being able to execute programming code greatly increase the flexibility and functionality of the Squonk Computational Notebook. However this is potentially un-safe as the user is in control of what code gets executed, and this could potentially contain malicious code which could compromise the system of any user.
To accommodate this we execute the code in a temporary Docker container, which provides a secure sandbox where the code can execute. The container is created purely for the purpose of executing the code and is destroyed after it completes. The code only has access to the data it needs for its execution, and cannot access other parts of the system. At worse malicious code could impact the temporary container it is running it.
This specific cell allows for execution of a Groovy script. Groovy is like an extension of Java, the main language used in the Squonk Computational Notebook. As such it is probably the best general choice, though we also plan to provide similar execution environment for other types of programming languages such as Python and R.
The Docker container that is used to execute the Groovy script is adapted from the webratio/groovy image from Docker hub. It allows additional libraries to be pulled in from Maven Central using Groovy’s “Grab” functions (see below for details). In addition the core Squonk libraries are also accessible from a local Maven repository. This allows a wide range of additional functionality to be pulled in from external sources.
- Simplest example that processes a dataset, allowing the molecule’s properties to be updated:
@GrabResolverline: makes the local Maven repository with the core Squonk libraries available. This is not required, but these libraries can be very useful, and are used here.
@Grabline: grabs the “common” Squonk library available.
importlines: imports the necessary classes, including a static import of the methods from the
MoleculeObjectUtilsclass which provides some of the helper methods that are used.
processDatasetmethod: specifies the locations that the input is found and the output will be written to. These are inbuilt conventions that must be followed
mo.putValue()line: this where the actual processing takes place. You would replace this line with something more useful.
- as Consumer: makes groovy cast the closure to a Consumer class which is what is needed by the
In most cases you only need to replace the
mo.putValue() lien with something that meets your neeeds.
- Slightly more complex example that allows any operation on the Stream of MoleculeObjects to be performed, allowing additional operations like filtering to be performed:
This is very similar to the previous example, but as you will see you get access to the Stream
The closure needs to be case to a Function.
- More verbose example providing further flexibility
Similar to above but in this case we are handling the reading and writing. Usually there is no need for this.
- As above, but using “lazy” Groovy-style typing rather that strong Java-style typing
As above but need for fewer imports.