Stored procedures in JavaScript? (My Drizzle repository can do it)

Just wanted to record for the history books that:

drizzle> select js_eval('var d = new Date(); "Drizzle started running JavaScript at: " + d;')\g
+----------------------------------------------------------------------------------+
| js_eval('var d = new Date(); "Drizzle started running JavaScript at: " + d;') |
+----------------------------------------------------------------------------------+
| Drizzle started running JavaScript at: Mon Aug 29 2011 00:23:31 GMT+0300 (EEST) |
+----------------------------------------------------------------------------------+
1 row in set (0.001792 sec)

I will push this onto launchpad tomorrow, after a good nights sleep and final code cleanups. (Update: It's now available here: https://code.launchpad.net/~hingo/drizzle/drizzle-js_eval See comments for discussion on bugs. Update 2: I changed the name of the function to just JS() and also the plugin name is just "js". So all examples in below comments now work with js(...) and there is no function called js_eval(...).) This is actually follow up work to Stewart's earlier work on a JSON over HTTP API. The fact that you can execute any JavaScript code is just a cool side effect. (Note that I haven't actually exported any Drizzle functionality, so it's just a pure EcmaScript environment for now.) How this is related to Stewart's work might be not at all intuitive at the moment, unless you're really into JSON stuff. Speaking of which, thanks to Roland Bouman for this insightful idea/solution. (No, I don't actually pass in any additional parameters yet...) Even if I did it purely to manipulate JSON objects, it seems like adding JavaScript support to an RDBMS is a trendy thing to do. I went to a Node.js tutorial at OSCON (with fellow Drizzler Mark Atwood) and I started believing they're onto something with server side JavaScript. (I unsuccessfully tried to dub it as "Erlang for dummies" at work :-) I'd also like to point out that even if PostgreSQL is famous for allowing you to write stored procedures in pretty much any language, it doesn't seem to support JavaScript yet - making Drizzle the first RDBMS to support embedded JavaScript! (I'm so trendy...)

Usual Suspect. (not verified)

Mon, 2011-08-29 00:17

Actually, already done it several years ago (2007) with the Rhino Javascript interpreter compiled and linked into the same process as mysqld. The code may still be on the bk-internal.mysql.com hard disks,... if the machine still exists.

So it's the usual story with MySQL being able to absorb new features... Well, this time I aim to get it merged into trunk (Drizzle) and am reasonably optimistic it will happen. Once I add things like tests and comments of course :-)

Hi Henrik,

nice work! I got a very crude version like this working in a MySQL UDF last week too, but a few IMO essential features that I want to implement keep crashing so I was wondering how you fared in that respect.

Right now, my UDF takes the first (string) argument, and in the row-level function, it compiles it (each call again), executes the script, and returns the value.
I create the v8 context just once in the init function, so I am not getting overhead from that.

There are two things that I'd like to do that I just can't get to work:

1) if the script argument is constant, it compile should occur just once in the init function, and a handle to the script should be stored so it can be called repeatedly.
2) All arguments beyond the first one should be made available to the script, either as variables or as an array (which seems the better way to go, since you wouldn't need to invent variable names, and since the Ecmascript standard already defines a built-in "arguments" array for functions, so this would be the same thing but for the script snippet)

This is an interesting variant of pair-programming then: Choose a friend and implement the same feature as MySQL plugin and Drizzle plugin :-)

The code is now up at https://code.launchpad.net/~hingo/drizzle/drizzle-js_eval for you to enjoy

I spent a few hours in GDB myself, for instance you really need to catch those v8 exceptions, otherwise drizzled of course will crash :-). I took the code from samples/shell.cc and it worked just fine. For instance, I used "return 'hello'" first, but you can't use return since it's not inside a function, it's just a script. This resulted in an exception.

My approach to programming is a piece-by-piece bootstrapping: All code is within the Drizzle function method, optimization will happen later. Also I don't actually use the parameters yet, it's just the execution of javascript that works.

As I understand v8 contexts, you shouldn't do it in the init function of the plugin. The context shouldn't be global to drizzled/mysqld instance. It can probably be something like per-session.

Caching compiled snippets of javascript code is an obvious optimization, but did you check whether v8 already does that? It would be a nice service for it to do it for us...

Henrik,

"As I understand v8 contexts, you shouldn't do it in the init function of the plugin. "

I was talking about the init function of the udf - this is called once for each call to the function at the SQL level. I am pretty sure this is appropriate.

Anyway, i just took a look at your code and it's pretty much what I got. Have you tested what happens if multiple sessions are concurrently calling js_eval? v8 documentation is sparse but I got the impression you have to explicitly "isolate" handles in a multi-threaded environment.

Also, you might want to take a look again at the cases where you prematurely exit the function due to an error. I think you forget to clean up the context in those cases, and since it is a persistent one it will remain allocated.

Otherwise, interesting first stab. Would be interested to know when you get those arguments to work, I'm having no luck with it.

Ok, then it's not the same as Drizzle init. Or it might be, it's not like there is a lot of documentation or code comments :-) But no, in Drizzle the init function is called on startup, I'm sure.

Thanks for the hints, I'll take a look at it. I think the error conditions should be fine, all errors are properly caught and function runs to the end and the context and handle_scope are cleaned.

I didn't try multiple concurrent sessions yet, that's a good idea. But as I've understood, these v8 resources are allocated onto the stack, so in two different sessions the context and all other handles should be completely separate. Remains to be seen...

Hi. I tested this now. When just running from a single thread, error conditions do not make v8 unusable. When I step through it with GDB, I can see how at the return, control jumps back to the line with TryCatch and then to HandleScope. I think it's the job of the HandleScope object to close and delete all other objects that were created in this scope. Hence, I don't need to actually call close on the context object, it is closed anyway when I return from this function. (Symbian had a similar system called CleanupStack - essentially it's garbage collection manually implemented.)

On the other hand, using js_eval from 2 different drizzle clients gives me a "Maximum call stack exceeded". This happens always, even for '"hello world"' or any other JS code that runs fine in a single thread.

I've been googling this exception and it seems v8 really has some issues in a multi-threaded environment :-(
http://stackoverflow.com/questions/2802573/google-javascript-v8-multith…
http://groups.google.com/group/v8-users/browse_thread/thread/22910dc14f…
Will continue reading...

There seems to be very little information and much misinformation about this topic. Turns out reading through v8.h has all the answers though - imagine that!

This is what I learned:

- In the basic case (such as all examples in documentation and samples/*.cc) a v8 engine can only be accessed from a single thread - ever. Accessing it from a second thread gives an exception. This is by design.

- In a multi-threaded environment you must use v8::Locker at the beginning of any v8 related code, in a similar fashion as one uses HandleScope. There's also v8::Unlocker and methods for pre-emption (ie scheduling between multiple long running JavaScript objects.

- It is possible to explicitly create more than one instance of the v8 engine. This is done with the Isolate class. So each thread could have it's own v8::Isolate (and then no locking is required).

Presumably running many Isolates will have a lot of overhead too, but given the complexities of locking and that Drizzle is so much anti-locking anyway, this seems like a simple next step to take. An optimized solution would be to first use locking and once contention is observed (taking a lock takes too many milliseconds) creating more Isolates on demand.

I was disappointed that none of the mailing list messages and blogs I found by googling said anything about Isolate.

Turns out Isolate class doesn't exist in older versions of libv8, such as 2.5.9.9 that is installed in newest (K)Ubuntu 11.04. Once I have more of the "first stab" tasks done, I can create a proxy class that will do the v8 initialization (including binding the ObjectTemplate), and it can use Isolate depending on the v8 version that is available.

I've now pushed a commit that works with multiple threads - it's a single line fix really :-)

I also added the context->Exit() calls, I still don't know if they're needed or not.

Now pushed results of today's work:


drizzle> select js_eval("'hello ' + argv[0]", "world");
+----------------------------------------+
| js_eval("'hello ' + argv[0]", "world") |
+----------------------------------------+
| hello world |
+----------------------------------------+
1 row in set (0.005235 sec)
 
drizzle> select js_eval("'my arguments are: ' + argv", "foo", "bar", "baz", 1, 2, 3) as 'js_eval(...)';
+-------------------------------------+
| js_eval(...) |
+-------------------------------------+
| my arguments are: foo,bar,baz,1,2,3 |
+-------------------------------------+
1 row in set (0.004099 sec)
 
drizzle> select js_eval("Math.max(argv[2], argv[1], argv[0])", 1, 2, 3) as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| 3 |
+--------------+
1 row in set (0.004505 sec)
 
drizzle> select js_eval("var res = argv[0]; for (i in argv) { if(i>res) res=i;} res;", 1,2,3,4,3,2,1) as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| 6 |
+--------------+
1 row in set (0.004067 sec)
 
drizzle> select js_eval("var res = argv[0]; for (i in argv) { if(argv[i]>res) res=argv[i];} res;", 1,2,3,4) as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| 4 |
+--------------+
1 row in set (0.004614 sec)
 
drizzle> select js_eval("argv[0]+argv[1]", 1, 2);
+----------------------------------+
| js_eval("argv[0]+argv[1]", 1, 2) |
+----------------------------------+
| 12 |
+----------------------------------+
1 row in set (0.004715 sec)
 
drizzle> select js_eval("eval(argv[0])+eval(argv[1])", 1, 2);
+----------------------------------------------+
| js_eval("eval(argv[0])+eval(argv[1])", 1, 2) |
+----------------------------------------------+
| 3 |
+----------------------------------------------+
1 row in set (0.004323 sec)
 
drizzle> select js_eval("eval('var jsondoc = ' + argv[0]); jsondoc['firstname'];", "{'firstname': 'Henrik', 'lastname': 'Ingo'}") as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| Henrik |
+--------------+
1 row in set (0.004305 sec)
 
drizzle> select js_eval("var jsondoc = JSON.parse(argv[0]); jsondoc['firstname'];", "{'firstname': 'Henrik', 'lastname': 'Ingo'}") as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| NULL |
+--------------+
1 row in set (0.006932 sec)

Result 4 is unintuitive but apparently this is how for (... in ...) behaves in JavaScript (result 5 is what you want). Result 6 is not correct, it's just that I'm binding all parameters (in drizzled space) with val_str(). I need to add different line of code for different types. Result 7 demonstrates correct addition of integers, but will not be needed in a final implementation.

Anyway, that brings us back to the original problem, I can now pass a JSON object as a string, and eval() it and I have a JavaScript object I can manipulate. Mission accomplished! (Kind of, I actually want to return serialized JSON too...)

I don't know what's wrong with JSON.parse().

Nice! So do you still compile for each row? Or have you managed to compile just once before handling the rows?

The strange result you see with the for..in loop is just expected. When you run it over an array, the loop variable assumes the index number - not the value. I always avoid for..in over arrays, it just doesn't make sense. It's very useful for looping over the properties of an object though.

With regard to JSON.parse: your document just isn't valid json. The keys must be double quoted, not single quoted. This will be a lot easier if you use proper single quotes to delimit your SQL strings (since you won't have to escape the double quotes for javascript strings)

To return serialized JSON, you should simply call

JSON.stringify(object);

Final note: maybe you think it's verbose or something like that, but argv as name for the arguments is really awkward if you're used to javascript. As per the Ecmascript standard "arguments" is the standard name to refer to function arguments, it seems natural to re-use that for the imported script variables.

Thanks for the comments! Your knowledge here is very helpful.

And here are the results from Helsinki... (Netherlands, 12 points.)


drizzle> select js_eval("var jsondoc = JSON.parse(arguments[0]); jsondoc['firstname'];", '{"firstname" : "Henrik", "lastname" : "Ingo"}') as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| Henrik |
+--------------+
1 row in set (11.7978 sec)
 
drizzle> select js_eval("var jsondoc = JSON.parse(arguments[0]); jsondoc['name'];", '{ "name" : {"firstname" : "Henrik", "lastname" : "Ingo"} }') as 'js_eval(...)';
+-----------------+
| js_eval(...) |
+-----------------+
| [object Object] |
+-----------------+
1 row in set (0.004471 sec)
 
drizzle> select js_eval("var jsondoc = JSON.parse(arguments[0]); jsondoc['name']['firstname'];", '{ "name" : {"firstname" : "Henrik", "lastname" : "Ingo"} }') as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| Henrik |
+--------------+
1 row in set (0.024345 sec)
 
drizzle> select js_eval("var jsondoc = JSON.parse(arguments[0]); JSON.stringify(jsondoc['name']['firstname']);", '{ "name" : {"firstname" : "Henrik", "lastname" : "Ingo"} }') as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| "Henrik" |
+--------------+
1 row in set (0.008467 sec)
 
drizzle> select js_eval("var jsondoc = JSON.parse(arguments[0]); JSON.stringify(jsondoc['name']);", '{ "name" : {"firstname" : "Henrik", "lastname" : "Ingo"} }') as 'js_eval(...)';
+------------------------------------------+
| js_eval(...) |
+------------------------------------------+
| {"firstname":"Henrik","lastname":"Ingo"} |
+------------------------------------------+
1 row in set (0.006754 sec)
 
drizzle> select js_eval("var jsondoc = JSON.parse(jsondata); jsondoc['name']['firstname'];", '{ "name" : {"firstname" : "Henrik", "lastname" : "Ingo"} }' as jsondata ) as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| Henrik |
+--------------+
1 row in set (0.004395 sec)
 
drizzle> select js_eval("arguments[0] + arguments[1] + arguments[2];", 1, 2, 3) as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| 6 |
+--------------+
1 row in set (0.039269 sec)
 
drizzle> select js_eval("arguments[0] + arguments[1] + arguments[2];", 1, 2.0, 3.5) as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| 6.5 |
+--------------+
1 row in set (0.022482 sec)
 
drizzle> select js_eval("first + second + third;", 1 AS 'first', 2.0 AS 'second', 3.5 AS 'third') as 'js_eval(...)';
+--------------+
| js_eval(...) |
+--------------+
| 6.5 |
+--------------+
1 row in set (0.0045 sec)

So, implemented your suggestions, correctly pass also integer and decimal/number types, and support for named arguments, the name becoming a global variable name in javascript.

And to answer your standing question, no I'm not caching the compiled script nor are there any other optimizations. But functionality-wise it's getting quite good!

And here's an example of using this against a table. This is in fact pretty much the use case I had in mind. (Or at least a first step in a larger picture involving triggers and indexes...)


drizzle> create table t (k INT PRIMARY KEY auto_increment, v TEXT);Query OK, 0 rows affected (0.004813 sec)
 
drizzle> INSERT INTO t (v) VALUES ('{ "person" : { "firstname" : "Roland", "lastname" : "Bouman" } }');
Query OK, 1 row affected (0.002912 sec)
 
drizzle> INSERT INTO t (v) VALUES ('{ "person" : { "firstname" : "Henrik", "lastname" : "Ingo" } }');
Query OK, 1 row affected (0.001796 sec)
 
drizzle> INSERT INTO t (v) VALUES ('{ "person" : { "firstname" : "Brian", "lastname" : "Aker" } }');
Query OK, 1 row affected (0.002644 sec)
 
drizzle> SELECT js_eval('var person = JSON.parse(jsondoc); person["person"]["firstname"];', v as jsondoc) AS 'js_eval(...)' FROM t WHERE k=2;
+--------------+
| js_eval(...) |
+--------------+
| Henrik |
+--------------+
1 row in set (0.005119 sec)
 
drizzle> SELECT k, js_eval('var person = JSON.parse(jsondoc); person["person"]["firstname"];', v as jsondoc) AS 'firstname', js_eval('var person = JSON.parse(jsondoc); person["person"]["lastname"];', v as jsondoc) AS 'lastname' FROM t;
+---+-----------+----------+
| k | firstname | lastname |
+---+-----------+----------+
| 1 | Roland | Bouman |
| 2 | Henrik | Ingo |
| 3 | Brian | Aker |
+---+-----------+----------+
3 rows in set (0.017192 sec)

And finally, to pay homage to the title of this blog post:

- But Henrik, this is not stored procedure. You just implemented a function to run javascript snippets.
- Ok, so let's store the javascript code in the database:


drizzle> CREATE TABLE sp (name VARCHAR(255) PRIMARY KEY, script TEXT);
Query OK, 0 rows affected (0.003578 sec)
 
drizzle> INSERT INTO sp (name, script) VALUES ('get_person_property', 'var person = JSON.parse(jsondoc); person["person"][property];');
Query OK, 1 row affected (0.002341 sec)
 
drizzle> SELECT k, js_eval(sp.script, v as jsondoc, 'firstname' as 'property') AS 'firstname', js_eval(sp.script, v as jsondoc, 'lastname' as 'property') AS 'lastname' FROM t JOIN sp WHERE sp.name='get_person_property';
+---+-----------+----------+
| k | firstname | lastname |
+---+-----------+----------+
| 1 | Roland | Bouman |
| 2 | Henrik | Ingo |
| 3 | Brian | Aker |
+---+-----------+----------+
3 rows in set (0.028613 sec)

Now it's a *stored* script :-)

About the bookAbout this siteAcademicAccordAmazonAppleBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDistributed ConsensusDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLNyrkiöodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPParkinsonPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonStartupsSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTransactionsTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube

Search

Recent blog posts

Recent comments