Interesting analysis of "Computational notebooks, ...
# thinking-together
r
Interesting analysis of "Computational notebooks, such as Jupyter Notebooks, Azure Notebooks, and Databricks". These observations seem spot on to me, any theories about why these problems haven't been fixed yet? Or are they not problems at all? http://web.eecs.utk.edu/~azh/blog/notebookpainpoints.html
p
I think a lot of this is a case of path-dependence in the architecture. The notebooks we have now (IPython/Jupyter and descendents) have an architecture where the client-side notebook is basically a glorified interface to a REPL that runs on a backend with very little state shared between them. The simplicity makes it very easy to write new kernels or new frontends (nteract, VSCode, etc.) but it also means that those architectural choices persist. Some side effects of this architecture have just become accepted as how notebooks "work": tab-complete doesn't work when code is running; if you lose your network connection you lose output of running code; etc.
👍 3
I also think part of the problem is how languages like Python, R, and Julia have become the lingua Franca of data science, but all are too dynamic to support static type checking or static analysis of the dataflow graph which would make a whole lot more possible
☝🏼 1
s
notebooks have always felt like a step backwards in a lot of ways to me: especially considering how the "great" recent innovation in structuring information is hypertext, and nonlinear presentation and exploration of information, while notebooks impose a linear narrative on you I think part of it is the lack of time-reversibility in the languages in question—there's no easy "undo button" In a lot of ways, a smalltalk style language with versioned image based code distribution focused on working with data could solve a lot of these issues
👍 3
Additionally, the lack of decent crossplatform gui toolkits has forced any serious project involving graphics onto the web, and that limits the extent of how well the ux can be engineered and integrated, which is unfortunate.
👍 1
@Paul Butler You can do quite a lot with a "simple" repl and a client-server model, slime+swank for common lisp for example solved most of the issues you mentioned a long time ago, and provide a sophisticated integrated debugger and more
a
I really like writing .Rmarkdown documents in the RStudio IDE, which avoids some of these issues. https://yihui.org/en/2018/09/notebook-war/
p
@S.M Mukarram Nainar thanks, I wasn't familiar with slime+swank. Is there a good demo of the functionality you're referring to? I've tried but haven't found much.
s
Depends, how familiar are you with common lisp? @Paul Butler
p
I've written some clojure and scheme at various points in my career, but am pretty rusty these days
s
Okay, honestly I would just jump in and use it, portacle, https://portacle.github.io/ , is probably the easiest way to get a fully setup environment, and practical common lisp, http://www.gigamonkeys.com/book/ , is a good reference for common lisp essentials, as for debugging, https://malisper.me/debugging-lisp-part-1-recompilation/ , is a good intro You can inspect the inner state of slime+swank using it's own debugging tools, which is kinda fun, and using (essentially) standard common lisp apis like the bourdeux threading api, since slime+swank is written entirely in common lisp on the server side and emacs lisp on the client side Swanks has it's own repl thread and compiles code in separate threads so interactions aren't blocked, it's quite nice, and if slime disconnects from swank, all your stuff is still there when you reconnect, which I think covers the issues you raised?
If you have any questions feel free to ask
That thought I had earlier is interesting, I never really considered that computational notebooks are largely reinventing image-based programming but without a lot of the features.
❤️ 1
c
Joel Grus gave a talk 2 years back about how he felt about Jupyter (slides and video attached): https://twitter.com/joelgrus/status/1033035196428378113?lang=en Based on my experience working with newer analysts and in workshops, the "statefulness"/dependence on execution order/"permissiveness of sloppyness" were the most problematic issues. As far as concerns around IDE-like tooling go, VSCode has native support for opening Jupyter notebooks and Atom has the "Hydrogen" extension, which help a bit.
k
@S.M Mukarram Nainar The main limitation of swank+slime is the main limitation of Emacs: no graphics. Data science relies heavily on visualization. Perhaps the main attraction of notebooks is the possibility to store code and visualization in a single document. But yes, the execution model is extremely limiting. For a demo of a notebook based on Smalltalk (Pharo) with GToolkot, see http://blog.khinsen.net/posts/2019/05/09/the-computational-notebook-of-the-future-part-2/. That’s a mich nicer environment, but the price to pay is a serious lack of scientific libraries.
s
@Konrad Hinsen Yeah, I agree with that, emacs is great—for a text-mode ui with no security considerations. Thanks for the link, I'll give it a spin later. It's been a couple years since I've played with pharo.
Though I should point out, you can largely replicate computational notebooks in emacs with org-mode, since emacs can in fact show images. What's missing is interactivity with the visualizations. Though there's not much point as in the end you're left with the same semantic issues as other notebooks, like out-of-order execution, and forced linear progression.

https://www.youtube.com/watch?v=CGnt_PWoM5Y

k
Org-Mode is indeed a nice notebook tools, if notebook tools is what you want. It's even better in permitting Knuth-style literate programming as well. Unfortunatly, if you combine all those features the result is brittle and hard to understand for a reader. And for the author, the big limitation of Emacs is synchronous execution of code snippets. If you run a one-minute computation, Emacs freezes for one minute.
s
Async execution has been possible for a while, but yeah, it's quite bad w.r.t accessibility and scaling up with code complexity.
thanks for your pointer to gtoolkit btw, very interesting, I had no idea the pharo community had gotten this far