Is anyone familiar with past work on typed datafra...
# thinking-together
p
Is anyone familiar with past work on typed dataframes? For example, being able to statically check a series of data pipeline operations and ensure that columns exist and are of the right type without executing the code?
👍 1
b
I'm very curious about this area. The language I'm working on called Maia passes typed dataframes between nodes. The column types are determined at runtime. A concept I've become enamored with recently is "Data Usability Score" (https://www.kaggle.com/product-feedback/93922). I think if we move to a world where it's standard for datasets to be accompanied by type information, then we have a clear path to statically analyzing the flow of the types through a data pipeline
i
curious too! If you could map your dataframes to tables in a sql engine, do you think - valid query plan implies the code is type-correct?
p
I came across Spark's new Dataset approach, which is relevant: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Dataset.html
a
p
That's useful, thanks! It exposed me to a bunch of literature I hadn't seen because I didn't know the terminology to search for
a
You're welcome, Paul 🙂
b
@Paul Butler just came across this http://tomasp.net/academic/papers/inforich/inforich-msr.pdf . Seems to be a decent amount of work in this area in F#
p
Thanks @Breck Yunits, I'll check it out