Is anyone familiar with past work on typed dataframes For ex Future of Coding #thinking-together

Is anyone familiar with past work on typed datafra...

Paul Butler

12/03/2019, 10:51 PM

Is anyone familiar with past work on typed dataframes? For example, being able to statically check a series of data pipeline operations and ensure that columns exist and are of the right type without executing the code?

👍 1

Breck Yunits

12/04/2019, 2:24 AM

I'm very curious about this area. The language I'm working on called Maia passes typed dataframes between nodes. The column types are determined at runtime. A concept I've become enamored with recently is "Data Usability Score" (https://www.kaggle.com/product-feedback/93922). I think if we move to a world where it's standard for datasets to be accompanied by type information, then we have a clear path to statically analyzing the flow of the types through a data pipeline

Ishan

12/04/2019, 7:23 AM

curious too! If you could map your dataframes to tables in a sql engine, do you think - valid query plan implies the code is type-correct?

Paul Butler

12/04/2019, 11:36 AM

I came across Spark's new Dataset approach, which is relevant: https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-Dataset.html

Adriaan Leijnse

12/04/2019, 12:46 PM

How about work like this? https://pdfs.semanticscholar.org/e529/13a157b06a95951e19cde54a776e16adee11.pdf

👍 1

Paul Butler

12/05/2019, 10:18 AM

That's useful, thanks! It exposed me to a bunch of literature I hadn't seen because I didn't know the terminology to search for

Adriaan Leijnse

12/05/2019, 1:09 PM

You're welcome, Paul 🙂

Breck Yunits

12/11/2019, 2:21 AM

@Paul Butler just came across this http://tomasp.net/academic/papers/inforich/inforich-msr.pdf . Seems to be a decent amount of work in this area in F#

Paul Butler

12/11/2019, 2:26 AM

Thanks @Breck Yunits, I'll check it out

2 Views

Open in Slack

Previous Next