Loading…
This event has ended. Visit the official site or create your own event on Sched.
Thursday, August 9 • 3:45pm - 3:57pm
New in DataStreams.jl: Type flexibility, querying, and parallelism, oh my!

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Abstract
What’s new in the data framework package powering some of the most popular data packages in Julia? Come learn about advancements in flexibly typed schemas, querying functionality directly integrated with IO, and automatic data transfer parallelism.
Description
The DataStreams.jl framework is behind a number of key packages in the Julia data ecosystem. At it’s core, it defines the “Source” and “Sink” interfaces that various formats can implement to automatically integrate with other formats that also implement the interfaces. This solves the one-to-many interop problem that always plagues data formats (“what? it only takes CSV files??”). With DataStreams, it’s quick and easy to implement the interface and automatically hook into the rest of the Julia data ecosystem.
So what’s new and noteworthy in DataStreams?
  • Flexibly typed schemas: a long-standing issue with any sort of data transfer is how to align expected types between source and sink; Base julia itself has pioneered a flexible, yet performant solution in it’s implementation of map over collections. This same approach has been applied to DataStreams to allow dynamic, type-inference-independent transfer from sources to sinks.
  • IO-integrated querying functionality: how many times have you thought, “sheesh, I wish there was a way to only parse a few columns, filter out certain values, and apply a transformation to this csv file all at the same time!” Ok, maybe not those words exactly, but with DataStreams, now you can! A fully integrated query-planner can now take any number of transformations and apply them at the IO-level to avoid more data transfer than absolutely necessary and fuse them all together in tightly compiled Julia code.
  • Data transfer parallelism: Sinks can now signal to Sources that they support parallel streaming; this can lead to massive improvements in data throughput and fully leveraging a system’s resources

Speakers
avatar for Jacob Quinn

Jacob Quinn

Senior Engineer, Domo
Attended Carnegie Mellon for a master's degree in data science and active Julia contributor for 6 years now.


Thursday August 9, 2018 3:45pm - 3:57pm BST
LT 106