Loading…
This event has ended. Visit the official site or create your own event on Sched.
Wednesday, August 8 • 4:20pm - 5:00pm
Scalable Data Science with JuliaDB and OnlineStats

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Abstract
JuliaDB integrates with OnlineStats to provide scalable single pass algorithms (that can run in parallel) for statistics and modeling on big data. This integration allows you to transform small-scale analyses into out-of-core computations for huge datasets without changing your code.
Description
JuliaDB is a distributed database for high performance analytics that allows you to load large multi-file datasets across multiple processes, index the data for fast queries, and save tables to disk for quick reloading. JuliaDB is 100% Julia, so user-defined functions are compiled and you can efficiently store any data type.
OnlineStats is a Julia package that provides online algorithms for statistics, machine learning, and big data visualization. Online algorithms update estimators one observation at a time in a single pass, making it unnecessary that your data fit in memory. JuliaDB interfaces with OnlineStats to provide a scalable analytical framework that does the heavy lifting for you. The same operations used on toy datasets can therefore be efficiently executed on a cluster without changing your code.
This combination provides a powerful workflow for dealing with both large and small datasets. This talk will demonstrate how you can take advantage of these packages to implement scalable analytics over your own datasets.

Speakers
avatar for Josh Day

Josh Day

Julia Computing
I am a statistician/data scientist who enjoys working on difficult optimization and machine learning problems. I recently received my PhD in statistics from NC State where I researched on-line algorithms for streaming and big data. Before coming to NC State, I earned a B.S. in Mathematics/Statistics... Read More →


Wednesday August 8, 2018 4:20pm - 5:00pm BST
Darwin LT B40