Polars is an OLAP query engine that focusses on the DataFrame use case.
We have done innovative research on the prediction of changes within 3D-shapes over time, a subject with limited prior exploration due to the scarcity of publicly available 3D data with temporal dimensions. This presentation aims to address the challenges of transitioning from 2D models to a 3D context. To facilitate audience comprehension, we will illustrate our points using practical examples.
Natural AI aspires to create intelligent agents that draw inspiration from the human brain. This paradigm merges cognitive science with the Free Energy Principle to develop adaptable and human-like AI systems. These low-power agents excel in understanding context, learning from experience, and intuitive interactions, promising a future of energy-efficient, intelligent problem-solving.
Jupyter Notebooks revolutionized Data Science - Reactive Notebooks build on top of this giant. Reactivity solves a couple of problems:
-
Improved safety: It is impossible to get into an invalid state (where you would hit "rerun all" in Jupyter).
-
Simplified interactivity: Changing a value also updates those cells which depend on it.
-
Simplified deployment: The notebook itself is already a dashboard (with fully customizable html frontend).
In addition, the new notebook allows for self-updates, i.e. listening for some remote events and rerunning notebook cells automatically with the new updated value. This upgrades your notebook to a real-time analytics process.
In this talk I will present the new notebook and go through all its powerful features.
The best way to learn how to secure a system is to know how it breaks! In this talk we will take on the role of the Red Team, try to break Large Language Models using prompt injection, and learn about the kind of havoc we can cause. Learn how to safeguard your LLM applications by understanding their weaknesses.
Whether through Cython, Rcpp, or other interfaces, many of the common packages in the Python and R ecosystems contain large amounts of C code for the package internals. The reason for this is performance: C is just a faster language than Python and R. However, Julia is also as fast as C while having higher level semantics similar to Python and R. Could one build Python and R packages using Julia? In this talk we will discuss how diffeqpy/diffeqr were built as packages to allow Python/R users to use Julia's DifferentialEquations.jl in a simple way. We will discuss topics from automating GPU compatibility, packaging precompiled binaries of Julia code, and automating interface wrappers which enable doing this efficiently. The audience will leave with a clear idea of how to use Julia as an alternative to C for package internals.
Healtphlus.ai is developping PERISCOPE© as a decision support system for surgical departments. With PERICOPE we train a machine learning model on data in the electronic health records. In this talk we will show you our mission, the design choices we have made and the resulting architecture as well as shed some light on some of the challenges we face.
This talk describes the development and deployment of a reinforcement model to reduce food waste in a supermarket chain. Markdowns are dynamically increased through the day on short shelf-life products following a particular policy. The goal is to minimize food waste without significantly increasing the cost of the markdowns. The sequence of choices of markdown levels is modelled as a Markov Decision Process and offline Q-learning is used on historic data to learn a policy. The talk introduces the context of the problem, how a reinforcement model was applied, and the challenges faced with offline and online evaluation.
In this talk, we will delve into the various techniques and tools for ML model optimization during both training and inference stages. We will trace the journey of an ML model from its beginnings in a Jupyter notebook to its final deployment with a high-performance inference runtime. Along the way, valuable insights will be shared that you can seamlessly incorporate into your own workflow.
Tasks related to computer vision don’t have an easy learning curve - there are many nuances, it’s in some cases subjective and the data is not always there and ready to be used.
In this talk, you will see an example of how to use your own domain-specific data to outperform pre-built market solutions. We will explore some approaches for generating ground truth data for a background removal task, and how to use it for fine-tuning open source models with limited computational power.
Onboard the LLM hype-train! 🚂 Are you curious about what LLM's (Large Language Models) can do for you? One of their use cases is to enrich datasets, by converting text into structured data. This can have huge benefits. Useful facts that were previously hidden inside a large piece of text can now be unveiled, allowing you to more accurately filter and query your data.
Scale your Data Science with Reactive Notebooks.
From Monitoring and Dashboards, to Big Data, Real-Time and High Performance Computing — everything is cleanly controlled using a simple interactive notebook.
Jolin.io Cloud is built on top of Kubernetes and can be deployed into your own infrastructure. Or use our hosted service at cloud.jolin.io.
Do you find it takes too long to deploy your ML models responsibly in production? Say goodbye to complex integrations of containerized training and inference, preprocessing pipelines, model registries, monitoring frameworks, feature stores, and prediction stores. As demonstrated in this talk, all of these concepts can be easily implemented within dbt using only Python and SQL, the two languages data scientists truly love to write in.
Many Python frameworks are suitable for creating basic dashboards, but struggle with more complex ones. Though many teams default to splitting into separate frontend and backend divisions when faced with increasing dashboard complexity, this approach introduces its own set of challenges, like reduced personnel interchangeability and cumbersome refactoring due to REST API changes.
Solara, our new web framework, addresses these challenges. We use the foundational principles of ReactJS, yet maintain the ease of writing only Python. Solara has a declarative API, designed for dynamic and complex UI's, yet easy to write. Reactive variables power our state management which automatically trigger rerenders. Our component-centric architecture stimulates code reusability, and hot reloading promotes efficient workflows. Together with our rich set of UI and data-focused components, Solara spans the entire spectrum from rapid prototyping to robust, complex dashboards.
Without modification your application and components will work in Jupyter, Voilà and on our standalone server for high scalability. Our server can run along existing FastAPI, Starlette, Flask and even Django servers to integrate with existing web services. We prioritize code quality and developer friendliness by including strong typing and first class support for unit and integration testing.
Working with time series data can be special - so let's talk about things we have learned deploying models dealing with this type of data at scale.
Want a dataset for ML? Internet says you should use ... active learning!
It's not a bad idea. When you're creating your own training data you typically want to focus on examples that can teach a machine learning algorithm the most. That's why active learning techniques typically fetch examples with the lowest confidence scores to annotate first. The thinking is that low confidence regions represent the areas where the algorithm might learn more than regions where the algorithm seems sure of itself.
Again, it's not a bad idea. But it's an approach that can be improved by rethinking some parts. Maybe it would be better for the human to understand the mistakes that the model makes and uses this information to actively teach the model on how to improve.
This talk is all about exploring this idea.
In this talk, we will explore the different types of quantization techniques and discuss how they can be applied to deep learning models - everything in a Jupyter Notebook, which allows you to try it at home.
Large language models are all the rage, but building scalable applications with them can be costly and difficult. In this talk, we give you a glimpse at the emerging ecosystem of LLM apps beyond just ChatGPT. In particular, we focus on OSS alternatives, like the Llama model family, and show you how to use them in your own projects. We discuss how to leverage services like Anyscale Endpoints in Python to get LLM apps up and running quickly. To demonstrate this, we showcase two application we built ourselves, namely a GitHub bot that helps you with your pull requests, and an "Ask AI" chatbot that we integrated into our project documentation.
Royal FloraHolland is the biggest flower and plant auction in the world. We sell products on behalf of the growers. Every day we receive new supply and we want to ensure quality standards. In our talk we will present a project that aimed to revolutionize the quality control process within our organization. With a substantial inflow of supply, far surpassing the quality control team's capacity, the traditional approach of random checking proved impractical. Clearly, a smarter solution was needed. We will showcase how we developed and implemented a multi-model system, which utilizes a dynamic risk threshold for optimal selection of supply items. This is an informative, yet engaging talk with practical examples of techniques and infrastructure used.
Great news! You have just joined a new cool data science project. The client wants you to deliver a maintainable, reproducible and production-ready data pipeline. Where do you best start?
In this talk we demonstrate how Kedro, an open-source Python framework was created to do just that: build production-ready data science code from the get-go. We will demonstrate with a concrete example how we leveraged Kedro to built an analytics solution for energy cost optimization at a large Dutch agricultural player.
Probabilistic energy price forecasts can help balance the electrical grid in the face of volatile renewable energy sources, especially when the forecasts are well-calibrated. Conformal prediction can calibrate probabilistic forecasts, producing a distribution with valid coverage in finite samples. This presentation will delve deeper into probabilistic time series forecasting and how to calibrate your forecast.
This talk will introduce PyScript – a framework that enables rich Python applications in the browser. PyScript aims to give users a first-class programming language that has consistent styling rules, is more expressive, and is easier to learn than JavaScript. This talk will give demos and show how to write and host PyScript applications.
SHAP (SHapley Additive exPlanations) is a model agnostic AI explainability framework that can be used for global and local explainability. Starting from scratch, the theory of SHAP values will be explained and the usage of the Python framework will be illustrated on a classification example in the transaction monitoring domain. After the presentation, you will have learned how to use SHAP to investigate feature importance, feature sensitivity and how to explain individual prediction in a human readable output.
pixi
is changing the field for software environment management. pixi
bridges the gap between conda
and pip
with a single binary tool that automatically generates lock-files and can be configured with a single configuration file. While we build on the conda ecosystem, we’re also integrating PyPi (pip
) into pixi
. This way you’ll only need one tool across all platforms.
"You should never, ever deal with time zones if you can help it" Tom Scott
Instead, you should let your software deal with time zones for you.
Polars is a Dataframe library with full support for time zones - come and learn how to leverage it to its full potential!
Setting up an ML application by creating some ad-hoc dataset and training a model with an unversioned Python script just does not cut it anymore. The MLOps process helps to structure the development and life cycle of any ML application to make sure that data is traceable, and performance is reproducible. In this talk, an automated visual inspection application is used as an example to show a data definition and labeling platform using Django, and an image ingress and streaming service using gRPC. This system can serve as an example for any future ML application as the same principles can be applied. A demonstration of this system is also shown at the end of the session. Having some basic knowledge of Django and gRPC is preferred, but it is not required.