KKMIN

CSV Analyzer with Python and Streamlit

Introduction

Python is a great language for data analysis; it has a lot of libraries like pandas and numpy that makes it easy to work with data. There are also plenty of visualization libraries like matplotlib and seaborn that creates beautiful graphs and charts.

That's great, but sometimes you just want quick and easy visuals without any wrangling with the data for data cleaning and such. In such cases, I found myself re-writing the same boilerplate code again and again just to get a quick glance of my data. It can be a pain especially if you aren't a data scientist and have to look-up the same old functions again.

CSV Analyzer

So I made CSV Analyzer, an easy tool that allows uploading of CSV files and plots simple charts using the headers of the csv file. I used a frontend framework for Python called Streamlit which seems to be targeted specially towards data-oriented applications. It's a great tool, but it can be a little confusing for those without frontend experience.

Streamlit

If you're making a data-oriented application, chances are Streamlit provides you with almost everything you need with their API without needing to re-invent the wheel.

Streamlit provides a variety of charts, and also has integration with popular graphing libraries like matplotlib. In that sense, making graphs from csv files weren't a challenge at all. What is more interesting is Streamlit itself and how it renders the interface, which is important if we want to build applications with more complex interactivity.

How Streamlit creates the interface

When you run a Streamlit app, the script will run from top to bottom and create any UI elements based on the script itself. But the more crucial concept is how Streamlit updates the UI: if you interact with the elements in some meaningful way such that something on the interface needs to be changed, the entire script will re-run from top to bottom. This means that we cannot naively initialize and assign variables to Streamlit components and expect them to persist across each top-to-bottom render.

For example:


import streamlit as st
titleVar = 'Hello'
def changeTitle():
titleVar = 'Goodbye'
st.title(titleVar)
st.button('Change Title', on_click=changeTitle)

We would expect titleVar to change and reflect accordingly when the button is clicked, but that doesn't happen. Consider the following flow:

  1. The script is run from top to bottom for the first time, and titleVar = 'Hello' is initialized and displayed.
  2. The button is clicked, and changeTitle() is called.
  3. titleVar is changed to 'Goodbye'. At this point, Streamlit knows that the interface must change as titleVar is bound to the title element.
  4. The script is run from top to bottom again, and titleVar = 'Hello' is reinitialized and displayed.

State Management

This brings us to the idea of state, which refers to information about the application that must be stored and persisted across top-to-bottom runs. In the above example, we would want titleVar to persist across runs, and that's where state management comes in. Streamlit provides a way to manage state using the st.session_state object. Anything variables stored inside will survive script reruns.


import streamlit as st
# Initialization for the first script run:
if('titleVar' not in st.session_state):
st.session_state.titleVar = 'Hello'
def changeTitle():
st.session_state.titleVar = 'Goodbye'
st.title(st.session_state.titleVar)
st.button('Change Title', on_click=changeTitle)

Now, clicking the button will change the title as expected.

Coming from a React background, this was very natural to me, but it might be useful for others without frontend experience to understand the concept of state management and how it relates to the UI rendering for more interactive applications.

For example, in an application like the CSV Analyzer, we would want to store the user uploaded file and selected columns to plot in the session state so that we can plot the correct graphs with the correct data every time the script is rerun.

Closing Thoughts

Streamlit is a great tool if you want clean and simple interfaces for your data-oriented applications. It's also a great foray into frontend development for those are come from a data science background. However, a drawback I see is that cutomizability is rather limited, since you are most likely to be stuck using Streamlit's built in components.

If you want to build a more complex application, state management might also become a chore as well, having to keep track of many variables in st.session_state without any way to organize them properly. You might want to consider using a more powerful frontend framework like React or Vue in such cases. Nevertheless, Streamlit does its job well in terms of allowing us to build Python apps and is definitely worth checking out if you work with data.

← Back to home

Comments