Izzuddin Ahsanujunda

Java Never Stuck with Me

Fri, 17 Apr 2026 17:07:58 +0900

It might come as a surprise for anyone who knew me personally, but I used to despise programming. I am what can be considered a late bloomer. My first introduction to a proper programming was in first year of university: Algorithm and Data Structure class. So I never developed natural interest on this topic, I need the good grade as the initial motivation. We used C back then and I can’t tell you how much I hated it. Not the class itself, but making the C language do what I want it to do. Google was not as helpful back then, I did not even remember if stack overflow was around, and no LLM for obvious reason. I fight this battle only with the good old “Introduction to Algorithms” book.

What a Three Years

Fri, 17 Apr 2026 01:59:17 +0900

Been three years since my last post. What a difference three years make. I relocated to Japan, wife started school, we lost her father, wife got a job, I practiced half-marathon, ran half-marathon, ran a second half-marathon, wife quit her job, our son was born.

Run R in Google Colab

Sun, 21 Aug 2022 00:33:13 +0900

Google colab offers a way to easily run R commands. It is really useful for someone like me who simply wants to try R but does not want to download and set-up the tools needed to run R. There are two approaches, one is to create a new notebook with R runtime, the other one is to execute a single cell with R script inside pre-existing python runtime notebook.

New R Runtime

To create a new notebook with R, simply go to the following url http://colab.to/r. It will resolve to https://colab.research.google.com/notebook#create=true&language=r which is the url for creating google colab notebook with R runtime.

Create Florence Nightingale's Coxcomb Diagram with Python

Tue, 09 Aug 2022 21:26:04 +0900

This is my implementation of the famous coxcomb diagram of excess death from Crimean War created by Florence Nightingale in 1858. It highlights the portion of death that occur from preventable cause such as disease and infection instead of direct battle-inflicted wound. This diagram been used to illustrate the power of visualization to empower decision making.

I realized that using computer to resolve the proportion makes the chart emphasize the excess death even further than what Florence Nightingale presented.

Integrate Utterance to Blog Posts

Mon, 08 Aug 2022 21:29:58 +0900

Utterance is a comment widget that leverage github issues to bring discussion to our blog. I have been looking for a discussion tool for a while now, but none seems suitable. Lots of hexo-powered blogs use disqus for discussion, however I felt overwhelmed with the many configuration of disqus. The appearance is very cluttered as well. I just need a simple discussion widget that will let user identify themselves before commenting, but hexo as a static page generator doesn’t make use any kind of database storage so authenticating would need to be taken care of by third party.

Read Google Spreadsheet as Pandas DataFrame

Mon, 08 Aug 2022 17:26:21 +0900

We can read our google spreadsheets via pandas to be used in our analysis. We just have to get this part of the sharing URL https://docs.google.com/spreadsheets/d/[sheet_id]/edit?usp=sharing and the sheet name, then put it into this format https://docs.google.com/spreadsheets/d/[sheet_id]/gviz/tq?tqx=out:csv&sheet=[sheet_name].

We can then put this url into pandas.read_csv().

And of course I build a react app for this. It’s this one. Because why not.

About Me

Sun, 07 Aug 2022 21:57:45 +0900

Greetings, I’m Junda! 👋

I’m a Senior Data Scientist at Rakuten, a leading e-commerce platform in Japan with global presence. Outside of work, I run freelance web development projects.

📚 I’m currently learning causal inference extensively for work
🐝 I have worked on several side projects on a web development using MERN stack that is deployed to Heroku
🔭 My wife is very obsessed about zero waste household, so I’m trying to help her by developing a device with a little bit of deep-learning-on-edge capability for waste management. I mainly use intel OpenVINO toolkit with models built on pytorch
🌱 I’m adept at using alibaba cloud and GCP courtesy to professional experiences.
🍰 Fun fact: Me and my wife is a surprisingly good baker, a hobby that we both picked up due to COVID-19 quarantines

Coffee Analytics 1 - How to Tell if Changing Grind Size Results in Better Brew

Thu, 04 Aug 2022 21:53:09 +0900

TL;DR the smaller grind size behaves pretty much the same with larger grind size. There might be other more important factor at play, or there are too little observation to make a confident conclusion.

It’s now been two years since I seriously started making my own coffee. COVID-19 pandemic really brings out my inner barista. Before the pandemic and the ensuing restriction hits, I always buy my coffee. But now I weigh my coffee, hand grind beans right before brewing, 3D print custom tools to help me brew better, and watched James Hoffmann videos to know the latest poison to buy.

How to Embed Images Hosted on Google Drive

Mon, 01 Aug 2022 23:12:02 +0900

Embedding image URL from google drive is actually easy. We just have to convert the URL we got from google drive from this format https://drive.google.com/file/d/[image_id]/view?usp=sharing to this format https://drive.google.com/uc?export=view&id=[image_id].

Even easier, just use this tool I make: gdrive-url-converter

I build this tool using React as I have not got a chance to play with React for a long time. It is a good exercise to flex my front-end muscle.

Does FromSoft explicitly program Malenia to skip waterfowl dance?

Sat, 30 Jul 2022 21:05:32 +0900

Hidetaka Miyazaki and Fromsoft has a portfolio of hard and challenging bosses in their games. However, in their latest game Elden Ring, one particular boss has been making a scene online, so much so that videos of her have popped up on my Youtube recommendation before I even purchase the game. She heals with every connecting hit, her move set is unforgiving, she attacks as soon as we try to heal, and one particular move of her is near-impossible to dodge or block. The only way to avoid this attack is to git gud!

MLOps Part 3 - Evaluation and Optimization

Thu, 21 Oct 2021 11:57:21 +0900

Now that we have trained a model and store it as a reusable artifact, we are ready to evaluate the model on unseen data. As with usual training practice, we are going to pull out the test portion of our split data, run this data through the trained model, and record the score we got from the test data. As a good measure, we will also re-run training process with mlflow-powered hyperparameter sweep and discover the most optimal hyperparameter that could gave us best generalization between training and testing data.

MLOps Part 2 - Feature Engineering and Training

Tue, 14 Sep 2021 23:00:18 +0900

Previously, we have set up the main skeleton of our training pipeline using mlflow project and implemented a download step component. Now let’s continue building the training pipeline.

Right now we are going to develop the feature engineering and training part. For the sake of simplicity, we are going to implement a bare minimum feature engineering for our model, because we are looking to focus our work on mlops. It is very possible to develop a more rigorous feature engineering step that results in much better model performance.

MLOps Part 1 - Intro to MLflow Project and Setting-up Our First Component

Mon, 02 Aug 2021 21:49:38 +0900

MLflow is a very nice tool to handle our MLOps needs. It covers several important features for doing MLOps, namely tracking server, model registry, and source code packaging. Here we are going to focus on MLFlow Projects, the source code packaging feature that can help us develop a reproducible machine learning pipeline.

MLFlow projects enable us to run source codes in a consistent way by encapsulating the runtime environment together with the source code, so that we can develop our source code on OSX, and have it run on linux with the same reproducible result, if we so need.

Intuition to Recommender System for Implicit Feedback Dataset

Mon, 17 May 2021 20:59:40 +0900

I have been tinkering with recommender system at work for a few months now in order to gain deeper understanding on how the model works, how the training process learns from observation data, and how to make recommendation from learned model. This post is basically the overview on what I’ve learnt and will be divided into several parts, this is the first.

This post will rely heavily on paper from Yifan Hu, Yehuda Koren, and Chris Volinsky titled “Collaborative Filtering with Implicit Feedback Dataset”. The theory laid out in the paper has been incorporated into several open source tools to build recommender systems, most prominently perhaps the Apache Spark’s ALS package.

Setting Up Unit Test for Your Apache Spark Job Using Scalatest

Sat, 28 Nov 2020 08:15:10 +0900

By nature, machine learning models that run on production need to deal with… well… data, presumably lots of them. There will be times that among many data that our model need to deal with, there will be bad ones. In which case, machine learning models tend to either immediately stop processing data, or continued on with processing and produce smelly result. The impact of both are bad.

Unit testing ML models equip us as developers with an extra confidence to put models in production, by giving a way to an isolated modules in an ML pipeline to face various edge cases and try to handle them accordingly.

Understanding the Data - Exploring CO2 Emissions, Internet Usage, GDP per Capita, and Oil Consumption between Countries

Sun, 07 Jun 2020 09:45:28 +0900

One often overlooked aspect of data analysis project is keeping track of the data that we are working on. Our data will evolve during the course of analysis project, sometime new variables will be introduced, sometimes we redefined an old variables, or sometimes we dropped a variable that deemed no longer relevant. Whatever the reason is, it make a good sense that we keep track of the changes in our data. This is where a code book comes in handy. Code books are simply a document where we put information about our data. At the very least we want to keep track of our variable names, their description, and the unit of measurement.

Starting Analytics Project - What is the Connection Between CO2 Emissions and Internet Usage Across Countries

Sat, 06 Jun 2020 19:05:25 +0900

Every analytics project MUST start from a question. I have been always curious about the explosive growth of the Internet, especially as someone who built his career on enabling wider adoption of Internet in fields that traditionally not rely on the Internet. I want to know if Internet usage is bringing bad effect to CO2 emission - one of the variable most strongly linked to global warming.

The Internet - and the digital age at large - has been viewed as mainly bringing net postive. It powers education and economy. It supports our health and well-being. It also connects individuals to their community and their loved ones. However, as I got older, I kept finding myself contemplating whether or not the overwhelming positive overshadows a potentialy serious downside. For me, an environmental impact is one area that I believe will grow in urgency as we are going to keep witnessing the impact of a changing climate on our everyday life. For this analytics project, not only I want to know if every country is making more co2 emission with higher internet useage, but also if there is anomaly out there where a country managed to power their internet growth from sustainable sources.