Hey there 👋#
I’m Drazen.
I’m an engineer, currently focusing on machine learning. LinkedIn
On the side, I hack indie projects and sometimes write.
Side Projects#
Blogboard is a curated index of engineering blogs with a search engine and weekly newsletter.
The tech behind it: A fine-tuned DistilBERT model to classify articles into topic, Elasticsearch for full-text search and find similar documents, Python/Flask/PostgreSQL for backend, VueJS/Vuetify for frontend. It all runs on a single $20 virtual server.
A simple polling app for Slack, with a slick UI and poll analytics features.
Writing#
Some posts from the Blogboard blogs, written for SEO and my amusement.
A/B Testing for Practical Significance
Code Review Best Practices
Marketing Data Science - Case Studies from Airbnb, Lyft, Doordash
Kano Model Examples - Build Great Products With a Simple Mental Model
Priority Queue in Python
Calculating Variance in Python
Probability Interview Questions
Python Enumerate from Scratch
gitroot – a Simple Command to Navigate to the Root of a Git Repo
Â
Better Heatmaps and Correlation Matrix Plots in Python in Towards Data Science
Â
Fun facts#
The intuition When deriving the cross-entropy loss, we’ve shown how entropy plays a central role in the optimization of softmax models (ie. multi-class classification models).
All large language models (LLMs) are exactly that - softmax models that for an input sequence of \(t\) tokens \(x=[x_1, x_2, \ldots, x_t]\) output a conditional probability distribution \(P(w|x)\) over the vocabulary \(V\) of all tokens. This distribution gives us the most likely next token(s) to continue the input sequence.
...
Binary logistic regression, binary cross-entropy loss We start with the binary logistic regression, and define our task as follows.
We’re given a dataset inputs and targets (labels):
$$ \begin{array}{} (x^{(1)}, y^{(1)})\\ (x^{(2)}, y^{(2)})\\ \vdots\\ (x^{(m)}, y^{(m)}) \end{array} $$and a logistic model:
$$\hat{y}^{(i)} = h(f(x^{(i)}; \theta)) = \frac{1}{1 + e^{-f(x^{(i)}; theta)}}$$Each target \(y^{(i)}\) is either 0 or 1. That is, were doing binary classification, for example fraud/not fraud, churn/not churn, disease/not disease, cat/dog.
...
Say we want to do post-training quantization of an LLM.
For PyTorch models, we’ll usually have an implementation defaulting to bfloat16 and torch.nn layers, such as torch.nn.Linear and torch.nn.Embedding.
We’ll also have pretrained weights. For a HuggingFace model they’ll come in a bunch of .safetensors files, accompanied by model configs.
To get a quantized model, we can simply:
load the pretrained model into memory (cpu or gpu)
Do this with the default, non-quantized dtype, usually bfloat16.
...