2025 - Year In Review
A summary of what I did in 2025.
A year is over in the blink of an eye. However fast the year goes, there are a lot of things that were done this year. In this article, I will continue the tradition of the last two years (here and here) and summarize what I remember about the things I’ve done this year.
The AI trend continues this year as well, carrying on from last year. What’s predominantly called “AI” is LLMs and image, audio, and video generators, which can be guided to achieve a goal using multimodal prompts. While AI is affecting a lot of society, tech catches the trend first and utilizes it the most. I was involved in a few initiatives related to this trend.
- an MCP server to enable the hybrid search capabilities that we developed last year
- data normalization using LLMs
- building in-house UIs using code generation models
In the following sections, I go into a little more depth about each initiative, and the fruits and bitterness of them.
1. MCP
MCP stands for Model Context Protocol, a standard developed by Anthropic to unify how LLMs interact with external systems. Most mainstream LLM providers and applications provide MCP support today. We wanted to revamp the Product Search Chatbot that we built last year and refactor the tools part out. MCP is a perfect solution for this.
We took the tools (the functions that contain business logic) away from the chatbot and plugged them into MCP. We had to write detailed documentation for the tools and their parameters. When a tool successfully executes, we also send an instructions field in the response to help the LLM present the results in a natural way.
Even after providing all these instructions and SCREAMING at the LLMs, we found them occasionally going off-guard, which I think is to be expected with a stochastic parrot.
Since our MCP needs to persist user context across chat messages (i.e., it is stateful), we had to rely on MCP sessions. The server sets a Session ID on the initial handshake and expects the client (the LLM provider’s application) to send it along with future messages. We store this Session ID alongside our internal user identifiers in a Redis server.
While this session persistence works great in theory, we found that Claude repeatedly drops the Session ID in the middle of the chat, causing a bad UX. We still have to investigate whether the problem is on Claude’s end or in our distributed MCP server and Redis Cluster setup.
The @modelcontextprotocol/sdk library provides a great DX, although it is quite bare-bones. Since our MCP server is HTTP-based (and not stdio), we had to rely on ngrok port forwarding to expose our MCP server to the internet and try it with the OpenAI platform and Claude web.
Now we use the MCP for the same chatbot that we built last year. This chatbot is a part of the True Fit app (available in the US for Android and iOS — go check it out!) and helps our customers discover the best-fitting products for them.
2. Data Normalization
Since LLMs are trained on large corpora of web data, they encode common knowledge. This makes them good at zero-shot and few-shot classification or generation tasks. Paired with their structured output capability, they can serve as efficient data classification or transformation models.
We use LLMs heavily to normalize data, map one data point to another, and derive attributes. While we invest heavily in prompts and local testing, these models reveal their quirks when running in production with a diversity of inputs. We observed LLM output changing vastly and sometimes going into an infinite token loop, even when we change the input slightly. We now catch such errors and retry the input, slightly adjusting it.
When we want to generate large JSON outputs, we also hit the output token limit. This clips the LLM output and results in incomplete JSON. To handle that, we now have a chunkify<T>(array: T[], length: n): T[][] utility. It returns smaller chunks of the original array, and we invoke the LLM on each chunk individually.
Even after chunking and randomization, we see LLMs hallucinating a lot in our tasks. While a model upgrade solves some of the issues, it is not a silver bullet.
I personally feel restricted token generation in JSON mode limits some of the creativity of the LLM. I plan to use a new object notation such as TOON (Token Oriented Object Notation), which also saves token cost as a bonus.
3. AI-UIs
Vibecoding became popular this year, with large companies reporting that a good chunk of their code is AI-generated. We wanted to build an internal UI to debug one of our internal applications, and planned to give AI a try, since we had limited resources to build it on our own.
We turned to Vercel’s v0 for help and got started quickly. We shared our UI idea, API endpoints, and request/response schemas, and watched it magically generate the UI.
Since we did not want to host Next.js and were content with react-router-dom, we manually converted the API calls to fit its loader and action abstractions. We also found that the code generated by v0 needed a lot of editing to match our requirements.
In the end, we started using v0 in a very different way. We ask it to generate isolated components, which we then integrate into our codebase. This is our default strategy now, and it’s working great.
AI code generation has come a long way since we first tried v0. We plan to try it in future and current projects to get a speed boost.
I did work on a lot of other things apart from AI this year. Battling with LLMs via prompts is bound to make someone mad. Prompts do not behave like code — they are always stochastic. In the following sections, I will give a walkthrough of the non-AI things I did this year.
4. Distributed Systems
We host all of our systems in Kubernetes, gradually deprecating our in-house Kubernetes rival. Since we need to run more than one pod to manage the load, our systems naturally become distributed.
Many times, we keep the pods operationally independent and stateless, delegating the state to the database. There are still cases where we want these pods to collaborate with each other. For example, in our LLM data processing pipeline, we did not want the LLM to process the same input twice. While storing the result in a DB and checking it before we process a new input is a solution, it falls short when these inputs arrive at the same time.
To solve this, we implemented distributed locking via Redis using the redlock algorithm. Redlock intelligently handles the Redis cluster case as well, making it an ideal candidate for our use case. We also use Redis to stay within the LLM provider’s token and request limits using the bottleneck library. Now our distributed system is backed by a distributed set of Redis nodes in the cluster, making it truly distributed!
5. MQ Shennanigans Again
When we take a lock as described above, we don’t want to make our pod busy-wait on the lock. Instead, we planned to requeue the messages so they get another chance to be processed. We could not simply drop the messages in our case, as every input can hold some extra information that might be useful.
We use RabbitMQ, and I implemented requeueing using AMQP’s requeue mechanism, thinking it would do the job. However, it quickly brought the system down, including the RabbitMQ broker. It turns out that native requeue makes the messages immediately available to consumers, as opposed to putting them at the end of the queue. This caused high load on the brokers and the pods alike.
Enlightened, I changed the requeue mechanism to publish back to the same queue, thinking it would solve the problem. However, message delivery slowed down a lot, as the queue was getting busy handling incoming retry messages. We had to switch to a proper dead-letter queue setup to overcome this. The setup works like this:
- We create a main queue
work.queueand an associated exchangework.exchange. - We create a retry queue
retry.queuewith a dead-letter exchange set towork.exchangeand a message TTL set. We also create an associated exchangeretry.exchange. - For the message that we want to retry, we manually publish to
retry.exchange, which gets routed toretry.queue. - The
retry.queuesends the message back towork.exchangeonce the message outlives the TTL.
Many setups avoid step 3 by setting up a dead-letter exchange for work.queue as well, which makes all nacked messages go to the dead-letter exchange. However, we could not redefine work.queue since we already had it set up. This setup now gives us maximum control over when to retry, while still preserving backward compatibility.
6. A tale of the Databases
All the data pipelines and enrichment that I mentioned above need to go to some destination. The destination database can vary depending on the application it is used for. We use PostgreSQL and Scylla for real-time OLTP databases, and BigQuery for system-wide insights, where we can join tables from multiple databases. For the MCP and the mobile app, we had another requirement: fast analytical and search queries.
Traditionally, search queries were performed using full-text search, and there are a lot of specialized databases to handle them, with Elasticsearch being a popular one. With the advent of embedding models, we can also do similarity search using vector distances. When we want to combine full-text search, similarity search, and optional filters, the problem becomes hybrid search, which we solved using ClickHouse last year.
To handle B2C search traffic for the app, we started exploring a proper search-optimized database this year. Elasticsearch was an obvious choice, but we refrained from using it at first to explore other alternatives. Since our team is SQL-native, we gave Manticore Search a try, which has excellent hybrid search capabilities and runs a MySQL-compatible protocol.
All the infrastructure and cluster setup for Manticore went smoothly. However, we found the cluster becoming unstable as a result of the Galera synchronization algorithm that it uses, and due to our heavy ingest + vector index load. We ended up moving ahead with an Elasticsearch deployment using their Kubernetes operator.
Translating an SQL-based ranking algorithm to Elasticsearch is a big task. We could not achieve that rewrite due to a time crunch, and our ClickHouse is happily serving app search and MCP searches. We are also considering the possibility of going all-in on ClickHouse for search, as we greatly enjoy its SQL support.
7. Self-Service Infrastructure
We use GitOps-based Kubernetes workload management with FluxCD and Kustomization. Along with our applications, we also manage self-hosted infrastructure such as Elasticsearch and ClickHouse using FluxCD itself. FluxCD has helped people create and deploy their microservices in a matter of a day or two now.
FluxCD supports Kustomization overlays, allowing us to override config for each environment if needed. We have base, prod, and dev overlays. To release a microservice, we had to patch the deployment.yaml in the prod or dev overlay, which is cumbersome to do quickly. To mitigate that, I introduced Kustomization components called prod-images and dev-images, which override the image version in the respective overlays. This has centralized image tag management and made the process smoother.
We still rely on Terraform to manage secrets for these applications. We are currently exploring secret management solutions such as Vault, which can provide custom Kubernetes resources to automatically generate secrets that can be injected into pods. Encrypted secrets in git through sealed-secrets is also an option. If you have any suggestions on how to manage secrets with automatic propagation, please do reach out!
8. Doing Math in JS
Last year, I wrote about doing efficient tensor operations in JS using TensorFlow.js. We continue to use TensorFlow.js heavily, and it powers our core engines.
While working with TensorFlow, we now make use of broadcasting instead of worrying about making tensor shapes equal in many cases. For example, if we have a matrix A and want to add 1 to all elements, we can do it in the following ways—with and without broadcasting:
import * as tf from "@tensorflow/tfjs-node"
const A = tf.tensor([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
])
const APlusOneWithoutBroadcasting = A.add(tf.onesLike(A))
const APlusOneWithBroadcasting = A.add(1)Broadcasting makes our code easy to read and follow. It can also make refactoring easier.
When we started with TensorFlow.js, we did not know that tensors were not garbage-collected. Since Kubernetes managed the deployment, it was automatically restarting pods with OOMKilled when they hit the memory limits. The problem surfaced when load and restarts coincided. This triggered an increase in service response time latency and caused alerts.
We now use a combination of tf.tidy and tf.dispose to manage memory. Our usage pattern looks like this:
const computeUsingTensors = (
inputs: number[] /* can be any number of dimensions*/,
others: any[],
) => {
const resultTensor = tf.tidy(() => {
const inputTensor = tf.tensor(inputs)
// Do tensor ops here
// All the tensors except resultTensor will be garbage-collected,
// once tidy exits
return resultTensor
})
const result = resultTensor.arraySync()
// Manually dispose the result tensor now
resultTensor.dispose()
return result
}If you carefully read the code above, you’ll notice that .arraySync() is a blocking operation. It blocks the main thread until its results are ready. On top of that, all tensor operations like .add() in a Node.js backend are blocking as well. We learned this the hard way when our pods started failing the readiness probe HTTP checks and were being restarted by Kubernetes, again causing latency spikes and alerts.
We could either convert .arraySync() to .array(), coloring all functions async, or delegate the tensor operations to dedicated worker threads and implement message-passing between the main and worker threads. Both approaches would have made the code convoluted and/or introduced a lot of thread management code.
We decided to pursue a simpler path instead: we scale our deployment up to keep the target CPU utilization at 40%. This is a number we found experimentally, and it is working great in production now. We give each pod 1 vCPU and 1 GB RAM, as Node.js cannot use more than 1 vCPU since it is single-threaded. At peak load, our service can scale to ~256 pods, while still consuming fewer resources than our legacy Scala-based JVM engine.
9. Work in Progress: Configurable UI
When you are a B2B2C business, with the end-users of your business partners seeing and interacting with your UI, customization becomes paramount. After all, one size does not fit all. UI customization will be driven by a configuration that is different for each of your business partners, according to their needs. We can think of customizations in three levels:
- Basic customization: Theming through CSS variables (for example: colors, radii, etc.). Turning a feature on or off. The code has
iforswitchconditions everywhere to accommodate these customizations. Many companies provide this level of customization. - Everything is customizable: You provide a set of APIs, maybe a JS client to encapsulate them. Your business partners define how they want the UI to be. Maybe you can provide them with several pre-built templates. Shopify allows this level of customization through their
liquidtemplating language. - The Middle Way: Build a set of customizable components, and allow your business partners to mix and match them. This is the approach we chose, as we wanted more flexibility than approach #1, and #2 was too much to build.
The idea in the “Middle Way” is to give business partners flexibility to customize the UI according to their needs, but they still need our assistance if they want too much flexibility. In that case, we will build a new component tailored to their needs.
This project is currently a work in progress. Expect an article from me early next year on how far we can go with the Middle Way.
10. Getting things done, fast
Software engineering is not just about programming and closing tickets. It involves greater collaboration with other stakeholders and teams to get things going smoothly. One also needs to be visionary and think about the future to set goals, design, and architect a system.
We built a data onboarding system three years ago. While building the system and getting it working was easy, moving our business partners to it was hard. We started a migration effort soon after we got the system into a solid state, but the migration stopped soon after, citing data format incompatibilities. This year, we decided to revisit the migration, as the cost of maintaining two systems was straining us a lot.
Lots of things have changed in the last two years, including data formats. We had a team of three: me for technical assistance and fixing bugs as needed, one person for doing the actual groundwork, and another to prioritize and keep things in check. We completed the migration in a miraculous two months, migrating 96% of our partners’ data into the new system. The remaining ones pose a business challenge, since their technical development cycles run slow. We are planning to onboard them to an end-user-side data collection mechanism that I mentioned in another article.
While this story is about teamwork and getting things done, it also reflects another core aspect: mindset. If one is not willing to adapt to new things, systems can become pretty rusty, pretty fast. That rust shows up everywhere, from operational efficiency to monetary cost. I was recommended the book The Five Dysfunctions of a Team by one of my colleagues. It’s a great story about how mindset and building trust can reduce a lot of friction, uncertainty, and fear. I recommend everyone working in a team setting to go give the book a read!
11. Thoughts on React
I will dedicate this section to one of the primary frontend technologies most of the world uses, including us.
I was introduced to React when I was doing the setup for my personal blog. I was running on Jekyll and wanted to build reusable “templates” that could be parameterized with arguments, hopefully typed. When I saw React, it was a perfect fit for me. It was still in the class-based era. While I disliked classes, it also had support for functional components, which was exactly what I needed.
When working on real-world applications, we certainly need state management. The useState hook made perfect sense. The useEffect hook needed some time to adjust to, but the article “You Might not Need an Effect” by Dan cleared everything up. All was good until then.
When we started using React heavily in our stack, and were doing data fetch inside the components (yeah, we shouldn’t have done that; now we use react-router-dom’s loader and action abstraction for that), the need for useCallback and useMemo became evident. The non-preservation of referential integrity in the case of objects was something to adjust to as well. But it was not bad. Maybe the React compiler will help us here.
We made a conscious decision to not go with React context for global state management, instead opting to choose fine-grained reactivity with recoil, a project by Meta, which is now deprecated. However, when we are interfacing with non-React JS libraries, we still have to deal with useRefs and call their APIs to move them out of React’s re-render cycle.
Now React has entered the server side as well, replacing the JSON-API style of wire communication with streaming components and deltas. We have not adapted to this model yet, and might not do that in the near future. We got saved from useVulnerabilities this way!
Our way of React is now sticking to the basics: client-side only, fine-grained reactivity with jotai, data fetch via React Router, and styles via Tailwind (or a component library if needed). We are sustainable with this approach for now.
Having seen a lot of changes in the React ecosystem makes me think: is it really needed? I’m now willing to try JSX-based libraries like Solid.js, or Svelte while evaluating frontend tech choices. Given the current state of React Server Components, it might be worthwhile to try HTMX, Laravel, or Ruby on Rails too.
React’s ecosystem keeps us with React, and we sustain it by not using it too much. One wise person recommended to me to keep all business logic in plain JS, using UI libraries only for the UI. That is what we are doing now.
12. Personal Projects and Ideas
Apart from work, I keep myself involved with a bunch of personal projects to satisfy needs as they come. In this section, I will summarize some of them, their status, and where I might need your help.
- Blog migration: The blog that you are reading now is powered by Astro. An earlier version of the blog used Next.js, which I thought was too much for simple static site generation. I do not run any trackers on this website, so it is fine for me now. I used Tufte.css for styling, as it reads pretty much like a classic book.
- Rain Group: There is a local farmers’ group that shares daily rainfall details in a WhatsApp group. I used
whatsapp-web.jsto collect these messages and store them in a database, before attempting to normalize the data using LLMs. The project is now in limbo as WhatsApp’s APIs break the libraries. I also have to reuse a place name if it is already in the database, instead of asking the LLM to provide a new one. If you have interest in this project, do get in touch. - Year of Linux MacBook: I run Asahi Linux dual-boot on my personal MacBook. Asahi Fedora works great and ticks all the boxes, only lacking in battery performance, which prevents it from becoming my daily driver. With UI enshittification happening in macOS now, I think Asahi Linux has the real advantage.
- Self-hosted photos with Immich: I use Immich to host photos, as described in a past article. This setup is now working smoothly and reliably, especially after the v2 stable release of Immich. I had my SD card fill up with Docker overlays and had no luck clearing them, so I opted to format the disk since the Postgres backup was safe on the hard disk. Restoring the backup proved challenging due to an incompatible version of pgvector. I got the restore sorted out with the help of ChatGPT by downgrading and upgrading the databases during the process.
Looking forward
Although this year moved fast, like others, I got a lot of good things done, thanks to the support of the amazing team, friends, and family. Today (21 Dec 2026) is the Winter Solstice, and I’m looking forward to the next year with hopes and optimism.
Happy New Year in advance! Summer is coming.
TL;DR
I discuss what I did at work this year in 12 sections, summarized as:
- MCP: We built an MCP for the search engine we developed last year. I go through the technical details and challenges here.
- LLM data normalization: Usual challenges, and dealing with token limits and hallucinations.
- AI UIs: Our attempt at “vibecoding” an internal UI using Vercel’s v0, and the lessons learned.
- Distributed systems: Synchronization, locks, and bottlenecks!
- MQ shenanigans again: Did you think RabbitMQ would let us sleep peacefully?
- A tale of the databases: A lot of data—where to store it? And we need a fault-tolerant cluster too!
- Self-service infrastructure: How Flux is shining, and how we optimize it for our use cases.
- Doing math in JS: High-performance tensor operations, and caveats learned along the way.
- Configurable UI: When your customers ask for a lot of customization, what if you let them do it instead of changing the code yourself?!
- Getting things done, fast: As usual, a story of a positive outlook and team spirit.
- Thoughts on React: Or, is React becoming too complicated?
- Personal projects and ideas: What am I doing outside of work?