2025 - Year In Review

A summary of what I did in 2025.

TL;DR

A year is over in the blink of an eye. However fast the year goes, there are a lot of things that were done this year. In this article, I will continue the tradition of the last two years (here and here) and summarize what I remember about the things I’ve done this year.


The AI trend continues this year as well, carrying on from last year. What’s predominantly called “AI” is LLMs and image, audio, and video generators, which can be guided to achieve a goal using multimodal prompts. While AI is affecting a lot of society, tech catches the trend first and utilizes it the most. I was involved in a few initiatives related to this trend.

In the following sections, I go into a little more depth about each initiative, and the fruits and bitterness of them.

1. MCP

MCP stands for Model Context Protocol, a standard developed by Anthropic to unify how LLMs interact with external systems. Most mainstream LLM providers and applications provide MCP support today. We wanted to revamp the Product Search Chatbot that we built last year and refactor the tools part out. MCP is a perfect solution for this.

We took the tools (the functions that contain business logic) away from the chatbot and plugged them into MCP. We had to write detailed documentation for the tools and their parameters. When a tool successfully executes, we also send an instructions field in the response to help the LLM present the results in a natural way.

Even after providing all these instructions and SCREAMING at the LLMs, we found them occasionally going off-guard, which I think is to be expected with a stochastic parrot.

Since our MCP needs to persist user context across chat messages (i.e., it is stateful), we had to rely on MCP sessions. The server sets a Session ID on the initial handshake and expects the client (the LLM provider’s application) to send it along with future messages. We store this Session ID alongside our internal user identifiers in a Redis server.

While this session persistence works great in theory, we found that Claude repeatedly drops the Session ID in the middle of the chat, causing a bad UX. We still have to investigate whether the problem is on Claude’s end or in our distributed MCP server and Redis Cluster setup.

The @modelcontextprotocol/sdk library provides a great DX, although it is quite bare-bones. Since our MCP server is HTTP-based (and not stdio), we had to rely on ngrok port forwarding to expose our MCP server to the internet and try it with the OpenAI platform and Claude web.

Now we use the MCP for the same chatbot that we built last year. This chatbot is a part of the True Fit app (available in the US for Android and iOS — go check it out!) and helps our customers discover the best-fitting products for them.

2. Data Normalization

Since LLMs are trained on large corpora of web data, they encode common knowledge. This makes them good at zero-shot and few-shot classification or generation tasks. Paired with their structured output capability, they can serve as efficient data classification or transformation models.

We use LLMs heavily to normalize data, map one data point to another, and derive attributes. While we invest heavily in prompts and local testing, these models reveal their quirks when running in production with a diversity of inputs. We observed LLM output changing vastly and sometimes going into an infinite token loop, even when we change the input slightly. We now catch such errors and retry the input, slightly adjusting it.

When we want to generate large JSON outputs, we also hit the output token limit. This clips the LLM output and results in incomplete JSON. To handle that, we now have a chunkify<T>(array: T[], length: n): T[][] utility. It returns smaller chunks of the original array, and we invoke the LLM on each chunk individually.

Even after chunking and randomization, we see LLMs hallucinating a lot in our tasks. While a model upgrade solves some of the issues, it is not a silver bullet.

I personally feel restricted token generation in JSON mode limits some of the creativity of the LLM. I plan to use a new object notation such as TOON (Token Oriented Object Notation), which also saves token cost as a bonus.

3. AI-UIs

Vibecoding became popular this year, with large companies reporting that a good chunk of their code is AI-generated. We wanted to build an internal UI to debug one of our internal applications, and planned to give AI a try, since we had limited resources to build it on our own.

We turned to Vercel’s v0 for help and got started quickly. We shared our UI idea, API endpoints, and request/response schemas, and watched it magically generate the UI.

Since we did not want to host Next.js and were content with react-router-dom, we manually converted the API calls to fit its loader and action abstractions. We also found that the code generated by v0 needed a lot of editing to match our requirements.

In the end, we started using v0 in a very different way. We ask it to generate isolated components, which we then integrate into our codebase. This is our default strategy now, and it’s working great.

AI code generation has come a long way since we first tried v0. We plan to try it in future and current projects to get a speed boost.


I did work on a lot of other things apart from AI this year. Battling with LLMs via prompts is bound to make someone mad. Prompts do not behave like code — they are always stochastic. In the following sections, I will give a walkthrough of the non-AI things I did this year.

4. Distributed Systems

We host all of our systems in Kubernetes, gradually deprecating our in-house Kubernetes rival. Since we need to run more than one pod to manage the load, our systems naturally become distributed.

Many times, we keep the pods operationally independent and stateless, delegating the state to the database. There are still cases where we want these pods to collaborate with each other. For example, in our LLM data processing pipeline, we did not want the LLM to process the same input twice. While storing the result in a DB and checking it before we process a new input is a solution, it falls short when these inputs arrive at the same time.

To solve this, we implemented distributed locking via Redis using the redlock algorithm. Redlock intelligently handles the Redis cluster case as well, making it an ideal candidate for our use case. We also use Redis to stay within the LLM provider’s token and request limits using the bottleneck library. Now our distributed system is backed by a distributed set of Redis nodes in the cluster, making it truly distributed!

5. MQ Shennanigans Again

When we take a lock as described above, we don’t want to make our pod busy-wait on the lock. Instead, we planned to requeue the messages so they get another chance to be processed. We could not simply drop the messages in our case, as every input can hold some extra information that might be useful.

We use RabbitMQ, and I implemented requeueing using AMQP’s requeue mechanism, thinking it would do the job. However, it quickly brought the system down, including the RabbitMQ broker. It turns out that native requeue makes the messages immediately available to consumers, as opposed to putting them at the end of the queue. This caused high load on the brokers and the pods alike.

Enlightened, I changed the requeue mechanism to publish back to the same queue, thinking it would solve the problem. However, message delivery slowed down a lot, as the queue was getting busy handling incoming retry messages. We had to switch to a proper dead-letter queue setup to overcome this. The setup works like this:

  1. We create a main queue work.queue and an associated exchange work.exchange.
  2. We create a retry queue retry.queue with a dead-letter exchange set to work.exchange and a message TTL set. We also create an associated exchange retry.exchange.
  3. For the message that we want to retry, we manually publish to retry.exchange, which gets routed to retry.queue.
  4. The retry.queue sends the message back to work.exchange once the message outlives the TTL.

Many setups avoid step 3 by setting up a dead-letter exchange for work.queue as well, which makes all nacked messages go to the dead-letter exchange. However, we could not redefine work.queue since we already had it set up. This setup now gives us maximum control over when to retry, while still preserving backward compatibility.

6. A tale of the Databases

All the data pipelines and enrichment that I mentioned above need to go to some destination. The destination database can vary depending on the application it is used for. We use PostgreSQL and Scylla for real-time OLTP databases, and BigQuery for system-wide insights, where we can join tables from multiple databases. For the MCP and the mobile app, we had another requirement: fast analytical and search queries.

Traditionally, search queries were performed using full-text search, and there are a lot of specialized databases to handle them, with Elasticsearch being a popular one. With the advent of embedding models, we can also do similarity search using vector distances. When we want to combine full-text search, similarity search, and optional filters, the problem becomes hybrid search, which we solved using ClickHouse last year.

To handle B2C search traffic for the app, we started exploring a proper search-optimized database this year. Elasticsearch was an obvious choice, but we refrained from using it at first to explore other alternatives. Since our team is SQL-native, we gave Manticore Search a try, which has excellent hybrid search capabilities and runs a MySQL-compatible protocol.

All the infrastructure and cluster setup for Manticore went smoothly. However, we found the cluster becoming unstable as a result of the Galera synchronization algorithm that it uses, and due to our heavy ingest + vector index load. We ended up moving ahead with an Elasticsearch deployment using their Kubernetes operator.

Translating an SQL-based ranking algorithm to Elasticsearch is a big task. We could not achieve that rewrite due to a time crunch, and our ClickHouse is happily serving app search and MCP searches. We are also considering the possibility of going all-in on ClickHouse for search, as we greatly enjoy its SQL support.

7. Self-Service Infrastructure

We use GitOps-based Kubernetes workload management with FluxCD and Kustomization. Along with our applications, we also manage self-hosted infrastructure such as Elasticsearch and ClickHouse using FluxCD itself. FluxCD has helped people create and deploy their microservices in a matter of a day or two now.

FluxCD supports Kustomization overlays, allowing us to override config for each environment if needed. We have base, prod, and dev overlays. To release a microservice, we had to patch the deployment.yaml in the prod or dev overlay, which is cumbersome to do quickly. To mitigate that, I introduced Kustomization components called prod-images and dev-images, which override the image version in the respective overlays. This has centralized image tag management and made the process smoother.

We still rely on Terraform to manage secrets for these applications. We are currently exploring secret management solutions such as Vault, which can provide custom Kubernetes resources to automatically generate secrets that can be injected into pods. Encrypted secrets in git through sealed-secrets is also an option. If you have any suggestions on how to manage secrets with automatic propagation, please do reach out!

8. Doing Math in JS

Last year, I wrote about doing efficient tensor operations in JS using TensorFlow.js. We continue to use TensorFlow.js heavily, and it powers our core engines.

While working with TensorFlow, we now make use of broadcasting instead of worrying about making tensor shapes equal in many cases. For example, if we have a matrix A and want to add 1 to all elements, we can do it in the following ways—with and without broadcasting:

import * as tf from "@tensorflow/tfjs-node"

const A = tf.tensor([
  [1, 2, 3],
  [4, 5, 6],
  [7, 8, 9],
])
const APlusOneWithoutBroadcasting = A.add(tf.onesLike(A))
const APlusOneWithBroadcasting = A.add(1)

Broadcasting makes our code easy to read and follow. It can also make refactoring easier.

When we started with TensorFlow.js, we did not know that tensors were not garbage-collected. Since Kubernetes managed the deployment, it was automatically restarting pods with OOMKilled when they hit the memory limits. The problem surfaced when load and restarts coincided. This triggered an increase in service response time latency and caused alerts.

We now use a combination of tf.tidy and tf.dispose to manage memory. Our usage pattern looks like this:

const computeUsingTensors = (
  inputs: number[] /* can be any number of dimensions*/,
  others: any[],
) => {
  const resultTensor = tf.tidy(() => {
    const inputTensor = tf.tensor(inputs)
    // Do tensor ops here
    // All the tensors except resultTensor will be garbage-collected,
    // once tidy exits
    return resultTensor
  })
  const result = resultTensor.arraySync()
  // Manually dispose the result tensor now
  resultTensor.dispose()
  return result
}

If you carefully read the code above, you’ll notice that .arraySync() is a blocking operation. It blocks the main thread until its results are ready. On top of that, all tensor operations like .add() in a Node.js backend are blocking as well. We learned this the hard way when our pods started failing the readiness probe HTTP checks and were being restarted by Kubernetes, again causing latency spikes and alerts.

We could either convert .arraySync() to .array(), coloring all functions async, or delegate the tensor operations to dedicated worker threads and implement message-passing between the main and worker threads. Both approaches would have made the code convoluted and/or introduced a lot of thread management code.

We decided to pursue a simpler path instead: we scale our deployment up to keep the target CPU utilization at 40%. This is a number we found experimentally, and it is working great in production now. We give each pod 1 vCPU and 1 GB RAM, as Node.js cannot use more than 1 vCPU since it is single-threaded. At peak load, our service can scale to ~256 pods, while still consuming fewer resources than our legacy Scala-based JVM engine.

9. Work in Progress: Configurable UI

When you are a B2B2C business, with the end-users of your business partners seeing and interacting with your UI, customization becomes paramount. After all, one size does not fit all. UI customization will be driven by a configuration that is different for each of your business partners, according to their needs. We can think of customizations in three levels:

  1. Basic customization: Theming through CSS variables (for example: colors, radii, etc.). Turning a feature on or off. The code has if or switch conditions everywhere to accommodate these customizations. Many companies provide this level of customization.
  2. Everything is customizable: You provide a set of APIs, maybe a JS client to encapsulate them. Your business partners define how they want the UI to be. Maybe you can provide them with several pre-built templates. Shopify allows this level of customization through their liquid templating language.
  3. The Middle Way: Build a set of customizable components, and allow your business partners to mix and match them. This is the approach we chose, as we wanted more flexibility than approach #1, and #2 was too much to build.

The idea in the “Middle Way” is to give business partners flexibility to customize the UI according to their needs, but they still need our assistance if they want too much flexibility. In that case, we will build a new component tailored to their needs.

This project is currently a work in progress. Expect an article from me early next year on how far we can go with the Middle Way.

10. Getting things done, fast

Software engineering is not just about programming and closing tickets. It involves greater collaboration with other stakeholders and teams to get things going smoothly. One also needs to be visionary and think about the future to set goals, design, and architect a system.

We built a data onboarding system three years ago. While building the system and getting it working was easy, moving our business partners to it was hard. We started a migration effort soon after we got the system into a solid state, but the migration stopped soon after, citing data format incompatibilities. This year, we decided to revisit the migration, as the cost of maintaining two systems was straining us a lot.

Lots of things have changed in the last two years, including data formats. We had a team of three: me for technical assistance and fixing bugs as needed, one person for doing the actual groundwork, and another to prioritize and keep things in check. We completed the migration in a miraculous two months, migrating 96% of our partners’ data into the new system. The remaining ones pose a business challenge, since their technical development cycles run slow. We are planning to onboard them to an end-user-side data collection mechanism that I mentioned in another article.

While this story is about teamwork and getting things done, it also reflects another core aspect: mindset. If one is not willing to adapt to new things, systems can become pretty rusty, pretty fast. That rust shows up everywhere, from operational efficiency to monetary cost. I was recommended the book The Five Dysfunctions of a Team by one of my colleagues. It’s a great story about how mindset and building trust can reduce a lot of friction, uncertainty, and fear. I recommend everyone working in a team setting to go give the book a read!

11. Thoughts on React

I will dedicate this section to one of the primary frontend technologies most of the world uses, including us.

I was introduced to React when I was doing the setup for my personal blog. I was running on Jekyll and wanted to build reusable “templates” that could be parameterized with arguments, hopefully typed. When I saw React, it was a perfect fit for me. It was still in the class-based era. While I disliked classes, it also had support for functional components, which was exactly what I needed.

When working on real-world applications, we certainly need state management. The useState hook made perfect sense. The useEffect hook needed some time to adjust to, but the article “You Might not Need an Effect” by Dan cleared everything up. All was good until then.

When we started using React heavily in our stack, and were doing data fetch inside the components (yeah, we shouldn’t have done that; now we use react-router-dom’s loader and action abstraction for that), the need for useCallback and useMemo became evident. The non-preservation of referential integrity in the case of objects was something to adjust to as well. But it was not bad. Maybe the React compiler will help us here.

We made a conscious decision to not go with React context for global state management, instead opting to choose fine-grained reactivity with recoil, a project by Meta, which is now deprecated. However, when we are interfacing with non-React JS libraries, we still have to deal with useRefs and call their APIs to move them out of React’s re-render cycle.

Now React has entered the server side as well, replacing the JSON-API style of wire communication with streaming components and deltas. We have not adapted to this model yet, and might not do that in the near future. We got saved from useVulnerabilities this way!

Our way of React is now sticking to the basics: client-side only, fine-grained reactivity with jotai, data fetch via React Router, and styles via Tailwind (or a component library if needed). We are sustainable with this approach for now.

Having seen a lot of changes in the React ecosystem makes me think: is it really needed? I’m now willing to try JSX-based libraries like Solid.js, or Svelte while evaluating frontend tech choices. Given the current state of React Server Components, it might be worthwhile to try HTMX, Laravel, or Ruby on Rails too.

React’s ecosystem keeps us with React, and we sustain it by not using it too much. One wise person recommended to me to keep all business logic in plain JS, using UI libraries only for the UI. That is what we are doing now.

12. Personal Projects and Ideas

Apart from work, I keep myself involved with a bunch of personal projects to satisfy needs as they come. In this section, I will summarize some of them, their status, and where I might need your help.


Looking forward

Although this year moved fast, like others, I got a lot of good things done, thanks to the support of the amazing team, friends, and family. Today (21 Dec 2026) is the Winter Solstice, and I’m looking forward to the next year with hopes and optimism.

Happy New Year in advance! Summer is coming.

TL;DR

I discuss what I did at work this year in 12 sections, summarized as:

  1. MCP: We built an MCP for the search engine we developed last year. I go through the technical details and challenges here.
  2. LLM data normalization: Usual challenges, and dealing with token limits and hallucinations.
  3. AI UIs: Our attempt at “vibecoding” an internal UI using Vercel’s v0, and the lessons learned.
  4. Distributed systems: Synchronization, locks, and bottlenecks!
  5. MQ shenanigans again: Did you think RabbitMQ would let us sleep peacefully?
  6. A tale of the databases: A lot of data—where to store it? And we need a fault-tolerant cluster too!
  7. Self-service infrastructure: How Flux is shining, and how we optimize it for our use cases.
  8. Doing math in JS: High-performance tensor operations, and caveats learned along the way.
  9. Configurable UI: When your customers ask for a lot of customization, what if you let them do it instead of changing the code yourself?!
  10. Getting things done, fast: As usual, a story of a positive outlook and team spirit.
  11. Thoughts on React: Or, is React becoming too complicated?
  12. Personal projects and ideas: What am I doing outside of work?
Back to Top