Project Morpheous

How I built a fluent interface to transform spreadsheet data into JSON that can be imported via the APIs.

TL;DR

B2B companies are often faced with the problem of data migraion during customer onboarding. Whilst the company has a standard data models defined, they can not expect the customers to bring in the data with the same format. To handle this problem, one can use a one-time script to transform the data. However, this solution has a lot of drawbacks:

In this article, we shall explore how I built a simple solution to handle these problems, with a scope to easily extend it.

Go-To Solutions

Before building software to solve a problem ourselves, it is always good to look at the available options in the market, and check if we can trade money off for the engineering efforts. Since B2B customer onboarding is a frequent problem, there were already a few established SaaS companies providing a solution for it.

After exploring some other solutions such as Dromo, I decided to go with a custom solution.

The Solution

The solution was named Morpheus after a “Matrix” character and the Greek god of dreams. The interesting part of the word was “morph” — to transform, which the project would do the spreadsheet input data.

The project consisted of two parts — one to import the JSON data that is already in the format required by the POST APIs, and another to convert the spreadsheet data coming into the standard JSON format. A validator was placed in between these steps to ensure that data was in the format needed by API.

Importer

The importer tool is responsible for uploading the data present in the JSON files to the APIs. The challenge comes here with the references between the entities. To give a concrete example, let’s assume the following scenario:

We do not know the UUID of the Student resource until it is present in the backend; thus, it JSON input file has to contain some other unique field of the student. Then, the code must call the GET endpoint, and return the UUID of the resource.

To handle such scenarios, I created a Model class in the code, responsible for keeping track of dependencies and resolving them. The class looked like,

type Reference = {
  model: Model
  field: string
  oneToMany: boolean
}

type SearchParams = Record<string, string>

class Model {
  constructor(
    public name: string,
    private args: {
      getEndpoint: string
      postEndpoint: string
      references: Reference[]
    },
  ) {
    // initialize all properties
  }

  async getOne(params: SearchParams): Promise<string> {
    // return the ID of the object matching the search criteria
  }

  async createOne(obj: {}): Promise<string> {
    // create one object and returns its ID
  }

  async resolveDependencies(obj: {}): Promise<{}> {
    // takes in a value from JSON, returns the object with resolved dependencies
    // the returned object can be used in `createOne()`
  }
}

The getOne() method converts SearchParams into a string in the format of the search parameter and calls the GET API, handling the case of multiple objects and no objects returned.

Similarly, the createOne() method calls the POST API to persist the object and returns its ID once it is created.

The method of interest is resolveDependencies() here. It walks through all references, reading in the value from the field, and replacing it with IDs from the getOne() method. For this method to work, the resources have to be imported in the order of their resolved dependency (Here, Course must be imported after importing Student). I used jsonpath to give extra flexibility while accessing the field. Using the Model class, we can define Student and Course models as:

const student = new Model("Student", {
  getEndpoint: "https://exmaple.com/api/student",
  postEndpoint: "https://example.com/api/student",
  references: [],
})

const course = new Model("Course", {
  getEndpoint: "https://example.com/api/course",
  postEndpoint: "https://example.com/api/course",
  references: [
    {
      model: student,
      field: "$.students",
      oneToMany: true,
    },
  ],
})

Now, we can start importing the objects. Let’s say we have the following data.

const students = [
  {
    email: "alex@uni.edu",
    name: "Alex Jones",
  },
  {
    email: "bob@uni.edu",
    name: "Bob Harris",
  },
  {
    email: "chris@uni.edu",
    name: "Chris Muller",
  },
]

const courses = [
  {
    name: "Intro to CS",
    id: "CS101",
    students: [
      {
        email: "alex@uni.edu",
      },
      {
        email: "bob@uni.edu",
      },
    ],
  },
  {
    name: "Fundamentals of Programming",
    id: "CS102",
    students: [
      {
        email: "chris@uni.edu",
      },
    ],
  },
]

Creating the students is straightforward. We call student.createOne() for all student objects.

await Promise.all(
  students.map((studentObject) => student.createOne(studentObject)),
)

Before creating the courses, however, we need to resolve student IDs.

const resolvedCourses = await Promise.all(
  courses.map((courseObject) => course.resolveDependencies(courseObject)),
)

The resolvedCourses looks like

;[
  {
    name: "Intro to CS",
    id: "CS101",
    students: ["uuid1", "uuid2"],
  },
  {
    name: "Fundamentals of Programming",
    id: "CS102",
    students: ["uuid3"],
  },
]

Now, we can import the resolvedCourses as usual.

await Promise.all(
  resolvedCourses.map((courseObject) => course.createOne(courseObject)),
)

Note: The Model class takes only care of handling references, creating and getting objects. It is still your responsibility to ensure that other fields are valid.

Transformer

Once the importer tool was ready, it was time to build the Transformer tool. Assuming that we will have student names in student.xlsx with sheet names and an API to query their email based on the names (which is a bad example, please bear with me :sweat_smile:), the Transformer would look as follows:

const studentTransformer = new Transformer()
  .field("name", (data) => data["Student Name"])
  .field("email", async (data) =>
    fetch(`https://uni.edu/api/email?name=${data["name"]}`)
      .then((response) => response.json())
      .then((res) => res.name),
  )
  .input("student.xlsx", { sheet: "names" })
  .output("student.json")

await studentTransformer.transform()

The field(name: string, action: FieldAction) function is defining the JSON field name and action to compute the value at the field. It is possible to return objects and arrays from the action, thus making nested structures possible. The action handlers support both async and sync operations, having the following type:

type DataRow = Record<string, string>

type Output<T> = T | Promise<T>

type FieldAction<T> = (dataRow: DataRow) => Output<T>

When working with real data, I came across the need of skipping some rows based on the condition. This resulted in the addition of the filter(predicate: Predicate) method into the fluent interface, with Predicate defined as:

type Predicate = (dataRow: DataRow) => Output<boolean>

Continuing the example of Students and Courses, let’s assume courses.xlsx contains a big list of Course IDs and Course Names duplicated for every student taking the course. To generate course.json, we need a grouping, rather than a 1-to-1 mapping. To do this, I created groupBy(action: GroupByAction, aggregateFields: string[])) function, with GroupByAction defined as:

type GroupByAction = (dataRow: DataRow) => Output<string[]>

The GroupByAction would return a list of strings, ideally picking them from the dataRow. The transformer would then keep track of the returned strings as keys to group the results returned by FieldAction of fields in aggregateFields into an array. The fields not in aggregateFields are only computed once. With the groupBy() method in place, we can write the transformer for Course:

const courseTransformer = new Transformer()
  .groupBy((data) => [data["Course ID"]], ["students"])
  .field("name", (data) => data["Course Name"])
  .field("id", (data) => data["Course ID"])
  .field("students", async (data) => ({
    email: await fetch(`https://uni.edu/api/email?name=${data["name"]}`)
      .then((response) => response.json())
      .then((res) => res.name),
  }))
  .input("courses.xlsx", { sheet: "course" })
  .output("courses.json")

await courseTransformer.transform()

In the above example, the FieldAction for the "id" field will be run only once for a course, and, the results from the FieldAction of the "students" would be aggregated as an array.

Validator

Once the transformer outputs the JSON file, before importing it, it was necessary to check whether the objects follow the specification of the API, since the importer does only the job of resolving references. I chose zod, which I had come across GitHub trending, to define the schema and validate the JSON file. With zod, schema definition for our Student and Course would look like:

import z from "zod"

const student = z.object({
  name: z.string(),
  email: z.string().email(),
})

const course = z.object({
  name: z.string(),
  id: z.string().regex(/CS\d{3}/),
  students: z.array(student.partial()).nonempty(),
})

And we can do validation as:

const result = student.safeParse(obj)

The Interface

The project Morpheus had a CLI interface, built using commander.

For the Import tool, I defined all models in the backend in models.ts and exported all of them in a Map<string, Model>. The same was repeated with transformers, with transformer.ts exporting a Map<string, Transformer>. A separate transformer.ts is required for a separate customer. The maps contained the name of the model as the key. This helped to choose a model name in the CLI, and get the corresponding transformer or importer.

During processing, if some error happens, the program would not halt; instead, the program writes the error-causing object and error message to an error file (specified in the Command Line) and continue working.

Once the process has finished, one can inspect the error file, fix the errors, and get the objects back using the clean subcommand. These objects can be further fed into the tool, or, sent back to the customer for clarifications.

Impacts

This project had a huge impact on customer onboarding. Specifically, it

Apart from customer onboarding, I used the Import tool heavily in combination with Mockaroo to create dummy data for product demos to potential customers. Thanks to the Import tool, I created around 10 such environments, incrementally gathering the feedback from the business :smile:.

Learnings

Since I handled this project from initialization to the first milestone, I learned how to run a project, what features to prioritize based on the business requirements, and communicate them with all the stakeholders. Apart from coding, I was involved with documenting the project and gathering feedback from other developers regarding code quality and interface usability, from which I was able to understand the project from different perspectives.

TL;DR

Back to Top