A guide to validating data (with Joi)

A guide to validating data (with Joi)

When and how to use Joi as a validation library for JS/TS based projects

When do you need validation?

Whenever you are dealing with large structured data (think of JS objects or JSON data with more than 3-4 fields or more), many-a-times you want to verify the data.

For a real-world use-case, say you are building a server with an HTTP POST endpoint for storing information about all the pets in your neighborhood. After researching, say you ended up with the following structure:

{
  "name": "Popo",
  "age": 4,
  "species": "DOG",
  "breed": "POMERANIAN",
  "color": ["White", "Brown"],
  "food": {
    "name": "Milk",
    "type": "Vegetarian"
  },
  "comments": "Certainly the cutest dog in the next 5-mile radius"
}

If I want to build a website that shows a form to fill this information, I could think of some form elements. The species could be a dropdown, name a text-input, age a number, breed a dropdown that is dependent on the selected species, one or more color values, food as a nested list of objects and so on. On submitting the form, you want to call your HTTP server from the website with this data to finally store it in your database.

What could go wrong?

As you might have expected, since your website code is something that can be modified by any visitor, you need to validate the data that you are receiving from your form inputs before sending them to the server. Similarly, if you are building an open API for others to build a website, or a native app (Android/iOS/macOS/Windows), then you need to validate the data being sent by these client apps before storing it in your database. These validations could be just loose structural validations like checking for certain keys like breed and age to be present in the JSON, or they could be much tighter like checking for all the keys you expect, and things like age should be a number, should be less than say 100 (Edge case).

A rudimentary approach

If you had just deal with a few (2-3) fields in your project, you might be already be doing it using conditionals.

Something like:

if (!(pet instanceof Object)) {
  console.log('Please provide a pet'); // Throw an instead of logging if needed
}
if (!('name' in pet) || pet.name.length === 0) {
  console.log('Pleas provide a name');
}

While this is easy to understand and fast to prototype, but won't scale once the number of fields increases or the condition on those fields becomes complex. For example, what if you want to check for age to be a number that is less than 100.

if (!('age' in pet)) {
  console.log('Please provide an age');
} else {
  if (typeof pet.age !== 'number') {
    console.log('Age should be a number');
  } else {
    if (pet.age >= 100) {
      console.log('Pets cannot be that old (normally)!');
    }
  }
}

With more and more checks, you would end up with a lot of if-else clauses; which are hard to understand and maintain. So how do we solve this problem?

Validating using a library

This is when a validation library like Joi might just make your day. Here's a simple implementation if I wanted to check and validate the keys and their data types for my pets data.

I install and import the Joi library which I am then going to use to create a schema for the data that I am expecting to receive.

In Joi you declare schemas for the data that you are expecting. A schema is a JS object like any other (except that it is immutable, we will explore this later). Just know that it is generated by Joi and has a validate function to compare the schema with any data that might be passed to it, which then may return an error object with a message property in case the passed in data conflicts with the schema.

Let's check a basic use case:

const ageSchema = Joi.number().min(0).max(100).required();

const result = ageSchema.validate('A');

console.log(result.error?.message); // prints an error message because 'A' is not a number

Here we create a schema to check for an age value. We want it to be a number and between 0 and 100. Once the schema for this is defined, we pass a sample age to our schema and check for errors.

Seems a lot for just an age, right? We can handle this in a much simpler manner in a plain if-else. But what if we want to validate our pet JSON data.

Let' declare one for our pet. We know that we are expecting an object (not a string, array, or any other type). And say we at least need a name and an age. The corresponding Joi schema would look like the following.

const schema = Joi.object({
  name: Joi.string().required(),
  age: Joi.number().required(),
});

Now if I forget to pass the name for my pet, I will get a validation error.

const partialPetSchema = Joi.object({
  name: Joi.string().required(),
  age: Joi.number().required(),
});

const result = partialPetSchema.validate({ age: 4 });
console.log(result.error.message);

Similarly, if supposedly I pass a string to the age key, I get an appropriate error message

const partialPetSchema = Joi.object({
  name: Joi.string().required(),
  age: Joi.number().required(),
});

const result = partialPetSchema.validate({ name: 'Popo', age: 'DOG' });
console.log(result.error.message);

And any valid data will pass through.

const partialPetSchema = Joi.object({
  name: Joi.string().required(),
  age: Joi.number().required(),
});

const validPartialPet = { name: 'Popo', age: '4' };
const error = partialPetSchema.validate(validPartialPet).error;
if (!error) {
  console.log('All OK');
}

What are the advantages of doing it this way? Sure we have got rid of nested if-else clauses. But more than that, what we have achieved is a declarative way of specifying our validations. Thus our code is much easier to understand and reason about, making our work easier when we want to extend it later and add other constraints later on as requirements arrive (you will see this in the next section).

Declarative vs imperative

Something you might have noticed is that this is declarative instead of being imperative. While imperative style gives us the power, declarative style gives us readability and easy-to-manage code. This is the reason SQL is so prevalent in the world of data analysis: you will soon reach the limits of its syntax when trying to do something complex, but for simple prototyping, it's the easiest to write and understand.

Adding constraints

The validation schema above is actually what you would call a loose validation. You can certainly stop right here and call it a day. But sometimes you might want to strive for better quality data as input and/or show relevant error messages when some incorrect data is being sent to you. Here's how we can add some more constraints to our pets input data.

As usual, say we got some new requirements. Fun, right? Say the requirements are as follows:

  • age can be optional, same for all other fields at the root level except name
  • only a few species are supported right now - say dogs and cats
  • color can be a combination of one or multiple colors depending on the pet's skin shade (primary/secondary)
  • If a pet food is supplied, its name and type must be supplied

Whoa! That's a lot of changes. Think about implementing them using if-else clauses!

Well, this is where Joi helps. We can chain our conditions in an easy-to-understand manner so that any other developer can later pick up where we left in a breeze.

const petSchema = Joi.object({
  name: Joi.string().required(), // name must be a string and is a mandatory field, all other fields are optional
  age: Joi.number(), // age should be a number
  species: Joi.string().valid('DOG', 'CAT', 'TURTLE'), // Only certain breed types are allowed
  color: Joi.array().min(1).items(Joi.string()), // one or more skin color combos are accepted
  food: Joi.object({
    // food is a nested object that must have a name and type
    name: Joi.string().required(),
    type: Joi.string().required(),
  }),
});

let result = petSchema.validate({ name: 'Popo', species: 'PIGEON' });
console.log(result.error?.message); // "species" must be one of [DOG, CAT, TURTLE]

result = petSchema.validate({ name: 'Popo', color: [] });
console.log(result.error?.message); // "color" must contain at least 1 items

result = petSchema.validate({ name: 'Popo', food: [] });
console.log(result.error?.message); // "food" must be of type object

result = petSchema.validate({ name: 'Popo', food: { name: 'Milk' } });
console.log(result.error?.message); // "food.type" is required

Configuration options and custom error messages

As you might have noticed above, sometimes the error messages that Joi provides can be a little more cryptic than usual for your end-user.

For example, the \" before and after any field name seems not so user-friendly.

Plus in the last invalid example above, where we tested for the nested food property, the error message saying food.type is a bit revelatory. The end-user might not want to know that we are storing the type of food inside a food object.

We can tackle the first case by changing the configuration options that Joi provides us. For the second use case, we can provide a custom error message.

const joiOptions = { errors: { wrap: { label: "'" } } }; // We want a single quote instead of double quote

// Using Joi - configuration and custom error messages
const petSchema = Joi.object({
  name: Joi.string().required(),
  food: Joi.object({
    name: Joi.string().required().messages({ 'any.required': 'Food name is required' }),
    type: Joi.string().required().messages({ 'any.required': 'Food type is required' }),
  }),
});

// Checking for name
let result = petSchema.validate({}, joiOptions);
console.log(result.error?.message); // 'name' is required

// Checking for food -> type
result = petSchema.validate({ name: 'Popo', food: { name: 'Milk' } }, joiOptions);
console.log(result.error?.message); // Food type is required

Worried about the any.required part of the message? That is just a Joi way of saying to show a particular message if the required condition is not fulfilled. We can target other conditions too like this. The key for the message can be easily found in the Joi error message documentation

const petSchema = Joi.object({
  age: Joi.number().min(0).max(100).required().messages({
    'any.required': 'Age is required',
    'number.base': 'Age must be a number',
    'number.min': 'Age must be a positive number',
    'number.max': 'Age cannot be more than 100',
  }),
});

console.log(petSchema.validate({ age: 1000 }).error?.message); // prints "Age cannot be more than 100"
console.log(petSchema.validate({ age: 'ABC' }).error?.message); // prints "Age must be a number"
console.log(petSchema.validate({ age: -1 }).error?.message); // prints "Age must be a positive number"
console.log(petSchema.validate({ age: 1000 }).error?.message); // prints "Age cannot be more than 100"

if-else clauses

What if we want to validate the breed of a pet. The breed is going to be dependent on the selected species, right? So we need an if-dog-then-x-else-y type of validation. Joi has just the thing for that.

const validDogBreeds = ['POMERANIAN', 'DOBERMAN', 'BULLDOG']; // Not an exhaustive list in any manner
const validCatBreeds = ['SIAMESE', 'BENGAL', 'MUNCHKIN'];
const petSchema = Joi.object({
  species: Joi.string().valid('DOG', 'CAT').required(),
  breed: Joi.string().when('species', {
    is: 'DOG',
    then: Joi.string().valid(...validDogBreeds),
    otherwise: Joi.string().valid(...validCatBreeds),
  }),
});

console.log(petSchema.validate({ species: 'DOG', breed: 'SPHYNX' }).error?.message); // "breed" must be one of [POMERANIAN, DOBERMAN, BULLDOG]
console.log(petSchema.validate({ species: 'CAT', breed: 'POODLE' }).error?.message); // "breed" must be one of [SIAMESE, BENGAL, MUNCHKIN]
console.log(petSchema.validate({ species: 'CAT', breed: 'SIAMESE' }).error?.message); // undefined - no error

Here we are declaring a list of valid breeds for dogs and cats. This list can then be provided conditionally to Joi for checking the breed in case the pet is a dog/cat. As you can see, the then and otherwise clauses are Joi schemas themselves. Thus you can nest other if-else clauses within either of these sections if needed.

Reusing schemas

Ok, so now you know how to use Joi for validating. Soon enough you will face the issue of how to reuse your schemas.

Say you are A/B testing, and for one version of your API you want to keep the age optional but for another version, you want to keep it mandatory.

Your non-mandatory schema would look like:

const allOptionalPetSchema = Joi.object({
  name: Joi.string(),
  age: Joi.number(),
  species: Joi.string(),
  // other properties left for brevity
});

You don't want to repeat yourself and write a whole new schema with all the fields if you want to just make age required, right? Here's how you do it: you overwrite the age key

const ageRequiredPetSchema = Joi.object(allOptionalPetSchema).keys({
  age: Joi.string().required(),
});

What if you wanted 1 version with all keys optional and another with all mandatory. Joi has a handy fork function to handle it. You provide fork and adjuster function which extends the schemas of all keys to make them mandatory recursively.

const allRequiredPetSchema = Joi.object(allOptionalPetSchema).fork(Object.keys(allOptionalPetSchema), schema =>
  schema.required(),
);

How does Joi handle all this overwriting? Well, immutability. Every Joi schema object is immutable. So when a new function like keys is chained at the end, it creates a new schema object.

Cheers, now you have learned how to be a data validator! Happy coding!