Alok Menghrajani

Currently: security consulting and lecturer.

Previously: security engineering at ~~Square~~ Block. Co-author of Hack (the programming language) and put the 's' in https at ~~Facebook~~ Meta. Maker of various CTF puzzles.

This blog does not use any tracking cookies and does not serve any ads. Enjoy your anonymity; I have no idea who you are, where you came from, and where you are headed to. Keep the dream of an Internet from times past alive.

A less error prone HMAC-based hash construction

or how to avoid shooting yourself in the foot with HMAC

Sep 25, 2014

tags: crypto | security | engineering

HMAC is a cryptographic construction that combines a hash function (e.g. md5 or sha256) and a key. The result of an HMAC function cannot be guessed by an attacker. Web applications therefore typically use HMACs for tokens in emails, tamper proof sessions or hidden form values, csrf protection nounces, OTP, etc.

Over the years, I have seen three common types of mistakes when computing/validating HMACs. I will go over each one of these mistakes and propose a construction which is less error prone when used by engineers with insufficient security knowledge.

The ideas here are applicable to all MAC functions, I picked HMAC because it's the most commonly used function in web applications.

Credits to Erling for helping me on the first iteration of this hash construction.

I have seen keyed hashes being built with code like hash = md5(secret + data), which is prone to length extension attacks. Marginally better is hash = md5(secret + ':' + data + ':' + secret) at which point why not just use an HMAC and benefit from the years of cryptographic research that has gone into trying to break the hash construction?

The most common error I have seen is code using string comparison to validate the HMAC: if (input.hash != hmac(algo, secret, input.data) { throw some exception }.

The problem with using a string comparison function is that it does not run in constant time and leaks knowledge about how many bytes in the hash are correct. An attacker can deduct the value for the HMAC by sending a handful of requests to the web server.

There are cases where an attacker can control the type of the hash. The simplest case is when the hash is extracted from a piece of JSON data: {"hash":"fc4696cc79...","data":"hello world"}. An attacker can trick the web application to treat the hash as an integer, boolean, or other data types by sending: {"hash":0,"data":"hello world"}.

In some weakly typed programming languages (e.g. PHP), when a string is compared to a boolean, the string is converted to a boolean (in PHP, most strings are true). An attacker could therefore simply provide {"hash":true,...} and bypass whatever was being protected by the HMAC.

Weakly typed languages usually offer a type preserving comparison function (e.g. === in PHP).

The third issue is not a security issue, but if at some point you want to rotate the keys used to generate HMACs, it is not easy to tell which calls to HMAC are being used to generate hashes vs which ones are being used to validate them.

Here is a proposal for a pair of keyed-hash functions (one function to generate the hash, the other to validate it). The pair of functions does not suffer from the three issues mentioned above. The idea is to salt the HMAC function and fail attempts to use a string comparison function (with a probability of 99.6%). A nice side effect of using a pair of functions is to contain the hash validation logic in a single place, instead of having potential string comparisons in multiple places in the codebase.

The salt acts as a forcing factor for engineers to do the right thing.

Pseudo-code:

/**
 * Generates a salted HMAC for a given piece of data.
 *
 * Note: the string returned by this function is one byte longer
 *       than the result of hmac.
 */
function generate_salted_hmac (algo, key, data) {
  h = hmac(algo, key, data);
  // Use an IV to "salt" the hmac result
  // The IV does not need to be cryptographically strong since it does not
  // affect the strenght of the hmac
  iv = random(0, 255);
  r = String.fromCharCode(iv);
  for (i=0; i<h.length; i++) {
    r += String.fromCharCode(h.charCodeAt(i) ^ iv);
  }
  return r;
}

/**
 * Validates if a given piece of data and its
 * salted HMACs have been tampered with.
 */
function validate_salted_hmac(algo, key, data, hash) {
  // Recreate the expected HMAC value
  h = hmac(algo, key, data);
  iv = hash.charCodeAt(0);
  r = String.fromCharCode(iv);
  for (i=0; i<h.length; i++) {
    r += String.fromCharCode(h.charCodeAt(i) ^ iv);
  }
  if (hash.length != r.length) {
    // Assumption is that an attacker knows the length of a valid hash.
    return false;
  }
  // Compare r with hash in constant time
  ok = 0;
  for (i=0; i<r.length & i<hash.length; i++) {
    ok = ok | r.charCodeAt(i) ^ hash.charCodeAt(i);
  }
  return ok == 0;
}

Alok Menghrajani

A less error prone HMAC-based hash construction

Why use HMAC in the first place?

Non-constant time comparison

Weakly typed programming languages

Difficulty to manage keys

Creating a pair of keyed-hash functions

Links