TensorFlow | NLP | Sentence similarity using TensorFlow cosine function

This code snippet is using TensorFlow2.0, some of the code might not be compatible with earlier versions, make sure to update TF2.0 before executing the code.

tf.keras.losses.cosine_similarity function in tensorflow computes the cosine similarity between labels and predictions. It is a negative quantity between -1 and 0, where 0 indicates less similarity and values closer to -1 indicate greater similarity.

In this code we will use transfer learning to get pre trained token based embedding model "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1" This provides embedding vector output with 128 dimensions.

"ex_sentence" a list of sentence having 5 sentences, we will calculate sentence similarity by using cosine similarity function of TensorFlow.


import tensorflow as tf
import tensorflow_hub as hub

ex_sentence = ["this is test", "this is second test",
	  "this is third test", "not similar to others in this list",
	  "this is test"]


embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1")
embeddings = embed(ex_sentence)
print(embeddings.shape)

'''
output should be very close to -1 as both
sentence are identical ex_sentence[0] and ex_sentence[4]
'''
print("Sentences are having greater similarity")
print(tf.keras.losses.cosine_similarity(
embeddings[0],
embeddings[4],
axis=-1
))

'''output should be very close to 0 as both
sentence are different ex_sentence[0] and ex_sentence[3]
'''
print("Sentences are having less similarity")
print(tf.keras.losses.cosine_similarity(
embeddings[0],
embeddings[3],
axis=-1
))

  • Output of similarity between sentences
  • 
    (5, 128)
    Sentences are having greater similarity
    tf.Tensor(-0.99999994, shape=(), dtype=float32)
    Sentences are having less similarity
    tf.Tensor(-0.3791005, shape=(), dtype=float32)
    

    Category: TensorFlow