This code snippet is using TensorFlow2.0, some of the code might not be compatible with earlier versions, make sure to update TF2.0 before executing the code.
tf.keras.losses.cosine_similarity function in tensorflow computes the cosine similarity between labels and predictions. It is a negative quantity between -1 and 0, where 0 indicates less similarity and values closer to -1 indicate greater similarity.
In this code we will use transfer learning to get pre trained token based embedding model "https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1" This provides embedding vector output with 128 dimensions.
"ex_sentence" a list of sentence having 5 sentences, we will calculate sentence similarity by using cosine similarity function of TensorFlow.
import tensorflow as tf
import tensorflow_hub as hub
ex_sentence = ["this is test", "this is second test",
"this is third test", "not similar to others in this list",
"this is test"]
embed = hub.load("https://tfhub.dev/google/tf2-preview/nnlm-en-dim128/1")
embeddings = embed(ex_sentence)
print(embeddings.shape)
'''
output should be very close to -1 as both
sentence are identical ex_sentence[0] and ex_sentence[4]
'''
print("Sentences are having greater similarity")
print(tf.keras.losses.cosine_similarity(
embeddings[0],
embeddings[4],
axis=-1
))
'''output should be very close to 0 as both
sentence are different ex_sentence[0] and ex_sentence[3]
'''
print("Sentences are having less similarity")
print(tf.keras.losses.cosine_similarity(
embeddings[0],
embeddings[3],
axis=-1
))
(5, 128)
Sentences are having greater similarity
tf.Tensor(-0.99999994, shape=(), dtype=float32)
Sentences are having less similarity
tf.Tensor(-0.3791005, shape=(), dtype=float32)
Category: TensorFlow