This post explains how to create word embedding vectors in TensorFlow using Pre Trained models.
TensorFlow Hub
TensorFlow Hub is a repository of trained machine learning models ready for fine-tuning and deployable anywhere. In this tutorial we will use pre trained text-embedding model from TensorFlow Hub to create word embedding vector.
What are Embeddings
An word embedding in machine learning is used to represent text with embedding vectors. Embedding is a term used for the representation of words for text analysis, typically in the form of a real-valued vector that encodes the meaning of the word such that the words that are closer in the vector space are expected to be similar in meaning.
Create word embeddings using TensorFlow pre trained models
pip install tensorflow-hub
import tensorflow_hub as hub
embed = hub.load("https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1")
There are many word embeddings models available in TensorFlow hub, details of these models are available
here. For this demo purpose we are using
https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1
this model is trained on English Google News 130GB corpus, it provides embedding vector output with 20 dimensions.
#sample list of sentences
train_examples = ["this is first sentence",
"post on transfer learning"]
# create embeddings
embeddings = embed(train_examples)
print(embeddings.shape)
print("Embedding for sentence 1")
print(embeddings[0])
print(embeddings[0].shape)
print("Embeddings for sentence 2")
print(embeddings[1])
print(embeddings[1].shape)
(2, 20)
Embedding for sentence 1
tf.Tensor(
[-1.0672672 -1.1043963 0.4294139 0.56906974 0.00213611 -0.65244186
-0.78387123 -0.52442074 0.99164987 -0.7627919 -0.25447875 -0.4340241
0.1354623 0.20142603 -0.6427126 1.4065914 0.00409365 -0.87753415
-0.48709276 -0.27135906], shape=(20,), dtype=float32)
(20,)
Embeddings for sentence 2
tf.Tensor(
[-0.09212524 -0.9034263 0.99333376 1.2055938 0.27041954 -0.15681976
0.6756444 -0.3102063 -0.53932023 -0.5933851 -1.2455785 0.5302402
-0.68134886 0.16809608 0.42006266 0.13688701 1.0491474 -0.76284564
-0.36677927 0.10497689], shape=(20,), dtype=float32)
(20,)
Complete code snippet to create embeddings in TensorFlow
import tensorflow_hub as hub
embed = hub.load("https://tfhub.dev/google/tf2-preview/gnews-swivel-20dim/1")
#sample list of sentences
train_examples = ["this is first sentence",
"post on transfer learning"]
# create embeddings
embeddings = embed(train_examples)
#Check embeddings shapes
print(embeddings.shape)
#Check embeddings and shape for each sentence
print("Embedding for sentence 1")
print(embeddings[0])
print(embeddings[0].shape)
print("Embeddings for sentence 2")
print(embeddings[1])
print(embeddings[1].shape)
Category: TensorFlow