Tag: TensorFlow

Elastic Stack In A Day 2017 is the third edition of the Italian event dedicated to the Elastic technologies. The event is organized by Seacom with the collaboration of Elastic and it will take place on June 6 2017 @Milano, Hotel Michelangelo.

During the event the news of Elasticsearch and X-Pack will be presented and in the afternoon there will be technical speeches held by developer and engineer (some of them from Elastic).

I will be speaking about Machine learning with TensorFlow and Elasticsearch: Image Recognition.

In this post we are going to see how to build a machine learning system to perform the image recognition task. The image recognition is the process of identifying and detecting an object or a feature in a digital image or video. The tools that we will use are the following:

Amazon S3 bucket

Amazon Simple Queue Service

Google TensorFlow machine learning library

Elasticsearch

The idea is to build a system that will process the image recognition task against some images stored in a S3 bucket and will index the results to Elasticsearch. The library used for the image recognition task is TensorFlow.TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) communicated between them. The flexible architecture allows you to deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device with a single API. TensorFlow was originally developed by researchers and engineers working on the Google Brain Team within Google’s Machine Intelligence research organization for the purposes of conducting machine learning and deep neural networks research, but the system is general enough to be applicable in a wide variety of other domains as well. You can read more about it here.

These are the main steps performed in the process:

Upload image to S3 bucket

Event notification from S3 to a SQS queue

Event consumed by a consumer

Image recognition on the image by TensorFlow

The result of the classification is indexed in Elasticsearch

Search in Elasticsearch by tags

This image shows the main steps of the process:

Event notifications

When an image is uploaded to the S3 bucket a message will be stored to a Amazon SQS queue. To configure the S3 Bucket and to read the queue programmatically you can read my previous post:Amazon S3 event notifications to SQS

Consume messages from Amazon SQS queue

Now that the S3 bucket is configured, when an image is uploaded to the bucket an event will be notified and stored to the SQS queue. We are going to build a consumer to read this notification, download the image from the S3 bucket and perform the image classification using Tensorflow.

With this code you can read the messages from a SQS queue and download the image from the S3 bucket and store it locally (ready for the image classification task):

Image recognition task

Now that the image (originally uploaded to S3) has been downloaded we can use Tensorflow to run the image recognition task. The model used by Tensorflow for the image recognition task is the Inception-V3. It achieved a 3.46% error rate in the ImageNet competition. You can read more about it here: Inception-V3 and here: Tensorflow image recognition.

So, starting from the classify_image.py code (you can find it on Github: classify_image.py) I created a Python module that given the local path of an image (the one previously downloaded from S3) returns a dictionary with the result of the classification. The result of the classification consists of a set of tags (the objects recognized in the image) and scores (the score represents the probability of a correct classification. The scores sum to one).

So, calling the function run_image_recognition with the image path as argument, will return a dictionary with the result of the classification.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

def run_inference_on_image(image):

ifnottf.gfile.Exists(image):

tf.logging.fatal('File does not exist %s',image)

image_data=tf.gfile.FastGFile(image,'rb').read()

# Creates graph from saved GraphDef.

create_graph()

with tf.Session()assess:

softmax_tensor=sess.graph.get_tensor_by_name('softmax:0')

predictions=sess.run(softmax_tensor,

{'DecodeJpeg/contents:0':image_data})

predictions=np.squeeze(predictions)

node_lookup=NodeLookup()

top_k=predictions.argsort()[-NUM_TOP_PREDICTIONS:][::-1]

dict_results={}

# create a dictionary

fornode_id intop_k:

human_string=node_lookup.id_to_string(node_id)

score=predictions[node_id]

dict_results[human_string]=score

returndict_results

def run_image_recognition(image_path):

maybe_download_and_extract()

returnrun_inference_on_image(image_path)

In the previously shown code, the Tensorflow built-in functions definition are not reported (you can find them in the Github repository I linked). The first time you will run the image classification task, the model (Inception-V3) will be downloaded and stored to your file system (it is around 300MB)

Index to Elasticsearch

So given an image we have now a set of tags that classify our image. We want now to index these tags to Elasticsearch. To do that I created a new index called imagerepository and a new type called image.

The image type we are going to create will have the following properties:

title: the title of the image

s3_location: the link to the S3 resource

tags: field that will contain the result of the classification task

For the tags property I used the Nested datatype. It allows arrays of objects to be indexed and queried independently of each other.
You can read more about it here:Nested datatypeNested query

We will not store the image to Elasticsearch but just the URL of the image within the S3 bucket.

New Index:

1

2

3

4

5

6

7

8

curl-XPUT'192.168.193.132:9200/imagerepository/'-d'{

"settings" : {

"index" : {

"number_of_shards" : 1,

"number_of_replicas" : 0

}

}

}'

New Type:

1

2

3

4

5

6

7

8

9

curl-XPUT"192.168.193.132:9200/imagerepository/image/_mapping"-d'{

"image" : {

"properties" : {

"title" : { "type" : "string" },

"s3_location" : { "type" : "string" },

"tags" : { "type" : "nested" }

}

}

}'

You can now try to post a test document:

1

2

3

4

5

6

7

8

9

10

11

12

13

curl-XPOST'192.168.193.132:9200/imagerepository/image/'-d'

{

"title" : "test",

"s3_location" : "http://mybucket/test.jpg",

"tags" : [

{"tag": "test_tag1",

"score": 0.5},

{"tag": "test_tag2",

"score": 0.38},

{"tag": "test_tag3",

"score": 0.12}

]

}'

We can index a new document using the Elasitcsearch Python SDK.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

# dictionary with the result (tags and scores)

result_dictionary={}

result_dictionary['tag_1']=0.3

result_dictionary['tag_2']=0.5

result_dictionary['tag_3']=0.2

img_s3_location="https://yourbucket/demo.jpg"

img_title="demo.jpg"

def index_new_document(img_title,img_s3_location,result_dictionary):

result_nested_obj=[]

forkey,value inresult_dictionary.items():

result_nested_obj.append({"tag":key,"score":value})

doc={

"title":img_title,

"s3_location":img_s3_location,

"tags":result_nested_obj

}

res=es.index(index='imagerepository',doc_type='image',body=doc)

# Call the function

index_new_document(img_title,img_s3_location,result_dictionary)

Search

Now that we indexed our documents in Elasticsearch we can search for them.
This is an example of queries we can run:

Give me all the images that represent this object (searching by tag = object_name)

What does this image (give the title) represent?

Give me all the images that represent this object with at least 90% of probability (search by tag = object_name and score >= 0.9)

I wrote some Sense queries.

Images that represent a waterfall:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

postimagerepository/_search

{

"query":{

"nested":{

"path":"tags",

"query":{

"bool":{

"must":[

{

"match":{

"tags.tag":"waterfall"

}

}

]

}

}

}

}

}

Images that represent a pizza with at least 90% of probability:

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

postimagerepository/_search

{

"query":{

"nested":{

"path":"tags",

"query":{

"bool":{

"must":[

{

"match":{

"tags.tag":"pizza"

}

},

{

"range":{

"tags.score":{

"gte":0.90

}

}

}

]

}

}

}

}

}

In this post we have seen how to combine the powerful machine learning library Tensorflow to perform a image recognition task and the search power of Elasticsearch to index the image classification results. The process pipeline includes also a S3 bucket (where the images are stored) and a SQS Queue used to receive event notifications when a new image is stored to S3 (and it is ready for the image classification task).

This website or its third-party tools use cookies, which are necessary to its functioning and required to achieve the purposes illustrated in the cookie policy. If you want to know more or withdraw your consent to all or some of the cookies, please refer to the coockie policy. Got it!Reject.