A Blog About D4T4 & M47H

16 September ’18

This is part 3 in a 3-part series (part 1, part 2) on building and deploying a deep learning model for the popular ACL 2011 IMDB dataset. In this part, I host the model on Cloud ML Engine and make it accessible via a simple HTTP Cloud Function. Give it a try!

Expose Model with a Cloud Function

# gets predictions from cloud ml enginedefclassify_movie_reviews(request):importflaskimportjsonimportreimportmathimportgoogleapiclient.discoveryimportgoogle.authheaders={'Access-Control-Allow-Origin':'*','Access-Control-Allow-Methods':'POST','Access-Control-Allow-Headers':'Content-Type'}# handle pre-flight options requestifrequest.method=='OPTIONS':returnflask.make_response(('',204,headers))_,project=google.auth.default()request_json=request.get_json()# this pulls out our proper nouns and treats them as single wordsdefpreprocessing(review):proper=r"([A-Z]([a-z]+|\.)(?:\s+[A-Z]([a-z]+|\.))*(?:\s+[a-z][a-z\-]+){0,2}\s+[A-Z]([a-z]+|\.)(?:\s+[0-9]+)?)"space_between_brackets=r"[\.\s]+(?=[^\[\]]*]])"brackets=r"(?:[\[]{2})(.*?)(?:[\]]{2})"review=re.sub(proper,'[[\\1]]',review)review=re.sub(space_between_brackets,'~',review)review=re.sub(brackets,'\\1',review)returnreviewmodel='movie_reviews'version=request_json['version']instances=[preprocessing(i)foriinrequest_json['instances']]service=googleapiclient.discovery.build('ml','v1')name='projects/{}/models/{}/versions/{}'.format(project,model,version)response=service.projects().predict(name=name,body={'instances':instances}).execute()if'error'inresponse:raiseRuntimeError(response['error'])# clear out nan if they existforrinresponse['predictions']:ifall([math.isnan(i)foriinr['prob']]):r['prob']=[]r['class']=-1returnflask.make_response((json.dumps(response['predictions']),200,headers))

NOTE: Additional preprocessing for grouping movie names and proper nouns is replicated here since it could not be embedded in the TF input serving function.

Export

defserving_input_fn():review=tf.placeholder(dtype=tf.string)label=tf.zeros(dtype=tf.int64,shape=[1,1])# just a placeholdertransformed_features=tft_output.transform_raw_features({'review':review,'label':label})returntf.estimator.export.ServingInputReceiver(transformed_features,{'review':review})export_path=classifier.export_savedmodel(export_dir_base='exports',serving_input_receiver_fn=serving_input_fn,checkpoint_path=best_ckpt)export_path=export_path.decode('utf-8')

02 September ’18

This is part 1 in a 3-part series (part 2, part 3) on building and deploying a deep learning model for the popular ACL 2011 IMDB dataset. In this part, I tackle data preprocessing.

===

The sklearn.preprocessing module has some great utility functions and transformer classes (e.g. scaling, encoding categorical features) for converting raw data into a numeric representation that can be modelled. How do we do this in the context of Tensorflow? And how do we ensure serving-time preprocessing transformations are exactly as those performed during training? The solution: tf.Transform.

You can use tf.Transform to construct preprocessing pipelines that can be run as part of a Tensorflow graph. tf.Transform prevents skew by ensuring that the data seen during serving is consistent with the data seen during training. Furthermore, you can execute tf.Transform pipelines at scale with Apache Beam, a huge advantage when preparing large datasets for training. Currently, you can only use tf.Transform in Python 2 since Apache Beam doesn't yet have Python 3 support.

Here is the code that I used to preprocess my data. I start by converting raw data into TFRecords files, then I transform those TFRecords files with tf.Transform.

# this pulls out our proper nouns and treats them as single wordsdefproper_preprocessing(review):proper=r"([A-Z]([a-z]+|\.)(?:\s+[A-Z]([a-z]+|\.))*(?:\s+[a-z][a-z\-]+){0,2}\s+[A-Z]([a-z]+|\.)(?:\s+[0-9]+)?)"space_between_brackets=r"[\.\s]+(?=[^\[\]]*]])"brackets=r"(?:[\[]{2})(.*?)(?:[\]]{2})"review=re.sub(proper,'[[\\1]]',review)review=re.sub(space_between_brackets,'~',review)review=re.sub(brackets,'\\1',review)returnreview

NOTE: RE2 does not support constructs for which only backtracking solutions are known to exist. Thus, backreferences and look-around assertions are not supported! As a result, I can't put my logic for identifying movie names / proper nouns into tf.regex_replace(...).

While I have found tf.Transform super-useful, we are still constrained by preprocessing that can be done with native TF ops! tf.py_func lets you insert a Python function as a TF op. However, a documented limitation is that it is not serialized in the GraphDef, so it cannot be used for serving, which requires serializing the model and restoring in a different environment. This has prevended me from doing more complicated text preprocessing steps like Porter stemming. Nevertheless, I still love tf.Transform, an unsung hero of the TF ecosystem!