Crazy? Probably, but it won’t be the first time that has been suggested.

First,let me offer some background. Recently I have had the opportunity to experience what content management systems do and how they utilized. Products like Documentum and Alfresco are meant for general use. By their nature these systems are less efficient and more complex than something built for a specific purpose. For some agencies this works out well. They don’t have IT organizations that could develop it in house. But for many an ECM is their system of record(SOR) and this is a good solution. When the system of record lies outside the ECM there is less to be gained. There maybe an existing workflow that doesn’t match the general flow that the ECM defines. I felt there had to be a simpler way. How difficult could it be? I am not suggesting to build for the general use, instead build only what is needed.

Note: The model I used is an EHR, Electronic Health Record.

The core

Basically there are only three pieces. A database to store metadata,a file system to store the content and processes for creating, retrieving, updating and deleting (CRUD) the information.

Other stuff

Thumbnails: For an EHR system this would not likely be needed. There are not a great variety of document types where a “preview” would be useful. This could be a requirement in other applications.

Transformation: EHR systems use a small number of standard file formats. Its required to show the HL7 data in is native format. converting the data to an image or pdf is not done. But this could be a requirement in other applications.

Version: This could be useful in any content store.

Starting out old school

My first thought was to go with what I know; Java,Springboot and JPA. Start with a database. Since this will be on AWS, MariaDB is a good place to start. Its MySQL compatible and free to start with. For an EHR system the content is the patient data,stored in HL7 format. Since the system of record is the content, the database doesn’t have to be very complex. Two or three tables is more than enough.

Create an app

Using Eclipse I created a new Springboot, JPA application(included Hibernate). Eclipse also generated the entities from the database and some of the support code. A few hours later I had a Springboot (CRUD ) app that could read and write to a database. S3 would be the choice for the content since the app would be running on AWS. Fortunately AWS offers nice Java support for S3.

With this done I had basic content management system. AWS suggests Elastic Beanstalk for deploying applications. Its not the simplest thing but it does work. My rest service was very simple, a JSON file for metadata and the HL7(xml) file for content.

This was not a ready for production system but with AWS it was pretty quick and simple to get something working.

But….

Something didn’t feel right. This is the same process/framework that everyone one is doing,its not new. Since I am studying the AWS exams shouldn’t I consider this from the AWS point of view?

Lambda

If you want to know what AWS thinks is the future, think Lamdba. “Run code without thinking about servers.Pay for only the compute time you consume(https://aws.amazon.com/lambda/). Lambda is what powers LEX and Alexia. I wont repeat what the AWS says about Lambda but AWS is putting a lot of effort into this.

Building a content management system based around Lambda

I still need a database(or do I?) and a file system to store the content. I already have the database and S3 from the Java project, no need to start over. What is missing is the CRUD app that I built with Java.

Since they are going to sit idle until needed Lambda functions should be light weight, quick to start up. AWS allows Lambda functions to be written in Javascript(node.js), Python, Java or C#. Java and C# seem too heavy. Spring and Hibernate don’t fit into this picture. I felt this left two options, Javascript or Python. Both have their advantages (I use both). I went with Python. I learned later that Javascript is the only choice for some third party tools. As in the Java application I have chosen to write the content and the meta data to S3. The meta data is written to the database as well. S3 has an option to add “metadata” to the object. By writing the data as file I could leverage Solr to search content and meta data! In theory this eliminate the need for a database.

AWS has support and examples for creating Lambda functions in Python. “pymysql” and “boto3” are Python libraries for MySQL and S3. Both are available and do not require the developer add them.

Python is deployed to Lambda as a deployment package. This is simply a zip file with your Python code and any external libraries not supported already with AWS. The trick to this is getting the python file and Lambda handler correct. I used contentHealthLambda.py and contentHealthLambdaHandler as the handler function. Below is how they are used in the Lambda configuration.

The code

Note: The code I am including in basic. Almost all of the error handling has been ignored.

Store the data to S3. In this case the bucket is fixed but it could be passed as a parameter.

target_bucket = "com.contentstore"

Create a file name.

target_file = metaData+"_md"+str(createDateTime)

# temp_handle.read() it reads like a file handle

temp_handle = StringIO(metaData)

Create or get the bucket

bucket = s3.Bucket(target_bucket)

Write to the bucket

result = bucket.put_object( Key=target_file, Body=temp_handle.read())

That is all there is, six lines of code(error handling is not included). Six more lines are required for storing the content to S3. I did not test with a very large file and there maybe more effort required in these cases. I have not noticed anyone talking about additional issues.

Write to the database

The connect string is familiar to anyone who has done Java database coding before.

My schema contains two tables, a base and a patient document. Since the patient document has a foreign key to the base document I have to store them separately. There are Python ORM’s that would probably handle this but its so simple that basic SQL will suffice. All of the database code is wrapped in a try-except clause. If any of the execute’s fail the commit will never happen.

Cloudwatch logs all of the output so you can easily see what happened.

Rest service

In order to use the Lambda function it needs to be exposed as a REST endpoint. This is done using the API gateway. The process is well documented so I wont go into it. The API gateway can be done separately as I did or at the time the lambda function is created.

The result is less than 1 sec per post call. The data is small( 3K), but this was done from my home laptop into AWS. I would expect better rates in a “real” environment.

The content data in S3

AWS Maria DB

One issue with Lambda is that is is slower to respond the first time since it has to spin up. I am not clear on the time window where the function is active vs. idle. Its something I need to look into.

This involves converting various document formats into one standard format, likely PDF. Other ECM’s use third party tools to do this work. Using Lambda would not prevent using a similar third party tool but I would prefer that conversion be done beforehand, it the code calling the REST service. Its not an integral part of content store. Another way to achieve this is to use the Lambda trigger to start the transformation. ImageMagick or LibreOffice can be used to convert the files as they are written to S3.

Thumbnails

This is also where third party tools come into play. Lambda has a great way to handle this, triggers. A function is setup to trigger on a file added to S3. The function handles the process of creating the image or images. The examples of this use something like ImageMagick. The only issue I found is that it is currently only usable in Lambda with Javascript. Its not a big deal but I’d have to part from Python for a while.

Versions

S3 can version documents automatically. AWS Lifecycle can use versions to move data to other storage options such as glacier. “boto3” supports S3 versions so its possible to filter and return information based on versions.

Conclusion

Lamdba is becoming AWS’s path forward. They continue to improve and add features.

This effort was for educational purposes. But it shows how the tools we have today can make building great software so much simpler.

There is a lot of talk about Solr these days. The engine that drives Solr is Lucene which I have used Lucene in Neo4j, but never directly. Maybe its time to see how it works.

Lucene

Lucene is full-text search library originally written in Java. A key feature is it use of an inverted-index. Instead of storing pages it stores keyword indexes to pages. This fact would dictate how to proceed. Tools like R and Python NLTK are used in text mining where the interest is with text analysis, how are words are related to each other. Lucene tends to focus on search at the page level. What is the frequency of text within a set of pages, similarity between pages. One reason why Lucene is so popular with web searches.
In order to make the best use of Lucene I’d need data that is split into pages and not one big document. A bit of searching led me to Shakespeare, he has a good deal of text(even if he didn’t write it all). A lot of the work done with this writing is text mining/analysis and as such most use a single document. MIT is a good source for this. But after looking at it I felt it was not going to fit my needs. The single document has all of his writings but there wasn’t an obvious way to delineate it into separate texts. I found one site that had all on the works in separate html files.

Process the files

I started with Python to get the files and remove all of the html and punctuation,leaving plain text. I hoped to continue with Python but…. Lucene has a great Python library if you are on a Mac or Linux system. It was tempting to switch to a Raspberry PI since its Linux but..I went with Java on Windows. Just too lazy I guess. In the end I had 196 separate text files( there are soo many sonnets!).

Getting started with Lucene

The first thing I learned was that Lucene changes quite a bit between versions. I was using 6.6 and the book I had was 3.0. When searching for examples make sure you note the version.

There are two stages with Lucene, indexing and searching. There are a lot of ‘switches’ and ‘levers’ that can be applied depending on the goal. I wanted to index all of the documents , frequencies and position. You need to define a field to be indexed. This can be specific like ‘fileName’ or ‘Title’. Or it can be ‘contents’ which in this case is the entire file. Each Field needs a FieldType that defines how you want Lucene to handle it. One of the these is Tokenized(). When set to true Lucene will break up a string into tokens . Otherwise it treats the string as a single entity. Searching still works on non-tokenized data but the details are only at the string level. I wanted frequency and position calculated which meant I needed TermVectors. They record frequency, position, offset. Position is where the term lies in the document. Offset is the start and end position of the term.In this sentence: “the quick brown fox jumps over the lazy dog”, the term vectors are:

Search

This part can be simple or very complex depending on the goal. The first three questions are simple. The fourth, similarity, is something I am still working on. This is primarily because I am not sure how or what I want to measure.

Questions:

1. How many documents contain a term.

2. In what documents is a term found 3. How frequent is a term 4. How similar are documents.

Question 1. This is code that most searches will start with. In this case the term is “Midsummer”

One thing we don’t know is how do these compare. Was the term found once or many times? If its only once maybe we don’t care. Clearly the play Midsummer Night’s Dream should rank higher than the rest. In order to find this out Lucene has Highlighter feature which will return fragments of the text surrounding the term.

In this first case we want to know where the term occurs more than once. Using the Highlighter to return surrounding text can help evaluate if this is relevant. Only in the first play does the term “Midsummer” occur more then once.

The file name(in the text file) :Midsummer Nights Dream Play.txt

The title : Midsummer Nights Dream Play (Left over text from the original html Shakespeare homepage )

Midsummer Nights Dream

Entire play ACT I SCENE I Athens

The last fragment shows text near the term is the start of ACT I. Maybe Midsummer wasn’t a great term to start with?

Highlighter code. This code is within the loop of all documents. “scoreDoc.doc” is the document id for the current document in the loop.

The second and third questions are answered using the term vectors. Each TermVector will indicate the number of documents the term was found in and the number of times the term was found in a document. Instead of Midsummer, which was disappointing, I tried searching for something that should give more interesting results. Either romeoor juliet (heard of them?) should work.

The term ‘romeo’ was only found in one document,‘Romeo and Juliet’ (surprise!). ‘juliet‘ was found in two,’Romeo and Juliet‘ and ‘Measure for Measure‘

When processing the data I did not do any ‘stemming’. This process reduces words to their root, juliets or julietta become juliet. This is a common practice and I could go back and clean up the data.

The fourth question,similarity is difficult because I am not sure how I would consider the various documents to be similar. The basic similarity in Lucene is Cosine similarity. The diagram below shows two vectors, one for each document.

The terms in question are transpacny and supply. The closer the two vectors are to each other the more ‘similar’ the documents could be. At least for these two terms. In the case of Shakespeare using romeo and juliet would not tell us much since one term only appears in one file. The other side of this might be medical or insurance documents. I suspect that these contain a lot of the same wording and it wouldn’t be hard to find documents that similar. I’ll have to experiment with different terms and see what, if anything, falls out.

Conclusion

Lucene offers much more than I have seen so far. But its searching capabilities are interesting. Content management systems store files on a file system and meta data(information about the content) in a database. For AWS the content could be on S3 and the meta data in Aurora. Lucene could be used to search content and meta data if both were stored on S3. This seems like a much simpler design…

Really it is a Tesla.. Four electric motors, batteries, sensors, and two cameras.

Okay…kinda like a Tesla.

I have been fasinated with neural networks going back to the early 90’s when I was doing work on forms recognition and hand writing analysis. The idea lost appeal for a long time but has had a resurgence as “deep learning” is being used for processing large amounts of data. Self driving cars is one area where they are making gains. Recently I watched the series by Dr. Lex Fridman(MIT 6.S094: Introduction to Deep Learning and Self-Driving Cars). Besides covering a lot about neural networks he talked about how Tesla instruments their cars to “learn”.

Is the car really learning to drive? Not exactly. By driving the car around it gathers data that can be used later on. The data includes images, sound,temperature, GPS and driver reactions. All of this data is feed into a neural network such as tensorflow. The car knows only what it has “seen” before. The system is memorizing every possible situation. Anything that occurs out of context from what it knows could cause trouble. But the more data that is gathered the less likely it will be that something unforeseen will occur. Of course AI is always improving and at some point will be able to make better choices – zero shot learning.

I wanted a way to experiment with this myself. Buying a Tesla is out of the question. I could add sensors to my car but that is just asking for distracted driving. Also, I work remote and don’t drive alot. The next best thing would be create a small ‘car’ that I could use to gather data.

This car is not going to be on a road. Which means it wont have things like lane lines to guide it. I might build a track where it could be driven. Or just wander around the house and scare the dog and cat..

The picture above shows a small RC car. It has a motor for each wheel. Steering is similar to tank driving. Turns are done by slowing down one set of wheels while speeding up another. Sharper turns can be done by reversing the wheel instead of just slowing them down. Its not very smooth but it gets the job done.

The car will is first configured for data gathering. I am using an Arduino with blue tooth for communications. I wrote a simple app for my Android tablet. There are six ultra sonic sensors for determining distance. I have two cameras(only one in the picture) mounted on the front. These will record stereo images which will help determine depth. The first thing I learned is that the ultra sonic sensors will only see a small portion of what is in front of them. The first trial run they completely missed the table or chair legs. The sensors need to sweep the area in front so as to create a point cloud. For this I am adding a pan and tilt control to the sensor mount. Two servos will move the sensor array. Data is being recorded to a flash drive.

Data

I am recording the data at one second intervals. The car doesn’t move very fast so this rate should be sufficient. The value at each sensor, two camera images and the drive command are recorded.

Currently I am in practice mode, refining the app to better control the car. I found some sensors for the wheels to detect the speed of rotation. I think I’d need to upgrade the Arduino to add any more devices so I’ll leave them off for the time being.

The original pong game was not much compared to what would come later. For many it was amazing that something like was available for use in the home. I have seen others make pong games in Unity and thought it might be fun to try. Pictures of the console show controls for two players. For this project I’ll make the second player the computer. Yeah it will hard to bet but….

Using Unity 5.5 start out with a basic camera.

Using a graphic tool I made a paddle and a ball(png format). Create a folder named Sprites and drop in the two images. Create a empty game object named ‘player’

Drag the paddle sprite onto the player game object and notice the Sprite Renderer show up as a component of the player. Select the player object and then Select Component->Physics->RigidBody from the menu. The paddle will need this in order to bounce the ball.

Create a new folder named Physics. Select Asset->Create->PhysicsMaterial. Name it bounce. When applied to the paddle will cause the ball to bounce back.

In order for the paddle to react to the ball hitting it we need to add a collider component. A BoxCollider will do. Set the material of the collider to the bounce material.

Create a new game object called Ball and add in the ball sprite. Add a RigidBody and collider as well.

So far there is one paddle and a ball. Not so good?

There needs to be some code in order to make this work. Create a new folder named Scripts and add a new c# file named paddle.cs. Below is what the code should look like( or close).

The Update function is part of the core Unity MonoBehavoir class. It is called once per frame. The variable ‘gameObject’ refers to the object to which this script is attached.

Input.GetAxis(“Vertical”) will return -1 or 1 depending if the down or up arrow key is pressed. This value times the speed will be used to increment the position of the game object.

playerPosition = new Vector2(-20,Mathf.Clamp(yPosition,-13,13));

This line creates a new 2d Vector with a fixed X location of -20(where I placed the paddle on the screen), and a new value of Y based on the new yPosition. Mathf.Clamp() restricts the y value to between -13 and 13. These values were determined by experimentation.

The last line transforms the object to a new position. Since the x value is always -20 the paddle will only move up or down.

… in a museum. You walk by an painting suddenly your phone becomes the voice of the artist and begins to speak to you about the piece…. Bringing art to the guest.

In 2008 I developed an application that used RFID to trigger events on a mobile device(PDA). The main purpose was to be an Electronic Docent, a museum guide. Exhibit information delivered directly to the guest.

Unfortunately RFID never became a consumer friendly technology. Fast forward to 2016, smart phones are prevalent and low power bluetooth(ble) devices are becoming ever more popular. In January myself and two others began development on a new version of the application.

The PDA has been replace by smart phones and tablets. Both iOS and Android hold major positions in this area. Both have support for standard Bluetooth as well as BLE.

How it works

The application running on the device is designed to look for BLE tags. When one is located a request is made to a server to search the database for the tag id. If the id is located information about the media is returned and the user can select if they want to view the media or not.

The tags and media have to be associated.This is done by personnel managing the location. They understand both the content and how they would like it displayed to the visitor.

One of the biggest decisions was how to develop the mobile portion of the application.

Until a few years ago mobile applications were required to be developed in Java or Obj C. Apple refused to accept applications cross compiled or interpreted into Obj C. The draw back is that an application had to be developed twice. Maintenance was much harder since it required twice the effort in coding and QA.

On the other side,native applications had the ability to interact with the devices hardware, sound, touch, gps, and accelerometer.

Cross platform framework

Frameworks such as Xarmin and QT give the developer to write one application and deploy it to multiple mobile platforms.

Xarmin: Based around C# and created by team that created Mono. Xarmin takes C# code and creates native code for iOS or Android. Microsoft now owns Xarmin and has integrated it into its Visual Studio IDE.

QT: This has long been a popular framework for developing applications for Windows, OSX and Linux. When mobile support was added there became license issues. Also QT has less of a native look and feel.

Javascript/HMTL5 framework: Tools such as Ionic use the Angular.js framework and Cordova libraries to create cross platform applications. The key to the success has been the Cordova(Phonegap) libraries. These provide access to the device hardware which lets the application behave more like native code.

We chose Ionic. There were too many issues with either Xarmin or QT. Developing two separate native applications was out of the question.

Serving up data

Once the mobile application find a BLE tag it needs to get the information associated with the tag. This means an application server. This was a simple choice, Java, Hibernate ,MySQL and Tomcat. This combination is proven, solid and will work well on something like AWS. One advantage to MySQL is that AWS’s Aruora database is MySQL compatible and an easy replacement if very high performance is required.

Server side

Using Java and Hibernate makes the server work pretty straight forward. The code is built on layers. Entities,DAO, Service, Controller.

Entity

Each entity represents a table in the database.

ExhibitTag

This represents a single ble tag

ExhibitTagMedia

This represents the media associated with tag. A tag could have more than one media component.

The templates folder holds the html files for the various screens. Since we started with a tabbed ionic project we have two core html templates, tab.html and tab-dash.html.The tab format allows tabbed pages as a navigation. We are not using this format and will be renamed later on.

The other screens are used to represent the media types, text,video,and audio.html. Here is an example of a text view.

The app.js file is loaded first and sets up the basic structure. The application uses the Ionic Bluetooth Low Energy (BLE) Central plugin for Apache Cordova . If the app is running on a real mobile device(not in a browser on a PC) the object ‘ble’ will be defined. On a PC this will not be valid. The app.js run function will check for this.

Once the list of devices is returned, the REST service is called to check the tags against the database. The server returns:

Organization Name

Location Name

Exhibit TagName

Exhibit TagId

Exhibit Tag

Exhibit Tag MimeType

The user selects which exhibit they want to view.

Testing the app locally

Ionic can run the app locally by using the command ‘ionic serve’ from the project folder.

C:\Users\rickerg0\workspace\Tundra>ionic serve
******************************************************
The port 35729 was taken on the host localhost - using port 35730 instead
Running live reload server: http://localhost:35730
Watching: www/**/*, !www/lib/**/*, !www/**/*.map
√ Running dev server: http://localhost:8100
Ionic server commands, enter:
restart or r to restart the client app from the root
goto or g and a url to have the app navigate to the given url
consolelogs or c to enable/disable console log output
serverlogs or s to enable/disable server log output
quit or q to shutdown the server and exit

The basic screen as viewed in FireFox.

Deploy the app to an Android device from Windows

Make sure the device is connected via the USB port. Also set the developer option on the device. If you don’t do this last step the device will not allow Ionic to connect. From the terminal issue the command ‘ionic run android’. This will build the apk file and install it on the device.