Pharma Manufacturing with Google Cloud

4 min readNov 24, 2020

Designing and deploying a scalable, low latency architecture in the cloud poses several challenges. Add on top of that the regulated nature of healthcare and low margins in manufacturing, we had a difficult problem on hand.

In this short read, we’ll go over the architectural decisions, and how we implemented one of the lowest latency architectures on the Google Cloud platform for a manufacturing use case. I’ll assume that you have a basic understanding of cloud technologies and software development.

Criteria For Success

We needed an architecture that could at a minimum achieve the following requirements:

Control on-prem devices with a cloud computing platform
Send MQTT messages to the cloud for inference and return a decision to adjust on-prem manufacturing devices in less than 4 seconds with a low standard deviation.
Latency and processing time will be measured from when it leaves the on-prem VPC to when it returns, and we’ll have a dedicated interlink between the two environments. We must also use managed services wherever possible to reduce infrastructure management overhead.
Messages from devices may be sent at different frequencies and will need to be windowed through event time windowing.
Message data will need to be processed by ML model(s), and the results sent back to the on-prem device controller.
Use GCP managed services wherever possible
Python language support wherever possible

Version 1: Dataflow Architecture

In this architecture, we have a few services that are serverless by design. We’ll have little control over response times and scaling speed on Dataflow, Pub/Sub, and IoT Core. This is important to note as we’ll have to make decisions here on the benefit of a managed offering as opposed to creating all services in Google Compute Engine (GCE). If we went the GCE route, we could hypothesize that we’d have lower and more consistent responses, but we wouldn’t have health and uptime checks on our IoT devices for example. A pure GCE implementation would also give you the most flexibility in application design and language options, with the con of having the highest management overhead.

The main benefit of Dataflow (Managed Apache Beam) is that it allows you to perform highly parallel computations very quickly and has useful mechanisms for event-time windowing. These benefits, along with a serverless infrastructure implementation may appeal to you more than the downside of having to set up your own subscriber service using a Cloud Pub/Sub API. If you built a custom subscriber service, it would require self-managing things like message acknowledgment and deduplication which are inherently done by Beam’s Pub/Sub connector. Note that implementing your Beam workflow in Java does give you an MQTT connector, but we would be losing our IoT device management and Cloud Pub/Sub message queue.

Version 2: Cloud Functions Architecture

Using Cloud Functions allows us to push from Pub/Sub to an HTTP endpoint instead of having Beam periodically pull messages from Pub/Sub. This allowed us to take some latency out of the roundtrip by getting messages to the business layer faster. In this version, we also called Cloud AI Platform from Cloud Functions to obtain online predictions on messages instead of including it in the Beam workflow. To optimize latency on Cloud AI Platform, we set a minimum number of prediction nodes to keep our endpoint hot. Depending on your use case and budget, set this to autoscale to 0, and the endpoint will remain hot for around 10 minutes after receiving the latest request. It is also important to fine tune the prediction endpoint to a machine that is fitting for the model you’d like to deploy and latency SLOs.

Best Practices

Keep all services in the same region or zone wherever possible to minimize latency.
To achieve the lowest latency, you may want to switch to Compute Engine, which allows for the most granular control over network, compute, and disk.
AI Platform prediction endpoints allow for low latency inference, while being a managed service. Version control allows us to deploy and test multiple versions.

Conclusion

Cloud Functions serves as a high performance business layer for low latency control loops. Our metrics demonstrate a less than 2 second round trip latency with cloud functions, whereas with Dataflow and Apache Beam, we were seeing less than 5 seconds round trip for similar data and ML models.
Dataflow with the JavaSDK allows you to remove Pub/Sub and go straight to the business layer (via MQTT), but lacks the device management and publish/subscribe model you get with Cloud IoT and Pub/Sub.

Check out all the tools and guides we used in this article below!

Cloud AI Platform — Deploying models
Cloud AI Platform — Writing custom prediction routines
Cloud AI Platform — Getting online predictions
Dataflow
Pub/Sub — Push vs Pull delivery
Compute Engine
IoT Core — Setup and configuration
Cloud Functions — Writing cloud functions
Cloud Functions — Deploying cloud functions
Blog: AI Platform serverless endpoints and Cloud Functions

About the Authors

Kyle Ziegler — Solution Engineer, Google Cloud
Om Patel — Data and Analytics Cloud Architect, Google Cloud