Reducing the size of an Azure Web Role deployment package

If you’ve been working with Azure Web Roles and deployed them to an Azure subscription, you likely have noticed the substantial size of a simple web role deployment package. Even with the vanilla ASP.NET sample website the deployment package seems to be quite bloated. This is not such a problem if you have decent upload bandwidth, but in Australia bandwidth is scarce like water in the desert so let’s see if we can compress this deployment package a little bit. We’ll also look at the consequences of this large package within the actual Web Role instances, and how we can reduce the footprint of a Web Role application.

To demonstrate the package size I have created a new Azure cloud service project with a standard ASP.NET web role:

1

Packaging up this Azure Cloud Service project results in a ‘CSPKG’ file and service configuration file:

2

As you can see the package size for a standard ASPX web role is around 14MB. The CSPKG is created in the ZIP format, and if we have a look inside this package we can have a closer look at what’s actually deployed to our Azure web role:

3

The ApplicationWebRole_….. file is a ZIP file itself and contains the following:

4

The approot and sitesroot folders are of significant size, and if we have a closer look they both contain the complete WebRole application including all content and DLL files! These contents are being copied to the actual local storage disk within the web role instances. When you’re dealing with large web applications this could potentially lead to issues due to the limitation of the local disk space within web role instances, which is around the 1.45 GB mark.

So why do we have these duplicate folders? The approot is used during role start up by the Windows Azure Host Bootstrapper and could contain a derived class from RoleEntryPoint. In this web role you can also include a start-up script which you can use to perform any customisations within the web role environment, like for example registering assemblies in the GAC.

The sitesroot contains the actual content that is served by IIS from within the web role instances. If you have defined multiple virtual directories or virtual applications these will also be contained in the sitesroot folder.

So is there any need for all the website content to be packaged up in the approot folder? No, absolutely not. The only reason we have this duplicate content is that the Azure SDK packages up the web role for storage and both the approot as well as sitesroot folders due to the behaviour of the Azure Web Role Bootstrapper.

The solution to this is to tailor the deployment package a little bit and get rid of the redundant web role content. Let’s create a new solution with a brand new web role:

5

This web role will hold just hold the RoleEntryPoint derived class (WebRole.cs) so we can safely remove all other content, NuGet packages and unnecessary referenced assemblies. The web role will not contain any of the web application bits that we want to host in Azure. This will result in the StartupWebRole to look like this:

6

Now we can add or include the web application that we want to publish to an Azure Web Role into the Visual Studio solution. They key point is to not include this as a role in the Azure Cloud Service project, but add it as a ‘plain web application’ to the solution. The only web role we’re publishing to Azure is the ‘StartupWebRole’, and we’re going to package up the actual web application in a slightly different way:

7

The ‘MyWebApplication’ project does not need to contain a RoleEntryPoint derived class, since this is already present on the StartupWebRole. Next, we open up the ServiceDefinition.csdef in the Cloud Service project and make some modifications in order to publish our web application along the StartupWebRole:
8

There are a few changes that need to be made:

  1. The name attribute of the Site element is set to the name of the web role containing the actual web application, which is ‘MyWebApplication’ in this instance.
  2. The physicalDirectory attribute is added and refers to the location where the ‘MyWebApplication’ will be published prior to creating the Azure package.

Although this introduces the additional step of publishing the web role to a separate physical directory, we immediately notice the reduced size of the deployment package:

9

When you’re dealing with larger web applications that contain numerous referenced assemblies the savings in size can add up quickly.

The Internet of Things with Arduino, Azure Event Hubs and the Azure Python SDK

In the emerging world of Internet of Things (IoT) we see more and more hardware manufacturers releasing development platforms to connect devices and sensors to the internet. Traditionally these kind of platforms are created around microcontrollers, and the Arduino platform can be considered as the standard in (consumer) physical computing, home automation, DIY and the ‘makers community’.

Most Arduinos come with an 8-bit AVR RISC-based microcontroller running at 16 MHz with the modest amount of 2 kilobytes of memory. These devices are perfectly capable of calling REST services with the Ethernet library and Arduino Ethernet shield. However, we do face some challenges when it comes to encrypting data, generating Azure shared access signatures and communicating over HTTPS due to a lack of processing power and memory. The Arduino platform has no SSL libraries and therefore cannot securely transmit data over HTTPS. This article shows a solution to this problem by using a secondary device as a bridge to the Internet.

Microsoft Azure allow us to store and process high volumes of sensor data through Event Hubs, currently still in preview. More information on Event Hubs, its architecture and how to publish and consume event data can be found here. In this article I focus on how to publish sensor data from these ‘things’ to an Azure Event Hub using a microcontroller with field gateway that is capable of communicating over HTTPS using the Azure Python SDK.

Azure Event Hubs

Before we start we need to create an Azure Service Bus Namespace and an Event Hub. This can be done in the Azure management portal:

Creating an Azure Event Hub
Creating an Azure Event Hub

When creating the event hub we need to specify the number of partitions. The link provided earlier will describe partitioning in detail, but in summary this will help us to spread the load of publishing devices across our event hub.

Event Hub Partitioning
Event Hub Partitioning

We can also define policies that can be used to generate a shared access signature on our devices that will be sending event data to the hub:

Event Hub Policies
Event Hub Policies

Arduino Yun

The Arduino Yun combines a microcontroller and ‘Wi-Fi System on Chip’ (WiSOC) on a single board. The microcontroller allows us to ‘sense’ the environment through its analogue input sensors, whereas the WiSOC runs a full Linux distribution with rich programming and connectivity capabilities. The WiSOC can be considered as the field gateway for the microcontroller and is able to send data to the Azure Event Hub. For other Arduino development boards that only have a microcontroller you can for example use a Raspberry Pi as the field gateway.
For the purpose of this demo we’ll keep the schematics simple and just use a simple temperature sensor and some LEDs to report back a status:

Yun temperature drawing_bb

The Arduino sketch reads the voltage signal from the temperature sensor and converts it to a temperature in Celsius degrees as our unit of measurement. The microcontroller communicates with the Yun Linux distribution via the bridge library, and blinks either the green or red LED depending on the HTTP status that is returned from the Linux distribution.
The complete Arduino sketch looks like this:

The Arduino bridge library is used to run a Python script within the Linux environment by executing a shell command to send the temperature to the Azure Event Hub. Next we’ll have a look at how this Python script actually works.

Python SDK

The Microsoft Azure Python SDK provides a set of Python packages for easy access to Azure storage services, service bus queues, topics and the service management APIs. Unfortunately there is no support for Event Hubs at this stage yet. Luckily Microsoft is embracing the open source community these days and is hosting the Python SDK on GitHub for public contribution, so hopefully this will be added soon. Details on how to install the Azure Python SDK in a Linux environment can be found on http://azure.microsoft.com/en-us/documentation/articles/python-how-to-install/. You can use a package manager like pip or easy_install to install the Python package ‘azure’.

The complete Python script to send event data to an Azure Event Hub is as follows:

The script can be called with a series of sensor values in the following format:

python script.py temperature:22,humidity:20 deviceid

Multiple key-value pairs with a sensor type and sensor value can be provided, and these will be nicely stored in the JSON message.

By using the ServiceBusSASAuthentication class from the Python SDK we can easily generate the shared access signature token that is required by the various services in the Azure ServiceBus, including Event Hubs.
Sending the actual event data is done with a simple REST call. Although Event Hubs allow any arbitrary message to be sent, we send JSON data which is easy to parse by potential consumers. Event data will be sent to a specific partition in the Event Hub. The hostname of the Arduino Yun is used as the partition key. Azure is taking care of assigning an actual Event Hub partition, but by using the hostname as the partition key it’s more likely that traffic from different devices is spread across the Event Hub for better throughput. The Python script will create the appropriate REST HTTP request according to the Azure Event Hub REST API:

When we deploy the Arduino sketch it will start sending the temperature to the Azure Event Hub continuously with one second intervals. We can confirm successful transmission by consuming the Event Hub data with Service Bus Explorer:

Service Bus Explorer
Service Bus Explorer

Conclusion

I’ve demonstrated how we can combine the Arduino microcontroller platform to read sensor data with a more powerful computing environment that runs Python. These allow our ‘things’ to leverage Azure Event Hubs for event processing with the potential to scale to millions of devices.