Thread hanging when using Azure client API

There’s a already a good stack overflow post which covers why the storage client hangs when uploading / downloading and the network cable is removed (there is an issue with the .Net streaming classes it uses under the skin).

This problem isn’t just limited to blobs, we’ve seen it when talking to queues too.  I’m pretty sure it would affect all calls.  We did speak to some folk (informally) and they think that the root cause may be resolved in .Net 4.5 – we haven’t yet tested ourselves (it will be a while before we upgrade our production environment).

It is important you understand this issue, even if you think you won’t have transient networking issues.  It’s a given if you’re on a train that the network connection will be up and down a fair bit.  But it still can happen when you’re on a server in Azure.  We run at large scale in Azure and we notice it from time to time.  The problem is if you don’t address it a thread which you thought was doing useful stuff, let’s say de-queueing and processing messages from a queue, is actually stuck on a network transfer somewhere.  It’s very nasty because that thread will never come back and you get no errors reported.  You’ll then be thinking, why isn’t that bit of our app working and spend a while peering through log files trying to work out what happened poor thread 33.

Where you monitor for this problem is up to you but the only way to really solve it is to monitor for how long things take and cancel them if they take too long.  If you’re using TPL you could start two tasks and wait either of them, if the timeout task completes then fail (remember to cancel the blocked task just in case it does actually complete).  We opted to not use TPL and abort the blocked thread in attempt to rescue some resources.

Here’s the code we use:

public class MonitoredWorkerPool
    private readonly ConcurrentQueue<Worker> _workers = new ConcurrentQueue<Worker>();
    public void DoWork(Action work, TimeSpan timeout)
        var worker = _GetWorker();
        if (!worker.Join(timeout))
            throw new WorkerTimeoutException();

    private Worker _GetWorker()
        Worker worker;
        return !_workers.TryDequeue(out worker) ? new Worker(this) : worker;

    private class Worker
        readonly MonitoredWorkerPool _ownerPool;
        readonly Thread _thread;
        readonly AutoResetEvent _workerWaitEvent;
        readonly AutoResetEvent _workCompleted;
        Action _work;
        bool _isRunning = true;

        public Worker(MonitoredWorkerPool ownerPool)
            _ownerPool = ownerPool;
            _workerWaitEvent = new AutoResetEvent(false);
            _workCompleted = new AutoResetEvent(false);
            _thread = new Thread(_WaitForWork);

        private void _WaitForWork(object state)
            while (_isRunning)
                catch (Exception ex)
                    Exception = ex;

        Exception Exception { get; set; }

        public void DoWork(Action action)
            _work = action;

        public bool Join(TimeSpan timeout)
            return _workCompleted.WaitOne(timeout);

        public void Abort()
            _isRunning = false;

        public void Complete()
            var exception = Exception;
            _work = null;
            Exception = null;
            if (exception != null)
                throw exception;

About Tom Peplow

C# .Net developer based in London and the South Coast
This entry was posted in Uncategorized and tagged . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s