Security ain’t simple, and it will never be

Every few months or so, we get a message from a customer that sounds like this:

I am looking to integrate JWT to my app. I found this tutorial and trying to follow it in my code. I am now trying to encrypt the signature with an RSA public key and decrypt it later with my private key to compare the hashes, but for some reasons my encryption results are always different.

If you don’t follow what’s happening, and I think most of my readers don’t, here’s what.

First, one guy publishes a tutorial that explains the townsfolk a general process of building a space rocket. Just take some titanium for the body, solder a guidance system (shouldn’t be that much harder than soldering that SatNav chip to your Arduino board), get some rocket fuel – just be careful, it is a bit super-deadly – and in a few months top you’ll be able to check for yourself whether the Great Wall can really be seen from space.

This makes Mick, an honest town lad, interested (he was a bit into rockets himself back in Y7), and he decides to launch a space travel business, using that tutorial as a guide for building his own space rocket. Mick decides to replace titanium with aluminum (as that is cheaper that way), but his aluminum doesn’t stay in shape as per the instructions because the feathering is too heavy for it. He feels frustrated and decides to get rid of some of the feathering.

Meanwhile, the town is getting interested in the project, and Mick’s bookings are growing steadily.

* * *

When my friend got her first car, her mum said to her: I’m super happy for you, darling. Could you please promise me that you will always bear in mind one important thing: it may not always look like that, but you are about to take care of a 3-tonne killing machine. Please be careful.

My friend recalls these words every time she turns the key.

We need to grow up. We need to understand that security is serious. We need to bear in mind that by integrating security into a product we are taking care, well, not of a killing machine, but of something of a very similar scale. Taking it lightly is extremely dangerous.

And I think Mick is as much of a victim here as his customers are. Tutorials like the one mentioned in the beginning of this post make complex things look simple. They make high-risk systems appear risk-free. They say, ah look at this funny thing here, it is called security and even you can do it. Go ahead!

I have actually been a Mick numerous times myself. I love doing things with my hands and consider myself a capable DIY’er – something of an orange or even green belt. And yet, dozens of times I have let YouTube DIY videos delude myself into thinking that a job is not as complex as I thought it was. Hey, just look how easy it was for that young couple to build a patio. Surely it can’t be that hard?

The outcome? I don’t want to talk about it.

And that’s why I stopped writing any manuals, guidance, todo’s, instructions, or whitepapers on security topics unless I am absolutely certain that the audience is capable of following them. Even when I do, I warn my readers that the job they are looking to embark on requires excellent technical competence, and I do so boldly and unambiguously. Security engineering is one of the largest surfaces for the dropped washers, and by directing irresponsibly you are playing your own part in creating the future chaos.

So, let’s re-iterate it for one last time:


WARNING:

Security is complex and can be dangerous if approached irresponsibly. Please, do not make it look simple.


Picture credit: FDA

The Dropped Washer Effect

One of these buildings can melt your car down. Can you identify the culprit?

Have you ever come across a situation where something, utterly negligible and minor, had become a cause for a major disruption or even an accident? Such as a small crack in an underground water pipe, dripping inconspicuously for a couple of years, and eventually causing a landslide after accumulating a critical mass of water? Or a seemingly common glass building capable of focusing the sunlight so that it melts the bodywork of cars parked nearby?

If so, chances are high that you observed an example of the Dropped Washer effect. Named after a Boeing 737 accident in Okinawa, Japan, the dropped washer effect describes large-scale adverse events that happened because of the cause of an incomparably lower significance. The unfortunate Boeing ended up burning out completely because of a missing slat mechanism washer, 0.625 inches wide, that the engineering crew forgot to replace after the aircraft’s last service.

One characteristic of the potential dropped-washer features that makes them particularly naughty is their zero perceived value for the business. Offering no added opportunities and presenting no apparent risks for the product, they often do not even exist in the minds of the product stakeholders. This important peculiarity makes it all too easy for them to slip every safety measure employed in modern production flows – from risk assessment to quality control.

Happily, in many cases there are techniques that can help increase our chances of spotting and eliminating the dropped washers from our projects.

Check out my new paper here.

Picture credit: Reuters

Check Your Backups, Now

Last week, a number of services hosted in Google Cloud suffered a dramatic outage. Following a maintenance glitch, services like YouTube, Shopify, Snapchat, and thousands of others became unavailable or very slow to respond. Overall, the services were down for more than four hours, before the availability of the platform was finally restored.

The curious thing about this incident was not the outage itself (sweet happens), but the circumstances behind it that made it last that long. Cloud service providers, as a rule, aim for the highest levels of availability, which are carved in their SLAs. So how could it happen that one of the leading global computing platforms was taken down for more than four hours? Happily, Google is very good in debriefing its failures, so we can have a sneak peek at what have actually happened behind the scenes.

It all started with a few computing nodes which needed to undergo routine maintenance and thus had to be temporarily removed from the cloud – a common day-to-day activity. And then something went wrong. Due to a glitch in the internal task scheduler, many more other, worker nodes had been mistakenly dismissed – drastically reducing the total throughput of the platform, and causing a Chertsey-style gridlock.

Ironically, Google did everything right, exceptionally right. They considered that risk on the design stage. They had a smart recovery mechanism in place that should have kicked in to recover from the glitch and provide the necessary continuity. The problem was that the recovery mechanism itself was supposed to be run by the faulty scheduler. Yet, being a system management task with a lower priority than the affected production services, it was pushed far back in the execution queue. And since the queue was miles long by that time, the recovery service in the choking cloud has never made its way to its time slice.

Any lessons we can learn from this incident? There are myriads; the deeper your knowledge about cloud infrastructures is, the more conclusions you can draw from it. A security architect can draw at least the following two:

1. Backing up systems is a process, not a one-off task. Your backup routine might have worked at the time you set it up, but things break, media dies, and passwords change.  Don’t risk, go and test your backups now – emulate a disaster, pull that cord, and see if your arrangements are capable of providing continuity. Don’t be tempted just to check the scripts – try the actual process in the field. Put this check on your schedule and make it a routine.

2. When designing a backup or recovery system, take extra care to minimize its dependencies on the system being recovered. It is worth remembering that modern digital environments are very complex, and you might need to be quite imaginative to recognise all possible interdependencies. The recovery system should live in its own world, with its own operating environment, connectivity, and power supply.

It is very easy to get caught in this trap, as it gives us the imaginary peace of mind we’re craving for. We know that the system is there for us, and we sleep well at night. We know that should a bad thing happen, it will give us its shoulder. We only realise it is not going to when it’s too late to do anything to make it right.

Just as I was writing this, my friend called me with a story. She went on an overseas trip, and, while being there, wanted to Skype home. Skype, however, having realised her IP was unusual, applied extra security and sent her a verification e-mail. It all would have ended there, if only her Skype account wasn’t bound to a very old e-mail account at an ISP that was blocked in the country for political reasons – so she couldn’t get to her inbox to confirm her identity. Luckily it was just Skype and luckily she knew about VPN – but the things might have become way more complex with a different, life-critical service.

So, really, you will never know how a cow catches a hare. There are way too many factors that may kick in unexpectedly, and, worst of all, unknown unknowns are among them. Still, by using the above two approaches wisely and persistently, you may reduce the risks to the negligible level, which is well worth the effort.

Picture credit: danielcheong1974

A Bag of Contention

A lot has been said about passengers stopping to collect their cabin bags while escaping the blazing Aeroflot aircraft in Moscow last week. Some media went as far as blaming them for excessive deaths of those trapped behind them, with certain Russian politicians even urging to initiate criminal proceedings against those who stopped to pick their bags up (yes, in Russia, party still goes to you © Yakov Smirnoff).

The moral perspective of this complicated matter is unlikely to ever have any kind of satisfactory resolution. It goes deep into the pre-social parts of our brain, mostly cared for by instincts, reflexes, and fight-or-flight responses – and nature is extremely difficult to judge. In the moments of extreme stress and imminent risk of death, few people would think of anything other than their own salvation. The extent of that few depends on many factors, with different social groups balancing fight, flight, and collaborate differently, but it is crystal clear that we can’t blame people for acting selfishly when their lives are in danger.

That being said, there is no doubt that the problem must be dealt with, for the sake of our own future, first of all. Obviously, the collection of cabin bags did delay the evacuation (though the extent of its contribution is yet to be assessed – and I hope it will be assessed). Yet, what’s more important, is that should a similar accident happen again, in whatever town or country it might take place, the behaviour patterns of the escaping passengers are very likely to be highly similar to what we’ve observed in Sheremetyevo.

The fact is, the safety rules around hand luggage, both written and unwritten, are quite relaxed. Effectively, you can do whatever you like with your bags while on board as long as they fit into the airline’s allowance and don’t contain prohibited items. While pre-flight safety briefings advise you against taking your cabin bags with you during evacuation, this is hardly being enforced. It might be hard to resist the temptation to grab the bag that contains valuables such as your passport, phone, or laptop.

One of the reasons for that is that over the last few decades the role and concept of cabin luggage significantly changed – while the rules governing it remained largely the same. For the vast share of today’s passengers, their cabin bags are their primary and only luggage, especially on short-haul flights. It differs drastically from what it used to be twenty years ago, when most of carry-on items were jackets, overcoats, clutches, and an odd duty free bag, with the principal luggage checked into the aircraft hold. The hold itself acted as a physical security control: in case of an emergency, there was no way for the passengers to retrieve their bags. The small or useless carry-on items didn’t pose any risk of a slowdown during the evacuation. Conversely, most of hand luggage items today are stuffed-to-capacity purpose-made ‘cabin bags’, designed and manufactured specifically to ‘just fit’ into the measuring cages. This makes a huge difference, and this is the problem that must be addressed in the safety rules.

The abundance of bulky personal items on board the aircraft is even more complicated by the fact that with many airlines you can’t bring two cabin bags on board, however small the second one is. This forces you to fit everything you need to take with you into that single piece, mixing items of low and high value in one huge cabin suitcase. Should you need to evacuate, even if you would only intend to grab the high-value items, you would have no other option but to take the bulky low-value ones with you too.

So we need to find a convenient way to address those matters. We can’t make people not care about what they value (e.g. their passport) – but we can totally help them with leaving whatever they value less behind. For example, we could give the cabin crew the powers to lock the overhead cabin bag compartments for the whole duration of the flight, and at the same time extend the hand luggage policy to include a [much] smaller second bag. This second bag could be as small as a clutch, a belt bag, or a neck pouch – just enough to accommodate your passport, phone, and wallet.

Such approach would let passengers separate their items of importance (which in most cases are quite compact in size) from the less significant ones. It would introduce a security control in the form of a lockable overhead compartment, yet give passengers peace of mind that the items they value won’t be lost or destroyed should they need to evacuate.

One way or another, one thing that can be said for sure is that the question of aircraft evacuation and the role of hand luggage in it should not be shelved. The lessons of the Aeroflot crash should be learned, in particular in respect of hand luggage policies and procedures. We would be complete fools if we fail to admit the obvious and simply transfer the blame onto the survivors – since this would mean transferring the punishment onto our future selves.

Moby-Dick; or, the Threat

Norwegian fishermen caught a white beluga whale carrying a harness with surveillance equipment attached to it. Marine experts believe that the whale had been trained by Russian navy, before escaping from its base in Murmansk and heading west through the waters of the Arctic ocean.

I doubt the whale had anything to do with Russian navy for a number of reasons (and it’s not for the ‘St Petersburg’ label on its harness, which, despite its absurdity, counts towards the opposite), but, really, there is nothing that would have prevented the navy from being the actual origin of the animal. For many years Russian military have been experimenting with training underwater mammals to guard their military bases in the Arctic, not to mention that one of their first initiatives in Ukrainian Crimea after temporarily anschluß’ing the peninsula in 2014 was restoring a long-dismissed Soviet dolphin training facility in Sevastopol.

What’s worth noting about this curious occasion is that we got used to believing that attacks, intrusions, and security compromises that originate from man-made sources normally rely on the man-made technologies. The Norwegian story illustrates that it is a mistake to underestimate the risks posed by nature’s own creations, in particular due to their natural ability to disguise, and our own, very human, propensity to think of ourselves as being above the nature, and, conversely, of the nature being well below us.

Trained animals, while probably being one of the most significant, is not the only man-aided source of security threats having their origins in the natural environment. There are certain geological threats: man-provoked floods, rainfalls, earthquakes, and tsunamis. There are biological threats: inflicted invasions of vermin, planted insect-spread diseases, and distribution of weed species capable of taking over large areas of land. Those threats are very hard to recognise, very hard to investigate, and very hard to mitigate.

Apart from direct risks of proactive exploitation of geological and biological opportunities, nature opens up a huge number of covert channels which can be used to spy on opponent’s activities. One example is that excessive waste from a highly concealed military base can lead to increase in population of foxes and other scavengers in surrounding areas. However small those deviations could be, modern monitoring and data mining facilities are likely to be capable in detecting them. Modern AI (let’s just call it that way) is exceptional in detecting and matching patterns, and nature provides countless possibilities for it to learn what the right way of things should look like – and what it should not.

Undoubtedly, crafting attacks involving nature is quite demanding, and brain- and labour-intensive. Setting them up requires a lot of investment and effort, which are only affordable for the richest of this world. Still, it’s all about ‘Il fine giustifica i mezzi’, in the end, isn’t it?

Picture credit: Guardian

That is no question

Back in 1854 a renowned mathematician George Boole was the first to describe the concepts of algebra and logic over a binary field, which were eventually named after him and are now regarded as one of the pillars of the information age.

The power and universality of foundations given to IT engineers and scholars by the works of Boole had one adverse effect though. Boolean had landed such a major role in software development tools and in developers’ minds, that the concept started to be abused and misused by being employed in scenarios for which it wasn’t exactly fit.

For as long as software programming was primarily a transcription of logical chains into English words and consisted largely of unequivocal instructions alike ‘is the value stored in CX greater than zero?’ everything worked well.

And then everything went out of sync. Since around 70’s, software programming started making its way up to higher, much higher abstraction layers. C has arrived, followed by OOP and C++, and then Java, Python, and Ruby. Complexity levels of programming tasks skyrocketed. No-one cared about contents of CX anymore. Questions answered by programmers in their code started resembling non-trivial day-to-day questions that we come across in real life. Yet the tools in the box, despite looking smart, shiny, and new, remained largely the same.

Let me ask you a simple question.

Can the outcome of a friend-or-foe identification – e.g. that of an aircraft – be represented with a Boolean type?

What could be easier, at first glance, – the aircraft is either friend or foe, right?

Wrong. There are at least two more possible outcomes: “the aircraft has not been positively identified (can be either friend or foe),” and “no aircraft has ultimately been found.” Those two outcomes are of no less importance than the ‘primary’ ones, and, being ignored, may lead to erroneous or even catastrophic decisions.

If you answered yes, don’t be too hard on yourself. The human brain is a skilful optimizer. Despite being often referred to as ‘intelligent’, when left to its own devices it actually does everything in its power to think less. It operates an impressive arsenal of corner cutting techniques, such as question substitution, simplification, framing, priming, and around a hundred of others to avoid the actual thinking in favour of pattern-based decisions.

And this doesn’t marry well with Boolean type. The problem of Boolean is that it offers an illusion of an obvious answer, suggesting a simple choice between two options where there is no actual choice, or where there might be something besides that choice.

Working hard on optimizing its decision making process, our brain celebrates the chance to substitute the whole set of outcomes with an easier choice between two opposites: yes-or-no, friend-or-foe, right-or-left, good-or-bad, a-boy-or-a-girl. Inspired by simplicity of the answer, the analytic part of our brain gives up and accepts the choice – even if the opposites together only comprise so much of the whole variety of the outcomes.

Development environments kindly assist the irrational part of our brain by providing the tools. I find it amusing that in line with the evolution of programming languages Boolean was given an increasingly significant presence: from none in assembly language, through int-emulated surrogate in C, to a dedicated type in C# and Java. That is, as software developers had to deal with questions more and more vague, the development frameworks kindly offered answers more and more simple.

“Wait,”, a smart programmer would say, “and what about exceptions? What about nullable types? Aren’t those supposed to deal with everything that goes beyond true and false?”

In some scenarios they do – and in the others they don’t. Exceptions may work well where there is clearly a yes-or-no choice that falls in, and a marginal alternative that falls out. The problem is that in many instances there is no yes-or-no choice at all, but our little grey cells would tells us there is. Apart from that, exceptions are an opt-in technique for our brain: something that needs to be considered proactively – and therefore they will be among the first to be ‘optimized’ and neglected. How many programmers do you personally know that do exception handling right? And how many do you know that don’t?

And so it goes. It’s Friday, well after 7pm. It’s only a programmer and a QA guy in the deserted office. Their deadline passed a few days ago. They are rushing to finish the system tonight. The programmer starts typing, ‘if…’ and stops for a moment. He quickly glances at the bottom-right corner of his screen: 7:53pm. He sighs, takes a sip of his cooled down tea, and completes the line:

if (!friend) { missile.launch(); }

His code is complete now. He commits the changes, writes a quick note to the client, and drives home to join his family at a late dinner. The QA chap runs a quick round of positive tests and follows his fellow.

You already know what happened next.

* * *

This story is not about negligent programmers. Rather, it is about the dangerous mix brought in by peculiarities of human mind and perks offered by modern development environments, which together give rise to serious logical errors in programs.

Most real-life questions that arise on the uneven ground under our feet have no black-or-white answers. Yet, for many of them, it is way too easy to get caught in the trap of narrowing the whole set of answers down to two mutually exclusive absolutes. The narrower becomes the gap between the programmer’s way of thinking and the human’s, the clearer this problem exposes itself in the software development profession.

So the next time you are tempted to think of some characteristic as a boolean, do make an effort to ask yourself: does this choice really have only two possible options? Didn’t I neglect any important outcomes? Isn’t my mind trying to cut short and take advantage on me?

Because it most certainly will.

Pic credit: mindfulyourownbusiness.com

7 Security Mistakes Boeing Made

The story of the two recent Boeing 737 MAX crashes is packed with questions we are yet to find answers to, yet it is already clear that the distinctive feature of the double tragedy is overwhelming number of gross blunders – a lot more than you would expect in a field so extremely attentive to security and safety as commercial aviation.

While we don’t know all the details of the crashes so far, what we do know points out a number of grievous security flaws:

  • security feature as a paid option, not by default: Boeing charged airlines extra for sensor discrepancy detectors; neither LionAir nor Ethiopian aircraft had them installed;
  • hiding information: Boeing hid from 737 pilots that their new aircraft featured a new MCAS system, which could quietly intervene and override the pilots’ control of the aircraft;
  • ignoring feedback: MAX pilots complained to FAA about issues with the aircraft’s in-flight performance, but those were largely silenced/ignored;
  • no safeguards for MCAS failure: this has not been officially confirmed, but it looks like pilots wouldn’t be able to switch off MCAS if they needed to, effectively being unable to fly the aircraft fully manually to recover from MCAS or sensor failure;
  • creating workarounds rather than fixing bugs: the MCAS system was introduced to balance the MAX’s tendency to raise its nose up due to changes in the aircraft’s aerodynamics as a result of its bigger engines. In other words, the essence of MCAS is effectively adding a ton of BBQ sauce on to your overpeppered steak, rather than cooking a well-peppered steak from the very start.
  • conflict of interest: it appears that a great deal of safety tests of the new aircraft were performed by its very creators;
  • trust compromise: this is by far the grossest mistake made by Boeing and FAA; something that might well affect the success of the whole MAX family and of its freshest 777X machine, which was quietly (guess why) introduced two days ago. Whereas the whole world had been grounding their MAX fleets, Boeing chose the tactics of silencing the matter, denying any allegations, and refusing to admit similarities between LionAir and Ethiopian crashes. The only statement that made sense from them was about introducing a vague ‘software update.’ A matter of uttermost importance is that, as per Boeing’s own words, the prospective change was in the works well before the second crash.

I feel incredibly sorry for those who lost their friends and relatives in the crashes, and I feel sorry for the designers of the MAX, which is without doubt a great aircraft. I only hope that the investigation goes smoothly (with Boeing bosses apparently being quite reluctant for it to), and discovers the full truth about the crashes. Being sensible humans, the best we can do for those who gave up their lives to the tragedy, is to learn our lessons and write down all the mistakes we made, and then do everything in our power to prevent anything similar from happening in future.

Picture credit: Boeing

Facepalm

Facebook, ever again, shows that it prefers to learn on its own mistakes rather than someone else’s. This time, it’s about storing passwords in plain text: a textbook security negligence, at different times stepped on by Equifax, Adobe, and Sony.

And this really doesn’t help in building confidence in the social network. We entrust them our most personal pieces of information, and they don’t give a damn about keeping it safe.

We have found no evidence to date that anyone internally abused or improperly accessed them.”, said Pedro Canahuati, Facebook’s vice president of engineering, security, and privacy. Given all the recent breaches in this company’s security, I can’t help translating this to human language as “we didn’t bother so we didn’t put any access control audit mechanisms in place, so whoever saw your passwords, there is no (and can’t be) any real evidence to that.”

Just a couple of days ago I was asked to send money via Facebook payment service. In the middle of the payment process I realized it is not possible to make the payment – which would have been a one-off one for me – without having Facebook remember either my card or Paypal details. I stopped, closed the Facebook tab, and paid with a different method. Glad I did.

Picture credit: Alex E. Proimos

North Korean hacker held responsible for major attacks

Last Thursday the US Department of Justice accused a North Korean hacker, Park Jin Hyok, of hacking Sony Pictures in 2014, committing a theft of $80m from the Bangladesh Central Bank in 2016, the launch of WannaCry malware in 2017, and a series of attacks on Lockheed Martin.

This is undoubtedly quite an impressive list of achievements for a single individual, especially taking into account education and career opportunities his country of residence is capable of providing.

Yet, there are two more thoughts about this matter that come to my mind.

First, it looks like a vast share of North Korean state hacking efforts had been concentrated in the hands of a single individual or a small group of individuals. Not only it is quite amusing to compare that, effectively, family business to some anthill-like underground syndicate carefully instilled in our heads by a crowd of politicians and journalists a good few orders of magnitude bigger; but it is also intriguing to speculate if we shall see the fall of the North Korean hacking programme now that Hyok is out of the game.

Second, all the above attacks have become known to us because they involved a straightforward and noticeable loss for their victims. However, a successful attack doesn’t always imply an immediate and tangible damage. Many hacks are performed by ‘black’ and ‘white’ enthusiasts doing that for all sorts of self-satisfaction; a lot are only performed to plant a time bomb and trigger it after a while; finally, a huge share of attacks target private and sensitive data stored at the victims’ premises, only to sell it later anonymously on dark web. All such attacks rarely get publicity. The bottom line here is, if a small group of NK hackers could do the damage of that scale, what is the actual potential of all the hackers out there?

The Greatest Backdoor

The greatest backdoor of all times might be running right before your eyes.

Earlier today we were quite surprised to discover that our Windows build server rebooted after installing another set of automatic updates. This looked weird, as automated reboots without an administrator’s approval have never been on our security policy. Still, given that we have just upgraded our Windows Server from 2012 to 2016, we believed it to be a misconfiguration issue and embarked on correcting it.

Surprisingly, disabling automated restarts in Windows Server 2016 appeared to be not an easy task. Believe it or not, but unlike it used to be in Server 2012, there is no direct setting in Server 2016 to disable the reboots. You have to employ awkward workarounds, like always having someone logged in, to stop your server from rebooting. Otherwise, it will always reboot automatically, every time a yet another bunch of updates are downloaded and installed.

This looks very worrying. Many server administrators quite reasonably prefer to be in control of reboots of their servers to harmonise them with their working hours, system load, backup and maintenance schedules, and myriad other factors. A mission-critical server that reboots out of the blue in the middle of the night may (and will) lead to all sorts of problems – from a local DoS after failing to complete the restart, to a gaping hole in the company’s network if a third-party IPS fails to co-operate with the updated version of some Windows component.

From a more distant perspective, by removing the possibility to disable automated reboots, Microsoft has acquired a gigantic ‘power switch’, which it can use to force thousands of servers across the world into rebooting by simply sending them a specific ‘update’ package. This puts the owners of those servers into an uncomfortable position of hostages. Even if we do believe in good intentions of the Seattle company, how can we be sure that someone won’t break into their update delivery environment one day, and use the legitimate update procedure to send to all the Windows servers out there a deadly restart command?

Image credit: pngtree.com