Connect with us

Technology

Excuse me, what just happened? Resilience is tough when your failure is due to a ‘sequence of events that was almost impossible to foresee’

Voice Of EU

Published

on

Feature When designing systems that our businesses will rely on, we do so with resilience in mind.

Twenty-five years ago, technologies like RAID and server mirroring were novel and, in some ways, non-trivial to implement; today this is no longer the case and it is a reflex action to procure multiple servers, LAN switches, firewalls, and the like to build resilient systems.

This does not, of course, guarantee us 100 per cent uptime. The law of Mr Murphy applies from time to time: if your primary firewall suffers a hardware failure, there is a tiny, but non-zero, chance that the secondary will also collapse before you finish replacing the primary.

If you have a power failure, there is a similarly micro-tangible likelihood that the generator you have tested weekly for years will choose this moment to cough stubbornly rather than roaring into life. Unless you are (or, more accurately, the nature of your business is) so risk-averse that you can justify spending on more levels of resilience to reduce the chance of an outage even further (but never, of course, to nothing).

There are occasions, though, where planning for failure becomes hard.

Let us look at a recent example. In July 2020, the main telco in Jersey had a major outage because of a problem with a device providing time service to the organisation’s network. The kicker in this event was that the failed device did not fail in the way we are all used to – by making a “bang” noise and emitting smoke; had it done so, in fact, all would have been well as the secondary unit would have taken over.

Impossible

No, this was a more devious kind of time server which only part-failed. It kept running but started serving times from about 20 years in the past (by no coincidence at all this was the factory default time setting), thus confusing network infrastructure devices and causing traffic to stop flowing.

Customer dissatisfaction was palpable, of course, but as an IT specialist one does have to feel something for the company’s technical team: how many of us would ever consider, as a possible failure case, something that the technical chief described quite correctly as a “sequence of events that was almost impossible to foresee”?

(Incidentally, in a somewhat more good-news story, stepping back a moment to our point about extra layers of resilience, the same company had previously survived three offshore cables being severed… by having a fourth).

Could monitoring tools have been put in place to see issues like this when they happen? Yes, absolutely, but the point is that to do so one would first need to identify the scenarios as something that could happen. In the sense of risk management, this type of failure – very high impact but infinitesimally unlikely – is the worst possible kind for a risk manager. There are theories and books about how one can contemplate and deal with such risks, the best-known of which is probably Nassim Nicholas Taleb’s book The Black Swan, which talks of just this kind of risk, but if you want to try to defend against the unexpected then at the very least you need to sit down with a significant number of people in a highly focused way, preferably with an expert in the field to guide and moderate, and work on identifying such possible “black swan” events.

While the black swan concept is most definitely a thing to bear in mind, there is in fact a far more common problem with systems that we consider resilient – a failure to understand how the resilience works.

One particular installation at a company with an office and two data centres had point-to-point links in a triangle between each premises, and each data centre had an internet connection. The two firewalls, one in each data centre, were configured as a resilient pair, and worked as such for years. One day internet service went down, and investigation showed that the secondary unit had lost track of the primary and had switched itself to become the primary. Having two active primaries caused split traffic flows, and hence an outage.

Predictable

In hindsight, this was completely predictable. The way the primary/secondary relationship was maintained between the devices was for the primary to send a “heartbeat” signal to the secondary every few seconds; if the secondary failed to receive the heartbeat three times, it woke up and acted as a primary. Because the devices were in separate data centres, they were connected through various pieces of technology: a LAN patch cord into a switch, into a fibre transceiver, into a telco fibre, then the same in reverse at the other end.

A fault on any one of those elements could cause the network devices to reconfigure their topology to switch data over the other way around the fibre triangle – with the change causing a network blip sufficiently long to drop three heartbeats. In fact, the only approved configuration for the primary/secondary interconnection was a crossover Ethernet cable from one device to the other: the failover code was written with the assumption that, aside perhaps from a highly unlikely sudden patch cord fault, the primary becoming invisible to the secondary meant that the former had died.

Many of us have come across similar instances, where something we expected to fail over has not done so. It’s equally common, too, to come across instances where the failover works OK but then there are issues with the failback, which can be just as problematic. I recall a global WAN I once worked on where, for whatever reason, failovers from primary to secondary were so quick that you didn’t notice any interruption (the only clue was the alert from the monitoring console) but there was a pause of several seconds when failing back.

In the firewall example, even when connectivity was restored the devices would not re-synch without a reboot: remember, the only supported failure scenario was the primary dying completely, which meant that it was only at boot time that it would check to see which role its partner was playing so it could act accordingly. Until someone turned it off and back on again, there was no chance that the problem would go away.

To make our resilient systems truly resilient, then, we need to do three things.

First, we should give some thought to those “black swan” events. It may be that we cannot afford masses of time and effort to consider such low-probability risks, but at the very least we should take a conscious decision on how much or how little we will do in that respect: risk management is all about reasoning and making conscious decisions like that.

Expertise

Second, if we don’t have the knowledge of the precise way our systems and their failover mechanisms work, we must engage people who do and get the benefit of their expertise and experience… and while we’re at it, we should read the manual: nine times out of ten it will tell us how to configure things, even if it doesn’t explain why.

Finally, though, we need to test things – thoroughly and regularly. In our firewall example all potential failure modes should have been considered: if a failure of one of a handful of components could cause an outage, why not test all of them? And when we test, we need to do it for real: we don’t just test failover in the lab and then install the kit in a production cabinet, we test it once it’s in too.

This may need us to persuade the business that we need downtime – or at least potential downtime to cater for the test being unsuccessful – but if management have any sense, they will be persuadable that an approved outage during a predictable time window with the technical team standing by and watching like hawks is far better than an unexpected but entirely foreseeable outage when something breaks for real and the resilience turns out not to work.

Testing

Oh, and when you test failover and failback, run for several days in a failed-over state if you can: many problems don’t manifest instantly, and you will always learn more in a multi-day failover than in one that lasts only a couple of minutes. Bear in mind also the word “regularly” that I used alongside “thoroughly”. Even if we know there has been no change to a particular component, there may well be some knock-on effect from a change to something else. Something that used to be resilient may have become less resilient or even non-resilient because something else changed and we didn’t realise the implication – so regular resilience testing is absolutely key.

Because if something isn’t resilient, this will generally not be because of some esoteric potential failure mode that is next to impossible to anticipate and/or difficult or impossible to test. Most of the time it will because something went wrong – or something was configured wrongly – in a way you could have emulated in a test. ®

Source link

Technology

Netflix employees join wave of tech activism with walkout over Chappelle controversy | Netflix

Voice Of EU

Published

on

Employees at Netflix halted work on Wednesday and staged a protest outside the company’s Los Gatos, California, headquarters to condemn the streaming platform’s handling of complaints against Dave Chappelle’s new special.

The actions – which hundreds participated in – are the latest in a string of highly visible organizing efforts in the tech sector, as workers increasingly take their grievances about company policies and decisions public.

“Three years ago, a worker walkout at a major tech company would have been unthinkable,” said Veena Dubal, a labor law professor at the University of California, Hastings. “White-collar workers across the world now understand their labor power, and their ability to change the unethical practices of their employer by withholding their labor.”

On Monday, the transgender employee resources group behind the walkout released a list of specific demands of Netflix, including more funding for trans creators, recruiting more diverse employees and flagging anti-trans content on the platform.

Tensions at Netflix started in early October, when Netflix leaders doubled down on their support for the comedian Dave Chappelle following criticism from viewers, the queer media watchdog Glaad as well as some employees that Chappelle’s new show contained jokes that were anti-trans.

As internal criticism grew, Netflix leaders continued to defend the special. Reed Hastings, the co-chief executive, reportedly said on an internal message board: “I do believe that our commitment to artistic expression and pleasing our members is the right long-term choice for Netflix, and that we are on the right side, but only time will tell.”

Ted Sarandos, the other co-CEO, claimed in an email obtained by Variety: “While some employees disagree, we have a strong belief that content on screen doesn’t directly translate to real-world harm.” He added: “Adults can watch violence, assault and abuse – or enjoy shocking standup comedy – without it causing them to harm others.”

The Sarandos memo in particular fueled the walkout, according to the Hollywood Reporter. “The memo was very disrespectful,” a staffer told the outlet on the condition of anonymity. “It didn’t invite a robust conversation about this hard topic, and that’s normally how things go.”

Ted Sarandos, co-CEO of Netflix.
Ted Sarandos, co-CEO of Netflix. Photograph: Vickie Flores/EPA

Meanwhile, Netflix temporarily suspended Terra Field, a trans employee, who had tweeted that Chappelle “attacks the trans community, and the very validity of transness” and tied such comments to real-world violence. The company said Field was suspended because she had attended a meeting she was not invited to, but it later conceded she had “no ill intent”.

Netflix fired another trans worker who had been involved in organizing the walkout on allegations of leaking internal documents to the press.

“We understand this employee may have been motivated by disappointment and hurt with Netflix, but maintaining a culture of trust and transparency is core to our company,” a Netflix spokesperson told the Guardian about that decision last week.

The employee on Tuesday identified themself as B Pagels-Minor in an interview with the New York Times and denied “leaking sensitive information to the press”.

Social media event pages for the walkout have advertised a rally outside the Netflix headquarters in Los Angeles featuring public figures and speakers.

Staffers participating in the virtual walkout have vowed to halt work and focus on efforts to support the trans community.

‘A wave of worker walkouts’

In this week alone, there are protests at Netflix, the grocery delivery platform Instacart and at Facebook by its content moderators. Uber drivers globally went on strike in 2019. Hundreds of Amazon workers walked out to protest against the company’s climate policies in 2019.

Walkouts have become an increasingly common tactic among tech employees. “We are seeing a wave of them,” said Jess Kutch, executive director of the Solidarity Fund, which raises money to support employees engaged in workplace organizing – including at Netflix.

Google employees were among the first to deploy the strategy on a large scale in 2018, when more than 20,000 workers around the world walked out over the news that the company had given a $90m severance package to an executive who was forced to step down over sexual misconduct allegations (which he has denied).

The incensed workers decried a culture of silence about sexual harassment and systemic racism and demanded Google make concrete changes to address such issues within the company. In particular, they targeted Google’s use of forced arbitration – a practice common in the tech industry in which workers settle legal disputes in a private forum, making it almost impossible for workers to sue their bosses in court and keep repeat offenders from being publicly recognized.

Google employees stage a walkout in Mountain View, California, in 2018.
Google employees stage a walkout in Mountain View, California, in 2018. Photograph: Stephen Lam/Reuters

The November 2018 action changed the way workers in the tech industry organize, experts said. “Workers are observing their peers to see what is effective in moving decision makers, and replicating that in their own companies,” Kutch said.

Kutch noted tech employees studied other protest movements to determine the most effective forms of action, learning, for example, to release specific demands tied to their walkouts. “There is a degree of depth, commitment and planning that was not present even just a few years ago,” she said.

Organizers have particularly taken aim at the tools tech companies had long used to keep dissent internal. Faced with employee pressure, companies such as Google, Airbnb, Facebook and eBay were compelled to end forced arbitration practices.

Employees have also fought companies’ use of non-disclosure agreements, or NDAs, which were initially meant to protect trade secrets, but later allowed companies to keep accusations of wrongdoing from becoming public.

Last month, California passed a law that makes it illegal for firms to prevent employees from speaking out about such issues through the use of NDAs.

Organizing gained another boost when the Black Lives Matter movement and protests laid bare some of the huge inequities in tech and revealed the power of protest to change them.

“Workers woke up at that moment to the fact that if employers are able to discriminate against any one part of the workforce, it hurts everyone,” said Anastasia Christman, senior policy analyst at the National Employment Law Project.

“There have been isolated examples of this kind of thing for years, but employees are increasingly using the leverage of their labor to stand up for diversity and equity,” she added.

The price of whistleblowing

For some employees, the price of speaking out has been steep. Leaked memos showed that in early 2020, Amazon discussed smearing a warehouse worker who spoke out against the company’s Covid-19 practices and was later fired. (Amazon said the employee was fired for putting other employees at risk of Covid-19.) In September 2021, Amazon reached a settlement with two other employees who said they had been fired over their climate activism within the company.

Other whistleblowers have narrated how their lives were upended by speaking out against major tech companies. The worker behind the walkouts at Google, Claire Stapleton, left the company after 12 years of working there, due to perceived retaliation for her role in organizing.

Netflix told the Guardian in an email that it “respect[s] the decision of any employee who chooses to walk out” and recognizes “we have much more work to do both within Netflix and in our content”.

“We value our trans colleagues and allies, and understand the deep hurt that’s been caused,” the spokesperson said.

In a public blogpost, Field outlined much of the vitriol she has sustained for speaking out about the special. She said she did not necessarily want the show removed from the platform, but wanted accountability from Netflix to its workers and viewers.

“We’ve spent years building out the company’s policies and benefits so that it would be a great place for trans people to work,” she wrote. “A place can’t be a great place to work if someone has to betray their community to do so.”

Netflix CEO Sarandos told the Hollywood Reporter on Tuesday that he handled the situation poorly, but that he remains supportive of Chappelle’s work. He said that his previous memos “lacked humanity”, and did not acknowledge that “a group of our employees were in pain”, but said that his stance “hadn’t changed”.

Source link

Continue Reading

Technology

Raspberry Pi 4 in price rise first, chip shortage blamed • The Register

Voice Of EU

Published

on

The price of a 2GB Raspberry Pi 4 single-board computer is going up $10, and its supply is expected to be capped at seven million devices this year due to the ongoing global chip shortage.

Demand for components is outstripping manufacturing capacity at the moment; pre-pandemic, assembly lines were being red-lined as cloud giants and others snapped up parts fresh out of the fabs, and the COVID-19 coronavirus outbreak really threw a spanner in the works, so to speak, exacerbating the situation.

Everything from cars to smartphones have felt the effects of supply constraints, and Raspberry Pis, too, it appears. Stock is especially tight for the Raspberry Pi Zero and the 2GB Raspberry Pi 4 models, we’re told. As the semiconductor crunch shows no signs of letting up, the Raspberry Pi project is going to bump up the price for one particular model.

The 2GB Raspberry Pi 4 will now once again set you back $45, an increase of $10 from its previous retail price. It used to be $45, then was brought down to $35 early last year when the 1GB model was discontinued. Now it’s back up again. This is the first time the project has hiked its prices, the trading arm of the Raspberry Pi Foundation said.

Don’t worry, however, the bump is said to be temporary and the module will eventually return to its original price of $35, company CEO Eben Upton announced on Wednesday.

The 4GB Raspberry Pi 4 and 8GB Raspberry Pi 4 versions will remain at $55 and $75, respectively. For those relying on a supply of $35 2GB boards, the project will bring back those 1GB Raspberry Pi 4 modules, priced $35.

“This provides a degree of choice: less memory at the same price; or the same memory at a higher price,” said Upton. 2GB for $45 or 1GB for $35. A choice, but not one people might expect.

“As many of you know,” he continued, “global supply chains are in a state of flux as we (hopefully) emerge from the shadow of the COVID-19 pandemic. In our own industry, semiconductors are in high demand, and in short supply: the upsurge of demand for electronic products for home working and entertainment during the pandemic has descended into panic buying, as companies try to secure the components that they need to build their products … At Raspberry Pi, we are not immune to this.”

The project is expected to make around seven million of its computer boards total this year, maintaining the same level of production as last year as the pandemic took hold of the world. This is unlikely to increase much next year either, Upton said. Judging from his explanation, this figure is lower than hoped: “Despite significantly increased demand, we’ll only end up making around seven million units in 2021.”

Pis containing 40nm chips will feel the chip crunch the hardest over the next year, meaning there will be limited supplies of devices older than the current generation of Raspberry Pi 4, Raspberry Pi 400, or Compute Module 4.

“In allocating our limited stocks of 40nm silicon, we will prioritise Compute Module 3, Compute Module 3+, and Raspberry Pi 3B, and deprioritise Raspberry Pi 3B+ … Our guidance to industrial and embedded users of Raspberry Pi 3B+ who wish to optimise availability in 2022 is to begin migrating your designs to the 1GB variant of Raspberry Pi 4,” Upton said.

The biz expects to be able to make enough systems using 28nm silicon – namely the Raspberry Pi 4 and Compute Module 4 – over the next 12 months to hold their price… bar the aforementioned 2GB model.

“These changes in pricing are not here to stay. As global supply chain issues moderate, we’ll keep revisiting this issue, and we want to get pricing back to where it was as fast as we can,” Upton concluded. ®

Source link

Continue Reading

Technology

Irish fintech Swoop secures £2.5m from major UK bank firm’s bailout fund

Voice Of EU

Published

on

UK headquartered Swoop was one of three finance companies to have received funding from RBS, which has previously given the start-up £5m in 2019.

Irish start-up Swoop Finance has received £2.5m from a fund established by banking giant RBS.

In 2019, it was awarded £5m by the banking firm, which accepted a £45bn bailout from the UK government at the height of the financial crisis in 2018. The bailout programme came with the condition that RBS would set up a £775m fund to boost competition in the region’s finance sector.

Swoop is one of three companies to have benefitted from that fund, with the others being UK finance companies Codat and Cashplus. The three start-ups will receive a combined £12.5m in grants from RBS.

Codat and Cashplus will both receive £5m from the fund.

Swoop was founded in 2017 by former KPMG chartered accountant and corporate financier Andrea Reynolds along with Ciarán Burke. Reynolds spoke at Silicon Republic’s Future Human event last year about the process of launching Swoop. She said she founded it after she spotted a gap in the market for a virtual “finance buddy” aimed at SMEs seeking financial advisers and lenders.

Today, Swoop is headquartered in the UK and it employs around 60 people. It recently launched in Canada, adding to its existing locations in Dublin, London and Sydney.

The fintech’s backers include Enterprise Ireland and Velocity. It has raised around €1.6m so far. Speaking last year, Reynolds said the pandemic’s digitisation of the finance industry – and most other industries – had benefitted the company.

She added that the ongoing changes in the industry would hopefully “democratise finance” and “open up opportunities” to companies seeking funding no matter where they are located.

“The future is that you won’t need to know who the lender is,” Reynolds said.

“All decisions will be made through your data and you’ll get those decisions instantly. So you could have a lender in Barcelona lending to a business in Ballyjamesduff, for example. It won’t matter where you are. It’s what your profile is and does it match to their algorithm.

“This means it’ll open up opportunities. It’ll democratise finance further because businesses, regardless of where they’re located, will not be disadvantaged. Everybody will have this at their fingertips,” she added.

Reynolds said she had seen “a 30pc increase in businesses moving online” during the Covid-19 pandemic.

Swoop also recently announced its partnership with UK automated cashflow and credit management company Itsettled.

Don’t miss out on the knowledge you need to succeed. Sign up for the Daily Brief, Silicon Republic’s digest of need-to-know sci-tech news.

Source link

Continue Reading

Trending

Subscribe To Our Newsletter

Join our mailing list to receive the latest news and updates 
directly on your inbox.

You have Successfully Subscribed!