32 stories

WebAssembly from Scratch: From FizzBuzz to DooM

1 Share

Page 2

You can’t perform that action at this time.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.

Read the whole story
7 days ago
Share this story

Automatic Remediation of Kubernetes Nodes

1 Comment
Automatic Remediation of Kubernetes Nodes
Automatic Remediation of Kubernetes Nodes

We use Kubernetes to run many of the diverse services that help us control Cloudflare’s edge. We have five geographically diverse clusters, with hundreds of nodes in our largest cluster. These clusters are self-managed on bare-metal machines which gives us a good amount of power and flexibility in the software and integrations with Kubernetes. However, it also means we don’t have a cloud provider to rely on for virtualizing or managing the nodes. This distinction becomes even more prominent when considering all the different reasons that nodes degrade. With self-managed bare-metal machines, the list of reasons that cause a node to become unhealthy include:

  • Hardware failures
  • Kernel-level software failures
  • Kubernetes cluster-level software failures
  • Degraded network communication
  • Software updates are required
  • Resource exhaustion1
Automatic Remediation of Kubernetes Nodes

Unhappy Nodes

We have plenty of examples of failures in the aforementioned categories, but one example has been particularly tedious to deal with. It starts with the following log line from the kernel:

unregister_netdevice: waiting for lo to become free. Usage count = 1

The issue is further observed with the number of network interfaces on the node owned by the Container Network Interface (CNI) plugin getting out of proportion with the number of running pods:

$ ip link | grep cali | wc -l

This is unexpected as it shouldn't exceed the maximum number of pods allowed on a node (we use the default limit of 110). While this issue is interesting and perhaps worthy of a whole separate blog, the short of it is that the Linux network interfaces owned by the CNI are not getting cleaned up after a pod terminates.

Some history on this can be read in a Docker GitHub issue. We found this seems to plague nodes with a longer uptime, and after rebooting the node it would be fine for about a month. However, with a significant number of nodes, this was happening multiple times per day. Each instance would need rebooting, which means going through our worker reboot procedure which looked like this:

  1. Cordon off the affected node to prevent new workloads from scheduling on it.
  2. Collect any diagnostic information for later investigation.
  3. Drain the node of current workloads.
  4. Reboot and wait for the node to come back.
  5. Verify the node is healthy.
  6. Re-enable scheduling of new workloads to the node.

While solving the underlying issue would be ideal, we needed a mitigation to avoid toil in the meantime — an automated node remediation process.

Existing Detection and Remediation Solutions

While not complicated, the manual remediation process outlined previously became tedious and distracting, as we had to reboot nodes multiple times a day. Some manual intervention is unavoidable, but for cases matching the following, we wanted automation:

  • Generic worker nodes
  • Software issues confined to a given node
  • Already researched and diagnosed issues

Limiting automatic remediation to generic worker nodes is important as there are other node types in our clusters where more care is required. For example, for control-plane nodes the process has to be augmented to check etcd cluster health and ensure proper redundancy for components servicing the Kubernetes API. We are also going to limit the problem space to known software issues confined to a node where we expect automatic remediation to be the right answer (as in our ballooning network interface problem). With that in mind, we took a look at existing solutions that we could use.

Node Problem Detector

Node problem detector is a daemon that runs on each node that detects problems and reports them to the Kubernetes API. It has a pluggable problem daemon system such that one can add their own logic for detecting issues with a node. Node problems are distinguished between temporary and permanent problems, with the latter being persisted as status conditions on the Kubernetes node resources.2

Draino and Cluster-Autoscaler

Draino as its name implies, drains nodes but does so based on Kubernetes node conditions. It is meant to be used with cluster-autoscaler which then can add or remove nodes via the cluster plugins to scale node groups.


Kured is a daemon that looks at the presence of a file on the node to initiate a drain, reboot and uncordon of the given node. It uses a locking mechanism via the Kubernetes API to ensure only a single node is acted upon at a time.


The Kubernetes cluster-lifecycle SIG has been working on the cluster-api project to enable declaratively defining clusters to simplify provisioning, upgrading, and operating multiple Kubernetes clusters. It has a concept of machine resources which back Kubernetes node resources and furthermore has a concept of machine health checks. Machine health checks use node conditions to determine unhealthy nodes and then the cluster-api provider is then delegated to replace that machine via create and delete operations.

Proof of Concept

Interestingly, with all the above except for Kured, there is a theme of pluggable components centered around Kubernetes node conditions. We wanted to see if we could build a proof of concept using the existing theme and solutions. For the existing solutions, draino with cluster-autoscaler didn’t make sense in a non-cloud environment like our bare-metal set up. The cluster-api health checks are interesting, however they require a more complete investment into the cluster-api project to really make sense. That left us with node-problem-detector and kured. Deploying node-problem-detector was simple, and we ended up testing a custom-plugin-monitor like the following:

apiVersion: v1
kind: ConfigMap
  name: node-problem-detector-config
  check_calico_interfaces.sh: |
    set -euo pipefail
    count=$(nsenter -n/proc/1/ns/net ip link | grep cali | wc -l)
    if (( $count > 150 )); then
      echo "Too many calico interfaces ($count)"
      exit 1
      exit 0
  cali-monitor.json: |
      "plugin": "custom",
      "pluginConfig": {
        "invoke_interval": "30s",
        "timeout": "5s",
        "max_output_length": 80,
        "concurrency": 3,
        "enable_message_change_based_condition_update": false
      "source": "calico-custom-plugin-monitor",
      "metricsReporting": false,
      "conditions": [
          "type": "NPDCalicoUnhealthy",
          "reason": "CalicoInterfaceCountOkay",
          "message": "Normal amount of interfaces"
      "rules": [
          "type": "permanent",
          "condition": "NPDCalicoUnhealthy",
          "reason": "TooManyCalicoInterfaces",
          "path": "/bin/bash",
          "args": [
          "timeout": "3s"

Testing showed that when the condition became true, a condition would be updated on the associated Kubernetes node like so:

kubectl get node -o json worker1a | jq '.status.conditions[] | select(.type | test("^NPD"))'
  "lastHeartbeatTime": "2020-03-20T17:05:17Z",
  "lastTransitionTime": "2020-03-20T17:05:16Z",
  "message": "Too many calico interfaces (154)",
  "reason": "TooManyCalicoInterfaces",
  "status": "True",
  "type": "NPDCalicoUnhealthy"

With that in place, the actual remediation needed to happen. Kured seemed to do most everything we needed, except that it was looking at a file instead of Kubernetes node conditions. We hacked together a patch to change that and tested it successfully end to end — we had a working proof of concept!

Revisiting Problem Detection

While the above worked, we found that node-problem-detector was unwieldy because we were replicating our existing monitoring into shell scripts and node-problem-detector configuration. A 2017 blog post describes Cloudflare’s monitoring stack, although some things have changed since then. What hasn’t changed is our extensive usage of Prometheus and Alertmanager.

For the network interface issue and other issues we wanted to address, we already had the necessary exported metrics and alerting to go with them. Here is what our already existing alert looked like3:

- alert: CalicoTooManyInterfaces
  expr: sum(node_network_info{device=~"cali.*"}) by (node) >= 200
  for: 1h
    priority: "5"
    notify: chat-sre-core chat-k8s

Note that we use a “notify” label to drive the routing logic in Alertmanager. However, that got us asking, could we just route this to a Kubernetes node condition instead?

Introducing Sciuro

Automatic Remediation of Kubernetes Nodes

Sciuro is our open-source replacement of node-problem-detector that has one simple job: synchronize Kubernetes node conditions with currently firing alerts in Alertmanager. Node problems can be defined with a more holistic view and using already existing exporters such as node exporter, cadvisor or mtail. It also doesn’t run on affected nodes which allows us to rely on out-of-band remediation techniques. Here is a high level diagram of how Sciuro works:

Automatic Remediation of Kubernetes Nodes

Starting from the top, nodes are scraped by Prometheus, which collects those metrics and fires relevant alerts to Alertmanager. Sciuro polls Alertmanager for alerts with a matching receiver, matches them with a corresponding node resource in the Kubernetes API and updates that node’s conditions accordingly.

In more detail, we can start by defining an alert in Prometheus like the following:

- alert: CalicoTooManyInterfacesEarly
  expr: sum(node_network_info{device=~"cali.*"}) by (node) >= 150
    priority: "6"
    notify: node-condition-k8s

Note the two differences from the previous alert. First, we use a new name with a more sensitive trigger. The idea is that we want automatic node remediation to try fixing the node first as soon as possible, but if the problem worsens or automatic remediation is failing, humans will still get notified to act. The second difference is that instead of notifying chat rooms, we route to a target called “node-condition-k8s”.

Sciuro then comes into play, polling the Altertmanager API for alerts matching the “node-condition-k8s” receiver. The following shows the equivalent using amtool:

$ amtool alert query -r node-condition-k8s
Alertname                 	Starts At            	Summary                                                               	 
CalicoTooManyInterfacesEarly  2021-05-11 03:25:21 UTC  Kubernetes node worker1a has too many Calico interfaces  

We can also check the labels for this alert:

$ amtool alert query -r node-condition-k8s -o json | jq '.[] | .labels'
  "alertname": "CalicoTooManyInterfacesEarly",
  "cluster": "a.k8s",
  "instance": "worker1a",
  "node": "worker1a",
  "notify": "node-condition-k8s",
  "priority": "6",
  "prometheus": "k8s-a"

Note the node and instance labels which Sciuro will use for matching with the corresponding Kubernetes node. Sciuro uses the excellent controller-runtime to keep track of and update node sources in the Kubernetes API. We can observe the updated node condition on the status field via kubectl:

$ kubectl get node worker1a -o json | jq '.status.conditions[] | select(.type | test("^AlertManager"))'
  "lastHeartbeatTime": "2021-05-11T03:31:20Z",
  "lastTransitionTime": "2021-05-11T03:26:53Z",
  "message": "[P6] Kubernetes node worker1a has too many Calico interfaces",
  "reason": "AlertIsFiring",
  "status": "True",
  "type": "AlertManager_CalicoTooManyInterfacesEarly"

One important note is Sciuro added the AlertManager_ prefix to the node condition type to prevent conflicts with other node condition types. For example, DiskPressure, a kubelet managed condition, could also be an alert name. Sciuro will also properly update heartbeat and transition times to reflect when it first saw the alert and its last update. With node conditions synchronized by Sciuro, remediation can take place via one of the existing tools. As mentioned previously we are using a modified version of Kured for now.

We’re happy to announce that we’ve open sourced Sciuro, and it can be found on GitHub where you can read the code, find the deployment instructions, or open a Pull Request for changes.

Managing Node Uptime

While we began using automatic node remediation for obvious problems, we’ve expanded its purpose to additionally keep node uptime low. Low node uptime is desirable to further reduce drift on nodes, keep the node initialization process well-oiled, and encourage the best deployment practices on the Kubernetes clusters. To expand on the last point, services that are deployed with best practices and in a high availability fashion should see negligible impact when a single node leaves the cluster. However, services that are not deployed with best practices will most likely have problems especially if they rely on singleton pods. By draining nodes more frequently, it introduces regular chaos that encourages best practices. To enable this with automatic node remediation the following alert was defined:

- alert: WorkerUptimeTooHigh
  expr: |
              max by(node) (kube_node_role{role="worker"})
            - on(node) group_left()
              (max by(node) (kube_node_role{role!="worker"}))
          or on(node)
            max by(node) (kube_node_role{role="worker"})
        ) == 1
    * on(node) group_left()
        (time() - node_boot_time_seconds) > (60 * 60 * 24 * 7)
    priority: "9"
    notify: node-condition-k8s

There is a bit of juggling with the kube_node_roles metric in the above to isolate the alert to generic worker nodes, but at a high level it looks at node_boot_time_seconds, a metric from prometheus node_exporter. Again the notify label is configured to send to node conditions which kicks off the automatic node remediation. One further detail is the priority here is set to “9” which is of lower precedence than our other alerts. Note that the message field of the node condition is prefixed with the alert priority in brackets. This allows the remediation process to take priority into account when choosing which node to remediate first, which is important because Kured uses a lock to act on a single node at a time.

Wrapping Up

In the past 30 days, we’ve used the above automatic node remediation process to action 571 nodes. That has saved our humans a considerable amount of time. We’ve also been able to reduce the time to repair for some issues as automatic remediation can act at all times of the day and with a faster response time.

As mentioned before, we’re open sourcing Sciuro and its code can be found on GitHub. We’re open to issues, suggestions, and pull requests. We do have some ideas for future improvements. For Sciuro, we may look to reduce latency which is mainly due to polling and potentially add a push model from Altermanager although this isn’t a need we’ve had yet.  For the larger node remediation story, we hope to do an overhaul of the remediating component. As mentioned previously, we are currently using a fork of kured, but a future replacement component should include the following:

  • Use out-of-band management interfaces to be able to shut down and power on nodes without a functional operating system.
  • Move from decentralized architecture to a centralized one that can integrate more complicated logic. This might include being able to act on entire failure domains in parallel.
  • Handle specialized nodes such as masters or storage nodes.

Finally, we’re looking for more people passionate about Kubernetes to join our team. Come help us push Kubernetes to the next level to serve Cloudflare’s many needs!

1Exhaustion can be applied to hardware resources, kernel resources, or logical resources like the amount of logging being produced.
2Nearly all Kubernetes objects have spec and status fields. The status field is used to describe the current state of an object. For node resources, typically the kubelet manages a conditions field under the status field for reporting things like if the node is ready for servicing pods.
3The format of the following alert is documented on Prometheus Alerting Rules.

Read the whole story
8 days ago
neat: custom k8s-native health check
Share this story

Useful Front-End Boilerplates And Starter Kits


Today, we’re shining the spotlight on boilerplates and starter kits for all kinds of projects, from static site templates and React/Vue starter kits to favicon and accessibility templates and emergency site templates. This collection is by no means complete, but rather a selection of things that the team at Smashing found useful and hope will make your day-to-day work more productive and efficient.

If you’re interested in more tools like these ones, please do take a look at our lovely email newsletter, so you can get tips like these drop right into your inbox!

Table of Contents

Below you’ll find quick jumps to specific boilerplates and guides you may need. Scroll down for a general overview or skip the table of contents.

Accessibility Boilerplate

To prevent accessibility from becoming an afterthought, it is a good idea to already lay the foundation when beginning a web project. So if you’re looking for a boilerplate solution to kickstart your project responsibly, the Accessibility Boilerplate is for you.

The template uses plain old semantic markup to correctly structure your content, making it easily accessible by search engines and assistive technologies. The HTML5 syntax is further enhanced with ARIA roles and Microdata. An oldie but goodie.

In case you need a little bit of help with WAI-ARIA, this collection of accessible snippets will sure come in handy. The snippets include all WAI-ARIA attributes and descriptions to help make your content more accessible. To help you resolve existing accessibility errors, Jacob Lett put together a collection of snippets for reducing redundant title text, dealing with empty links used for JavaScript behavior, and bringing meaning to visual elements like icons.

ASP.NET Boilerplate

The ASP.NET Boilerplate uses familiar tools to provide you with a solid developer experience when building modern web applications. Based on domain-driven design, it provides a strong infrastructure and development model for modularity, multi-tenancy, caching, background jobs, data filters, setting management, domain events, unit and integration testing, and everything else you’ll need to have more time to focus on your business code. Startup templates help you get started — either with an Angular single-page application or classic MVC & jQuery architecture.

Another fully-fledged boilerplate is the ASP.NET Core Hero Boilerplate. It enables you to run a single line of CLI on your Console to get a complete implementation. The template includes both WebAPI and MVC. A perfect starting point to learn about various essential packages and architecture.

Browser Extensions Boilerplate

Do you plan to build a browser extension? The Browser Extension Webpack Boilerplate has got your back. Designed for creating WebExtensions API-based browser extensions using Webpack, the extensions are, in theory, compatible with Chrome, Chromium, Firefox, Firefox for Android, Opera, and Microsoft Edge. Actual compatibility will depend on the APIs you used.

Modern Cross-Browser Extensions Boilerplate

Things that seem trivial in the web development world can turn out to be surprisingly hard in a web extension context. Especially when it comes to cross-browser extensions. To give you the experience you know from building cross-browser web apps when developing cross-browser web extensions, Cezar Augusto built extension-create.

extension-create helps you develop cross-browser extensions with built-in support for module import/exports, auto-reload, and more. There’s no build configuration necessary: To create an extension, a new browser instance (for now, Chrome) will open up, and you’re ready to dive right in. Each command and major feature works as a standalone module which is particularly useful if you have your extension setup but want to benefit from specific features, such as the browser launcher with default auto-reload.

Modern CSS Resets And Their Alternatives

With CSS browser compatibility issues being much less likely today, CSS resets have mostly become redundant. However, there are instances when a modern CSS reset might still make sense. Box sizing, body styles, links, fluid image styles, fonts, and a @media query for reduced motion, these are things you might want to reset, as Andy Bell shows. A modern reset of sensible defaults, so to say.

Another modern alternative to CSS resets is Normalize.css. It normalizes styles for a wide range of elements, corrects bugs and browser inconsistencies, improves usability with subtle modifications, and it uses detailed comments to explain what code does.

CSS Boilerplates And Snippets

Are you embarking on a smaller project or do you feel that a larger framework is overkill for your needs? Barebones only styles a handful of standard HTML elements and CSS Grid, and, as it turns out, that’s often more than enough to get started. With its approximately 400 lines, the boilerplate is light as a feather, and there’s no compiling or installing necessary to get you started.

The CSS snippet collection by 30 seconds of code contains utilities and interactive examples for CSS3. Whether it’s custom checkboxes, menu overlays, or button animations, the collection has got you covered with useful snippets for layouts, styling and animating elements, and handling user interactions. CodeMyUI also features a collection of pure CSS code snippets for user interfaces — some of them with quite fancy effects.

Color Themes For Your Dev Environment

Have you ever wished for a streamlined color theme across your entire development environment? One that you feel is pleasant for the eyes and that stays the same when you switch from your code editor to the terminal across to Slack? themer helps you achieve just that.

themer takes a set of colors and generates themes for your development environment based on them. You can either start with a pre-built color set or create one from scratch by entering two main shades for background color and foreground text and accent colors for syntax highlighting, errors, warnings, and success messages. Once you’re happy with the result, you can download the themes you want to generate from the palette — different terminals and text editors are supported, just like Slack, Alfred, Chrome, Prism, and other tools and services. To make the color coordination complete, there are matching wallpapers based on your theme, too.

Bootstrap Your Dotfiles

Dotbot helps you install dotfiles with just one short command, even on a freshly installed system. It is designed to be lightweight and self-contained (no external dependencies or installation required) and can be used as a replacement for any other tool you were using to manage your dotfiles. Dotbot uses YAML or JSON-formatted configuration files to let you specify how you set your dotfiles and it knows how to link files and folders, create folders, execute shell commands, and clean directories of broken symbolic links. User plugins are supported for custom commands.

If you want to dive deeper into dotfiles, the Awesome Dotfiles list features helpful articles and tutorials, as well as example dotfile repos and frameworks, tools, and more.

Electron Boilerplate

A minimalist boilerplate application for Electron runtime comes from Jakub Szwacz. To provide you with an easy-to-understand base that you can build upon, it only includes the bare minimum of tooling and dependencies that are needed for a fully-functional Electron environment. The boilerplate doesn’t impose any front-end technologies on you, so you are free to pick your favorite.

Emergency Site Kit

In case of emergency, many organizations need a quick way to publish critical information. However, existing CMS websites are often unable of handling sudden traffic spikes and, depending on the kind of emergency, the local network infrastructure might even be damaged, leaving people with poor mobile connections out. Max Boeck’s Emergency Site Kit is here to provide people with the information they need in such cases, no matter the circumstances.

The kit helps you quickly publish a simple website that is fast, accessible, and that can withstand large amounts of traffic. Built on the rule of least power, it uses simple technologies to ensure maximum resilience: The static files are optimized for first roundtrip, there’s only basic styling and one critical request, and service workers ensure offline support. One for the bookmarks.

How To Favicon In 2021

Sometimes, it’s a good idea to re-evaluate best practices. When it comes to favicons, for example — particularly given the fact that front-end developers have to deal with more than 20 static PNG files to display a simple favicon these days. To make the process more straightforward, Andrey Sitnik came up with a smarter solution that requires just five icons and one JSON file to fit most modern needs.

Inspired by Andrey’s approach, Chris Coyier went even a step further and went ultra-minimalist for the CSS Tricks favicon. He explains how it works in his post “How to Favicon in 2021”. An SVG concept to get your favicons ready for dark mode is also included.

A Boilerplate For Forms

Let’s be honest, forms can be a pain. Luckily, there’s a little HTML and CSS boilerplate to change that: Boilerform. Providing baseline BEM-structured CSS and appropriate attributes on elements, the little boilerplate gives your forms a head start.

Designed to be straightforward to implement, you can, in its most basic form, drop a CSS file into your head with a short snippet and wrap your elements in a boilerform wrapper. To give you more control, there’s also a Sass and Patterns Library to work with. Whether it’s a contact form, card payment, or user signup, Boilerform has got you covered.

All-In-One Front-End Boilerplates

The Modern Front-End Development Boilerplate is an all-in-one starter kit to develop, build, and deploy your next web project. Features include multiple front-end SCSS frameworks, an easy-to-manage folder structure, a centralized place to manage project-related settings like images, fonts, and JavaScript, hassle-free font-face generation, an integrated backup feature, and much more.

Another modern front-end boilerplate comes from the team at digital product studio tonik: the HTML Frontend Boilerplate is a modern solution for building fast, organized, and standardized web apps and sites.

GitHub Template Guidelines

No matter if it’s a private repository you share with your team or an open-source tool intended for the community: the first thing people usually see of your project is the Readme on GitHub. But what goes into the Readme that actually provides value to the user? Cezar Augusto put together guidelines for building GitHub templates. Handy!

Create .gitignore Files For Your Git Repositories

Another little detail that can be automated to save you some precious time are .gitignore files. gitignore.io does exactly that. The site has a graphical and a command line method of creating .gitignore files for your operating system, programming language, or IDE.

You can either enter the system and language you want to ignore directly on the site or copy the snippet that fits your shell from the documents to create an alias and, finally, the .gitignore file with the help of the command line.

Hackathon Starter

If you have ever attended a hackathon, you know how much time it takes to get a project started: Once you’ve decided what to build, you need to pick a programming language, a web framework, a CSS framework, and you need to set up an initial project that team members can contribute to.

Hackathon Starter is here to help you set the base for your Node.js web applications so that you can focus on what really matters: the hackathon project itself. The boilerplate features local authentication with email and password, authentication via Twitter, Facebook, Google, GitHub, LinkedIn, and Instagram, flash notifications, MVC project structure, account management, API examples, and much more to help you get started.

HTML Boilerplate Explained

How do you start a new project? Do you copy the HTML structure of the last site you built or maybe a boilerplate from HTML5 Boilerplate? Manuel Matuzović usually does the same, but recently, he encountered a situation where copying and pasting wasn’t an option: To document the structure he and his team are using at work, he had to understand the choices that have been made.

The task took up quite some time to research, so Manuel published the boilerplate on his blog for everyone to use, along with detailed explanations for each line of code so that you know exactly what you’re dealing with. A great opportunity to dive deeper into the underlying structure of a page.

Mobile-First Boilerplates

Do you need a lightweight, mobile-first boilerplate that includes only the essentials? Then Kraken might be for you. Kraken is not supposed to be a finished product but rather a starting point that you can adapt to any project. The base structure is a fully-fluid, single column layout, and an object-oriented approach to CSS lets you mix, match, and reuse classes throughout a project.

Another great little helper if you feel you don’t need all the utility of larger frameworks is Skeleton. It only styles a handful of standard HTML elements and includes a grid. The boilerplate gets by with only 400 lines and there’s no installation and zero compiling necessary to get started.

HTML5 Boilerplate

One of the most popular (if not the most popular) boilerplate to help you build fast, robust, and adaptable web apps or sites, is HTML5 Boilerplate. It bundles up the combined knowledge and effort of 100s of developers in one little package.

What’s in it? A lean, mobile-friendly HTML template, with optimized Google Analytics snippet, a placeholder touch device icon, and docs with extra tips and tricks. The boilerplate also includes Normalize.css, a modern, HTML5-ready alternative to CSS resets, and further base styles, helpers, media queries, and print styles. Perfect to give your project a head-start.

An alternative worth looking into is Igor Agapov’s Modern HTML Starter Template which was built with a focus on performance.

Boilerplates For Responsive HTML Emails

We all know about the challenges that come with formatting HTML emails. A handy boilerplate for sending out nicely formatted messages while avoiding some of the major pitfalls comes from Sean Powell: HTML Email Boilerplate.

Sean’s template is available in two versions — with and without comments — and consists of a header with global styles and a body section with more specific fixes and guidance to use where needed in your design. Whether you want to create your own template based on the snippets or cherry-pick the ones that fix your specific rendering issues, the boilerplate has got you covered.

Another email boilerplate worth looking into is Mark Robbins’ Good Email Code template, a simple stripped-back template that you can use for every email you send. If you’re interested in learning why each part of the code is where it is, Mark breaks it down in more detail.

Developed to help you build responsive HTML emails with confidence, the Email Framework provides you with pre-built grid options for responsive/fluid and hybrid layouts as well as with common components. The framework supports over 60 email clients and has been thoroughly tested using Litmus.

Last but not least, for those occasions when all you need is a simple responsive HTML template with a clear call-to-action button, you might also want to check out Lee Munroe’s template. It’s tested on all major email clients, on mobile, desktop, and web. Happy emailing!

A Complete Guide To HTML <head>

The head of a web page can get quite full, especially in large pages. But what do you actually need? And how to organize the head to prevent implications on performance? Josh Buchea put together a handy guide that dives deep into HTML <head> elements.

The guide covers everything from the recommended minimum and including elements for how a document should be rendered to links and references, favicons, social media, just like browser-depended information for things like smart app banners or “add to homescreen” features. A nice bonus: The guide is available in 11 languages. One for the bookmarks.

PHP Boilerplates

If you’re looking for a simple yet powerful PHP framework with a very small footprint, CodeIgniter has got you covered. CodeIgniter encourages MVC without forcing it on you, it has exceptional performance, and comes with built-in protection against CSRF and CSS attacks. There’s nearly zero configuration required to get you up and running.

A PHP framework that was also built with simplicity, performance, and security in mind is the PHP Microsite Boilerplate. As the name implies, it is perfect for building a rather small website without complex code structure. Key features include easy routing, intelligent serviceworker cache, and it’s SEO-optimized, and prepared for Accelerated Mobile Pages as well as for Progressive Web Apps.

Create Projects From Cookiecutters

A command-line utility that creates projects from cookiecutters (i.e, project templates)? Cookiecutter does just that. It takes a source directory tree, copies it into your new project, and replaces all the names that are surrounded by templating tags {{ and }} with names it finds in cookiecutter.json. These can be file names, directory names, and strings inside files. This enables you to bootstrap a new project from a standard form, skipping all the mistakes that are often involved when starting a new project. Project templates can be in any language or markup format and you can use both local cookiecutters or remote ones from Git or Mercurial repos.

Quick Snippets

Sometimes you come across a small tip that turns out to be true gold: Maybe it’s a solution to a problem you’ve been tinkering with for some time or a short code snippet that makes your workflow a lot more efficient. The site QuickSnippets collects little nuggets like these.

Currently, the collection features almost 1,300 snippets by 296 authors to help you in your everyday work. The snippets cover everything from browsers, tools, and editors to CSS, HTML, JavaScript, Laravel, PHP, React, UI/UX, and Vue.js. A treasure chest just waiting to be opened.

React Boilerplates

When it comes to React, there are several community-created boilerplates out there that are bound to save you time. One of them is the React Boilerplate. The highly-scalable, offline-first foundation was created with a focus on performance, best practices, and developer experience and shines with features such as quick scaffolding, instant feedback, predictable change management, and internationalization support, among other things.

Another boilerplate worth looking into comes from the team at Infinite Red: Ignite is the culmination of five years of constant React Native development and was created for both Expo and bare React Native. It comes with a CLI, component/model generators, and more.

The Electron React Boilerplate is another great foundation for scalable cross-platform apps. Fast iteration, incremental typing, and code optimization and minification out of the box are the three pillars it’s built upon.

The React Starterkit by Konstantin Tarkus is a front-end starter kit using React, Relay, GraphQL, and JAMstack architecture. It’s optimized for serverless deployment to CDN edge locations and comes pre-configured with CSS-in-JS styling, code quality tools like ESLint, Prettier, TypeScript, and Jest, as well as VSCode snippets and settings to make your workflow more efficient.

Speaking of VS Code: The React + Redux Snippets extension makes sure you always have the snippets you need available in your editor. It’s designed taking maximum advantage of code completion — perfect for power users.

Last but not least, if you want to use the best of all worlds to create your own, unique React boilerplate, Leonardo Maldonado’s tutorial is for you. He takes you step by step through building your own boilerplate from scratch with the main dependencies used in the React community today.

A Snippet For Loading Responsive WebP Images

It has always been complicated to load images in the best sizes and formats, and with new image formats like WebP and AVIF gaining popularity, things don’t get any easier. If you want to ship WebP already today, you’ll need a loading strategy that also providess a fallback for browsers that don’t support the new format yet. Stefan Judis shows how to do it.

Stefan’s solution for loading a responsive WebP image uses the picture element, and even though it it involves quite a lot of lines of code, it’s worth it as the snippet not only loads the image in the best format but also the best sizes. One for the bookmarks.

Also, in case you missed it, Stefan has started publishing his Web Weekly newsletter this year. Every Monday, you’ll find a colorful mix of resources all around frontend, productivity and web development learnings paired with handy tools and GitHub projects in your inbox. Stefan’s goal: make it the best email to start your week.

SaaS Boilerplate

User authentication, cookie sessions, subscription payments, billing management, team management, GraphQL API, transactional emails — when you’ve built a SaaS product before, you know how much time it takes to make all the different tools involved play well together to offer the functionality you need. To change that, Max Stoiber created Bedrock.

The modern full-stack Next.js and GraphQL boilerplate combines the best tools the JavaScript ecosystem has to offer into one solid foundation for your SaaS product. No need to master all of the technologies involved, if you know Next.js and GraphQL, you can start coding almost immediately.

Static Site Boilerplate

Automated build processes, a local development server, production minification and optimizations, and the latest standards for static websites. Eric Alli’s Static Site Boilerplate uses the latest tech to make the process of building static websites more straightforward.

The built-in development server will get you up and running in seconds, your HTML, styles, and scripts will be automatically linted, changes to files are monitored in real time, images are compressed for your production build, and sitemap.xml and robots.txt files are automatically included with your production build. A real timesaver.

Style Guide Boilerplates

What do you need to consider when building a style guide that, well, works? Brad Frost’s Styleguide Guide takes you step by step through each and every section and what goes into it — from the homepage to guidelines, styles, components, utilities, page templates, downloads, and even support, and contributions. A very complete overview.

Brett Jankord’s Style Guide Boilerplate is a great starting point for crafting living styleguides. You can create a directory for it on your site to see how your live site’s CSS affects the base elements and start customizing the patterns and modules to your liking.

Typographic Starter Kit

Do you need a little bit of help with typography? Not in terms of aesthetic design choices, but regarding markup? The typographic starter kit Typeplate has got your back. It defines proper markup with extensible styling for common typographic patterns.

Typeplate is available as a stripped-down Sass or CSS library of your choosing (including Bower and CDNJS) and is primarily concerned with technically implementing design patterns, not their looks: from typographic scale and word-wrap to indenting, hyphenation, small and drop capitals, small print, code blocks, quotes, footnotes, lists, and more.

VS Code Snippets To Streamline Your Workflow

Are you using VS Code? We came across some useful extensions that handle the React, Vue, and Angular snippets you might need to type frequently for you. For Vue, be sure to check out Sarah Drasner’s extension. It was built for real-world use and focuses on developer ergonomics instead of cataloguing API definitions.

Burke Holland provides you with a collection of essential React snippets and commands that he selected from his day-to-day React use. And if you’re looking for Angular snippets, John Papa has got you covered. His extension adds snippets for Angular for TypeScript and HTML to your VS Code setup.

Speaking of VS Code setup: Have you heard of the “VS Code Can Do That Workshop” already? From customizing the editor to using Git and source control, it features eight exercises to enhance your VS Code skills.

Vue Boilerplates

Do you plan to build a Progressive Web App with Vue.js? Vuesion has got your back. Described as the “most complete boilerplate for production-ready PWAs”, Vuesion focuses on performance, development speed, and best practices. The code is all yours, ready to be modified and build upon, so that you can implement the things you actually need, without being limited by the template itself.

If you’re looking for a solution to achieve a consistent user experience across your applications, CION might be for you. The design system utilizes design tokens, a living styleguide with integrated code playgrounds, and reusable components for common UI tasks. A great starting point that can be extended to your project’s needs.

To improve prototyping in Vue, there’s the prototyping tool OverVue. It allows developers to dynamically create and visualize a Vue application, implementing a real-time intuitive tree display of component hierarchy and a live-generated code preview. The resulting boilerplate can be exported as a template for further development.

Have you ever tinkered with the idea of using Vue to power a blog? Ben Hong did, and created a dev environment to help you do the same. Optimized for blogging, the VuePress Blog Boilerplate includes default features like RSS feed generation, a list of recent posts, etc. The minimal setup and Markdown-centered project structure help you focus on writing, and, thanks to the Vue-powered engine, you can use Vue components in Markdown and develop your theme in Vue, too.

For handy Vue snippets, little tips, tricks, useful directives, and nice practices, be sure to also check out Vue Snippets collection. A small but mighty collection.

WordPress Plugin Boilerplate Generator

No one likes to repeat unnecessary tasks. That’s why WordPress developer Enrique Chavez built the WordPress Plugin Boilerplate Generator. Every time he started working on a new plugin, he found himself renaming file names, packages, subpackages. The generator automates the task.

All you need to do is type your plugin details in a short form containing plugin name, slug, uri, autor name, email, and uri, and the generator will generate a ZIP file for you with the correctly-named file structure. A great little timesaver.

WordPress Starter Theme

Do you plan to build your own WordPress theme? The starter theme Underscores helps you get started. It’s not meant to be used as a parent theme but as a stable base to kickstart your theme development adventures.

Underscores comes with only minimal CSS so that there’s less stuff getting in your way when building your own theme. It shines with lean, well-commented HTML5, a helpful 404 template, an optional sample custom header implementation, custom template tags that keep your code clean, a mobile-friendly dropdown, and some other nifty features.

Read the whole story
21 days ago
especially handy for tech that I do not use daily, such as PHP or electron
Share this story
1 public comment
42 days ago
Sharing is caring
San Luis Obispo, CA
39 days ago
This kind of thing is super hard on my Paralysis of Choice

How Facebook deals with PCIe faults to keep our data centers running reliably

1 Share

Peripheral component interconnect express (PCIe) hardware continues to push the boundaries of computing thanks to advances in transfer speeds, the number of available lanes for simultaneous data delivery, and a comparatively small footprint on motherboards. Today, PCIe connectivity-based hardware delivers faster data transfers and is one of the de facto methods to connect components to servers.

Our data centers contain millions of PCIe-based hardware components — including ASIC-based accelerators for video and inference, GPUs, NICs, and SSDs — connected either directly into a PCI slot on a server’s motherboard or through a PCIe switch like a carrier card.

As with any hardware, PCIe-based components are susceptible to different types of hardware-, firmware-, or software-related failures and performance degradation. The variety of components and vendors, array of failures, and the challenges of scale make monitoring, collecting data, and performing fault isolation for PCIe-based components challenging.

We’ve developed a solution to detect, diagnose, remediate, and repair these issues. Since we’ve implemented it, this methodology has helped make our hardware fleet more reliable, resilient, and performant. And we believe the wider industry can benefit from the same information, strategies, and help build industry standards around this common problem.

Our tools for addressing PCIe faults

First, let’s outline the tools we use:

  • PCIcrawler: An open source, Python-based command line interface tool that can be used to display, filter, and export information about PCI or PCIe buses and devices, including PCI topology and PCIe Advanced Error Reporting (AER) errors. This tool produces visually appealing, treelike outputs for easy debugging as well as machine parsable json output that can be consumed by tools for deployment at scale.
  • MachineChecker: An in-house tool for quickly evaluating the production worthiness of servers from a hardware standpoint. MachineChecker helps detect and diagnose hardware problems. It can be run as a command line input tool. It also lives as a library and a service.
  • An in-house tool for taking a snapshot of the target host’s hardware configuration along with hardware modeling.
  • An in-house utility service used to parse the custom dmesg and SELs to detect and report PCIe errors on millions of servers. This tool parses the logs on the server at regular intervals and records the rate of correctable errors on a file on the corresponding server. The rate is recorded per 10 minutes, per 30 minutes, per hour, per six hours, and per day. This rate is used to decide which servers have exceeded the configured tolerable PCIe-corrected error rate threshold depending on the platform and the service.
  • IPMI Tool: An open source utility for managing and configuring devices that support the Intelligent Platform Management Interface (IPMI). IPMI is an open standard for monitoring, logging, recovery, inventory, and control of hardware that is implemented independent of the main CPU, BIOS, and OS. It’s mainly used to manually extract System Event Logs (SELs) for inspection, debugging, and study.
  • The OpenBMC Project: A Linux distribution for embedded devices that have a baseboard management controller (BMC).
  • Facebook auto remediation (FBAR): A system and a set of daemons that execute code automatically in response to detected software and hardware signals on individual servers. Every day, without human intervention, FBAR takes faulty servers out of production and sends requests to our data center teams to perform physical hardware repairs, making isolated failures a nonissue.
  • Scuba: A fast, scalable, distributed, in-memory database built at Facebook. It is the data management system we use for most of our real-time analysis.

How we studied PCIe faults

The sheer variety of PCIe hardware components (ASICs, NICs, SSDs, etc.) makes studying PCIe issues a daunting task. These components can have different vendors, firmware versions, and different applications running on them. On top of this, the applications themselves might have different compute and storage needs, usage profiles, and tolerances.

By leveraging the tools listed above, we’ve been conducting studies to ameliorate these challenges and ascertain the root cause of PCIe hardware failures and performance degradation.

Some of the issues were obvious. PCIe fatal uncorrected errors, for example, are definitely bad, even if there is only one instance on a particular server. MachineChecker can detect this and mark the faulty hardware (ultimately leading to it being replaced).

Depending on the error conditions, uncorrectable errors are further classified into nonfatal errors and fatal errors. Nonfatal errors are ones that cause a particular transaction to be unreliable, but the PCIe link itself is fully functional. Fatal errors, on the other hand, cause the link to be unreliable. Based on our experience, we’ve found that for any uncorrected PCIe error, swapping the hardware component (and sometimes the motherboard) is the most effective action.

Other issues can seem innocuous at first. PCIe-corrected errors, for example, are correctable by definition and are mostly corrected well in practice. Correctable errors are supposed to pose no impact on the functionality of the interface. However, the rate at which correctable errors occur matters. And if the rate is beyond a particular threshold, it leads to a degradation in performance that is not acceptable for certain applications.

We conducted an in-depth study to correlate the performance degradation and system stalls to PCIe-corrected error rates. Determining the threshold is another challenge, since different platforms and different applications have different profiles and needs. We rolled out the PCIe Error Logging Service, observed the failures in the Scuba tables, and correlated events, system stalls, and PCIe faults to determine the thresholds for each platform. We’ve found that swapping hardware is the most effective solution when PCIe-corrected error rates cross a particular threshold.

PCIe defines two error-reporting paradigms: The baseline capability and the AER capability. The baseline capability is required of all PCIe components and provides a minimum defined set of error reporting requirements. The AER capability is implemented with a PCIe AER extended capability structure and provides more robust error reporting. The PCIe AER driver provides the infrastructure to support PCIe AER capability and we leveraged PCIcrawler to take advantage of this.

We recommend that every vendor adopt the PCIe AER functionality and PCIcrawler rather than relying on custom vendor tools, which lack generality. Custom tools are hard to parse and even harder to maintain. Moreover, integrating new vendors, new kernel versions, or new types of hardware requires a lot of time and effort.

Bad (down-negotiated) link speed (usually running at 1/2 or 1/4 of the expected speed) and bad (down-negotiated) link width (running at 1/2, 1/4, or even 1/8 of the expected link width) were other concerning PCIe faults. These faults can be difficult to detect without some sort of automated tool because the hardware is working, just not as optimally as it could.

Based on our study at scale, we found that most of these faults could be corrected by reseating hardware components. This is why we try this first before marking the hardware as faulty.

Since a small minority of these faults can be fixed by a reboot, we also record historical repair actions. We have special rules to identify repeat offenders. For example, if the same hardware component on the same server fails a predefined number of times in a predetermined time interval, after a predefined number of reseats, we automatically mark it as faulty and swap it out. In cases where the component swap does not fix it, we will have to resort to a motherboard swap.

We also keep an eye on the repair trend to identify nontypical failure rates. For example, in one case, by using data from custom Scuba tables and their illustrative graphs and timelines, we root-caused a down-negotiation issue to a specific firmware release from a specific vendor. We then worked with the vendor to roll out new firmware that fixed the issue.

It’s also important to rate-limit remediations and repairs as a safety net to prevent bugs in the code from mass draining and unprovisioning, which can result in service outages if not handled properly.

Using this overall methodology, we’ve been able to add hardware health coverage and fix several thousand servers and server components. Every week, we’ve been able to detect, diagnose, remediate, and repair various PCIe faults on hundreds of servers.

Our PCIe fault workflow

Here’s a step-by-step breakdown of our process for identifying and fixing PCIe faults:

  1. MachineChecker runs periodically as a service on the millions of hardware servers and switches in our production fleet. Some of the checks include PCIe link speed, PCIe link width, as well as PCIe-uncorrected and PCIe-corrected error rate checks.
  2. For a particular PCIe endpoint, we find its parent called upstream using PCIcrawler’s PCIe topology information. We consider both ends of a PCIe link.
  3. We leverage PCIcrawler’s output, which in turn depends on the generic registers LnkSta, LnkSta2, LnkCtl, and LnkCtl2.
  4. We calculate expected speed as:
    expected_speed = min (upstream_target_speed, endpoint_capable_speed, upstream_capable_speed).
  5. We calculate current_speed as:
    current_speed = min (endpoint_current_speed, upstream_current_speed).
  6. current_speed must be equal to expected_speed.
    In other words, we should have the current speed of either end be equal to the minimum of the capable speeds, upstream capable, downstream capable, and upstream target speed.
  7. For PCIe link width, we calculate expected_width as:
    expected_width = min(pcie_upstream_device capable_width, pcie_endpoint_device capable width).
  8. If the expected_width is less than the current width of the upstream, we flag this as a bad link.
  9. The PCIe Error Logging Service independently runs on our hardware servers and independently records the rate of corrected and uncorrectable errors and their rates in a predetermined format (json).
  10. MachineChecker checks for uncorrected errors. Even a single uncorrected error event qualifies a server as faulty.
  11. During its periodic run, MachineChecker also looks up the generated files on the servers and checks them against a prerecorded source of truth in Configerator (our configuration management system) for a threshold per platform. If the rate exceeds a preset threshold, the hardware is marked as faulty. These thresholds are easily adjustable per platform.
  12. We also leverage PCIcrawler, which is also preinstalled on all our hardware servers, to check for PCIe AER issues.
  13. We leverage our in-house tool’s knowledge of hardware configuration to associate a PCIe address to a given hardware part.
  14. MachineChecker uses PCIcrawler (for link width, link speed, and AER information) and the PCIe Error Parsing Service (which in turn uses SEL and dmesg) to identify hardware issues and create alerts or alarms. MachineChecker leverages information from our in-house tool to identify the hardware components associated with the PCIe addresses and assists data center operators (who may need to swap out the hardware) by supplying additional information, such as the component’s location, model information, and vendor name.
  15. Application production engineers can subscribe to these alerts or alarms and customize workflows for monitoring, alerting, remediation, and custom repair.
  16. A subset of all the alerts can undergo a particular remediation. We can also fine-tune the remediation and add special casing, restricting the remediation to, for example, a firmware upgrade if a particular case is well known.
  17. If the remediation fails sufficiently, a hardware repair ticket is automatically created so that the data center operators can swap the bad hardware component or server with a tested good one.
  18. We have rate limiting in several places as a safety net to prevent bugs in the code or mass draining and unprovisioning, which can result in service outages if not handled properly.

We’ve added hardware health coverage and fixed several thousand servers and server components with this methodology. We continue to detect, diagnose, remediate, and repair hundreds of servers every week with various PCIe faults. This has made our hardware fleet more reliable, resilient, and performant.

We’d like to thank Aaron Miller, Aleksander Książek, Chris Chen, Deomid Ryabkov, Wren Turkal and many others who contributed to this work in different aspects.

Read the whole story
25 days ago
Share this story

heise+ | So unterscheiden sich Firmware-Architekturen von Flash-Speichern

1 Comment

Schrumpfende Strukturen, ­höhere 3D-Stapel und mehr Bits pro Speicherzelle machen Flash-Speicher immer günstiger, aber auch weniger robust.

Read the whole story
25 days ago
cache > flash
Share this story

RFC 9049: Path Aware Networking: Obstacles to Deployment (A Bestiary of Roads Not Taken)

1 Comment
This document is a product of the Path Aware Networking Research Group (PANRG). At the first meeting of the PANRG, the Research Group agreed to catalog and analyze past efforts to develop and deploy Path Aware techniques, most of which were unsuccessful or at most partially successful, in order to extract insights and lessons for Path Aware networking researchers. This document contains that catalog and analysis.
Read the whole story
25 days ago
Interesting read on failed networking protocols
Share this story
Next Page of Stories