94 stories
·
0 followers

How To Create Multi-Step Forms With Vanilla JavaScript And CSS

1 Share

Multi-step forms are a good choice when your form is large and has many controls. No one wants to scroll through a super-long form on a mobile device. By grouping controls on a screen-by-screen basis, we can improve the experience of filling out long, complex forms.

But when was the last time you developed a multi-step form? Does that even sound fun to you? There’s so much to think about and so many moving pieces that need to be managed that I wouldn’t blame you for resorting to a form library or even some type of form widget that handles it all for you.

But doing it by hand can be a good exercise and a great way to polish the basics. I’ll show you how I built my first multi-step form, and I hope you’ll not only see how approachable it can be but maybe even spot areas to make my work even better.

We’ll walk through the structure together. We’ll build a job application, which I think many of us can relate to these recent days. I’ll scaffold the baseline HTML, CSS, and JavaScript first, and then we’ll look at considerations for accessibility and validation.

I’ve created a GitHub repo for the final code if you want to refer to it along the way.

Our job application form has four sections, the last of which is a summary view, where we show the user all their answers before they submit them. To achieve this, we divide the HTML into four sections, each identified with an ID, and add navigation at the bottom of the page. I’ll give you that baseline HTML in the next section.

Navigating the user to move through sections means we’ll also include a visual indicator for what step they are at and how many steps are left. This indicator can be a simple dynamic text that updates according to the active step or a fancier progress bar type of indicator. We’ll do the former to keep things simple and focused on the multi-step nature of the form.,

We’ll focus more on the logic, but I will provide the code snippets and a link to the complete code at the end.

Let’s start by creating a folder to hold our pages. Then, create an index.html file and paste the following into it:

Looking at the code, you can see three sections and the navigation group. The sections contain form inputs and no native form validation. This is to give us better control of displaying the error messages because native form validation is only triggered when you click the submit button.

Next, create a styles.css file and paste this into it:

Open up the HTML file in the browser, and you should get something like the two-column layout in the following screenshot, complete with the current page indicator and navigation.

Now, create a script.js file in the same directory as the HTML and CSS files and paste the following JavaScript into it:

This script defines a method that shows and hides the section depending on the formStep values that correspond to the IDs of the form sections. It updates stepInfo with the current active section of the form. This dynamic text acts as a progress indicator to the user.

It then adds logic that waits for the page to load and click events to the navigation buttons to enable cycling through the different form sections. If you refresh your page, you will see that the multi-step form works as expected.

Let’s dive deeper into what the Javascript code above is doing. In the updateStepVisibility() function, we first hide all the sections to have a clean slate:

formSteps.forEach((step) => {
  document.getElementById(step).style.display = "none";
});

Then, we show the currently active section:

document.getElementById(formSteps[currentStep]).style.display = "block";`

Next, we update the text that indicators progress through the form:

stepInfo.textContent = `Step ${currentStep + 1} of ${formSteps.length}`;

Finally, we hide the Previous button if we are at the first step and hide the Next button if we are at the last section:

navLeft.style.display = currentStep === 0 ? "none" : "block";
navRight.style.display = currentStep === formSteps.length - 1 ? "none" : "block";

Let’s look at what happens when the page loads. We first hide the Previous button as the form loads on the first section:

document.addEventListener("DOMContentLoaded", () => {
navLeft.style.display = "none";
updateStepVisibility();

Then we grab the Next button and add a click event that conditionally increments the current step count and then calls the updateStepVisibility() function, which then updates the new section to be displayed:

navRight.addEventListener("click", () => {
  if (currentStep < formSteps.length - 1) {
    currentStep++;
    updateStepVisibility();
  }
});

Finally, we grab the Previous button and do the same thing but in reverse. Here, we are conditionally decrementing the step count and calling the updateStepVisibility():

navLeft.addEventListener("click", () => {
  if (currentStep > 0) {
    currentStep--;
    updateStepVisibility();
  }
});

Have you ever spent a good 10+ minutes filling out a form only to submit it and get vague errors telling you to correct this and that? I prefer it when a form tells me right away that something’s amiss so that I can correct it before I ever get to the Submit button. That’s what we’ll do in our form.

Our principle is to clearly indicate which controls have errors and give meaningful error messages. Clear errors as the user takes necessary actions. Let’s add some validation to our form. First, let’s grab the necessary input elements and add this to the existing ones:

const nameInput = document.getElementById("name");
const idNumInput = document.getElementById("idNum");
const emailInput = document.getElementById("email");
const birthdateInput = document.getElementById("birthdate")
const documentInput = document.getElementById("document");
const departmentInput = document.getElementById("department");
const termsCheckbox = document.getElementById("terms");
const skillsInput = document.getElementById("skills");

Then, add a function to validate the steps:

Here, we check if each required input has some value and if the email input has a valid input. Then, we set the isValid boolean accordingly. We also call a showError() function, which we haven’t defined yet.

Paste this code above the validateStep() function:

function showError(input, message) {
  const formControl = input.parentElement;
  const errorSpan = formControl.querySelector(".error-message");
  input.classList.add("error");
  errorSpan.textContent = message;
}

Now, add the following styles to the stylesheet:

If you refresh the form, you will see that the buttons do not take you to the next section till the inputs are considered valid:

Finally, we want to add real-time error handling so that the errors go away when the user starts inputting the correct information. Add this function below the validateStep() function:

This function clears the errors if the input is no longer invalid by listening to input and change events then calling a function to clear the errors. Paste the clearError() function below the showError() one:

function clearError(input) {
  const formControl = input.parentElement;
  const errorSpan = formControl.querySelector(".error-message");
  input.classList.remove("error");
  errorSpan.textContent = "";
}

And now the errors clear when the user types in the correct value:

The multi-step form now handles errors gracefully. If you do decide to keep the errors till the end of the form, then at the very least, jump the user back to the erroring form control and show some indication of how many errors they need to fix.

In a multi-step form, it is valuable to show the user a summary of all their answers at the end before they submit and to offer them an option to edit their answers if necessary. The person can’t see the previous steps without navigating backward, so showing a summary at the last step gives assurance and a chance to correct any mistakes.

Let’s add a fourth section to the markup to hold this summary view and move the submit button within it. Paste this just below the third section in index.html:

Then update the formStep in your Javascript to read:

const formSteps = ["one", "two", "three", "four"];

Finally, add the following classes to styles.css:

.summary-section {
  display: flex;
  align-items: center;
  gap: 10px;
}

.summary-section p:first-child {
  width: 30%;
  flex-shrink: 0;
  border-right: 1px solid var(--secondary-color);
}

.summary-section p:nth-child(2) {
  width: 45%;
  flex-shrink: 0;
  padding-left: 10px;
}

.edit-btn {
  width: 25%;
  margin-left: auto;
  background-color: transparent;
  color: var(--primary-color);
  border: .7px solid var(--primary-color);
  border-radius: 5px;
  padding: 5px;
}

.edit-btn:hover {
  border: 2px solid var(--primary-color);
  font-weight: bolder;
  background-color: transparent;
}

Now, add the following to the top of the script.js file where the other consts are:

const nameVal = document.getElementById("name-val");
const idVal = document.getElementById("id-val");
const emailVal = document.getElementById("email-val");
const bdVal = document.getElementById("bd-val")
const cvVal = document.getElementById("cv-val");
const deptVal = document.getElementById("dept-val");
const skillsVal = document.getElementById("skills-val");
const editButtons = 
  "name-edit": 0,
  "id-edit": 0,
  "email-edit": 0,
  "bd-edit": 0,
  "cv-edit": 1,
  "dept-edit": 1,
  "skills-edit": 2
};

Then add this function in scripts.js:

function updateSummaryValues() {
  nameVal.textContent = nameInput.value;
  idVal.textContent = idNumInput.value;
  emailVal.textContent = emailInput.value;
  bdVal.textContent = birthdateInput.value;

  const fileName = documentInput.files[0]?.name;
  if (fileName) 
  const extension = fileName.split(".").pop();
  const baseName = fileName.split(".")[0];
  const truncatedName = baseName.length > 10 ? baseName.substring(0, 10) + "..." : baseName;
  cvVal.textContent = `${truncatedName}.${extension}`;
  } else {
    cvVal.textContent = "No file selected";
  }

  deptVal.textContent = departmentInput.value;
  skillsVal.textContent = skillsInput.value || "No skills submitted";
}

This dynamically inserts the input values into the summary section of the form, truncates the file names, and offers a fallback text for the input that was not required.

Then update the updateStepVisibility() function to call the new function:

function updateStepVisibility() {
  formSteps.forEach((step) => {
    document.getElementById(step).style.display = "none";
  });

  document.getElementById(formSteps[currentStep]).style.display = "block";
  stepInfo.textContent = `Step ${currentStep + 1} of ${formSteps.length}`;
  if (currentStep === 3) {
    updateSummaryValues();
  }

  navLeft.style.display = currentStep === 0 ? "none" : "block";
  navRight.style.display = currentStep === formSteps.length - 1 ? "none" : "block";
}

Finally, add this to the DOMContentLoaded event listener:

Object.keys(editButtons).forEach((buttonId) => {
  const button = document.getElementById(buttonId);
  button.addEventListener("click", (e) => {
    currentStep = editButtons[buttonId];
    updateStepVisibility();
  });
});

Running the form, you should see that the summary section shows all the inputted values and allows the user to edit any before submitting the information:

And now, we can submit our form:

form.addEventListener("submit", (e) => {
  e.preventDefault();

  if (validateStep(2)) {
    alert("Form submitted successfully!");
    form.reset();
    currentFormStep = 0;
    updateStepVisibility();
}
});

Our multi-step form now allows the user to edit and see all the information they provide before submitting it.

Making multi-step forms accessible starts with the basics: using semantic HTML. This is half the battle. It is closely followed by using appropriate form labels.

Other ways to make forms more accessible include giving enough room to elements that must be clicked on small screens and giving meaningful descriptions to the form navigation and progress indicators.

Offering feedback to the user is an important part of it; it’s not great to auto-dismiss user feedback after a certain amount of time but to allow the user to dismiss it themselves. Paying attention to contrast and font choice is important, too, as they both affect how readable your form is.

Let’s make the following adjustments to the markup for more technical accessibility:

  1. Add aria-required="true" to all inputs except the skills one. This lets screen readers know the fields are required without relying on native validation.
  2. Add role="alert" to the error spans. This helps screen readers know to give it importance when the input is in an error state.
  3. Add role="status" aria-live="polite" to the .stepInfo. This will help screen readers understand that the step info keeps tabs on a state, and the aria-live being set to polite indicates that should the value change, it does not need to immediately announce it.

In the script file, replace the showError() and clearError() functions with the following:

function showError(input, message) {
  const formControl = input.parentElement;
  const errorSpan = formControl.querySelector(".error-message");
  input.classList.add("error");
  input.setAttribute("aria-invalid", "true");
  input.setAttribute("aria-describedby", errorSpan.id);
  errorSpan.textContent = message;
  }

  function clearError(input) {
  const formControl = input.parentElement;
  const errorSpan = formControl.querySelector(".error-message");
  input.classList.remove("error");
  input.removeAttribute("aria-invalid");
  input.removeAttribute("aria-describedby");
  errorSpan.textContent = "";
}

Here, we programmatically add and remove attributes that explicitly tie the input with its error span and show that it is in an invalid state.

Finally, let’s add focus on the first input of every section; add the following code to the end of the updateStepVisibility() function:

const currentStepElement = document.getElementById(formSteps[currentStep]);
const firstInput = currentStepElement.querySelector(
  "input, select, textarea"
);

if (firstInput) {
  firstInput.focus();
}

And with that, the multi-step form is much more accessible.

There we go, a four-part multi-step form for a job application! As I said at the top of this article, there’s a lot to juggle — so much so that I wouldn’t fault you for looking for an out-of-the-box solution.

But if you have to hand-roll a multi-step form, hopefully now you see it’s not a death sentence. There’s a happy path that gets you there, complete with navigation and validation, without turning away from good, accessible practices.

And this is just how I approached it! Again, I took this on as a personal challenge to see how far I could get, and I’m pretty happy with it. But I’d love to know if you see additional opportunities to make this even more mindful of the user experience and considerate of accessibility.

Here are some relevant links I referred to when writing this article:

Read the whole story
bernhardbock
3 days ago
reply
Share this story
Delete

seddonym/import-linter: Import Linter allows you to define and enforce rules for the internal and external imports within your Python project.

1 Share
Read the whole story
bernhardbock
6 days ago
reply
Share this story
Delete

Publishing a simple client-side JavaScript package to npm with GitHub Actions

1 Share

Here's what I learned about publishing a single file JavaScript package to NPM for my Prompts.js project.

The code is in simonw/prompts-js on GitHub. The NPM package is prompts-js.

A simple single file client-side package

For this project, I wanted to create an old-fashioned JavaScript file that you could include in a web page using a <script> tag. No TypeScript, no React JSK, no additional dependencies, no build step.

I also wanted to ship it to NPM, mainly so it would be magically available from various CDNs.

I think I've boiled that down to about as simple as I can get. Here's the package.json file:

{
  "name": "prompts-js",
  "version": "0.0.4",
  "description": "async alternatives to browser alert() and prompt() and confirm()",
  "main": "index.js",
  "homepage": "https://github.com/simonw/prompts-js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "author": "Simon Willison",
  "license": "Apache-2.0",
  "repository": {
    "type": "git",
    "url": "git+https://github.com/simonw/prompts-js.git"
  },
  "keywords": [
    "alert",
    "prompt",
    "confirm",
    "async",
    "promise",
    "dialog"
  ],
  "files": [
    "index.js",
    "README.md",
    "LICENSE"
  ]
}

That "scripts.test" block probably isn't necessary. The keywords are used when you deploy to NPM, and the files block tells NPM which files to include in the package.

The "repository" block is used by NPM's provenance statements. Don't worry too much about these - they're only needed if you use the npm publish --provenance option later on.

Really the three most important keys here are "name", which needs to be a unique name on NPM, "version" and that "main" key. I set "main" to index.js.

All that's needed now is that index.js file - and optionally the README.md and LICENSE files if we want to include them in the package. The README.md ends up displayed on the NPM listing page so it's worth including.

Here's my index.js file. It starts and ends like this (an IFFE):

const Prompts = (function () {
  // ...
  return { alert, confirm, prompt };
})();

Publishing to NPM

With these pieces in place, running npm publish in the root of the project will publish the package to NPM - after first asking you to sign into your NPM account.

Automating this with GitHub Actions

I use GitHub Actions that trigger on any release to publish all of my Python projects to PyPI. I wanted to do the same for this JavaScript project.

I found this example in the GitHub documentation which gave me most of what I needed. This is in .github/workflows/publish.yml:

name: Publish Package to npmjs
on:
  release:
    types: [published]
jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      id-token: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20.x'
          registry-url: 'https://registry.npmjs.org'
      - run: npm publish --provenance --access public
        env:
          NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}

There's that --provenance option which only works if you have the repository block set up in your package.json.

This needs a secret called NPM_TOKEN to be set up in the GitHub repository settings.

It took me a few tries to get this right. It needs to be a token created on the NPM website using the Access Tokens menu item, then Generate New Token -> Classic Token. As far as I can tell the new "Granular Access Token" format doesn't work for this as it won't allow you to create a token that never expires, and I never want to have to remember to update the secret in the future.

An "Automation" token should do the trick here - it bypasses 2-factor authentication when publishing.

Set that in GitHub Actions as a secret called NPM_TOKEN and now you can publish a new version of your package to NPM by doing the following:

  1. Update the version number in package.json
  2. Create a new release on GitHub with a tag that matches the version number
Read the whole story
bernhardbock
10 days ago
reply
Share this story
Delete

Simple trick to save environment and money when using GitHub Actions

1 Share

We recently onboarded Nikita Sivukhin as a new member of our Engineering team at Turso. He immediately started to have meaningful contributions to our Native Vector Search but something else triggered me to write this article. In addition to working on his main task, Nikita started to poke around our codebase and to fix anything he found worth tackling. This is a great proactive approach which I highly recommend to any software engineer. One thing improved by Nikita was our GitHub Actions setup to avoid running jobs that are no longer needed. This is great because GitHub Actions not only consume electricity when they run but also either cost money when used for private repositories or have some usage quota for open source projects.

#What's the problem

We use GitHub Actions for our CI/CD at Turso. Both on open source projects and the ones that are private. Among other things, we run GitHub Actions on our Pull Requests. Some of those actions are pretty heavy and can take considerable amount of time. Rust compilation has its share but we also run all sorts of tests spanning from unit tests to end-to-end tests. It isn't uncommon for Pull Request to be updated before CI/CD is finished for the previous version. Unfortunately, GitHub does not cancel GitHub Actions for a stale version of the code and those tasks keep running until they either fail or fully finish. This is a problem because those old runs of CI/CD consume resources like electricity and GitHub Action runners even though no one is interested in the outcome of the run any more.

#Solution

This problem can be easily solved in a universal way. If you're running your GitHub Actions on pull_request: target then you just need to add the following snipped to the definition of your GitHub workflow:

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: true

And voilà, GitHub will start to cancel all old GitHub Actions runs that are stale after a new version of the Pull Request was uploaded. You can see the solution in wider context in Nikita's Pull Request that added this to LibSQL GitHub repository.

#Effects

As a consequence of this change you will start seeing new result type in your GitHub Actions summary page. There will be not only green circle with a tick and red circle with an X but also a grey octagon with an exclamation point that means a task was cancelled. Below is a screenshot from GitHub Actions summary page of LibSQL repository

During the first week after Nikita's Pull Request had been merged, 56 tasks were cancelled in LibSQL repository alone.

#Conclusion

I hope that this short article was able to convince you that if you're using GitHub Actions for your CI/CD then you can easily become more environment friendly and possibly save some money on GitHub bills.

Read the whole story
bernhardbock
10 days ago
reply
Share this story
Delete

Brendan Gregg's Blog

1 Share

Imagine halving the resource costs of AI and what that could mean for the planet and the industry -- based on extreme estimates such savings could reduce the total US power usage by over 10% by 20301. At Intel we've been creating a new analyzer tool to help reduce AI costs called AI Flame Graphs: a visualization that shows an AI accelerator or GPU hardware profile along with the full software stack, based on my CPU flame graphs. Our first version is available to customers in the Intel Tiber AI Cloud as a preview for the Intel Data Center GPU Max Series (previously called Ponte Vecchio). Here is an example:


Simple example: SYCL matrix multiply microbenchmark

(Click for interactive SVG.) The green frames are the actual instructions running on the AI or GPU accelerator, aqua shows the source code for these functions, and red (C), yellow (C++), and orange (kernel) show the CPU code paths that initiated these AI/GPU programs. The gray "-" frames just help highlight the boundary between CPU and AI/GPU code. The x-axis is proportional to cost, so you look for the widest things and find ways to reduce them.


Layers

This flame graph shows a simple program for SYCL (a high-level C++ language for accelerators) that tests three implementations of matrix multiply, running them with the same input workload. The flame graph is dominated by the slowest implementation, multiply_basic(), which doesn't use any optimizations and consumes at 72% of stall samples and is shown as the widest tower. On the right are two thin towers for multiply_local_access() at 21% which replaces the accessor with a local variable, and multiply_local_access_and_tiling() at 6% which also adds matrix tiling. The towers are getting smaller as optimizations are added.

This flame graph profiler is a prototype based on Intel EU stall profiling for hardware profiling and eBPF for software instrumentation. It's designed to be easy and low-overhead, just like a CPU profiler. You should be able to generate a flame graph of an existing AI workload whenever you want, without having to restart anything or launch additional code via an interposer.

Instruction-offset Profiling

This is not the first project to build an AI profiler or even something called an AI Flame Graph, however, others I've seen focus on tracing CPU stacks and timing accelerator execution, but don't profile the instruction offsets running on the accelerator; or do profile them but via expensive binary instrumentation. I wanted to build AI flame graphs that work like CPU flame graphs: Easy to use, negligible cost, production safe, and shows everything. A daily tool for developers, with most of the visualization in the language of the developer: source code functions.

This has been an internal AI project at Intel for the past year. Intel was already investing in this space, building the EU stall profiler capability for the Intel Data Center GPU Max Series that provides an approximation of HW instruction sampling. I was lucky to have Dr. Matthew (Ben) Olson, an Intel AI engineer who has also worked on eBPF performance tooling (processwatch) as well as memory management research, join my team and do most of the development work. His background has helped us power through difficulties that seemed insurmountable. We've also recently been joined by Dr. Brandon Kammerdiener (coincidentally another graduate of the University of Tennessee, like Ben), who also has eBPF and memory internals experience, and has been helping us take on harder and harder workloads. And Gabriel Muñoz just joined today to help with releases. Now that our small team has shown that this is possible, we'll be joined by other teams at Intel to develop this further.

We could have built a harder-to-use and higher-overhead version months ago using Intel GTPin but for widespread adoption it needs minimal overhead and ease of use so that developers don't hesitate to use this daily and to add it to deployment pipelines.

What's a Flame Graph?

A flame graph is a visualization I invented in 2011 for showing sampled code stack traces. It has become the standard for CPU profiling and analysis, helping developers quickly find performance improvements and eliminate regressions. A CPU flame graph shows the "big picture" of running software, with x-axis proportional to CPU cost. The example picture on the right summarizes how easy it can be to go from compute costs to responsible code paths. Prior to flame graphs, it could take hours to understand a complex profile by reading through hundreds of pages of output. Now it takes seconds: all you have to do is look for the widest rectangles.

Flame graphs have had worldwide adoption. They have been the basis for five startups so far, have been adopted in over thirty performance analysis products, and have had over eighty implementations.

My first implementation of flame graphs took a few hours on a Wednesday night after work. The real effort has been in the decade since, where I worked with different profilers, runtimes, libraries, kernels, compilers, and hypervisors to get flame graphs working properly in different environments, including fixing stack walking and symbolization. Earlier this year I posted about the final missing piece: Helping distros enable frame pointers so that profiling works across standard system libraries.

Similar work is necessary for AI workloads: fixing stacks and symbols and getting profiling to work for different hardware, kernel drivers, user-mode drivers, frameworks, runtimes, languages, and models. A lot more work, too, as AI analysis has less maturity than CPU analysis.

Searching Samples

If you are new to flame graphs, it's worth mentioning the built-in search capability. In the earlier example, most of the stall samples are caused by sbid: software scoreboard dependency. As that may be a unique search term, you can run search (Ctrl-F, or click "Search") on "sbid" and it will highlight it in magenta:

Search also shows the total number of stack samples that contained sbid in the bottom right: 78.4%. You can search for any term in the flame graph: accelerator instructions, source paths, function names, etc., to quickly calculate the percentage of stacks where it is present (excluding vertical overlap) helping you prioritise performance work.

Note that the samples are EU stall-based, which means theoretical performance wins can take the percentages down to zero. This is different to timer-based samples as are typically used in CPU profiling. Stalls mean you better focus on the pain, the parts of the code that aren't making forward progress, but you aren't seeing resource usage by unstalled instructions. I'd like to supuport timer-based samples in the future as well, so we can have both views.

Who will use this?

At a recent golang conference, I asked the audience of 200+ to raise their hands if they were using CPU flame graphs. Almost every hand went up. I know of companies where flame graphs are a daily tool that developers use to understand and tune their code, reducing compute costs. This will become a daily tool for AI developers.

My employer will use this as well for evaluation analysis, to find areas to tune to beat competitors, as well as to better understand workload performance to aid design.

Why is AI profiling hard?

Consider CPU instruction profiling: This is easy when the program and symbol table are both in the file system and in a standardized file format (such as ELF) as is the case with native compiled code (C). CPU profiling gets hard for JIT-complied code, like Java, as instructions and symbols are dynamically generated and placed in main memory (the process heap) without following a universal standard. For such JITted code we use runtime-specific methods and agents to retrieve snapshots of the heap information, which is different for each runtime.

AI workloads also have different runtimes (and frameworks, languages, user-mode drivers, compilers, etc.) any of which can require special tinkering to get their CPU stacks and symbols to work. These CPU stacks are shown as the red, orange, and yellow frames in the AI Flame Graph. Some AI workloads are easy to get these frames working, some (like PyTorch) are a lot more work.

But the real challenge is instruction profiling of actual GPU and AI accelerator programs -- shown as the aqua and green frames -- and correctly associating them with the CPU stacks beneath them. Not only may these GPU and AI programs not exist in the file system, but they may not even exist in main memory! Even for running programs. Once execution begins, they may be deallocated from main memory and only exist in special accelerator memory, beyond the direct reach of OS profilers and debuggers. Or within reach, but only through a prohibitively high-overhead HW-specific debugger interface.

There's also no /proc representation for these programs either (I've been proposing building an equivalent) so there's no direct way to even tell what is running and what isn't, and all the other /proc details. Forget instruction profiling, even ps(1) and all the other process tools do not work.

It's been a mind-bending experience, revealing what gets taken for granted because it has existed in CPU land for decades: A process table. Process tools. Standard file formats. Programs that exist in the file system. Programs running from main memory. Debuggers. Profiliers. Core dumping. Disassembling. Single stepping. Static and dynamic instrumentation. Etc. For GPUs and AI, this is all far less mature. It can make the work exciting at times, when you think something is impossible and then find or devise a way.

Fortunately we have a head start as some things do exist. Depending on the runtime and kernel driver, there are debug interfaces where you can list running accelerator programs and other statistics, as used by tools like intel_gpu_top(1). You can kill -9 a GPU workload using intel_gpu_abrt(1). Some interfaces can even generate basic ELF files for the running accelerator programs that you can try to load in a debugger like gdb(1). And there is support for GPU/AI program disassembly, if you can get your hands on the binary. It feels to me like GPU/AI debugging, OS style, is about two years old. Better than zero, but still early on, and lots more ahead of us. A decade, at least.

What do AI developers think of this?

We've shown AI Flame Graphs to other AI developers at Intel and a common reaction is to be a bit puzzled, wondering what to do with it. AI developers think about their bit of code, but with AI Flame Graphs they can now see the entire stack for the first time, including the HW, and many layers they don't usually think about or don't know about. It basically looks like a pile of gibberish with their code only a small part of the flame graph.


CPU Flame Graph Implementations

This reaction is similar to people's first experiences with CPU flame graphs, which show parts of the system that developers and engineers typically don't work on, such as runtime internals, system libraries, and kernel internals. Flame graphs are great at highlighting the dozen or so functions that matter the most, so it becomes a problem of learning what those functions do across a few different code bases, which are typically open source. Understanding a dozen such functions can take a few hours or even a few days -- but if this leads to a 10% or 2x cost win, it is time well spent. And the next time the user looks at a flame graph, they start saying "I've seen that function before" and so on. You can get to the point where understanding the bulk of a CPU flame graph takes less than a minute: look for the widest tower, click to zoom, read the frames, done.

I'm encouraged by the success of CPU flame graphs, with over 80 implementations and countless real world case studies. Sometimes I'm browsing a performance issue I care about on github and hit page down and there's a CPU flame graph. They are everywhere.

I expect AI developers will also be able to understand AI Flame Graphs in less than a minute, but to start with people will be spending a day or more browsing code bases they didn't know were involved. Publishing case studies of found wins will also help people learn how to interpret them, and also help explain the value.

What about PyTorch?

Another common reaction we've had is that AI developers are using PyTorch, and initially we didn't support it as it meant walking Python stacks, which isn't trivial. But prior work has been done there (to support CPU profiling) and after a lot of tinkering we now have the first PyTorch AI Flame Graph:


PyTorch frames in pink

(Click for interactive SVG.) The PyTorch functions are at the bottom and are colored pink. This example runs oneDNN kernels that are JIT-generated, and don't have a source path so that layer just reads "jit". Getting all other the layers included was a real pain to get going, but an important milestone. We think if we can do PyTorch we can do anything.

In this flame graph, we show PyTorch running the Llama 2 7B model using the Intel Extensions for PyTorch (IPEX). This flame graph shows the origin of the GPU kernel execution all the way back to the Python source code shown in pink. Most samples are from a stack leading up to a gemm_kernel (matrix multiply) shown in aqua, which like the previous example has many stalls due to software scoreboarding.

There are two instructions (0xa30 and 0xa90) that combined are 27% of the entire profile. I expect someone will ask: Can't we just click on instructions and have it bring up a dissassembly view with full source? Yes, that should be possible, but I can't answer how we're going to provide this yet. Another expected question I can't yet answer: Since there are now multiple products providing AI auto-tuning of CPU workloads using CPU flame graphs (including Intel Granulate) can't we have AI auto-tuning of AI workloads using AI Flame Graphs?

First Release: Sometimes hard and with moderate overhead

Getting AI Flame Graphs to work with some workloads is easy, but others are currently hard and cost moderate overhead. It's similar to CPU profiling, where some workloads and languages are easy to profile, whereas others need various things fixed. Some AI workloads use many software dependencies that need various tweaks and recompilation (e.g., enabling frame pointers so that stack walking works) making setup time consuming. PyTorch is especially difficult and can take over a week of OS work to be ready for AI Flame Graphs. We will work on getting these tweaks changed upstream in their respective repositories, something involving teams inside and outside of Intel, and is a process I'd expect to take at least a year. During that time AI workloads will gradually become easier to flame graph, and with lower-overhead as well.

I'm reminded of eBPF in the early days: You had to patch and recompile the kernel and LLVM and Clang, which could take multiple days if you hit errors. Since then all the eBPF dependency patches have been merged, and default settings changed, so that eBPF "just works." We'll get there with AI Flame Graphs too, but right now it's still those early days.

The changes necessary for AI Flame Graphs are really about improving debugging in general, and are a requirement for Fast by Friday: A vision where we can root-cause analyze anything in five days or less.

Availability

AI Flame Graphs will first become available on the Intel Tiber AI Cloud as a preview feature for the Intel Data Center GPU Max Series. If you are currently deployed there you can ask through the Intel service channel for early access. As for if or when it will support other hardware types, be in other Intel products, be officially launched, be open source, etc., these involve various other teams at Intel and they need to make their own announcements before I can discuss them here.

Conclusions

Finding performance improvements for AI data centers of just fractions of a percent can add up to planetary savings in electricity, water, and money. If AI flame graphs have the success that CPU flame graphs have had, I'd expect finding improvements of over 10% will be common, and 50% and higher will eventually be found*. But it won't be easy in these early days as there are still many software components to tweak and recompile, and software layers to learn about that are revealed in the AI flame graph.

In the years ahead I imagine others will build their own AI flame graphs that look the same as this one, and there may even be startups selling them, but if they use more difficult-to-use and higher-overhead technologies I fear they could turn companies off the idea of AI flame graphs altogether and prevent them from finding sorely needed wins. This is too important to do badly. AI flame graphs should be easy to use, cost negligible overhead, be production safe, and show everything. Intel has proven it's possible.

Disclaimer

* This is a personal blog post that makes personal predictions but not guarantees of possible performance improvements. Feel free to take any claim with a grain of salt, and feel free to wait for an official publication and public launch by Intel on this technology.

1 Based on halving the Arm CEO Rene Haas' estimate of 20-25% quoted in Taking a closer look at AI's supposed energy apocalypse by Kyle Orland of ArsTechnica.

Thanks

Thanks to everyone at Intel who have helped us make this happen. Markus Flierl has driven this project and made it a top priority, and Greg Lavender has expressed his support. Special thanks to Michael Cole, Matthew Roper, Luis Strano, Rodrigo Vivi, Joonas Lahtinen, Stanley Gambarin, Timothy Bauer, Brandon Yates, Maria Kraynyuk, Denis Samoylov, Krzysztof Raszknowski, Sanchit Jain, Po-Yu Chen, Felix Degrood, Piotr Rozenfeld, Andi Kleen, and all of the other coworkers that helped clear things up for us, and thanks in advance for everyone else who will be helping us in the months ahead.

My final thanks is to the companies and developers who do the actual hands-on work with flame graphs, collecting them, examining them, finding performance wins, and applying them.
You are helping save the planet.

Read the whole story
bernhardbock
10 days ago
reply
Share this story
Delete

BadRAM: Historischer Seitenkanal hebelt Confidential Computing in der Cloud aus

1 Share
Read the whole story
bernhardbock
10 days ago
reply
Share this story
Delete
Next Page of Stories