HTML5 Drag & Drop — Not the API You’re Looking For

31 Jan 2024

A “regretrospective” from our app

Thanks to apps like Trello, Figma & Notion, users are accustomed to seamless drag and drop (DnD) experiences on the web. Nowadays if there's something to reorder, rearrange or edit users expect to drag it.

But DnD is tricky to implement. How do you tell a drag apart from a regular click? How do you ensure dragging doesn't cause unexpected hover states on other elements? How will your layout respond when the user is over a drop target? What about touchscreens?

Modern web APIs are usually great, so when building Relume Site Builder my first inclination was to use the HTML Drag and Drop API. I’ll take you on my journey with this and hopefully spare you the pain and disappointment that using it caused us.

The API that gaslights you

DnD means something different in every application. On the simple end is an app that only supports dropping - e.g. file uploads. The HTML DnD api works great in those cases.

I was building something more complex - an app where you both drag and drop things, primarily within the same app. We can break this down into 5 parts:

Starting a drag
Rendering a drag preview
Rendering a drop preview
Interacting during the drag
Processing a drop

At every turn, the HTML DnD api baits you into thinking it’ll be easy. After building them out, I realized it provided little value for any of parts.

1 - Starting a drag

Starting a drag in Canva, notice the odd drag cursor?

With the HTML DnD api, an element is made draggable by setting draggable on the element and then calling event.preventDefault() on during the drag event handler.

Here we run into our first theme, significant implementation differences between browsers. Given the api is a decade old, I was surprised by how many differences remain. For example, Safari will not start a drag unless mime data (event.dataTransfer.items) has also been set during the drag event handler.

But how does a user drag a draggable element? The spec states (emphasis mine):

This specification does not define exactly what a drag-and-drop operation actually is.

On a visual medium with a pointing device, a drag operation could be the default action of a mousedown event that is followed by a series of mousemove events, and the drop could be triggered by the mouse being released.

Despite the spec's ambiguity, browsers are consistent. Chrome, Firefox, and Safari all require a mouse to start a drag; none will trigger drag and drop events for touchscreen users.

The lack of touch support makes the DnD api a non-starter for most use cases. Mobile is eating the world and has been for the last decade. Nevertheless, I foolishly stuck with the DnD api as our app is desktop-targeted and descended further into madness…

2 - Rendering a drag preview

The DnD api will render a "drag image" while the user is dragging. The default image differs by browser:

Chrome	Firefox	Webkit

In Chrome, the drag image is a full opacity version of the dragged object. But it doesn't handle border-radius correctly.	In Firefox, the drag image is a translucent version of the dragged object.	WebKit artistically offsets the image from the cursor. What even?

We wanted a consistent experience, with correct border-radiuses, to meet our app’s design goals. However, the api to change the drag image has similar limitations:

You can set the image to a regular HTML element, but that’ll still have the cross-browser differences.
You can set the image to a img or canvas element and get consistent results across browsers. But, you need to supply an img or canvas! This will likely involve re-implementing your styling & layout as there's no api to render a DOM element onto a canvas.
Either way, you can only set the drag image at the start of the drag - the preview can not change as the drag moves over a drop target.

Luckily, you can hack around drag image’s limitations by avoiding them:

A rabbit hole: Faking a drag preview

You can hide the native drag preview image and render DOM nodes to follow the cursor, creating the illusion of a preview. Of course, there are pitfalls with this approach:

Clipping / scroll containers. We can’t simply transform: translate(x, y) the element being dragged and call it a day. If the element is affected by clipping (e.g. a parent has overflow: scroll), it would disappear as you drag it outside. So we need to create a new DOM node for the element being dragged, which will be outside of any clipping parents. To avoid seeing double we should also hide the original element, which in itself is non-trivial:
- Removing the element that started the drag from the DOM will cancel the drag.
- We observed chrome would abort the drag if the element resized to 0x0 too quickly, so we needed to setTimeout(..., 1) before updating the element size.
Empty preview images need to be a very specific type - my other post explores how the wrong image can silently break things on other browsers
While the Element::drag event sounds logical to get the drag position, Firefox always reports the position as 0,0 in this event. Use Window::dragover instead. However...
DataTransfer::getData returns null during the dragover handler in Chrome & Safari. The data is only readable on drop. This means you'll need to maintain the state of "what is being dragged" in your javascript instead and keep it in sync with the browser.

3 - Rendering a drop preview

Trello (pictured above) has the thoughtful detail of showing gray placeholders to indicate where your card will drop. We wanted a similar detail, which I call a “drop preview”.

The HTML DnD api doesn’t help you here. I’d even say it works against this goal:

The drop target likely needs to know what is over it, to render the correct sized preview. As mentioned above, DataTransfer::getData returns null during the dragover handler in Chrome & Safari. This is another reason to maintain the state in your own javascript - effectively reimplementing part of the HTML DnD api yourself.
You have limited styling control of OS cursors. The HTML DnD api overrides any css cursor styling you have set. Instead, you’re limited to a set of 4 cursors (via dataTransfer.dropEffect) - “copy”, “move”, “link” or “none” (not allowed). These look kind of ugly, at least in my designer colleague’s opinion.

4 - Interacting during a drag

On most mice the scroll wheel is separate from the primary button. So users expect to be able to scroll while they are dragging something. Even touchpad users might try dragging & scrolling by using multiple fingers at once.

Sadly the HTML DnD api rules out this possibility. While dragging, no mouse events are dispatched and the browser won't scroll either. For a canvas-style app (like figma, pictured above), this would be losing a table-stakes feature.

5 - Processing a drop

This is fine, or at least it wasn’t memorably bad. One win for the HTML DnD api. 🎉

Is the HTML DnD api fit for purpose?

By this point, I felt like I’d implemented so many workarounds that I was almost re-implementing the HTML DnD api itself. Worse still, we discovered api limitations that cramped my app’s style - from small things like ugly cursors to bigger issues like no touch support.

So we bit the bullet and re-implement our app’s drag and drop without the HTML DnD api. It looked very similar to the HTML DnD api - instead of using draggable we were using data-sam-draggable. Modern web apis our input-handling code a breeze too:

With the Pointer Events api you can setPointerCapture after drag starts. That way the dragged element can continue receiving the mouse events regardless of where the cursor moves - you don't need to muck around with window event handlers or outrunning the browser hit test.
With document.elementsFromPoint you can find all the dom elements at the cursor's position, sorted by z-index. This allows us to insert a transparent div that “catches” all the pointer-events (to stop hover states being triggered during a drag) while still knowing which drop-target element the cursor “hits” underneath.

Now our app is free from past limitations - users can scroll the canvas while they drag!

Why would you use the HTML DnD api?

To drag or drop things between different windows. That's it.

With the html api, you can accept drops from other windows or even native applications. It’s not obvious how you’d do that any other way.

Supporting cross-window DnD should be a no-brainer. Almost every app supports copy-paste across tabs… why not drag and drop? But the state of the HTML DnD api forces you to choose between a horrifically broken cross-window experience or a flawless single window experience. The latter usually wins.

Conclusion

Don't use the HTML5 DnD api, unless you're implementing a file upload widget or have a passion for frustration.

This post is about work I've been doing at Relume.
Relume helps agencies and freelancers design websites faster.
Learn more at relume.io.

I hope you enjoyed this article. Contact me if you have any thoughts or questions.

Korean typing practice?

A side-project born from personal experience

S3's default Cache-Control is dangerous for browsers

Stale responses, here we come!

View all posts

About me