Hempstick is an OpenSource USB HID joystick/controller firmware designed for Atmel’s ARM MicroControllers. This site serves as the official software and information hub.


///////////////////////////////// CAUTION /////////////////////////////////

This web site and contents are not intended for consumption of EU citizens and residents. If you are an EU citizen and/or resident, please go away.

I hereby certify all copyrighted contents, other than cited as from other copyrighted source with permission or are granted blanket right to use, like Creative Commons, are mine and mine alone. All contents hosted here and copyrighted by an EU citizen have been taken down, links to them have also been removed.

I also fully certify that the server hosting this website and all contents are located in the United States of America, and contains no components nor services located in the European Union member territories.

I am unable and unwilling to comply with the newly passed GPDR or Article 11 and 13 of the new law Copyright in the Digital Single Market, passed by the EU Parliament.

In addition, all EU citizens and residents are hereby not granted use of copyrighted material on this, and their right of use are hereby revoked.

///////////////////////////////// CAUTION /////////////////////////////////

Hempstick is an OpenSource USB joystick controller framework/library.

It uses FreeRTOS to achieve fully Multi-Threaded modular design/implementation to mitigate sensitive timing issue so that extensions can be easily added in isolation. It also uses advanced techniques like event-based interrupt handler to process sensor readings, and Direct Memory Access wherever possible to move sensor data to memory (even the USB module uses DMA to send reports to the host), and whenever possible uses hardware to do the work instead of using the old standby software techniques like bit-banging.

Most importantly, Hempstick NEVER uses the unforgivable busy polling loops!

Currently supported boards are:

  1. Arduino Due/X, and

  2. Atmel SAM4S XPLAIN Pro.


  1. 1. Max. 16 Analog Digital Converter channels (DirectInput can only have up to 8 axes).

  2. 2. Currently supports 64 buttons, but you can configure it to do 128 buttons.

  3. 3. Able to read buttons on ThrustMaster’s Cougar/Warthog sticks using the original 5 wires PS2 connector.

  4. 4. 12-bit hardware ADC, can be software enhanced to 14-bit with oversampling.

  5. 5. Software running averaging noise filter for ADC.

  6. 6. No ghosting, no shadowing, because there is no button matrix arrangement. We simply connect each button directly to an MCU pin. That is, we brute force the problem away with bigger MCUs.

  7. 7. Built-in hardware debouncers and pull-up/pull-down resistors.

  8. 8. Software debouncers for TM sticks.

  9. 9. Fully Multi-Threaded and contains a Real-Time Operating System (FreeRTOS), so that it can accommodate more complex features and provides flexibility for more modular design. Allows me to design it in a modular way to abstract out the configuration for end users to modify them without requiring deep embedded programming knowledge in order to produce a custom joystick controller.

  10. 10. Developers can easily change the configurations like USB VID/PID, manufacturer, device name, button/ADC assignments etc. to produce truly their own custom controllers.

Planned Features

  1. 1. Read and program MLX90363 Hall Effect Sensors in pure digital SPI mode (current works well with analog output Hall Effect Sensors using ADC).

  2. 2. Support SAM4E XPLAINED Pro board with Ethernet.

  3. 3. Implement UDP-based DeviceNet to output control signals to Hempstick for controlling devices like LEDs, motors etc. instead of using custom USB descriptors and drivers.

  4. 4. PWM output for controlling LED dimming.

  5. 5. NeoPixel control with hardware.

Why Write/Use Hempstick?

I want a USB controller that is extendable so I can add more functionalities beyond just reading a couple of ADCs and buttons, and perhaps flashes a couple of LEDs.

Technical Reasonings

Number 1 question people are going to ask is why write yet another USB firmware while there are indeed commercial products, like the Bodnar boards, the OpenCockpit board, and even OpenSource Arduino-based firmware available. I will be honest with you, the main reason I write the Hempstick firmware, in addition to those on the market don’t do what I want, is because of technical reasons.

  1. 1. Those available are just using inferior MCUs. Come on... PIC vs. ARM? ATMega vs. ARMs? Need I say more?

    Some might argue that it does not matter whether ARMs are in general more powerful than the PICs and ATMegas, as long as they do their jobs, who cares? Or do they do their jobs? See, it depends on what you define the “job” is. If you are designing a joystick that has fixed sensors, buttons, etc. That would be true. You know what the “job” is, precisely. But I am not talking about a fixed configuration, I am talking about “unspecified” jobs. I know somebody writing a USB firmware using a PIC, and his PIC is actually struggling to keep up with the task. It does the job still. Since he’s writing it specifically for one particular device, it’s ok. But to put more extension on it... kind of dicey.

  1. 2. The ones that I found, at least the OpenSource ones, do a simple dumb busy polling loop to read sensors/buttons, etc.

    Again, who cares, you might ask? I do. You see, a dumb busy polling loop does this. Loop forever, and in each iteration, read all the sensors, one after another, and then wait for a short period of time, and repeat, in a single execution flow.

    Now, say when reading the ADCs. An ADC sampling takes time to do the conversion. So, in a single reading, you actually have to tell it to switch on ADC channels, then, tell it to start the conversion process. Then you have to keep going back to read a register to see if it’s done (are we there yet, are we there yet, are we there yet ....). Once the done flag is detected, you move the data out of the ADC result register, and then start the next ADC conversion process for another channel. While you are asking “are we there yet”, the MCU cannot be doing anything else. Sure, you can squeeze in a couple of button reading routines while waiting for ADC conversion to be completed, but that violates the modular design principle, and you will end up with a spaghetti code! What if there are a lot of ADC channels and buttons to read?

    Worse yet, even though reading a button is a very quick process, in a dumb busy polling loop scheme, you read each button’s current status inside each iteration. So, if you are aiming to capture say any button press event time longer than 10ms, then each iteration must be completed within 10ms, otherwise you might miss a button press. Imagine somebody presses a button and releases in 1ms, and each of your loop takes 12ms -- you might have missed that button press event entirely when you are busy polling the ADC conversion status. What makes this bad is that you don’t know how long precisely each iteration will take, and adding more code reading more (or less) sensors will change the timing. So, it might happen that everything works fine, but as soon as you add one more sensor, some button press events start missing, and it is a runtime behavior that only shows up occasionally. This kind of timing dependent design should be avoided at all cost because debugging them is extremely difficult, not only replicating the timing to generate the fault is difficult, attaching a debugger will change the timing and perhaps making it impossible to replicate and debug (so is adding a simple print line could potentially throw off the timing).

    A simple dumb busy polling loop may be good enough for a fixed configuration with limited resources (like an MCU with only 2048 bytes of RAM), but is a bad design choice for an extendable framework. I want it to be extendable so that developers (particularly me) can add more sensors without having to worry about the timing problem. I mean, if I am going to support different boards that have different speeds... a timing dependent main loop is just not acceptable. I can’t go test every board for all configurations whenever I add a new line of code. That’s a mission impossible. You just don’t do that kind of things in a good software design.

  1. 3. The ones I found just don’t do multi-threading, nor contain an RTOS. Well, at least nobody claims so.

    Why is an RTOS important? See, I want to take advantages of modern MCU’s highly integrated event-based system and for the eventuality that even MCUs will be multi-cored (some already do). For instance, with Atmel’s SAM (ARM) MCUs, each General Purpose Input/Output (GPIO) pin can be configured to generate an interrupt when voltage level change. That is, when a button is pressed, an interrupt is generated and the corresponding interrupt handler is executed by the MCU, the regular non-interrupt execution task gets preempted out.


    Also, because interrupt handlers “preempt out” regular threads, you don’t want interrupt handlers to run too long monopolizing the CPU. The traditional tried and true solution is that you create a Semaphore, and a regular thread/task. This thread/task would then block on the Semaphore waiting for an event to unblock it. The interrupt handler then gets executed when something happens, it then quickly grabs the needed data out of the hardware, and then notify the Semaphore. This unblocks the regular task waiting on the Semaphore. However, even though the task (thread) is unblocked, it’s not scheduled to run yet. We would prefer that the particular task/thread is immediately scheduled to run but it can be preempted out by other task/thread once it starts to run.

    That means, I need independent threads running, each one takes care of their own “thing.” Say, one thread takes care of button events -- a button is pressed or released, causing voltage level change, then an interrupt is issued by the on die hardware, an interrupt handler is run, preempt out the current running thread, and then the button reading thread is executed to run to process the button change event, recording it somewhere in the back. When there is no button pressed (or released), no CPU cycle is ever used. A TMStick task would be waiting for data event from the hardware when the hardware raise an event, and then it needs to lock down the button data so that the button task and TMStick task do not step on each others’ toes. ADCs will have its own task running in a different thread moving the ADC conversion result out of the ADC register into RAM somewhere. When the ADC thread is waiting for the conversions, no CPU cycle is used. The thread just goes to sleep, yielding the CPU to other threads in the system. When the ADC conversion is completed, the ADC gets scheduled to run, moving the ADC result out and starts the next conversion process and go back to sleep. Same goes for other “modules” for reading sensors and outputting as well.

    Now, if we use such kind of event-based Multi-thread architecture, it doesn’t matter what period of time the buttons are pressed, they all get captured, as long as the GPIO module is capable of detecting the button press event (that’s another story about debouncer that we will not go into right now) and the CPU is not overloaded.

    Not only this saves a lot CPU cycles, it also greatly reduces the timing issue we talked about earlier. It does not completely eliminate the timing issue, but they are now very much isolated into each module/thread/task. As long as there are enough CPU cycles to go around, adding code or more sensor reading in one module will not affect other modules’ timing. And the code is forced to be more modular and independent of each other. Button reading code no longer co-mingle with ADC reading, nor with Hall Sensor reading, etc.

    This also makes supporting different boards much easier than the dumb polling loop. Look, each board might have different hardware for ADC. One might have a 16 channel 12bit ADC, another might have an 8 channel 10bit ADC. By forcing the design to be modular, ADC code can take care of it’s own part of supporting different boards without having to worry about buttons and other modules.

    For production code, using a dump busy polling loop should be avoided at all cost, except for trivial problems! For toys or artists, a dumb polling look might be easier to understand and still have acceptable performance. But for a joystick controller, failure on sending reports of the sensors in time could mean you get missiles up your rear end! So, it better work ALL THE TIME! NO MISS! But the reality is that there is no such thing as never miss. What we are doing here really is to reduce the chance of missing to an astronomical small chance so that it is practically zero.

    Furthermore, think about the software development process. One guy writes the code, he leaves for a higher paying job. Another guy gets hired to take over. A production code always go through many development cycles by many generations of teams. A dumb busy polling loop kind of architecture is just not maintainable except for trivial programs. You just don’t do that other than toy programs you write at school.

    But I guarantee you that some of the closed sourced commercial firmware do that too. Very often, to reduce cost, they choose a low end MCU, say with 2048 bytes RAM. Given such low spec. MCU, the developers are then forced to work inside this confine -- NO OS! Without an OS for multi-threading support, there is not much choice but to use a dumb busy polling loop. Plus, the goal is usually very simple, read a damn TMStick, and then read 1x MLX90333 in SPI mode, in the case of TM Warthog. You know what? It’s trivial! Is that evil? Hell no. The problem domain is trivial, thus a trivial solution suffices!

    What is unforgivable are those who are working on controllers to control cockpit panels, but still uses Arduino’s trivial solution for everything. Dude, you need to analyze your problem domain before you start applying solutions! Even if you have a solution (Arduino) looking for a problem to solve, you still need to analyze your problem to see if the solution fits. From my personal experience, a lot of the much younger programmers I baby sit through have this particular problem (even some much older industry veterans who never grow up pass this stage do so too) -- too eager to apply a new technology they are excited about so they apply it everywhere instead of analyze the problem and then find appropriate solution(s) to apply.

    How do you do multi-threading and all those task scheduling? That’s one of facilities an Operating System provides. Not only an OS provides thread creation and scheduling, it also provides Mutil-Threading locking primitives like Semaphore, Mutex, Queue etc. to help prevent race conditions, coordinate synchronization etc., i.e. thread safety. That is, in order to take advantages of the event based system, either I write these Multi-Threading primitives and task scheduling, or I use one. Why write one when there are already some in existence, some tried and true ones perhaps? I am not an RTOS expert! Let the experts write it, and we can just use it! The only question left is which one to use... After some extensive research and tests, I chose the FreeRTOS. It’s simple, it’s free, I get the source code, and Atmel Software Foundation supports it. That is, I can pretty much do everything I need with it, and all I have to do is to learn how to use it. Learn, I did.

These are just some of the technical reasons why I chose to write Hempstick anew instead of just using one.

The Official Hempstick Site