Skip to main content

How I built a real-time Android vision system from scratch using YOLO, DeepSORT, and uinput.

AimBuddy: Building a 60 FPS on-device tracking and touch injection system
19 mins
TIP

I have decided to open source AimBuddy. Everything discussed in this post, the full native pipeline, training scripts, and docs, is now freely available on GitHub.

AimBuddy started as an experiment to see if a phone could run a full real-time vision pipeline entirely on-device. Screen capture, YOLO inference, multi-target tracking, and programmatic touch injection, all natively on a mobile GPU at 60 FPS with no PC tethering. It can, but the interesting problems weren’t where I expected them.

Running YOLO on a phone was the easy part. NCNN with Vulkan gives you GPU compute shaders and FP16 ALUs for free. The problems that actually ate months of dev time were in the glue between components. How do you keep latency honest when the SoC thermally throttles and your inference time doubles? How do you make a tracker that doesn’t flicker every time a detection disappears for a frame? How do you inject touch events that feel like a human input and not a machine gun?

This post covers the full technical stack with every design decision, actual code, and the problems that were painful to debug.

IMPORTANT

This is a research and educational project. All testing was done in controlled environments.

What AimBuddy actually ish2

There are two runtime modes, and the split between them is deliberate:

  • Visual Assist (no root required) runs screen capture, YOLO inference, target tracking, and an ESP overlay. Works on any Android 11+ device.
  • Assisted Input (root required) adds low-latency touch injection via Linux uinput on top of the visual pipeline.

Root failure doesn’t crash the app. If /dev/uinput isn’t available or the grab fails, the visual pipeline keeps running and the touch layer just never starts. This matters during development when you’re constantly switching between root and non-root test devices.

The stack is Kotlin + Jetpack Compose for the Android UI layer, and C++ via JNI for everything on the hot path. The inference model is yolo26n, a nano-sized single-class detector from the YOLO26 family, running on NCNN with Vulkan compute.

The architectureh2

AimBuddy architecture
Architecture diagram for the full AimBuddy pipeline

Four threads at runtime. The inference thread is pinned to the Cortex-X1 big core and the render thread to a Cortex-A78 core via sched_setaffinity. This is done through an RAII ESP::Thread wrapper that takes an affinity parameter at start:

thread.h
bool start(int cpuAffinity = -1) {
cpuAffinity_ = cpuAffinity;
int result = pthread_create(&thread_, nullptr, threadEntry, this);
// ...
}
// Inside threadEntry:
cpu_set_t cpuset;
CPU_ZERO(&cpuset);
CPU_SET(thread->cpuAffinity_, &cpuset);
sched_setaffinity(0, sizeof(cpu_set_t), &cpuset);

Pinning to specific cores on a big.LITTLE SoC is not optional for consistent timing. Without affinity, the scheduler freely migrates the inference thread between fast and slow cores, and your inference time oscillates wildly. That variance breaks the adaptive crop controller, which relies on stable EMA measurements to make decisions.

The inference and render threads don’t share a lock for frames. Data flows through a lock-free SPSC ring buffer from capture to inference, and through a std::mutex-protected copy from inference to render. The aim loop reads from the tracker under its own mutex. There’s no single choke point.

Capture: MediaProjection and HardwareBufferh2

Android’s MediaProjection API gives you a VirtualDisplay you can attach an ImageReader to. Each frame arrives as an AHardwareBuffer, which is a reference to GPU memory you can pass directly to native code without copying:

esp_jni.cpp
AHardwareBuffer* buffer = AHardwareBuffer_fromHardwareBuffer(env, hardwareBuffer);
AHardwareBuffer_acquire(buffer);
ESP::Frame frame;
frame.hardwareBuffer = buffer;
frame.timestamp = timestamp;
frame.width = g_captureWidth;
frame.height = g_captureHeight;
if (!g_frameBuffer->push(frame)) {
AHardwareBuffer_release(buffer);
// drop count tracked in FrameBuffer for periodic telemetry
}

Capture runs at 1280x720. Full 1080p doubles the preprocessing cost for no detection benefit since the model input is only 256x256. The pixels you’d gain are thrown away during the center crop and resize anyway.

The ring buffer has 8 slots, giving about 200ms of buffering headroom at 40+ FPS capture. You need this slack because inference occasionally takes longer than a single frame period, and you can’t let the capture thread block.

One thing I got burned by early on was the ImageReader buffer count. It’s configured with 3 max images:

settings.h
constexpr int IMAGE_READER_MAX_IMAGES = 3;

With 2 buffers, if inference is holding one and capture is writing another, the producer stalls. That tanks you from 60+ FPS to a lumpy ~30. Three buffers breaks that deadlock. It’s a classic producer-consumer problem, and it’s annoying to debug because the symptom looks like slow inference when it’s actually a buffer allocation bottleneck.

The inference loop: drain to latesth2

The inference thread doesn’t process frames in order. It drains the ring buffer to the newest available frame every iteration, deliberately dropping stale work:

esp_jni.cpp
if (g_frameBuffer && g_frameBuffer->pop(frame)) {
ESP::Frame newer;
uint64_t drainedThisIteration = 0;
while (g_frameBuffer->pop(newer)) {
if (frame.hardwareBuffer) {
AHardwareBuffer_release(frame.hardwareBuffer);
}
frame = newer;
drainedThisIteration++;
}
// run inference on freshest frame only
}

If the GPU is slow and frames pile up, processing them in order means you’re always behind reality. Dropping frames to stay current feels smoother and produces better tracking because the tracker’s velocity estimates are based on real-time deltas, not stale data.

When the inference loop has no frames, it doesn’t busy-wait. It uses exponential backoff starting at 200 microseconds and topping out at 2ms:

esp_jni.cpp
const auto sleepDuration = std::min(kNoFrameSleepMin * (1u << noFrameBackoffLevel), kNoFrameSleepMax);
std::this_thread::sleep_for(sleepDuration);
if (noFrameBackoffLevel < 4) ++noFrameBackoffLevel;

When a frame arrives, noFrameBackoffLevel resets to 0 so the loop immediately returns to tight polling. This keeps CPU usage low when idle without adding latency when frames are flowing.

I track both average and EMA inference time per window of 120 frames, and the telemetry logs to logcat:

Pipeline stats: avg infer=7.2ms avg e2e=14.1ms ema infer=7.8ms ema e2e=15.3ms crop=352 drained=1 dropped_push=0

If drained is consistently > 2 per window, something’s under pressure. If dropped_push is nonzero, the ring buffer is overflowing and you’re losing frames at the capture side.

Adaptive crop: treating crop size as a control variableh2

This is probably the most interesting optimization in the codebase. The center crop size going into inference is not fixed. It adjusts at runtime based on two pressure signals.

esp_jni.cpp
const bool backlogPressure = (drainedThisIteration > 0);
const bool latencyPressure = (emaInferMs > kTargetCycleMs) || (emaEndToEndMs > kE2ePressureMs);
if (latencyPressure || backlogPressure) {
adaptiveCropSize = std::max(kMinAdaptiveCrop, adaptiveCropSize - kDownscaleStep);
} else if (adaptiveCropSize < cachedCropSize) {
adaptiveCropSize = std::min(cachedCropSize, adaptiveCropSize + kUpscaleStep);
}

Under load the crop shrinks quickly per iteration. When pressure clears it grows back slowly toward the FOV-derived target. The asymmetric step sizes prevent oscillation. Fast shrink, slow grow is the same idea behind TCP congestion control: respond to overload quickly but recover cautiously so you don’t immediately re-enter overload.

The crop size also adapts to the user’s configured FOV radius. When the FOV setting changes, the system recomputes the target crop by mapping FOV pixels through the screen-to-capture resolution ratio:

esp_jni.cpp
int targetSize = static_cast<int>(fovRadius * 2.0f);
targetSize = std::max(256, std::min(targetSize, safeScreenWidth));
const float scaleToCapture = static_cast<float>(Config::CAPTURE_WIDTH) / static_cast<float>(safeScreenWidth);
int dynamicCropSize = static_cast<int>(targetSize * scaleToCapture);

This means a small FOV setting automatically gives you a smaller crop and faster inference. The adaptive controller then further adjusts within that range based on runtime pressure.

NCNN and Vulkan: getting inference under 10msh2

NCNN is Tencent’s mobile inference framework. I use it instead of TFLite because it has first-class Vulkan support, which means I can run compute shaders on the GPU instead of the CPU. The difference is roughly 3x throughput and significantly less thermal output.

The NCNN configuration for Adreno GPUs:

yolo_detector.cpp
net.opt.use_vulkan_compute = true;
net.opt.use_fp16_packed = true;
net.opt.use_fp16_storage = true;
net.opt.use_fp16_arithmetic = true;
net.opt.use_packing_layout = true;
net.opt.lightmode = true;
net.opt.num_threads = 4; // CPU fallback threads

FP16 packed + arithmetic is the important one for Adreno GPUs. They have native FP16 ALUs and you need all three flags to actually use them. Without them you’re doing FP32 compute and losing roughly half the throughput. The lightmode flag tells NCNN to release intermediate blob memory after each layer, which keeps the memory footprint under control.

The model input is 256x256, not the standard 640x640. The preprocessing chain from HardwareBuffer:

Preprocessing pipeline
Preprocessing pipeline from HardwareBuffer to model input
yolo_detector.cpp
const float normVals[3] = {1/255.f, 1/255.f, 1/255.f};
input.substract_mean_normalize(nullptr, normVals);

One thing that bit me was model export format differences. Depending on how you export from Ultralytics, the NCNN blob names may or may not be present in the param file. I handle this with a name-first, index-fallback strategy:

yolo_detector.cpp
int ret = -1;
if (!useInputIndex_ && !inputBlobName_.empty()) {
ret = ex.input(inputBlobName_.c_str(), input);
}
if (ret != 0) {
useInputIndex_ = true;
ret = ex.input(0, input); // index fallback
}

Once fallback is triggered, useInputIndex_ is cached so the name path isn’t retried every frame.

Training the modelh2

The model is yolo26n, a single-class detector. The training pipeline enforces yolo26n.pt as a hard contract in both train.py and download_base_model.py. Passing a different base model name errors out immediately:

download_base_model.py
if base_model.name.lower() != "yolo26n.pt":
print("ERROR: base_model must be yolo26n.pt for this repository contract")
return 2

I enforce this because the NCNN export output filenames, the inference layer names, and the model input dimensions are all downstream assumptions. Swapping the base model breaks the contract silently if you let it through.

Training runs on Windows with Ultralytics + PyTorch. The dataset is frames extracted from screen recordings, auto-labeled with a pre-trained detector, then manually reviewed to fix mistakes.

Export pipeline
Model export pipeline from PyTorch to NCNN
Training results
Training curves showing clean convergence with no overfitting
Precision-Recall curve
Precision-Recall curve at 0.5 IOU threshold
Validation batch predictions
Validation predictions showing detection across different poses and occlusion levels

NMS and postprocessingh2

YOLO outputs thousands of candidate boxes at multiple scales. Most overlap. NMS filters them to the best non-overlapping set by computing Intersection over Union between every pair and suppressing lower-confidence boxes that overlap above a threshold:

bounding_box.h
float iou(const BBox& a, const BBox& b) {
float x1 = std::max(a.left(), b.left());
float y1 = std::max(a.top(), b.top());
float x2 = std::min(a.right(), b.right());
float y2 = std::min(a.bottom(), b.bottom());
float inter = std::max(0.f, x2-x1) * std::max(0.f, y2-y1);
return inter / (a.area() + b.area() - inter);
}

After NMS, coordinates are remapped from model crop-space back to screen-space. This remapping is where coordinate system bugs hide. Off-by-one errors in the crop offset calculation show up as boxes that are consistently shifted by a few pixels in one direction, and it’s infuriating to track down because the detection itself looks correct.

The postprocessor also handles both transposed and non-transposed NCNN output layouts, since the format changed between Ultralytics export versions.

DeepSORT-style trackingh2

Raw YOLO detections are noisy. Boxes jump a few pixels each frame, sometimes disappear for a frame or two during partial occlusion. Reacting directly to raw detections produces jittery output. The tracker smooths this into stable identities.

I use a DeepSORT-inspired matching cascade. Instead of matching all detections to all tracks simultaneously, tracks are processed in order of increasing age (younger first). This prevents old occluded tracks from stealing detections that belong to recently-confirmed targets:

target_tracker.cpp
// Match tracks in order of increasing age (younger first)
for (int currentAge = 0; currentAge <= maxAge; currentAge++) {
for (int t = 0; t < numTracks; t++) {
if (trkMatched[t]) continue;
if (track.age != currentAge) continue;
// ... find best detection match
}
}

The matching score is a weighted combination of three signals:

target_tracker.cpp
float score = iou * 0.70f + centerScore * 0.22f + areaScore * 0.08f;
if (isLockedTrack) score += 0.06f; // bias toward current lock

70% IoU, 22% center distance, 8% area similarity. The locked target gets a small bonus, which makes the system sticky to its current target without being so sticky that it ignores a clearly better match.

Before matching, there’s also a spatial gate. If a detection’s center is too far from the track’s predicted position, it’s rejected without computing IoU at all. This prevents a track on the left of the screen from matching a detection that appeared on the right.

The real-time dt measurement is critical. A fixed timestep assumption breaks on Android because scheduling jitter is real:

target_tracker.cpp
float dt = 1.0f / 60.0f; // default
if (m_lastUpdateNs > 0 && nowNs > m_lastUpdateNs) {
dt = static_cast<float>(nowNs - m_lastUpdateNs) / 1'000'000'000.0f;
dt = AimbotMath::clamp(dt, 1.0f / 120.0f, 1.0f / 20.0f);
}

Clamping dt between 1/120 and 1/20 prevents velocity estimates from exploding when scheduling hiccups cause a long gap between updates.

Track lifecycle state diagram
Track lifecycle state transitions

One-frame spurious detections never reach CONFIRMED state, so they never influence the controller. Three matches at 60 FPS is 50ms, short enough to feel responsive but long enough to filter garbage. Tentative tracks that miss even one frame are immediately removed (they never proved themselves), while confirmed tracks get a grace period.

Target selection has hysteresis. The locked target needs to be beaten by a significant margin before a switch happens, and there’s a cooldown on switches. The lock also needs to have matured for at least a few frames before a switch is even considered:

target_tracker.cpp
const bool cooldownReady = (m_switchCooldownFrames <= 0);
const bool lockMatured = (m_lockFrameCount >= 4);
bool canSwitch = cooldownReady && lockMatured;

This prevents identity bouncing when two targets are at similar distances.

Velocity estimation and predictionh2

When a track goes unmatched, I predict where it should be using its EMA-smoothed velocity:

P_new = P_old + v_old * dt

The velocity EMA has confidence-aware blending. High-confidence detections get more influence on the velocity estimate. Mature tracks (many consecutive matches) use a slightly faster blending factor because they’ve proven stable:

target_tracker.cpp
const float conf = AimbotMath::clamp(detection.confidence, 0.0f, 1.0f);
const float maturity = AimbotMath::clamp(static_cast<float>(track.consecutiveMatches) / 8.0f, 0.0f, 1.0f);
const float dynamicSmoothing = AimbotMath::clamp(smoothing + (1.0f - conf) * 0.20f - maturity * 0.10f, 0.15f, 0.92f);

There’s also a sub-pixel wobble suppression gate. If the detection center moved less than 0.9px from the previous frame, the velocity is forced to zero. Without this, detector quantization noise creates phantom velocity on stationary targets, which makes the lead prediction drift.

Velocity resets on large spatial jumps. If a detection appears far from where the predicted track should be, it’s almost certainly a different target, not the same one teleporting. When this happens, the EMA and Kalman filter states are also reset so the filters don’t try to interpolate across the discontinuity.

Aim control: three modes, a PD controller, and a lot of clampingh2

The controller reads from the tracker with a validated settings snapshot:

aimbot_controller.cpp
UnifiedSettings settingsSnapshot = g_settings;
settingsSnapshot.validate();

Shared settings can change mid-run from the ImGui menu on the render thread. A snapshot + validate gives each aim iteration a coherent, bounds-checked parameter set. Without this, you get undefined behavior from reading a struct that’s being partially written on another thread.

Three aim modes:

ModeBehaviorBest for
SmoothPD controller with convergence dampingGeneral use, natural feel
SnapGain-capped proportional (never exceeds 82% of distance per frame)Fast acquisition
MagneticDistance-proportional pull (gentle near, stronger far)Precision, minimal overshoot

All three modes enforce an invariant: the movement vector can never point away from the target. This sounds obvious but it’s easy to violate with a derivative term. The controller checks this at multiple points in the pipeline:

aimbot_controller.cpp
// Never move away from the target direction
if (outX * dx < 0.0f) outX = 0.0f;
if (outY * dy < 0.0f) outY = 0.0f;

The smooth mode uses a PD controller. I killed the integral term entirely:

u[n] = K_p * e[n] + K_d * (e[n] - e[n-1]) / dt

Integral windup is a real problem here. If the target is briefly occluded, the integral accumulates error during that period. When the target reappears you overshoot badly because the integral is trying to make up for all the “missed” time. PD without integral is more stable for a system where the target disappears unpredictably.

The smooth mode also has convergence damping: when the crosshair is close to the target, the proportional gain is squared and scaled down to a minimum of 20%. This prevents the characteristic oscillation you get from a fast PD controller at small error. Without it, the output bounces back and forth across the target at sub-pixel amplitude, which looks terrible at 60 FPS.

The derivative term has distance-dependent clamping:

aimbot_controller.cpp
const float derivativeClamp = AimbotMath::clamp(distance * 0.18f + 5.0f, 5.0f, 20.0f);
derivativeX = AimbotMath::clamp(derivativeX, -derivativeClamp, derivativeClamp);
derivativeY = AimbotMath::clamp(derivativeY, -derivativeClamp, derivativeClamp);

At close range the clamp is tight so single-frame jitter can’t produce a large correction. At long range it opens up so the derivative can actually contribute to tracking moving targets.

Motion-gated lead predictionh3

The controller applies predictive lead based on the tracker’s velocity estimate, but only when the target is actually moving. There’s a three-part gate:

  1. Distance gate: lead scales from zero at close range to full at long range. No lead at point-blank because you don’t need it.
  2. Confidence gate: lead scales with detection confidence. Low-confidence detections produce noisy velocity, so don’t trust them for prediction.
  3. Motion speed gate: lead only kicks in when the target is actually moving above a minimum speed threshold. This is the critical one, because without it stationary targets drift due to detector quantization noise being fed through the velocity estimator.

Jitter suppression and movement smoothingh3

Small movements when already locked are suppressed with a quadratic ramp:

aimbot_controller.cpp
if (m_isAiming) {
const float moveMag = std::sqrt(moveX * moveX + moveY * moveY);
if (moveMag < 1.5f && moveMag > EPSILON) {
const float jitterScale = moveMag / 1.5f;
moveX *= jitterScale * jitterScale;
moveY *= jitterScale * jitterScale;
}
}

A 0.5px movement becomes 0.5 _ (0.5/1.5)^2 = 0.056px, essentially zero. A 1.4px movement becomes 1.4 _ (1.4/1.5)^2 = 1.22px, nearly unchanged. The quadratic curve gives a smooth transition between “kill this noise” and “let it through.”

Movement is also EMA-blended between frames and direction reversals under a small threshold are halved. On the first frame after touch-down, movement is dampened to prevent the initial acquisition from looking too snappy.

Touch radius clampingh3

The touch position is constrained to a circular region around the configured center:

aimbot_controller.cpp
if (distFromCenterSq > touchRadius * touchRadius) {
const float distFromCenter = std::sqrt(distFromCenterSq);
const float scale = touchRadius / distFromCenter;
m_touchX = touchCenterX + distFromCenterX * scale;
m_touchY = touchCenterY + distFromCenterY * scale;
}

If the accumulated touch position drifts too far from center, it gets projected back onto the circle boundary. This prevents the virtual finger from wandering off-screen during long tracking sequences.

The FOV gating has entry/exit hysteresis:

aimbot_controller.cpp
const float exitFovMultiplier = 1.2f;
const float fovThreshold = m_isAiming
? (settings.fovRadius * exitFovMultiplier)
: settings.fovRadius;

Entry is at the configured FOV. Exit is 20% wider. Without this, a target on the FOV boundary makes the controller flicker on and off every frame.

The control loop in action. Smooth tracking from far to near, with deadzone behavior near center.

Touch injection via uinputh2

This is the rootiest part of the system. The Linux kernel’s uinput driver lets you create a virtual input device that the OS treats identically to real hardware.

The grab + replay is what makes this work transparently. Real user touches still work because the reader thread forwards them. Injected touches are mixed in on a reserved slot so they don’t collide with real finger contacts.

One subtle detail: the application runs in landscape but the device’s touch panel reports in portrait coordinates. The touch helper does a 90-degree rotation with axis inversion:

touch_helper.cpp
// Game X (landscape long axis) -> Device Y (portrait long axis)
long deviceY = gameX * (long)(g_touchDevice.touchYMax - g_touchDevice.touchYMin) / g_displayWidth;
// Game Y (landscape short axis) -> Device X (portrait short axis)
long deviceX = gameY * (long)(g_touchDevice.touchXMax - g_touchDevice.touchXMin) / g_displayHeight;
// Y axis is inverted
finalY = (g_touchDevice.touchYMax - deviceY);
finalX = deviceX + g_touchDevice.touchXMin;

Getting this mapping right took several iterations. The first version sent touch events to the wrong quadrant because I had the Y inversion backwards.

Without a cooldown on injections, rapid successive events queue up inside the kernel and create a phantom input storm that looks like drift. The injection rate is clamped to prevent this.

Zero-allocation hot pathsh2

Android’s garbage collector can pause for 50ms+. At 60 FPS that’s 3 full frames. The entire hot path avoids heap allocation.

Detections and tracks use a fixed-capacity stack-allocated array:

utils/aimbot_types.h
template <typename T, int N>
class FixedArray {
T data[N];
int size = 0;
public:
bool push(const T& v) {
if (size >= N) return false;
data[size++] = v;
return true;
}
void removeAt(int i) {
data[i] = data[size-1]; // swap-remove: O(1)
size--;
}
};

The removeAt swap-remove is O(1) and order doesn’t matter for either detections or tracks at this point in the pipeline. In practice frames rarely have more than 5-10 detections, so the capacity limits are conservative.

The NCNN input mat is pre-allocated and reused every frame. The frame buffer ring is statically sized at startup. There are zero heap allocations in the inference, tracker, controller, and injection path.

Real-time detection overlay running at 60 FPS. Red boxes are CONFIRMED tracks, not raw detections.

Settings: validation before hot-path useh2

All runtime settings live in a UnifiedSettings struct, serialized to disk with a magic number check. The validate() method clamps everything before use:

aimbot_types.h
fovRadius = (fovRadius < 50.0f) ? 50.0f : (fovRadius > 600.0f) ? 600.0f : fovRadius;
if (aimFovRadius > fovRadius) {
aimFovRadius = fovRadius; // semantic constraint, not just a numeric clamp
}

aimFovRadius <= fovRadius is a system contract. The aiming FOV can’t be wider than the detection FOV. If it were, the controller would try to target things that the detection pipeline can’t see, producing phantom movements toward nothing. Treating that as a logic rule rather than a UI constraint keeps the render overlay and targeting math in sync.

The ImGui settings menu shows measured overlay FPS, not assumed. I measure the real frame timing from the native tick cadence with EMA smoothing, rejecting pathological gaps from Android lifecycle events (app backgrounded then foregrounded).

Build configurationh2

The native layer compiles with C++17, -O3, LTO, and hidden symbol visibility. ARM64-specific flags:

CMakeLists.txt
target_compile_options(aimbuddy PRIVATE
-march=armv8-a+fp+simd
-O3
-fvisibility=hidden
)

NCNN is linked statically. Vulkan is linked conditionally based on NDK availability. On a big.LITTLE SoC the core layout matters: pinning inference to the performance core gives the most consistent timing and the highest single-thread throughput, while the render thread on a mid-tier core is fast enough for ImGui + overlay drawing without stealing cycles from inference.

Measured performanceh2

MetricValue
Average inference~7ms
P99 inference~12ms
End-to-end latency~15ms
Sustained framerate60 FPS
Memory footprint~80 MB

Inference is the bottleneck. Tracking, control, injection, and rendering are rounding error by comparison. Thermal throttling pushes inference toward 12-15ms sustained, and the adaptive crop kicks in to manage it. Under sustained thermal load the crop automatically shrinks and inference stays within budget.

Things I’d changeh2

The ring buffer capacity is probably double what’s needed. The drain-to-latest behavior means you almost never have more than 2-3 buffered frames in practice. I sized it conservatively and it works, but it wastes memory.

The tracker’s O(n²) matching works for 5-10 detections per frame. For a crowded scene with 50+ detections it’d start to hurt. KD-tree spatial indexing would fix that but I never hit the problem so I never bothered.

The landscape-to-portrait coordinate rotation in touch_helper.cpp is hardcoded. It works for my test device but would need a proper orientation detection system for portability. Right now if you run it on a device with different axis mapping, the touch injection sends events to the wrong quadrant.

Killing the integral term was pragmatic. The tracker already has optional Kalman filtering for position smoothing, so combining a Kalman-filtered aim point with a full PID controller might give the best of both worlds.

NOTE

This project was built for educational and research purposes only.

Further readingh2

Author: 1337XCode
Post: AimBuddy: Building a 60 FPS on-device tracking and touch injection system
License: CC BY-NC 4.0