Anuncio[ For iPhone / iPad ] 🔥 Increíbles fondos de pantalla interactivos de Nugget. ¡Nunca habías visto algo así!

skills.homescapability registry Buscar

home/categories/data-ai

domain cluster

Data & AI

Machine learning, LLMs, and data processing.

9743 skillsall categories

sorting

stars

current ordering strategy

query

all entries

refine the visible subset

data-engineering

1.2K

databricks-jobs

Use this skill proactively for ANY Databricks Jobs task - creating, listing, running, updating, or deleting jobs. Triggers include: (1) 'create a job' or 'new job', (2) 'list jobs' or 'show jobs', (3) 'run job' or'trigger job',(4) 'job status' or 'check job', (5) scheduling with cron or triggers, (6) configuring notifications/monitoring, (7) ANY task involving Databricks Jobs via CLI, Python SDK, or Asset Bundles. ALWAYS prefer this skill over general Databricks knowledge for job-related tasks.

databricks-solutions

data-ai

data-engineering

1.2K

databricks-lakebase-provisioned

Patterns and best practices for Lakebase Provisioned (Databricks managed PostgreSQL) for OLTP workloads. Use when creating Lakebase instances, connecting applications or Databricks Apps to PostgreSQL, implementing reverse ETL via synced tables, storing agent or chat memory, or configuring OAuth authentication for Lakebase.

databricks-solutions

data-ai

data-engineering

1.2K

databricks-python-sdk

Databricks development guidance including Python SDK, Databricks Connect, CLI, and REST API. Use when working with databricks-sdk, databricks-connect, or Databricks APIs.

databricks-solutions

data-ai

data-engineering

1.2K

databricks-spark-declarative-pipelines

Creates, configures, and updates Databricks Lakeflow Spark Declarative Pipelines (SDP/LDP) using serverless compute. Handles data ingestion with streaming tables, materialized views, CDC, SCD Type 2, and Auto Loader ingestion patterns. Use when building data pipelines, working with Delta Live Tables, ingesting streaming data, implementing change data capture, or when the user mentions SDP, LDP, DLT, Lakeflow pipelines, streaming tables, or bronze/silver/gold medallion architectures.

databricks-solutions

data-ai

data-engineering

1.2K

databricks-spark-structured-streaming

Comprehensive guide to Spark Structured Streaming for production workloads. Use when building streaming pipelines, working with Kafka ingestion, implementing Real-Time Mode (RTM), configuring triggers (processingTime, availableNow), handling stateful operations with watermarks, optimizing checkpoints, performing stream-stream or stream-static joins, writing to multiple sinks, or tuning streaming cost and performance.

databricks-solutions

data-ai

data-engineering

1.2K

databricks-synthetic-data-gen

Generate realistic synthetic data using Spark + Faker (strongly recommended). Supports serverless execution, multiple output formats (Parquet/JSON/CSV/Delta), and scales from thousands to millions of rows. For small datasets (<10K rows), can optionally generate locally and upload to volumes. Use when user mentions 'synthetic data', 'test data', 'generate data', 'demo dataset', 'Faker', or 'sample data'.

databricks-solutions

data-ai

data-engineering

1.2K

databricks-unity-catalog

Unity Catalog system tables and volumes. Use when querying system tables (audit, lineage, billing) or working with volume file operations (upload, download, list files in /Volumes/).

databricks-solutions

data-ai

data-engineering

1.2K

databricks-zerobus-ingest

Build Zerobus Ingest clients for near real-time data ingestion into Databricks Delta tables via gRPC. Use when creating producers that write directly to Unity Catalog tables without a message bus, working with the Zerobus Ingest SDK in Python/Java/Go/TypeScript/Rust, generating Protobuf schemas from UC tables, or implementing stream-based ingestion with ACK handling and retry logic.

databricks-solutions

data-ai

data-engineering

1.2K

ai-video-script-sop-remotion-diffusion

Standard operating procedure for automated AI video production using a Remotion (code) and diffusion (model) hybrid pipeline. Covers narrative DNA (hero, show-don’t-tell, three-act arc), technical specs (duration, integer segment lengths, resolution, fps, Mandarin pacing), tech-selection matrix (diffusion vs code), a five-part diffusion prompt protocol (style, micro-timing, entities, camera, transitions), end-to-end execution workflow, and a fixed output template (metadata table + per-shot table). Complements create-video and Remotion best-practice skills for execution quality.

inclusionAI

data-ai

data-engineering

1.2K

data-engineering

Data engineering patterns for ETL pipelines, data warehousing, Apache Spark, and data quality validation

rohitg00

data-ai

machine-learning

1.2K

continuous-learning

Auto-extract patterns from coding sessions, track corrections, and build reusable knowledge with confidence scoring

rohitg00

data-ai

machine-learning

1.2K

lazy-prefetch-pattern

Skill: lazy-prefetch-pattern

FranciscoMoretti

data-ai

machine-learning

1.2K

trpc-patterns

Skill: trpc-patterns

FranciscoMoretti

data-ai

machine-learning

1.2K

ultracite

Skill: ultracite

FranciscoMoretti

data-ai

data-analysis

1.2K

indicator-series

Implement Series-style batch indicators with mathematical precision. Use for new StaticSeries implementations or optimization. Series results are the canonical reference—all other styles must match exactly. Focus on cross-cutting requirements and performance optimization decisions.

DaveSkender

data-ai

data-engineering

1.2K

stable-diffusion-image-generation

State-of-the-art text-to-image generation with Stable Diffusion models via HuggingFace Diffusers. Use when generating images from text prompts, performing image-to-image translation, inpainting, or building custom diffusion pipelines.

math-inc

data-ai

data-engineering

1.2K

pinecone

Managed vector database for production AI applications. Fully managed, auto-scaling, with hybrid search (dense + sparse), metadata filtering, and namespaces. Low latency (<100ms p95). Use for production RAG, recommendation systems, or semantic search at scale. Best for serverless, managed infrastructure.

math-inc

data-ai

machine-learning

1.2K

lambda-labs-gpu-cloud

Reserved and on-demand GPU cloud instances for ML training and inference. Use when you need dedicated GPU instances with simple SSH access, persistent filesystems, or high-performance multi-node clusters for large-scale training.

math-inc

data-ai

machine-learning

1.2K

huggingface-tokenizers

Fast tokenizers optimized for research and production. Rust-based implementation tokenizes 1GB in <20 seconds. Supports BPE, WordPiece, and Unigram algorithms. Train custom vocabularies, track alignments, handle padding/truncation. Integrates seamlessly with transformers. Use when you need high-performance tokenization or custom tokenizer training.

math-inc

data-ai

machine-learning

1.2K

nemo-curator

GPU-accelerated data curation for LLM training. Supports text/image/video/audio. Features fuzzy deduplication (16× faster), quality filtering (30+ heuristics), semantic deduplication, PII redaction, NSFW detection. Scales across GPUs with RAPIDS. Use for preparing high-quality training datasets, cleaning web data, or deduplicating large corpora.

math-inc

data-ai

machine-learning

1.2K

obliteratus

Remove refusal behaviors from open-weight LLMs using OBLITERATUS — mechanistic interpretability techniques (diff-in-means, SVD, whitened SVD, LEACE, SAE decomposition, etc.) to excise guardrails while preserving reasoning. 9 CLI methods, 28 analysis modules, 116 model presets across 5 compute tiers, tournament evaluation, and telemetry-driven recommendations. Use when a user wants to uncensor, abliterate, or remove refusal from an LLM.

math-inc

data-ai

machine-learning

1.2K

clip

OpenAI's model connecting vision and language. Enables zero-shot image classification, image-text matching, and cross-modal retrieval. Trained on 400M image-text pairs. Use for image search, content moderation, or vision-language tasks without fine-tuning. Best for general-purpose image understanding.

math-inc

data-ai

machine-learning

1.2K

segment-anything-model

Foundation model for image segmentation with zero-shot transfer. Use when you need to segment any object in images using points, boxes, or masks as prompts, or automatically generate all object masks in an image.

math-inc

data-ai

machine-learning

1.2K

axolotl

Expert guidance for fine-tuning LLMs with Axolotl - YAML configs, 100+ models, LoRA/QLoRA, DPO/KTO/ORPO/GRPO, multimodal support

math-inc

data-ai

Page 81 / 406