Add ios-application-dev and shader-dev skills

Power by MiniMax
2026-03-17 17:25:23 +08:00
parent cfddb22ea3
commit 1706804893
91 changed files with 41607 additions and 4 deletions
--- a/skills/ios-application-dev/SKILL.md
+++ b/skills/ios-application-dev/SKILL.md
@@ -0,0 +1,178 @@
+---
+name: ios-application-dev
+description: |
+  iOS application development guide covering UIKit, SnapKit, and SwiftUI. Includes touch targets, safe areas, navigation patterns, Dynamic Type, Dark Mode, accessibility, collection views, common UI components, and SwiftUI design guidelines. For detailed references on specific topics, see the reference files.
+  Use when: developing iOS apps, implementing UI, reviewing iOS code, working with UIKit/SnapKit/SwiftUI layouts, building iPhone interfaces, Swift mobile development, Apple HIG compliance, iOS accessibility implementation.
+license: MIT
+metadata:
+  author: MiniMax-OpenSource
+  version: "1.0.0"
+  category: mobile
+  sources:
+    - Apple Human Interface Guidelines
+    - Apple Developer Documentation
+---
+
+# iOS Application Development Guide
+
+A practical guide for building iOS applications using UIKit, SnapKit, and SwiftUI. Focuses on proven patterns and Apple platform conventions.
+
+## Quick Reference
+
+### UIKit
+
+| Purpose | Component |
+|---------|-----------|
+| Main sections | `UITabBarController` |
+| Drill-down | `UINavigationController` |
+| Focused task | Sheet presentation |
+| Critical choice | `UIAlertController` |
+| Secondary actions | `UIContextMenuInteraction` |
+| List content | `UICollectionView` + `DiffableDataSource` |
+| Sectioned list | `DiffableDataSource` + `headerMode` |
+| Grid layout | `UICollectionViewCompositionalLayout` |
+| Search | `UISearchController` |
+| Share | `UIActivityViewController` |
+| Location (once) | `CLLocationButton` |
+| Feedback | `UIImpactFeedbackGenerator` |
+| Linear layout | `UIStackView` |
+| Custom shapes | `CAShapeLayer` + `UIBezierPath` |
+| Gradients | `CAGradientLayer` |
+| Modern buttons | `UIButton.Configuration` |
+| Dynamic text | `UIFontMetrics` + `preferredFont` |
+| Dark mode | Semantic colors (`.systemBackground`, `.label`) |
+| Permissions | Contextual request + `AVCaptureDevice` |
+| Lifecycle | `UIApplication` notifications |
+
+### SwiftUI
+
+| Purpose | Component |
+|---------|-----------|
+| Main sections | `TabView` + `tabItem` |
+| Drill-down | `NavigationStack` + `NavigationPath` |
+| Focused task | `.sheet` + `presentationDetents` |
+| Critical choice | `.alert` |
+| Secondary actions | `.contextMenu` |
+| List content | `List` + `.insetGrouped` |
+| Search | `.searchable` |
+| Share | `ShareLink` |
+| Location (once) | `LocationButton` |
+| Feedback | `UIImpactFeedbackGenerator` |
+| Progress (known) | `ProgressView(value:total:)` |
+| Progress (unknown) | `ProgressView()` |
+| Dynamic text | `.font(.body)` semantic styles |
+| Dark mode | `.primary`, `.secondary`, `Color(.systemBackground)` |
+| Scene lifecycle | `@Environment(\.scenePhase)` |
+| Reduce motion | `@Environment(\.accessibilityReduceMotion)` |
+| Dynamic type | `@Environment(\.dynamicTypeSize)` |
+
+## Core Principles
+
+### Layout
+- Touch targets >= 44pt
+- Content within safe areas (SwiftUI respects by default, use `.ignoresSafeArea()` only for backgrounds)
+- Use 8pt spacing increments (8, 16, 24, 32, 40, 48)
+- Primary actions in thumb zone
+- Support all screen sizes (iPhone SE 375pt to Pro Max 430pt)
+
+### Typography
+- UIKit: `preferredFont(forTextStyle:)` + `adjustsFontForContentSizeCategory = true`
+- SwiftUI: semantic text styles `.headline`, `.body`, `.caption`
+- Custom fonts: `UIFontMetrics` / `Font.custom(_:size:relativeTo:)`
+- Adapt layout at accessibility sizes (minimum 11pt)
+
+### Colors
+- Use semantic system colors (`.systemBackground`, `.label`, `.primary`, `.secondary`)
+- Asset catalog variants for custom colors (Any/Dark Appearance)
+- No color-only information (pair with icons or text)
+- Contrast ratio >= 4.5:1 for normal text, 3:1 for large text
+
+### Accessibility
+- Labels on icon buttons (`.accessibilityLabel()`)
+- Reduce motion respected (`@Environment(\.accessibilityReduceMotion)`)
+- Logical reading order (`.accessibilitySortPriority()`)
+- Support Bold Text, Increase Contrast preferences
+
+### Navigation
+- Tab bar (3-5 sections) stays visible during navigation
+- Back swipe works (never override system gestures)
+- State preserved across tabs (`@SceneStorage`, `@State`)
+- Never use hamburger menus
+
+### Privacy & Permissions
+- Request permissions in context (not at launch)
+- Custom explanation before system dialog
+- Support Sign in with Apple
+- Respect ATT denial
+
+## Checklist
+
+### Layout
+- [ ] Touch targets >= 44pt
+- [ ] Content within safe areas
+- [ ] Primary actions in thumb zone (bottom half)
+- [ ] Flexible widths for all screen sizes (SE to Pro Max)
+- [ ] Spacing aligns to 8pt grid
+
+### Typography
+- [ ] Semantic text styles or UIFontMetrics-scaled custom fonts
+- [ ] Dynamic Type supported up to accessibility sizes
+- [ ] Layouts reflow at large sizes (no truncation)
+- [ ] Minimum text size 11pt
+
+### Colors
+- [ ] Semantic system colors or light/dark asset variants
+- [ ] Dark Mode is intentional (not just inverted)
+- [ ] No color-only information
+- [ ] Text contrast >= 4.5:1 (normal) / 3:1 (large)
+- [ ] Single accent color for interactive elements
+
+### Accessibility
+- [ ] VoiceOver labels on all interactive elements
+- [ ] Logical reading order
+- [ ] Bold Text preference respected
+- [ ] Reduce Motion disables decorative animations
+- [ ] All gestures have alternative access paths
+
+### Navigation
+- [ ] Tab bar for 3-5 top-level sections
+- [ ] No hamburger/drawer menus
+- [ ] Tab bar stays visible during navigation
+- [ ] Back swipe works throughout
+- [ ] State preserved across tabs
+
+### Components
+- [ ] Alerts for critical decisions only
+- [ ] Sheets have dismiss path (button and/or swipe)
+- [ ] List rows >= 44pt tall
+- [ ] Destructive buttons use `.destructive` role
+
+### Privacy
+- [ ] Permissions requested in context (not at launch)
+- [ ] Custom explanation before system permission dialog
+- [ ] Sign in with Apple offered with other providers
+- [ ] Basic features usable without account
+- [ ] ATT prompt shown if tracking, denial respected
+
+### System Integration
+- [ ] App handles interruptions gracefully (calls, background, Siri)
+- [ ] App content indexed for Spotlight
+- [ ] Share Sheet available for shareable content
+
+## References
+
+| Topic | Reference |
+|-------|-----------|
+| Touch Targets, Safe Area, CollectionView | [Layout System](references/layout-system.md) |
+| TabBar, NavigationController, Modal | [Navigation Patterns](references/navigation-patterns.md) |
+| StackView, Button, Alert, Search, ContextMenu | [UIKit Components](references/uikit-components.md) |
+| CAShapeLayer, CAGradientLayer, Core Animation | [Graphics & Animation](references/graphics-animation.md) |
+| Dynamic Type, Semantic Colors, VoiceOver | [Accessibility](references/accessibility.md) |
+| Permissions, Location, Share, Lifecycle, Haptics | [System Integration](references/system-integration.md) |
+| Metal Shaders & GPU | [Metal Shader Reference](references/metal-shader.md) |
+| SwiftUI HIG, Components, Patterns, Anti-Patterns | [SwiftUI Design Guidelines](references/swiftui-design-guidelines.md) |
+| Optionals, Protocols, async/await, ARC, Error Handling | [Swift Coding Standards](references/swift-coding-standards.md) |
+
+---
+
+Swift, SwiftUI, UIKit, SF Symbols, Metal, and Apple are trademarks of Apple Inc. SnapKit is a trademark of its respective owners.
--- a/skills/ios-application-dev/references/accessibility.md
+++ b/skills/ios-application-dev/references/accessibility.md
@@ -0,0 +1,259 @@
+# Accessibility
+
+iOS accessibility guide covering Dynamic Type, semantic colors, VoiceOver, and motion adaptation.
+
+## Dynamic Type
+
+### Using System Fonts
+
+```swift
+private func setupLabels() {
+    let titleLabel = UILabel()
+    titleLabel.font = .preferredFont(forTextStyle: .headline)
+    titleLabel.adjustsFontForContentSizeCategory = true
+    
+    let bodyLabel = UILabel()
+    bodyLabel.font = .preferredFont(forTextStyle: .body)
+    bodyLabel.adjustsFontForContentSizeCategory = true
+    bodyLabel.numberOfLines = 0
+}
+```
+
+### Custom Font Scaling
+
+```swift
+extension UIFont {
+    static func scaled(_ name: String, size: CGFloat, for style: TextStyle) -> UIFont {
+        guard let font = UIFont(name: name, size: size) else {
+            return .preferredFont(forTextStyle: style)
+        }
+        return UIFontMetrics(forTextStyle: style).scaledFont(for: font)
+    }
+}
+
+let customFont = UIFont.scaled("Avenir-Medium", size: 16, for: .body)
+```
+
+### Text Style Reference
+
+| Style | Default Size | Usage |
+|-------|--------------|-------|
+| `.largeTitle` | 34pt | Screen titles |
+| `.title1` | 28pt | Primary headings |
+| `.title2` | 22pt | Secondary headings |
+| `.title3` | 20pt | Tertiary headings |
+| `.headline` | 17pt (semibold) | Important information |
+| `.body` | 17pt | Body text |
+| `.callout` | 16pt | Explanatory text |
+| `.subheadline` | 15pt | Subtitles |
+| `.footnote` | 13pt | Footnotes |
+| `.caption1` | 12pt | Labels |
+| `.caption2` | 11pt | Small labels |
+
+### Adapting Layout for Large Text
+
+```swift
+override func traitCollectionDidChange(_ previous: UITraitCollection?) {
+    super.traitCollectionDidChange(previous)
+    
+    let isLargeText = traitCollection.preferredContentSizeCategory.isAccessibilityCategory
+    contentStack.axis = isLargeText ? .vertical : .horizontal
+    
+    if isLargeText {
+        iconImageView.snp.remakeConstraints { make in
+            make.size.equalTo(64)
+        }
+    } else {
+        iconImageView.snp.remakeConstraints { make in
+            make.size.equalTo(44)
+        }
+    }
+}
+```
+
+## Semantic Colors
+
+Use system semantic colors for automatic Dark Mode adaptation:
+
+```swift
+view.backgroundColor = .systemBackground
+containerView.backgroundColor = .secondarySystemBackground
+cardView.backgroundColor = .tertiarySystemBackground
+
+titleLabel.textColor = .label
+subtitleLabel.textColor = .secondaryLabel
+hintLabel.textColor = .tertiaryLabel
+placeholderLabel.textColor = .placeholderText
+
+separatorView.backgroundColor = .separator
+borderView.layer.borderColor = UIColor.separator.cgColor
+```
+
+### System Color Reference
+
+| Color | Light Mode | Dark Mode | Usage |
+|-------|------------|-----------|-------|
+| `.systemBackground` | White | Black | Main background |
+| `.secondarySystemBackground` | Light gray | Dark gray | Card/grouped background |
+| `.tertiarySystemBackground` | Lighter gray | Medium gray | Nested content background |
+| `.label` | Black | White | Primary text |
+| `.secondaryLabel` | Gray | Light gray | Secondary text |
+| `.tertiaryLabel` | Light gray | Dark gray | Auxiliary text |
+
+### Custom Color Adaptation
+
+```swift
+extension UIColor {
+    static let customAccent = UIColor { traitCollection in
+        switch traitCollection.userInterfaceStyle {
+        case .dark:
+            return UIColor(red: 0.4, green: 0.8, blue: 1.0, alpha: 1.0)
+        default:
+            return UIColor(red: 0.0, green: 0.5, blue: 0.8, alpha: 1.0)
+        }
+    }
+}
+```
+
+## VoiceOver
+
+### Basic Labels
+
+```swift
+let cartButton = UIButton(type: .system)
+cartButton.setImage(UIImage(systemName: "cart.badge.plus"), for: .normal)
+cartButton.accessibilityLabel = "Add to cart"
+
+let ratingView = UIView()
+ratingView.accessibilityLabel = "Rating: 4 out of 5 stars"
+
+let closeButton = UIButton()
+closeButton.accessibilityLabel = "Close"
+closeButton.accessibilityHint = "Dismisses this dialog"
+```
+
+### Custom Accessibility
+
+```swift
+class ProductCell: UICollectionViewCell {
+    override var accessibilityLabel: String? {
+        get {
+            return "\(product.name), \(product.price), \(product.isAvailable ? "In stock" : "Out of stock")"
+        }
+        set {}
+    }
+    
+    override var accessibilityTraits: UIAccessibilityTraits {
+        get {
+            var traits: UIAccessibilityTraits = .button
+            if product.isSelected {
+                traits.insert(.selected)
+            }
+            return traits
+        }
+        set {}
+    }
+}
+```
+
+### Accessibility Container
+
+```swift
+class CustomContainerView: UIView {
+    override var isAccessibilityElement: Bool {
+        get { false }
+        set {}
+    }
+    
+    override var accessibilityElements: [Any]? {
+        get {
+            return [titleLabel, actionButton, detailLabel]
+        }
+        set {}
+    }
+}
+```
+
+### VoiceOver Notifications
+
+```swift
+func didLoadContent() {
+    UIAccessibility.post(notification: .screenChanged, argument: headerLabel)
+}
+
+func didUpdateStatus() {
+    UIAccessibility.post(notification: .announcement, argument: "Download complete")
+}
+```
+
+## Reduce Motion
+
+```swift
+func animateTransition() {
+    let duration: TimeInterval = UIAccessibility.isReduceMotionEnabled ? 0 : 0.3
+    UIView.animate(withDuration: duration) {
+        self.cardView.alpha = 1
+    }
+}
+
+func showPopup() {
+    if UIAccessibility.isReduceMotionEnabled {
+        popupView.alpha = 1
+    } else {
+        popupView.transform = CGAffineTransform(scaleX: 0.8, y: 0.8)
+        popupView.alpha = 0
+        UIView.animate(withDuration: 0.3, delay: 0, usingSpringWithDamping: 0.7, initialSpringVelocity: 0) {
+            self.popupView.transform = .identity
+            self.popupView.alpha = 1
+        }
+    }
+}
+```
+
+### Observing Setting Changes
+
+```swift
+NotificationCenter.default.addObserver(
+    self,
+    selector: #selector(reduceMotionChanged),
+    name: UIAccessibility.reduceMotionStatusDidChangeNotification,
+    object: nil
+)
+
+@objc func reduceMotionChanged() {
+    updateAnimationSettings()
+}
+```
+
+## Accessibility Checklist
+
+### Basic Requirements
+- [ ] All icon buttons have `accessibilityLabel`
+- [ ] Custom controls have correct `accessibilityTraits`
+- [ ] Images have `accessibilityLabel` or marked as decorative
+- [ ] Forms have clear error messages
+
+### Dynamic Type
+- [ ] Using `preferredFont(forTextStyle:)`
+- [ ] Set `adjustsFontForContentSizeCategory = true`
+- [ ] Layout adapts at accessibility sizes
+- [ ] Text is not truncated
+
+### Color Contrast
+- [ ] Body text contrast >= 4.5:1
+- [ ] Large text contrast >= 3:1
+- [ ] Information not conveyed by color alone
+
+### Motion
+- [ ] Respect Reduce Motion setting
+- [ ] No flashing or rapid animation
+- [ ] Auto-playing animations can be paused
+
+### Interaction
+- [ ] Touch targets >= 44x44pt
+- [ ] Gestures have alternative actions
+- [ ] Timeouts can be extended
+
+---
+
+*UIKit, VoiceOver, Dynamic Type, and Apple are trademarks of Apple Inc.*
--- a/skills/ios-application-dev/references/graphics-animation.md
+++ b/skills/ios-application-dev/references/graphics-animation.md
@@ -0,0 +1,350 @@
+# Graphics & Animation
+
+iOS graphics and animation guide covering CAShapeLayer, CAGradientLayer, UIBezierPath, and Core Animation.
+
+## CAShapeLayer
+
+For custom shapes, paths, and animations:
+
+```swift
+class CircularProgressView: UIView {
+    private let trackLayer = CAShapeLayer()
+    private let progressLayer = CAShapeLayer()
+    
+    var progress: CGFloat = 0 {
+        didSet { updateProgress() }
+    }
+    
+    override init(frame: CGRect) {
+        super.init(frame: frame)
+        setupLayers()
+    }
+    
+    required init?(coder: NSCoder) {
+        super.init(coder: coder)
+        setupLayers()
+    }
+    
+    private func setupLayers() {
+        let center = CGPoint(x: bounds.midX, y: bounds.midY)
+        let radius = min(bounds.width, bounds.height) / 2 - 10
+        let startAngle = -CGFloat.pi / 2
+        let endAngle = startAngle + 2 * CGFloat.pi
+        
+        let circularPath = UIBezierPath(
+            arcCenter: center,
+            radius: radius,
+            startAngle: startAngle,
+            endAngle: endAngle,
+            clockwise: true
+        )
+        
+        trackLayer.path = circularPath.cgPath
+        trackLayer.strokeColor = UIColor.systemGray5.cgColor
+        trackLayer.fillColor = UIColor.clear.cgColor
+        trackLayer.lineWidth = 10
+        trackLayer.lineCap = .round
+        layer.addSublayer(trackLayer)
+        
+        progressLayer.path = circularPath.cgPath
+        progressLayer.strokeColor = UIColor.systemBlue.cgColor
+        progressLayer.fillColor = UIColor.clear.cgColor
+        progressLayer.lineWidth = 10
+        progressLayer.lineCap = .round
+        progressLayer.strokeEnd = 0
+        layer.addSublayer(progressLayer)
+    }
+    
+    override func layoutSubviews() {
+        super.layoutSubviews()
+        setupLayers()
+    }
+    
+    private func updateProgress() {
+        progressLayer.strokeEnd = progress
+    }
+    
+    func animateProgress(to value: CGFloat, duration: TimeInterval = 0.5) {
+        let animation = CABasicAnimation(keyPath: "strokeEnd")
+        animation.fromValue = progressLayer.strokeEnd
+        animation.toValue = value
+        animation.duration = duration
+        animation.timingFunction = CAMediaTimingFunction(name: .easeInEaseOut)
+        progressLayer.strokeEnd = value
+        progressLayer.add(animation, forKey: "progressAnimation")
+    }
+}
+```
+
+## UIBezierPath
+
+### Common Shapes
+
+```swift
+let roundedRect = UIBezierPath(
+    roundedRect: bounds,
+    cornerRadius: 12
+)
+
+let customCorners = UIBezierPath(
+    roundedRect: bounds,
+    byRoundingCorners: [.topLeft, .topRight],
+    cornerRadii: CGSize(width: 16, height: 16)
+)
+
+let triangle = UIBezierPath()
+triangle.move(to: CGPoint(x: bounds.midX, y: 0))
+triangle.addLine(to: CGPoint(x: bounds.maxX, y: bounds.maxY))
+triangle.addLine(to: CGPoint(x: 0, y: bounds.maxY))
+triangle.close()
+
+let circle = UIBezierPath(
+    arcCenter: CGPoint(x: bounds.midX, y: bounds.midY),
+    radius: bounds.width / 2,
+    startAngle: 0,
+    endAngle: .pi * 2,
+    clockwise: true
+)
+```
+
+### Custom Paths
+
+```swift
+let customPath = UIBezierPath()
+customPath.move(to: CGPoint(x: 0, y: bounds.height))
+customPath.addCurve(
+    to: CGPoint(x: bounds.width, y: 0),
+    controlPoint1: CGPoint(x: bounds.width * 0.3, y: bounds.height),
+    controlPoint2: CGPoint(x: bounds.width * 0.7, y: 0)
+)
+```
+
+## CAGradientLayer
+
+### Linear Gradient Button
+
+```swift
+class GradientButton: UIButton {
+    private let gradientLayer = CAGradientLayer()
+    
+    override init(frame: CGRect) {
+        super.init(frame: frame)
+        setupGradient()
+    }
+    
+    required init?(coder: NSCoder) {
+        super.init(coder: coder)
+        setupGradient()
+    }
+    
+    private func setupGradient() {
+        gradientLayer.colors = [
+            UIColor.systemBlue.cgColor,
+            UIColor.systemPurple.cgColor
+        ]
+        gradientLayer.startPoint = CGPoint(x: 0, y: 0.5)
+        gradientLayer.endPoint = CGPoint(x: 1, y: 0.5)
+        gradientLayer.cornerRadius = 12
+        layer.insertSublayer(gradientLayer, at: 0)
+    }
+    
+    override func layoutSubviews() {
+        super.layoutSubviews()
+        gradientLayer.frame = bounds
+    }
+}
+```
+
+### Gradient Background View
+
+```swift
+class GradientBackgroundView: UIView {
+    private let gradientLayer = CAGradientLayer()
+    
+    override init(frame: CGRect) {
+        super.init(frame: frame)
+        setupGradient()
+    }
+    
+    required init?(coder: NSCoder) {
+        super.init(coder: coder)
+        setupGradient()
+    }
+    
+    private func setupGradient() {
+        gradientLayer.colors = [
+            UIColor.systemBackground.cgColor,
+            UIColor.secondarySystemBackground.cgColor
+        ]
+        gradientLayer.locations = [0.0, 1.0]
+        gradientLayer.startPoint = CGPoint(x: 0.5, y: 0)
+        gradientLayer.endPoint = CGPoint(x: 0.5, y: 1)
+        layer.insertSublayer(gradientLayer, at: 0)
+    }
+    
+    override func layoutSubviews() {
+        super.layoutSubviews()
+        gradientLayer.frame = bounds
+    }
+    
+    override func traitCollectionDidChange(_ previousTraitCollection: UITraitCollection?) {
+        super.traitCollectionDidChange(previousTraitCollection)
+        gradientLayer.colors = [
+            UIColor.systemBackground.cgColor,
+            UIColor.secondarySystemBackground.cgColor
+        ]
+    }
+}
+```
+
+### Gradient Types
+
+| Type | Configuration |
+|------|---------------|
+| Linear (horizontal) | `startPoint: (0, 0.5)`, `endPoint: (1, 0.5)` |
+| Linear (vertical) | `startPoint: (0.5, 0)`, `endPoint: (0.5, 1)` |
+| Diagonal | `startPoint: (0, 0)`, `endPoint: (1, 1)` |
+| Radial | Use `CAGradientLayer.type = .radial` |
+
+## Core Animation
+
+### Basic Animation
+
+```swift
+func animateScale() {
+    let animation = CABasicAnimation(keyPath: "transform.scale")
+    animation.fromValue = 1.0
+    animation.toValue = 1.2
+    animation.duration = 0.3
+    animation.autoreverses = true
+    animation.timingFunction = CAMediaTimingFunction(name: .easeInEaseOut)
+    layer.add(animation, forKey: "scaleAnimation")
+}
+
+func animatePosition() {
+    let animation = CABasicAnimation(keyPath: "position")
+    animation.fromValue = layer.position
+    animation.toValue = CGPoint(x: 200, y: 200)
+    animation.duration = 0.5
+    layer.add(animation, forKey: "positionAnimation")
+}
+```
+
+### Keyframe Animation
+
+```swift
+func animateAlongPath() {
+    let path = UIBezierPath()
+    path.move(to: CGPoint(x: 50, y: 50))
+    path.addCurve(
+        to: CGPoint(x: 250, y: 250),
+        controlPoint1: CGPoint(x: 150, y: 50),
+        controlPoint2: CGPoint(x: 50, y: 250)
+    )
+    
+    let animation = CAKeyframeAnimation(keyPath: "position")
+    animation.path = path.cgPath
+    animation.duration = 2.0
+    animation.timingFunction = CAMediaTimingFunction(name: .easeInEaseOut)
+    layer.add(animation, forKey: "pathAnimation")
+}
+```
+
+### Animation Group
+
+```swift
+func animateMultiple() {
+    let scaleAnimation = CABasicAnimation(keyPath: "transform.scale")
+    scaleAnimation.fromValue = 1.0
+    scaleAnimation.toValue = 1.5
+    
+    let opacityAnimation = CABasicAnimation(keyPath: "opacity")
+    opacityAnimation.fromValue = 1.0
+    opacityAnimation.toValue = 0.0
+    
+    let group = CAAnimationGroup()
+    group.animations = [scaleAnimation, opacityAnimation]
+    group.duration = 0.5
+    group.fillMode = .forwards
+    group.isRemovedOnCompletion = false
+    
+    layer.add(group, forKey: "multipleAnimations")
+}
+```
+
+### Spring Animation
+
+```swift
+func springAnimation() {
+    let spring = CASpringAnimation(keyPath: "transform.scale")
+    spring.fromValue = 0.8
+    spring.toValue = 1.0
+    spring.damping = 10
+    spring.stiffness = 100
+    spring.mass = 1
+    spring.initialVelocity = 5
+    spring.duration = spring.settlingDuration
+    layer.add(spring, forKey: "springAnimation")
+}
+```
+
+## UIView Animation
+
+### Basic UIView Animation
+
+```swift
+UIView.animate(withDuration: 0.3) {
+    self.view.alpha = 1.0
+    self.view.transform = .identity
+}
+
+UIView.animate(withDuration: 0.3, delay: 0, options: [.curveEaseInOut]) {
+    self.cardView.frame.origin.y = 100
+} completion: { _ in
+    self.didFinishAnimation()
+}
+```
+
+### Spring Animation
+
+```swift
+UIView.animate(
+    withDuration: 0.6,
+    delay: 0,
+    usingSpringWithDamping: 0.7,
+    initialSpringVelocity: 0.5,
+    options: []
+) {
+    self.popupView.transform = .identity
+}
+```
+
+### Keyframe Animation
+
+```swift
+UIView.animateKeyframes(withDuration: 1.0, delay: 0) {
+    UIView.addKeyframe(withRelativeStartTime: 0, relativeDuration: 0.25) {
+        self.view.transform = CGAffineTransform(scaleX: 1.2, y: 1.2)
+    }
+    UIView.addKeyframe(withRelativeStartTime: 0.25, relativeDuration: 0.25) {
+        self.view.transform = CGAffineTransform(rotationAngle: .pi / 4)
+    }
+    UIView.addKeyframe(withRelativeStartTime: 0.5, relativeDuration: 0.5) {
+        self.view.transform = .identity
+    }
+}
+```
+
+## Timing Functions
+
+| Name | Description |
+|------|-------------|
+| `.linear` | Constant speed |
+| `.easeIn` | Slow start |
+| `.easeOut` | Slow end |
+| `.easeInEaseOut` | Slow start and end |
+| `.default` | System default |
+
+---
+
+*UIKit, Core Animation, and Apple are trademarks of Apple Inc.*
--- a/skills/ios-application-dev/references/layout-system.md
+++ b/skills/ios-application-dev/references/layout-system.md
@@ -0,0 +1,199 @@
+# Layout System
+
+iOS layout system guide covering touch targets, safe areas, UICollectionView, and Compositional Layout.
+
+## Touch Targets
+
+Interactive elements need adequate tap areas. The recommended minimum is 44x44 points.
+
+```swift
+let actionButton = UIButton(type: .system)
+actionButton.setTitle("Submit", for: .normal)
+view.addSubview(actionButton)
+
+actionButton.snp.makeConstraints { make in
+    make.height.greaterThanOrEqualTo(44)
+    make.leading.trailing.equalToSuperview().inset(16)
+    make.bottom.equalTo(view.safeAreaLayoutGuide).offset(-16)
+}
+```
+
+Use 8-point increments for spacing (8, 16, 24, 32, 40, 48) to maintain visual consistency.
+
+## Safe Area
+
+Always constrain content to the safe area to avoid the notch, Dynamic Island, and home indicator.
+
+```swift
+class MainViewController: UIViewController {
+    private let contentStack = UIStackView()
+    
+    override func viewDidLoad() {
+        super.viewDidLoad()
+        view.backgroundColor = .systemBackground
+        
+        contentStack.axis = .vertical
+        contentStack.spacing = 16
+        view.addSubview(contentStack)
+        
+        contentStack.snp.makeConstraints { make in
+            make.top.bottom.equalTo(view.safeAreaLayoutGuide)
+            make.leading.trailing.equalTo(view.safeAreaLayoutGuide).inset(16)
+        }
+    }
+}
+```
+
+## UICollectionView with Diffable Data Source
+
+```swift
+class ItemsViewController: UIViewController {
+    enum Section { case main }
+    
+    private var collectionView: UICollectionView!
+    private var dataSource: UICollectionViewDiffableDataSource<Section, Item>!
+    
+    override func viewDidLoad() {
+        super.viewDidLoad()
+        setupCollectionView()
+        configureDataSource()
+    }
+    
+    private func setupCollectionView() {
+        var config = UICollectionLayoutListConfiguration(appearance: .insetGrouped)
+        config.trailingSwipeActionsConfigurationProvider = { [weak self] indexPath in
+            self?.makeSwipeActions(for: indexPath)
+        }
+        
+        let layout = UICollectionViewCompositionalLayout.list(using: config)
+        collectionView = UICollectionView(frame: .zero, collectionViewLayout: layout)
+        
+        view.addSubview(collectionView)
+        collectionView.snp.makeConstraints { make in
+            make.edges.equalToSuperview()
+        }
+    }
+    
+    private func configureDataSource() {
+        let cellRegistration = UICollectionView.CellRegistration<UICollectionViewListCell, Item> { 
+            cell, indexPath, item in
+            var content = cell.defaultContentConfiguration()
+            content.text = item.title
+            content.secondaryText = item.subtitle
+            cell.contentConfiguration = content
+        }
+        
+        dataSource = UICollectionViewDiffableDataSource(collectionView: collectionView) { 
+            collectionView, indexPath, item in
+            collectionView.dequeueConfiguredReusableCell(
+                using: cellRegistration, for: indexPath, item: item
+            )
+        }
+    }
+    
+    func updateItems(_ items: [Item]) {
+        var snapshot = NSDiffableDataSourceSnapshot<Section, Item>()
+        snapshot.appendSections([.main])
+        snapshot.appendItems(items)
+        dataSource.apply(snapshot)
+    }
+}
+```
+
+## Grid Layout
+
+```swift
+private func createGridLayout() -> UICollectionViewLayout {
+    let itemSize = NSCollectionLayoutSize(
+        widthDimension: .fractionalWidth(1/3),
+        heightDimension: .fractionalHeight(1.0)
+    )
+    let item = NSCollectionLayoutItem(layoutSize: itemSize)
+    item.contentInsets = NSDirectionalEdgeInsets(top: 2, leading: 2, bottom: 2, trailing: 2)
+    
+    let groupSize = NSCollectionLayoutSize(
+        widthDimension: .fractionalWidth(1.0),
+        heightDimension: .fractionalWidth(1/3)
+    )
+    let group = NSCollectionLayoutGroup.horizontal(layoutSize: groupSize, subitems: [item])
+    
+    let section = NSCollectionLayoutSection(group: group)
+    return UICollectionViewCompositionalLayout(section: section)
+}
+```
+
+## Sectioned List with Headers
+
+```swift
+class CategorizedListVC: UIViewController {
+    enum Section: Hashable {
+        case favorites, recent, all
+    }
+    
+    private var dataSource: UICollectionViewDiffableDataSource<Section, Item>!
+    
+    private func setupCollectionView() {
+        var config = UICollectionLayoutListConfiguration(appearance: .insetGrouped)
+        config.headerMode = .supplementary
+        
+        let layout = UICollectionViewCompositionalLayout.list(using: config)
+        collectionView = UICollectionView(frame: .zero, collectionViewLayout: layout)
+    }
+    
+    private func configureDataSource() {
+        let cellRegistration = UICollectionView.CellRegistration<UICollectionViewListCell, Item> { 
+            cell, indexPath, item in
+            var content = cell.defaultContentConfiguration()
+            content.text = item.title
+            cell.contentConfiguration = content
+        }
+        
+        let headerRegistration = UICollectionView.SupplementaryRegistration<UICollectionViewListCell>(
+            elementKind: UICollectionView.elementKindSectionHeader
+        ) { [weak self] header, elementKind, indexPath in
+            guard let section = self?.dataSource.sectionIdentifier(for: indexPath.section) else { return }
+            var content = header.defaultContentConfiguration()
+            content.text = self?.title(for: section)
+            header.contentConfiguration = content
+        }
+        
+        dataSource = UICollectionViewDiffableDataSource(collectionView: collectionView) { 
+            collectionView, indexPath, item in
+            collectionView.dequeueConfiguredReusableCell(using: cellRegistration, for: indexPath, item: item)
+        }
+        
+        dataSource.supplementaryViewProvider = { collectionView, kind, indexPath in
+            collectionView.dequeueConfiguredReusableSupplementary(using: headerRegistration, for: indexPath)
+        }
+    }
+    
+    func applySnapshot(favorites: [Item], recent: [Item], all: [Item]) {
+        var snapshot = NSDiffableDataSourceSnapshot<Section, Item>()
+        if !favorites.isEmpty {
+            snapshot.appendSections([.favorites])
+            snapshot.appendItems(favorites, toSection: .favorites)
+        }
+        if !recent.isEmpty {
+            snapshot.appendSections([.recent])
+            snapshot.appendItems(recent, toSection: .recent)
+        }
+        snapshot.appendSections([.all])
+        snapshot.appendItems(all, toSection: .all)
+        dataSource.apply(snapshot)
+    }
+}
+```
+
+## Spacing Guidelines
+
+| Spacing | Usage |
+|---------|-------|
+| 8pt | Compact element spacing |
+| 16pt | Standard padding |
+| 24pt | Section spacing |
+| 32pt | Large section separation |
+| 48pt | Screen margins (large screens) |
+
+---
+
+*UIKit and Apple are trademarks of Apple Inc. SnapKit is a trademark of its respective owners.*
--- a/skills/ios-application-dev/references/metal-shader.md
+++ b/skills/ios-application-dev/references/metal-shader.md
@@ -0,0 +1,178 @@
+# Metal Shader Reference
+
+Expert reference for Metal shaders, real-time rendering, and Apple's Tile-Based Deferred Rendering (TBDR) architecture.
+
+## Core Principles
+
+**Half precision first → Leverage TBDR → Function constant specialization → Use Intersector API**
+
+### When to Use
+
+- Metal Shading Language (MSL) development
+- Apple GPU optimization (TBDR architecture)
+- PBR rendering pipelines
+- Compute shaders and parallel processing
+- Apple Silicon ray tracing
+- GPU profiling and debugging
+
+### When NOT to Use
+
+- WebGL/GLSL (different architecture)
+- CUDA (NVIDIA only)
+- OpenGL (deprecated on Apple)
+- CPU-side optimization
+
+## Expert vs Novice
+
+| Topic | Novice | Expert |
+|-------|--------|--------|
+| Data types | `float` everywhere | Default `half`, `float` only for position/depth |
+| Branching | Runtime conditionals | Function constants for compile-time elimination |
+| Memory | Everything in device | Know constant/device/threadgroup tradeoffs |
+| Architecture | Treat as desktop GPU | Understand TBDR: tile memory is free, bandwidth is expensive |
+| Ray tracing | intersection queries | intersector API (hardware-aligned) |
+| Debugging | print debugging | GPU capture, shader profiler, occupancy analysis |
+
+## Common Anti-Patterns
+
+| Anti-Pattern | Problem | Solution |
+|--------------|---------|----------|
+| 32-bit floats | Wastes registers, reduces occupancy, doubles bandwidth | Default `half`, `float` only for position/depth |
+| Ignoring TBDR | Not using free tile memory | Use `[[color(n)]]`, memoryless targets |
+| Runtime constant branches | Warp divergence, wastes ALU | Function constants + pipeline specialization |
+| intersection queries | Not hardware-aligned | Use intersector API |
+
+## Metal Evolution
+
+| Era | Key Development |
+|-----|-----------------|
+| Metal 2.x | OpenGL migration, basic compute |
+| Apple Silicon | Unified memory, tile shaders critical |
+| Metal 3 | Mesh shaders, hardware-accelerated ray tracing |
+| Latest | Neural Engine + GPU cooperation, Vision Pro foveated rendering |
+
+**Apple Family 9 Note**: Threadgroup memory less advantageous vs direct device access.
+
+## Shader Types
+
+| Type | Purpose | Key Attributes |
+|------|---------|----------------|
+| Vertex | Vertex transformation | `[[stage_in]]`, `[[buffer(n)]]` |
+| Fragment | Pixel shading | `[[color(n)]]`, `[[texture(n)]]` |
+| Compute/Kernel | General computation | `[[thread_position_in_grid]]` |
+| Tile | TBDR-specific | `[[imageblock]]` |
+| Mesh | Metal 3 geometry | `[[mesh_id]]` |
+
+## Rendering Techniques
+
+| Technique | Description |
+|-----------|-------------|
+| Fullscreen quad | 4 vertex triangle strip, no MVP, post-processing basis |
+| PBR Cook-Torrance | Fresnel Schlick + GGX Distribution + Smith Geometry |
+| Blinn-Phong | Simple specular, half-vector calculation |
+
+## Procedural Generation
+
+| Technique | Use Case |
+|-----------|----------|
+| Hash functions | Pseudo-random basis for noise, random sampling |
+| Voronoi | Cell textures, stones, cracks |
+| Value/Perlin Noise | Continuous random fields |
+| FBM | Multi-octave layering, fractal terrain, clouds |
+| Domain Warping | Coordinate distortion, organic shapes |
+
+## Numerical Techniques
+
+| Technique | Formula |
+|-----------|---------|
+| Central difference gradient | `(f(x+h) - f(x-h)) / (2h)` |
+| Smoothstep | `x * x * (3 - 2 * x)` |
+| SDF operations | `min/max/smooth_min` boolean ops |
+
+## SwiftUI + MTKView Integration
+
+### Architecture Pattern
+
+```
+MetalView (UIViewRepresentable)
+    └── Coordinator = Renderer (MTKViewDelegate)
+            ├── MTLDevice
+            ├── MTLCommandQueue
+            ├── MTLRenderPipelineState
+            └── MTLBuffer (vertices, uniforms)
+```
+
+### Uniform Alignment Rules
+
+| Swift Type | Metal Type | Alignment |
+|------------|------------|-----------|
+| `Float` | `float` | 4 bytes |
+| `SIMD2<Float>` | `float2` | 8 bytes |
+| `SIMD3<Float>` | `float3` | **16 bytes** |
+| `SIMD4<Float>` | `float4` | 16 bytes |
+
+**Key**: `float3` aligns to 16 bytes. Use `MemoryLayout<T>.size` to verify.
+
+## Command Line Tools
+
+| Command | Purpose |
+|---------|---------|
+| `xcrun metal -c shader.metal -o shader.air` | Compile to AIR |
+| `xcrun metallib shader.air -o shader.metallib` | Link to metallib |
+| `xcrun metal shader.metal -o shader.metallib` | One-step compile & link |
+| `xcrun metal -Weverything -c shader.metal` | Syntax check |
+| `xcrun metal-objdump --disassemble shader.metallib` | Disassemble |
+
+## GPU Debugging
+
+### Xcode Workflow
+
+1. **GPU Capture**: ⌘⇧⌥G
+2. **Shader Profiler**: Select draw call → View Shader
+3. **Memory Viewer**: Inspect buffer/texture
+4. **Performance HUD**: Enable in device options
+
+### Key Metrics
+
+| Metric | Healthy Value | Low Value Cause |
+|--------|---------------|-----------------|
+| GPU Occupancy | > 80% | Memory bandwidth bottleneck |
+| ALU Utilization | > 60% | Waiting on memory |
+| Bandwidth | As low as possible | TBDR should minimize store |
+
+### Debug Utility Functions
+
+| Function | Purpose |
+|----------|---------|
+| heatmap | Value visualization (blue→green→red) |
+| debugNaN | NaN/Inf detection (magenta marker) |
+| visualizeDepth | Linearized depth visualization |
+
+## Performance Optimization Checklist
+
+### Data Types
+- [ ] Default `half`, `float` only for position/depth
+
+### Memory Management
+- [ ] Constants in constant address space
+- [ ] Use `.storageModeShared`
+- [ ] Leverage tile memory (TBDR free reads)
+- [ ] Avoid unnecessary render target stores
+
+### Branch Optimization
+- [ ] Function constants to eliminate branches
+- [ ] Fixed loop bounds (GPU unrolling)
+
+### Rendering Tips
+- [ ] Fullscreen quad with 4 vertex triangle strip
+- [ ] Procedural textures to avoid sampling bandwidth
+- [ ] `[[early_fragment_tests]]` for early depth test
+- [ ] `setFragmentBytes` for small data
+
+### Compute Optimization
+- [ ] Vectorize (SIMD)
+- [ ] Reduce register pressure
+
+---
+
+*Metal, Apple Silicon, and Xcode are trademarks of Apple Inc.*
--- a/skills/ios-application-dev/references/navigation-patterns.md
+++ b/skills/ios-application-dev/references/navigation-patterns.md
@@ -0,0 +1,175 @@
+# Navigation Patterns
+
+iOS navigation patterns guide covering Tab navigation, Navigation Controller, and modal presentation.
+
+## Tab-Based Navigation
+
+For apps with 3-5 main sections:
+
+```swift
+class AppTabBarController: UITabBarController {
+    override func viewDidLoad() {
+        super.viewDidLoad()
+        
+        let homeNav = UINavigationController(rootViewController: HomeVC())
+        homeNav.tabBarItem = UITabBarItem(
+            title: "Home",
+            image: UIImage(systemName: "house"),
+            selectedImage: UIImage(systemName: "house.fill")
+        )
+        
+        let searchNav = UINavigationController(rootViewController: SearchVC())
+        searchNav.tabBarItem = UITabBarItem(
+            title: "Search",
+            image: UIImage(systemName: "magnifyingglass"),
+            tag: 1
+        )
+        
+        let profileNav = UINavigationController(rootViewController: ProfileVC())
+        profileNav.tabBarItem = UITabBarItem(
+            title: "Profile",
+            image: UIImage(systemName: "person"),
+            selectedImage: UIImage(systemName: "person.fill")
+        )
+        
+        viewControllers = [homeNav, searchNav, profileNav]
+    }
+}
+```
+
+### Tab Bar Best Practices
+
+| Principle | Description |
+|-----------|-------------|
+| Limit count | Maximum 5 tabs, use More for additional |
+| Always visible | Tab bar stays visible at all navigation levels |
+| State preservation | Preserve navigation state when switching tabs |
+| Icon choice | Use SF Symbols, provide selected/unselected states |
+
+## Navigation Controller
+
+Use large titles for root views:
+
+```swift
+class ListViewController: UIViewController {
+    override func viewDidLoad() {
+        super.viewDidLoad()
+        title = "Items"
+        navigationController?.navigationBar.prefersLargeTitles = true
+        navigationItem.largeTitleDisplayMode = .always
+    }
+    
+    func pushDetail(_ item: Item) {
+        let detail = DetailViewController(item: item)
+        detail.navigationItem.largeTitleDisplayMode = .never
+        navigationController?.pushViewController(detail, animated: true)
+    }
+}
+```
+
+### Navigation Bar Configuration
+
+```swift
+class CustomNavigationController: UINavigationController {
+    override func viewDidLoad() {
+        super.viewDidLoad()
+        
+        let appearance = UINavigationBarAppearance()
+        appearance.configureWithDefaultBackground()
+        
+        navigationBar.standardAppearance = appearance
+        navigationBar.scrollEdgeAppearance = appearance
+        navigationBar.compactAppearance = appearance
+    }
+}
+```
+
+### Navigation Bar Buttons
+
+```swift
+override func viewDidLoad() {
+    super.viewDidLoad()
+    
+    navigationItem.rightBarButtonItem = UIBarButtonItem(
+        image: UIImage(systemName: "plus"),
+        style: .plain,
+        target: self,
+        action: #selector(addItem)
+    )
+    
+    navigationItem.rightBarButtonItems = [
+        UIBarButtonItem(systemItem: .add, primaryAction: UIAction { _ in }),
+        UIBarButtonItem(systemItem: .edit, primaryAction: UIAction { _ in })
+    ]
+}
+```
+
+## Modal Presentation
+
+### Sheet Presentation
+
+```swift
+func presentEditor() {
+    let editorVC = EditorViewController()
+    let nav = UINavigationController(rootViewController: editorVC)
+    
+    editorVC.navigationItem.leftBarButtonItem = UIBarButtonItem(
+        systemItem: .cancel, target: self, action: #selector(dismissEditor)
+    )
+    editorVC.navigationItem.rightBarButtonItem = UIBarButtonItem(
+        systemItem: .done, target: self, action: #selector(saveAndDismiss)
+    )
+    
+    if let sheet = nav.sheetPresentationController {
+        sheet.detents = [.medium(), .large()]
+        sheet.prefersGrabberVisible = true
+        sheet.prefersScrollingExpandsWhenScrolledToEdge = false
+    }
+    
+    present(nav, animated: true)
+}
+```
+
+### Custom Detent (iOS 16+)
+
+```swift
+if let sheet = nav.sheetPresentationController {
+    let customDetent = UISheetPresentationController.Detent.custom { context in
+        return context.maximumDetentValue * 0.4
+    }
+    sheet.detents = [customDetent, .large()]
+}
+```
+
+### Full Screen Presentation
+
+```swift
+func presentFullScreen() {
+    let vc = FullScreenViewController()
+    vc.modalPresentationStyle = .fullScreen
+    vc.modalTransitionStyle = .coverVertical
+    present(vc, animated: true)
+}
+```
+
+## Presentation Styles
+
+| Style | Usage |
+|-------|-------|
+| `.automatic` | System default (usually sheet) |
+| `.pageSheet` | Card-style, parent view visible |
+| `.fullScreen` | Full screen cover |
+| `.overFullScreen` | Full screen with transparent background |
+| `.popover` | iPad popover |
+
+## Navigation Best Practices
+
+1. **Back gesture** - Ensure edge swipe back always works
+2. **State restoration** - Use `UIStateRestoring` to save navigation stack
+3. **Depth limit** - Avoid more than 4-5 navigation levels
+4. **Cancel button** - Modal views must provide a cancel option
+5. **Save confirmation** - Show confirmation dialog for unsaved changes
+
+---
+
+*UIKit, SF Symbols, and Apple are trademarks of Apple Inc.*
--- a/skills/ios-application-dev/references/swift-coding-standards.md
+++ b/skills/ios-application-dev/references/swift-coding-standards.md
@@ -0,0 +1,741 @@
+# Swift Coding Standards
+
+Best practices for writing clean, safe, and idiomatic Swift code following Apple's guidelines and modern Swift conventions.
+
+---
+
+## 1. Optionals and Safety
+
+**Impact:** CRITICAL
+
+Swift's optional system eliminates null pointer exceptions through compile-time safety.
+
+### 1.1 Safe Unwrapping with if let
+
+```swift
+if let name = optionalName {
+    print("Hello, \(name)")
+}
+
+// Multiple bindings
+if let name = userName, let age = userAge, age >= 18 {
+    print("\(name) is an adult")
+}
+```
+
+### 1.2 Guard for Early Exit
+
+Use `guard` to exit early when preconditions aren't met:
+
+```swift
+func processUser(_ user: User?) {
+    guard let user = user else { return }
+    guard !user.name.isEmpty else { return }
+    print(user.name)
+}
+```
+
+### 1.3 Nil Coalescing for Defaults
+
+```swift
+let displayName = name ?? "Anonymous"
+let count = items?.count ?? 0
+```
+
+### 1.4 Optional Chaining
+
+```swift
+let count = user?.profile?.posts?.count
+let uppercased = optionalString?.uppercased()
+```
+
+### 1.5 Optional map/flatMap
+
+```swift
+let uppercasedName = userName.map { $0.uppercased() }
+let userID = userIDString.flatMap { Int($0) }
+```
+
+### 1.6 Never Force Unwrap
+
+Avoid `!` force unwrapping. Use safe alternatives:
+
+| Instead of | Use |
+|------------|-----|
+| `value!` | `if let value = value { }` |
+| `array[0]` (unsafe) | `array.first` |
+| `dictionary["key"]!` | `dictionary["key", default: defaultValue]` |
+
+---
+
+## 2. Naming Conventions
+
+**Impact:** HIGH
+
+### 2.1 Types: PascalCase
+
+```swift
+class UserProfileViewController { }
+struct NetworkRequest { }
+protocol DataSource { }
+enum LoadingState { }
+```
+
+### 2.2 Variables and Functions: camelCase
+
+```swift
+var userName: String
+let maximumRetryCount = 3
+func fetchUserProfile() { }
+```
+
+### 2.3 Boolean Naming
+
+Use `is`, `has`, `should`, `can` prefixes:
+
+```swift
+var isLoading: Bool
+var hasCompletedOnboarding: Bool
+var shouldShowAlert: Bool
+var canEditProfile: Bool
+```
+
+### 2.4 Function Naming
+
+Use verb phrases, read like natural English:
+
+```swift
+// Good - clear actions
+func fetchUsers() async throws -> [User]
+func remove(_ item: Item, at index: Int)
+func makeIterator() -> Iterator
+
+// Avoid - unclear or redundant
+func getUsersData() // "get" is redundant
+func doRemove() // vague
+```
+
+### 2.5 Parameter Labels
+
+First parameter label can be omitted when obvious:
+
+```swift
+func insert(_ element: Element, at index: Int)
+func move(from source: Int, to destination: Int)
+```
+
+---
+
+## 3. Protocol-Oriented Design
+
+**Impact:** HIGH
+
+Swift favors composition over inheritance through protocols.
+
+### 3.1 Define Capabilities Through Protocols
+
+```swift
+protocol DataStore {
+    func save<T: Codable>(_ item: T, key: String) throws
+    func load<T: Codable>(key: String) throws -> T?
+}
+
+protocol Drawable {
+    var color: Color { get set }
+    func draw()
+}
+```
+
+### 3.2 Protocol Extensions for Default Behavior
+
+```swift
+extension Drawable {
+    func draw() {
+        print("Drawing with \(color)")
+    }
+}
+
+extension Collection {
+    func chunked(into size: Int) -> [[Element]] {
+        stride(from: 0, to: count, by: size).map {
+            Array(self[$0..<Swift.min($0 + size, count)])
+        }
+    }
+}
+```
+
+### 3.3 Associated Types for Flexibility
+
+```swift
+protocol Repository {
+    associatedtype Item
+    func fetchAll() async throws -> [Item]
+    func save(_ item: Item) async throws
+}
+
+class UserRepository: Repository {
+    typealias Item = User
+    
+    func fetchAll() async throws -> [User] { /* ... */ }
+    func save(_ item: User) async throws { /* ... */ }
+}
+```
+
+### 3.4 Protocol Composition
+
+```swift
+protocol Named { var name: String { get } }
+protocol Aged { var age: Int { get } }
+
+func greet(_ person: Named & Aged) {
+    print("Hello, \(person.name), age \(person.age)")
+}
+```
+
+---
+
+## 4. Value Types vs Reference Types
+
+**Impact:** HIGH
+
+### 4.1 Prefer Structs (Value Types)
+
+Use structs for simple data models, independent copies:
+
+```swift
+struct User {
+    var name: String
+    var email: String
+}
+
+struct Point {
+    var x: Double
+    var y: Double
+}
+```
+
+### 4.2 Use Classes When Needed
+
+Use classes for shared mutable state, identity matters:
+
+```swift
+class NetworkManager {
+    static let shared = NetworkManager()
+    private init() { }
+}
+
+class FileHandle {
+    // Wrapping system resource
+}
+```
+
+### 4.3 Enums for Finite States
+
+```swift
+enum LoadingState {
+    case idle
+    case loading
+    case success(Data)
+    case failure(Error)
+}
+
+enum Result<Success, Failure: Error> {
+    case success(Success)
+    case failure(Failure)
+}
+```
+
+| Type | Use When |
+|------|----------|
+| `struct` | Data models, coordinates, independent values |
+| `class` | Shared state, identity matters, inheritance needed |
+| `enum` | Finite set of options, state machines |
+
+---
+
+## 5. Memory Management with ARC
+
+**Impact:** CRITICAL
+
+### 5.1 Breaking Retain Cycles with weak
+
+```swift
+class Apartment {
+    weak var tenant: Person?
+}
+
+class Person {
+    var apartment: Apartment?
+}
+```
+
+### 5.2 Closure Capture Lists
+
+```swift
+// Weak capture for optional self
+onComplete = { [weak self] in
+    self?.processResult()
+}
+
+// Capture specific values
+let id = user.id
+fetchData { [id] result in
+    print("Fetched for \(id)")
+}
+```
+
+### 5.3 unowned for Guaranteed Lifetime
+
+Use when reference should never be nil during object lifetime:
+
+```swift
+class CreditCard {
+    unowned let customer: Customer
+    
+    init(customer: Customer) {
+        self.customer = customer
+    }
+}
+```
+
+| Keyword | Use When |
+|---------|----------|
+| `weak` | Reference may become nil |
+| `unowned` | Reference guaranteed to outlive |
+| None | Strong ownership needed |
+
+---
+
+## 6. Error Handling
+
+**Impact:** HIGH
+
+### 6.1 Define Typed Errors
+
+```swift
+enum NetworkError: Error {
+    case invalidURL
+    case noConnection
+    case serverError(statusCode: Int)
+    case decodingFailed(underlying: Error)
+}
+
+enum ValidationError: LocalizedError {
+    case emptyField(name: String)
+    case invalidFormat(field: String, expected: String)
+    
+    var errorDescription: String? {
+        switch self {
+        case .emptyField(let name):
+            return "\(name) cannot be empty"
+        case .invalidFormat(let field, let expected):
+            return "\(field) must be \(expected)"
+        }
+    }
+}
+```
+
+### 6.2 Throwing Functions
+
+```swift
+func fetchUser(id: Int) throws -> User {
+    guard let url = URL(string: "https://api.example.com/users/\(id)") else {
+        throw NetworkError.invalidURL
+    }
+    // ... implementation
+}
+```
+
+### 6.3 Do-Catch Handling
+
+```swift
+do {
+    let user = try fetchUser(id: 123)
+    print(user.name)
+} catch NetworkError.serverError(let code) {
+    print("Server error: \(code)")
+} catch NetworkError.noConnection {
+    print("Check your internet connection")
+} catch {
+    print("Unknown error: \(error)")
+}
+```
+
+### 6.4 try? and try!
+
+```swift
+// try? returns optional (nil on error)
+let user = try? fetchUser(id: 123)
+
+// try! crashes on error - use only when failure is programmer error
+let config = try! loadBundledConfig()
+```
+
+### 6.5 Rethrows
+
+```swift
+func perform<T>(_ operation: () throws -> T) rethrows -> T {
+    return try operation()
+}
+```
+
+---
+
+## 7. Modern Concurrency (async/await)
+
+**Impact:** CRITICAL
+
+### 7.1 Async Functions
+
+```swift
+func fetchUser(id: Int) async throws -> User {
+    guard let url = URL(string: "https://api.example.com/users/\(id)") else {
+        throw NetworkError.invalidURL
+    }
+    let (data, _) = try await URLSession.shared.data(from: url)
+    return try JSONDecoder().decode(User.self, from: data)
+}
+
+// Calling async functions
+Task {
+    do {
+        let user = try await fetchUser(id: 123)
+        print(user.name)
+    } catch {
+        print("Failed: \(error)")
+    }
+}
+```
+
+### 7.2 Parallel Execution with TaskGroup
+
+```swift
+func fetchAllUsers(ids: [Int]) async throws -> [User] {
+    try await withThrowingTaskGroup(of: User.self) { group in
+        for id in ids {
+            group.addTask {
+                try await fetchUser(id: id)
+            }
+        }
+        return try await group.reduce(into: []) { $0.append($1) }
+    }
+}
+```
+
+### 7.3 async let for Concurrent Bindings
+
+```swift
+async let user = fetchUser(id: 1)
+async let posts = fetchPosts(userId: 1)
+async let followers = fetchFollowers(userId: 1)
+
+let profile = try await ProfileData(
+    user: user,
+    posts: posts,
+    followers: followers
+)
+```
+
+### 7.4 Actors for Thread-Safe State
+
+```swift
+actor BankAccount {
+    private var balance: Double = 0
+    
+    func deposit(_ amount: Double) {
+        balance += amount
+    }
+    
+    func withdraw(_ amount: Double) throws {
+        guard balance >= amount else {
+            throw BankError.insufficientFunds
+        }
+        balance -= amount
+    }
+    
+    func getBalance() -> Double {
+        balance
+    }
+}
+
+// Usage
+let account = BankAccount()
+await account.deposit(100)
+let balance = await account.getBalance()
+```
+
+### 7.5 MainActor for UI Updates
+
+```swift
+@MainActor
+class ViewModel: ObservableObject {
+    @Published var isLoading = false
+    @Published var users: [User] = []
+    
+    func loadUsers() async {
+        isLoading = true
+        defer { isLoading = false }
+        
+        do {
+            users = try await fetchUsers()
+        } catch {
+            // Handle error
+        }
+    }
+}
+```
+
+### 7.6 Task Cancellation
+
+```swift
+func fetchWithTimeout() async throws -> Data {
+    try await withThrowingTaskGroup(of: Data.self) { group in
+        group.addTask {
+            try await fetchData()
+        }
+        group.addTask {
+            try await Task.sleep(for: .seconds(10))
+            throw TimeoutError()
+        }
+        
+        let result = try await group.next()!
+        group.cancelAll()
+        return result
+    }
+}
+
+// Check for cancellation
+func longOperation() async throws {
+    for item in items {
+        try Task.checkCancellation()
+        await process(item)
+    }
+}
+```
+
+---
+
+## 8. Access Control
+
+**Impact:** MEDIUM
+
+### 8.1 Access Levels
+
+| Level | Scope |
+|-------|-------|
+| `private` | Enclosing declaration only |
+| `fileprivate` | Entire source file |
+| `internal` | Module (default) |
+| `public` | Other modules can access |
+| `open` | Other modules can subclass/override |
+
+### 8.2 Best Practices
+
+```swift
+public class UserService {
+    // Public API
+    public func fetchUser(id: Int) async throws -> User { }
+    
+    // Internal helper
+    func buildRequest(for id: Int) -> URLRequest { }
+    
+    // Private implementation detail
+    private let session: URLSession
+    private var cache: [Int: User] = [:]
+}
+```
+
+### 8.3 Private Setters
+
+```swift
+public struct Counter {
+    public private(set) var count = 0
+    
+    public mutating func increment() {
+        count += 1
+    }
+}
+```
+
+---
+
+## 9. Generics and Type Constraints
+
+**Impact:** MEDIUM
+
+### 9.1 Generic Functions
+
+```swift
+func swapValues<T>(_ a: inout T, _ b: inout T) {
+    let temp = a
+    a = b
+    b = temp
+}
+```
+
+### 9.2 Type Constraints
+
+```swift
+func findIndex<T: Equatable>(of value: T, in array: [T]) -> Int? {
+    array.firstIndex(of: value)
+}
+
+func decode<T: Decodable>(_ type: T.Type, from data: Data) throws -> T {
+    try JSONDecoder().decode(type, from: data)
+}
+```
+
+### 9.3 Where Clauses
+
+```swift
+func allMatch<C: Collection>(_ collection: C, predicate: (C.Element) -> Bool) -> Bool
+    where C.Element: Equatable {
+    collection.allSatisfy(predicate)
+}
+
+extension Array where Element: Numeric {
+    func sum() -> Element {
+        reduce(0, +)
+    }
+}
+```
+
+### 9.4 Opaque Types (some)
+
+```swift
+func makeCollection() -> some Collection {
+    [1, 2, 3]
+}
+
+var body: some View {
+    Text("Hello")
+}
+```
+
+---
+
+## 10. Property Wrappers
+
+**Impact:** MEDIUM
+
+### 10.1 Common SwiftUI Property Wrappers
+
+| Wrapper | Use Case |
+|---------|----------|
+| `@State` | View-local mutable state |
+| `@Binding` | Two-way connection to parent state |
+| `@StateObject` | View-owned observable object |
+| `@ObservedObject` | Passed-in observable object |
+| `@EnvironmentObject` | Shared object from ancestor |
+| `@Environment` | System environment values |
+| `@Published` | Observable property in class |
+
+### 10.2 Custom Property Wrappers
+
+```swift
+@propertyWrapper
+struct Clamped<Value: Comparable> {
+    private var value: Value
+    let range: ClosedRange<Value>
+    
+    var wrappedValue: Value {
+        get { value }
+        set { value = min(max(newValue, range.lowerBound), range.upperBound) }
+    }
+    
+    init(wrappedValue: Value, _ range: ClosedRange<Value>) {
+        self.range = range
+        self.value = min(max(wrappedValue, range.lowerBound), range.upperBound)
+    }
+}
+
+struct Settings {
+    @Clamped(0...100) var volume: Int = 50
+}
+```
+
+---
+
+## Quick Reference
+
+### Optionals
+
+```swift
+if let x = optional { }      // Safe unwrap
+guard let x = optional else { return }  // Early exit
+let x = optional ?? default  // Default value
+optional?.method()           // Optional chaining
+optional.map { transform($0) }  // Transform if present
+```
+
+### Common Patterns
+
+```swift
+// Defer for cleanup
+func process() {
+    let file = openFile()
+    defer { closeFile(file) }
+    // ... work with file
+}
+
+// Lazy initialization
+lazy var expensive: ExpensiveObject = {
+    ExpensiveObject()
+}()
+
+// Type inference
+let numbers = [1, 2, 3]  // [Int]
+let doubled = numbers.map { $0 * 2 }  // [Int]
+```
+
+### Closure Syntax
+
+```swift
+// Full syntax
+let sorted = names.sorted(by: { (s1: String, s2: String) -> Bool in
+    return s1 < s2
+})
+
+// Shortened
+let sorted = names.sorted { $0 < $1 }
+
+// Trailing closure
+UIView.animate(withDuration: 0.3) {
+    view.alpha = 0
+}
+```
+
+---
+
+## Checklist
+
+### Safety
+- [ ] No force unwrapping (`!`) except for IB outlets and known-safe cases
+- [ ] All optionals handled with `if let`, `guard let`, or `??`
+- [ ] No implicitly unwrapped optionals (`!`) in data models
+
+### Memory
+- [ ] Closures use `[weak self]` when capturing self in escaping closures
+- [ ] Delegate properties are `weak`
+- [ ] No retain cycles between objects
+
+### Concurrency
+- [ ] Async functions used instead of completion handlers
+- [ ] Actors protect shared mutable state
+- [ ] UI updates on `@MainActor`
+- [ ] Task cancellation checked in long operations
+
+### Access Control
+- [ ] `private` used for implementation details
+- [ ] `public` API is minimal and intentional
+- [ ] No unnecessary `internal` exposure
+
+### Naming
+- [ ] Types use PascalCase
+- [ ] Functions and variables use camelCase
+- [ ] Booleans have `is`/`has`/`should` prefix
+- [ ] Functions read like natural English
+
+---
+
+*Swift and Apple are trademarks of Apple Inc.*
--- a/skills/ios-application-dev/references/swiftui-design-guidelines.md
+++ b/skills/ios-application-dev/references/swiftui-design-guidelines.md
--- a/skills/ios-application-dev/references/system-integration.md
+++ b/skills/ios-application-dev/references/system-integration.md
@@ -0,0 +1,401 @@
+# System Integration
+
+iOS system integration guide covering permissions, location, sharing, app lifecycle, and haptic feedback.
+
+## Permission Requests
+
+Request permissions contextually, not at launch:
+
+```swift
+import AVFoundation
+
+@objc func openCamera() {
+    AVCaptureDevice.requestAccess(for: .video) { [weak self] granted in
+        DispatchQueue.main.async {
+            if granted {
+                self?.showCameraInterface()
+            } else {
+                self?.showPermissionDeniedAlert()
+            }
+        }
+    }
+}
+```
+
+### Photo Library
+
+```swift
+import Photos
+
+func requestPhotoAccess() {
+    PHPhotoLibrary.requestAuthorization(for: .readWrite) { status in
+        DispatchQueue.main.async {
+            switch status {
+            case .authorized, .limited:
+                self.showPhotoPicker()
+            case .denied, .restricted:
+                self.showSettingsAlert()
+            default:
+                break
+            }
+        }
+    }
+}
+```
+
+### Microphone
+
+```swift
+func requestMicrophoneAccess() {
+    AVAudioSession.sharedInstance().requestRecordPermission { granted in
+        DispatchQueue.main.async {
+            if granted {
+                self.startRecording()
+            }
+        }
+    }
+}
+```
+
+### Notifications
+
+```swift
+import UserNotifications
+
+func requestNotificationPermission() {
+    UNUserNotificationCenter.current().requestAuthorization(
+        options: [.alert, .badge, .sound]
+    ) { granted, error in
+        DispatchQueue.main.async {
+            if granted {
+                self.registerForRemoteNotifications()
+            }
+        }
+    }
+}
+```
+
+## Location Button
+
+For one-time location access without persistent permission:
+
+```swift
+import CoreLocationUI
+
+class StoreFinderVC: UIViewController {
+    override func viewDidLoad() {
+        super.viewDidLoad()
+        
+        let locationBtn = CLLocationButton()
+        locationBtn.icon = .arrowFilled
+        locationBtn.label = .currentLocation
+        locationBtn.cornerRadius = 20
+        locationBtn.addTarget(self, action: #selector(findNearby), for: .touchUpInside)
+        
+        view.addSubview(locationBtn)
+        locationBtn.snp.makeConstraints { make in
+            make.centerX.equalToSuperview()
+            make.bottom.equalTo(view.safeAreaLayoutGuide).offset(-24)
+        }
+    }
+}
+```
+
+### Core Location
+
+```swift
+import CoreLocation
+
+class LocationManager: NSObject, CLLocationManagerDelegate {
+    private let manager = CLLocationManager()
+    
+    func requestLocation() {
+        manager.delegate = self
+        manager.desiredAccuracy = kCLLocationAccuracyBest
+        manager.requestWhenInUseAuthorization()
+    }
+    
+    func locationManagerDidChangeAuthorization(_ manager: CLLocationManager) {
+        switch manager.authorizationStatus {
+        case .authorizedWhenInUse, .authorizedAlways:
+            manager.requestLocation()
+        case .denied:
+            showLocationDeniedAlert()
+        default:
+            break
+        }
+    }
+    
+    func locationManager(_ manager: CLLocationManager, didUpdateLocations locations: [CLLocation]) {
+        guard let location = locations.last else { return }
+        handleLocation(location)
+    }
+}
+```
+
+## Share Sheet
+
+```swift
+@objc func shareContent() {
+    let items: [Any] = [contentURL, contentImage].compactMap { $0 }
+    let activityVC = UIActivityViewController(activityItems: items, applicationActivities: nil)
+    
+    if let popover = activityVC.popoverPresentationController {
+        popover.sourceView = shareButton
+        popover.sourceRect = shareButton.bounds
+    }
+    
+    present(activityVC, animated: true)
+}
+```
+
+### Custom Share Items
+
+```swift
+class ShareItem: NSObject, UIActivityItemSource {
+    let title: String
+    let url: URL
+    
+    init(title: String, url: URL) {
+        self.title = title
+        self.url = url
+    }
+    
+    func activityViewControllerPlaceholderItem(_ activityViewController: UIActivityViewController) -> Any {
+        return url
+    }
+    
+    func activityViewController(_ activityViewController: UIActivityViewController, itemForActivityType activityType: UIActivity.ActivityType?) -> Any? {
+        return url
+    }
+    
+    func activityViewController(_ activityViewController: UIActivityViewController, subjectForActivityType activityType: UIActivity.ActivityType?) -> String {
+        return title
+    }
+}
+```
+
+### Excluding Activities
+
+```swift
+let activityVC = UIActivityViewController(activityItems: items, applicationActivities: nil)
+activityVC.excludedActivityTypes = [
+    .addToReadingList,
+    .assignToContact,
+    .print
+]
+```
+
+## App Lifecycle
+
+```swift
+class PlayerViewController: UIViewController {
+    override func viewDidLoad() {
+        super.viewDidLoad()
+        
+        NotificationCenter.default.addObserver(
+            self, selector: #selector(onBackground),
+            name: UIApplication.didEnterBackgroundNotification, object: nil
+        )
+        NotificationCenter.default.addObserver(
+            self, selector: #selector(onForeground),
+            name: UIApplication.willEnterForegroundNotification, object: nil
+        )
+        NotificationCenter.default.addObserver(
+            self, selector: #selector(onTerminate),
+            name: UIApplication.willTerminateNotification, object: nil
+        )
+    }
+    
+    @objc private func onBackground() { 
+        saveState()
+        pausePlayback()
+    }
+    
+    @objc private func onForeground() { 
+        restoreState()
+        resumePlayback()
+    }
+    
+    @objc private func onTerminate() {
+        saveState()
+    }
+}
+```
+
+### Scene Lifecycle (iOS 13+)
+
+```swift
+class SceneDelegate: UIResponder, UIWindowSceneDelegate {
+    func sceneDidBecomeActive(_ scene: UIScene) {
+        // Resume tasks
+    }
+    
+    func sceneWillResignActive(_ scene: UIScene) {
+        // Pause tasks
+    }
+    
+    func sceneDidEnterBackground(_ scene: UIScene) {
+        // Save state
+    }
+    
+    func sceneWillEnterForeground(_ scene: UIScene) {
+        // Prepare UI
+    }
+}
+```
+
+### State Preservation
+
+```swift
+class ViewController: UIViewController {
+    override func encodeRestorableState(with coder: NSCoder) {
+        super.encodeRestorableState(with: coder)
+        coder.encode(currentItemID, forKey: "currentItemID")
+    }
+    
+    override func decodeRestorableState(with coder: NSCoder) {
+        super.decodeRestorableState(with: coder)
+        if let itemID = coder.decodeObject(forKey: "currentItemID") as? String {
+            loadItem(itemID)
+        }
+    }
+}
+```
+
+## Haptic Feedback
+
+```swift
+func onTaskComplete() {
+    UINotificationFeedbackGenerator().notificationOccurred(.success)
+}
+
+func onError() {
+    UINotificationFeedbackGenerator().notificationOccurred(.error)
+}
+
+func onWarning() {
+    UINotificationFeedbackGenerator().notificationOccurred(.warning)
+}
+
+func onSelection() {
+    UISelectionFeedbackGenerator().selectionChanged()
+}
+
+func onImpact() {
+    UIImpactFeedbackGenerator(style: .medium).impactOccurred()
+}
+```
+
+### Impact Styles
+
+| Style | Usage |
+|-------|-------|
+| `.light` | Subtle feedback, small UI changes |
+| `.medium` | Standard feedback, button presses |
+| `.heavy` | Strong feedback, significant actions |
+| `.soft` | Gentle feedback, background changes |
+| `.rigid` | Sharp feedback, collisions |
+
+### Prepared Feedback
+
+For time-critical haptics, prepare the generator in advance:
+
+```swift
+class DraggableView: UIView {
+    private let impactGenerator = UIImpactFeedbackGenerator(style: .medium)
+    
+    override func touchesBegan(_ touches: Set<UITouch>, with event: UIEvent?) {
+        super.touchesBegan(touches, with: event)
+        impactGenerator.prepare()
+    }
+    
+    func didSnapToPosition() {
+        impactGenerator.impactOccurred()
+    }
+}
+```
+
+## Deep Linking
+
+### URL Schemes
+
+```swift
+// In AppDelegate or SceneDelegate
+func application(_ app: UIApplication, open url: URL, options: [UIApplication.OpenURLOptionsKey: Any] = [:]) -> Bool {
+    guard let components = URLComponents(url: url, resolvingAgainstBaseURL: true) else {
+        return false
+    }
+    
+    switch components.host {
+    case "item":
+        if let itemID = components.queryItems?.first(where: { $0.name == "id" })?.value {
+            navigateToItem(itemID)
+            return true
+        }
+    default:
+        break
+    }
+    
+    return false
+}
+```
+
+### Universal Links
+
+```swift
+func application(_ application: UIApplication, continue userActivity: NSUserActivity, restorationHandler: @escaping ([UIUserActivityRestoring]?) -> Void) -> Bool {
+    guard userActivity.activityType == NSUserActivityTypeBrowsingWeb,
+          let url = userActivity.webpageURL else {
+        return false
+    }
+    
+    return handleUniversalLink(url)
+}
+```
+
+## Background Tasks
+
+```swift
+import BackgroundTasks
+
+func registerBackgroundTasks() {
+    BGTaskScheduler.shared.register(
+        forTaskWithIdentifier: "com.app.refresh",
+        using: nil
+    ) { task in
+        self.handleAppRefresh(task: task as! BGAppRefreshTask)
+    }
+}
+
+func scheduleAppRefresh() {
+    let request = BGAppRefreshTaskRequest(identifier: "com.app.refresh")
+    request.earliestBeginDate = Date(timeIntervalSinceNow: 15 * 60)
+    
+    do {
+        try BGTaskScheduler.shared.submit(request)
+    } catch {
+        print("Could not schedule app refresh: \(error)")
+    }
+}
+
+func handleAppRefresh(task: BGAppRefreshTask) {
+    scheduleAppRefresh()
+    
+    let operation = RefreshOperation()
+    
+    task.expirationHandler = {
+        operation.cancel()
+    }
+    
+    operation.completionBlock = {
+        task.setTaskCompleted(success: !operation.isCancelled)
+    }
+    
+    OperationQueue.main.addOperation(operation)
+}
+```
+
+---
+
+*UIKit, Core Location, and Apple are trademarks of Apple Inc.*
--- a/skills/ios-application-dev/references/uikit-components.md
+++ b/skills/ios-application-dev/references/uikit-components.md
@@ -0,0 +1,297 @@
+# UIKit Components
+
+Common UIKit components guide covering UIStackView, buttons, alerts, search, and context menus.
+
+## UIStackView
+
+Stack views simplify auto layout for linear arrangements:
+
+```swift
+class FormViewController: UIViewController {
+    private let mainStack = UIStackView()
+    
+    override func viewDidLoad() {
+        super.viewDidLoad()
+        
+        mainStack.axis = .vertical
+        mainStack.spacing = 16
+        mainStack.alignment = .fill
+        mainStack.distribution = .fill
+        
+        view.addSubview(mainStack)
+        mainStack.snp.makeConstraints { make in
+            make.top.equalTo(view.safeAreaLayoutGuide).offset(20)
+            make.leading.trailing.equalToSuperview().inset(16)
+        }
+        
+        let headerStack = UIStackView()
+        headerStack.axis = .horizontal
+        headerStack.spacing = 12
+        headerStack.alignment = .center
+        
+        let avatarView = UIImageView()
+        avatarView.snp.makeConstraints { make in
+            make.size.equalTo(48)
+        }
+        
+        let labelStack = UIStackView()
+        labelStack.axis = .vertical
+        labelStack.spacing = 4
+        labelStack.addArrangedSubview(titleLabel)
+        labelStack.addArrangedSubview(subtitleLabel)
+        
+        headerStack.addArrangedSubview(avatarView)
+        headerStack.addArrangedSubview(labelStack)
+        
+        mainStack.addArrangedSubview(headerStack)
+        mainStack.addArrangedSubview(contentView)
+        mainStack.addArrangedSubview(actionButton)
+        
+        mainStack.setCustomSpacing(24, after: headerStack)
+    }
+}
+```
+
+### StackView Properties
+
+| Property | Options | Usage |
+|----------|---------|-------|
+| `axis` | `.horizontal`, `.vertical` | Layout direction |
+| `distribution` | `.fill`, `.fillEqually`, `.fillProportionally`, `.equalSpacing`, `.equalCentering` | Space distribution |
+| `alignment` | `.fill`, `.leading`, `.center`, `.trailing` | Cross-axis alignment |
+| `spacing` | CGFloat | Uniform spacing |
+| `setCustomSpacing(_:after:)` | - | Variable spacing |
+
+## UIButton.Configuration (iOS 15+)
+
+```swift
+let primaryButton = UIButton(type: .system)
+primaryButton.configuration = .filled()
+primaryButton.setTitle("Continue", for: .normal)
+
+let secondaryButton = UIButton(type: .system)
+secondaryButton.configuration = .tinted()
+secondaryButton.setTitle("Save for Later", for: .normal)
+
+let destructiveButton = UIButton(type: .system)
+destructiveButton.configuration = .plain()
+destructiveButton.setTitle("Remove", for: .normal)
+destructiveButton.tintColor = .systemRed
+```
+
+### Custom Button Configuration
+
+```swift
+var config = UIButton.Configuration.filled()
+config.title = "Add to Cart"
+config.image = UIImage(systemName: "cart.badge.plus")
+config.imagePadding = 8
+config.cornerStyle = .capsule
+config.baseBackgroundColor = .systemBlue
+config.baseForegroundColor = .white
+let cartButton = UIButton(configuration: config)
+```
+
+### Button State Handling
+
+```swift
+var config = UIButton.Configuration.filled()
+config.titleTextAttributesTransformer = UIConfigurationTextAttributesTransformer { incoming in
+    var outgoing = incoming
+    outgoing.font = .boldSystemFont(ofSize: 16)
+    return outgoing
+}
+
+config.configurationUpdateHandler = { button in
+    var config = button.configuration
+    config?.showsActivityIndicator = button.isSelected
+    button.configuration = config
+}
+```
+
+## UIAlertController
+
+### Alert
+
+```swift
+func confirmDeletion() {
+    let alert = UIAlertController(
+        title: "Remove Item?",
+        message: "This cannot be undone.",
+        preferredStyle: .alert
+    )
+    alert.addAction(UIAlertAction(title: "Remove", style: .destructive) { _ in
+        self.performDeletion()
+    })
+    alert.addAction(UIAlertAction(title: "Cancel", style: .cancel))
+    present(alert, animated: true)
+}
+```
+
+### Action Sheet
+
+```swift
+func showOptions() {
+    let sheet = UIAlertController(title: nil, message: nil, preferredStyle: .actionSheet)
+    sheet.addAction(UIAlertAction(title: "Share", style: .default) { _ in })
+    sheet.addAction(UIAlertAction(title: "Edit", style: .default) { _ in })
+    sheet.addAction(UIAlertAction(title: "Delete", style: .destructive) { _ in })
+    sheet.addAction(UIAlertAction(title: "Cancel", style: .cancel))
+    
+    if let popover = sheet.popoverPresentationController {
+        popover.sourceView = optionsButton
+        popover.sourceRect = optionsButton.bounds
+    }
+    
+    present(sheet, animated: true)
+}
+```
+
+### Alert with Text Field
+
+```swift
+func showInputAlert() {
+    let alert = UIAlertController(
+        title: "Rename",
+        message: "Enter a new name",
+        preferredStyle: .alert
+    )
+    
+    alert.addTextField { textField in
+        textField.placeholder = "Name"
+        textField.autocapitalizationType = .words
+    }
+    
+    alert.addAction(UIAlertAction(title: "Save", style: .default) { _ in
+        if let name = alert.textFields?.first?.text {
+            self.rename(to: name)
+        }
+    })
+    alert.addAction(UIAlertAction(title: "Cancel", style: .cancel))
+    
+    present(alert, animated: true)
+}
+```
+
+## UISearchController
+
+```swift
+class SearchableListVC: UIViewController, UISearchResultsUpdating {
+    private let searchController = UISearchController(searchResultsController: nil)
+    private var allItems: [Item] = []
+    
+    override func viewDidLoad() {
+        super.viewDidLoad()
+        setupSearch()
+    }
+    
+    private func setupSearch() {
+        searchController.searchResultsUpdater = self
+        searchController.obscuresBackgroundDuringPresentation = false
+        searchController.searchBar.placeholder = "Search"
+        navigationItem.searchController = searchController
+        definesPresentationContext = true
+    }
+    
+    func updateSearchResults(for searchController: UISearchController) {
+        let query = searchController.searchBar.text ?? ""
+        let filtered = query.isEmpty ? allItems : allItems.filter {
+            $0.title.localizedCaseInsensitiveContains(query)
+        }
+        updateItems(filtered)
+    }
+}
+```
+
+### Search Bar Configuration
+
+```swift
+searchController.searchBar.scopeButtonTitles = ["All", "Recent", "Favorites"]
+searchController.searchBar.showsScopeBar = true
+searchController.searchBar.delegate = self
+
+extension SearchableListVC: UISearchBarDelegate {
+    func searchBar(_ searchBar: UISearchBar, selectedScopeButtonIndexDidChange selectedScope: Int) {
+        filterContent(scope: selectedScope)
+    }
+}
+```
+
+## UIContextMenuInteraction
+
+```swift
+extension PhotoCell: UIContextMenuInteractionDelegate {
+    func contextMenuInteraction(
+        _ interaction: UIContextMenuInteraction,
+        configurationForMenuAtLocation location: CGPoint
+    ) -> UIContextMenuConfiguration? {
+        UIContextMenuConfiguration(identifier: nil, previewProvider: nil) { _ in
+            let share = UIAction(
+                title: "Share",
+                image: UIImage(systemName: "square.and.arrow.up")
+            ) { _ in }
+            
+            let favorite = UIAction(
+                title: "Favorite",
+                image: UIImage(systemName: "heart")
+            ) { _ in }
+            
+            let delete = UIAction(
+                title: "Delete",
+                image: UIImage(systemName: "trash"),
+                attributes: .destructive
+            ) { _ in }
+            
+            return UIMenu(children: [share, favorite, delete])
+        }
+    }
+}
+```
+
+### Context Menu with Preview
+
+```swift
+func contextMenuInteraction(
+    _ interaction: UIContextMenuInteraction,
+    configurationForMenuAtLocation location: CGPoint
+) -> UIContextMenuConfiguration? {
+    UIContextMenuConfiguration(
+        identifier: itemID as NSCopying,
+        previewProvider: { [weak self] in
+            return self?.makePreviewController()
+        },
+        actionProvider: { _ in
+            return self.makeMenu()
+        }
+    )
+}
+
+func contextMenuInteraction(
+    _ interaction: UIContextMenuInteraction,
+    willPerformPreviewActionForMenuWith configuration: UIContextMenuConfiguration,
+    animator: UIContextMenuInteractionCommitAnimating
+) {
+    animator.addCompletion {
+        self.showDetail()
+    }
+}
+```
+
+### CollectionView Context Menu
+
+```swift
+func collectionView(
+    _ collectionView: UICollectionView,
+    contextMenuConfigurationForItemAt indexPath: IndexPath,
+    point: CGPoint
+) -> UIContextMenuConfiguration? {
+    let item = dataSource.itemIdentifier(for: indexPath)
+    return UIContextMenuConfiguration(identifier: indexPath as NSCopying, previewProvider: nil) { _ in
+        return self.makeMenu(for: item)
+    }
+}
+```
+
+---
+
+*UIKit and Apple are trademarks of Apple Inc.*
--- a/skills/shader-dev/SKILL.md
+++ b/skills/shader-dev/SKILL.md
@@ -0,0 +1,299 @@
+---
+name: shader-dev
+description: Comprehensive GLSL shader techniques for creating stunning visual effects — ray marching, SDF modeling, fluid simulation, particle systems, procedural generation, lighting, post-processing, and more.
+license: MIT
+metadata:
+  version: "1.0"
+  category: graphics
+---
+
+# Shader Craft
+
+A unified skill covering 36 GLSL shader techniques (ShaderToy-compatible) for real-time visual effects.
+
+## Invocation
+
+```
+/shader-dev <request>
+```
+
+`$ARGUMENTS` contains the user's request (e.g. "create a raymarched SDF scene with soft shadows").
+
+## Skill Structure
+
+```
+shader-dev/
+├── SKILL.md                      # Core skill (this file)
+├── techniques/                   # Implementation guides (read per routing table)
+│   ├── ray-marching.md           # Sphere tracing with SDF
+│   ├── sdf-3d.md                 # 3D signed distance functions
+│   ├── lighting-model.md         # PBR, Phong, toon shading
+│   ├── procedural-noise.md       # Perlin, Simplex, FBM
+│   └── ...                       # 34 more technique files
+└── reference/                    # Detailed guides (read as needed)
+    ├── ray-marching.md           # Math derivations & advanced patterns
+    ├── sdf-3d.md                 # Extended SDF theory
+    ├── lighting-model.md         # Lighting math deep-dive
+    ├── procedural-noise.md       # Noise function theory
+    └── ...                       # 34 more reference files
+```
+
+## How to Use
+
+1. Read the **Technique Routing Table** below to identify which technique(s) match the user's request
+2. Read the relevant file(s) from `techniques/` — each file contains core principles, implementation steps, and complete code templates
+3. If you need deeper understanding (math derivations, advanced patterns), follow the reference link at the bottom of each technique file to `reference/`
+4. Apply the **WebGL2 Adaptation Rules** below when generating standalone HTML pages
+
+## Technique Routing Table
+
+| User wants to create... | Primary technique | Combine with |
+|---|---|---|
+| 3D objects / scenes from math | [ray-marching](techniques/ray-marching.md) + [sdf-3d](techniques/sdf-3d.md) | lighting-model, shadow-techniques |
+| Complex 3D shapes (booleans, blends) | [csg-boolean-operations](techniques/csg-boolean-operations.md) | sdf-3d, ray-marching |
+| Infinite repeating patterns in 3D | [domain-repetition](techniques/domain-repetition.md) | sdf-3d, ray-marching |
+| Organic / warped shapes | [domain-warping](techniques/domain-warping.md) | procedural-noise |
+| Fluid / smoke / ink effects | [fluid-simulation](techniques/fluid-simulation.md) | multipass-buffer |
+| Particle effects (fire, sparks, snow) | [particle-system](techniques/particle-system.md) | procedural-noise, color-palette |
+| Physically-based simulations | [simulation-physics](techniques/simulation-physics.md) | multipass-buffer |
+| Game of Life / reaction-diffusion | [cellular-automata](techniques/cellular-automata.md) | multipass-buffer, color-palette |
+| Ocean / water surface | [water-ocean](techniques/water-ocean.md) | atmospheric-scattering, lighting-model |
+| Terrain / landscape | [terrain-rendering](techniques/terrain-rendering.md) | atmospheric-scattering, procedural-noise |
+| Clouds / fog / volumetric fire | [volumetric-rendering](techniques/volumetric-rendering.md) | procedural-noise, atmospheric-scattering |
+| Sky / sunset / atmosphere | [atmospheric-scattering](techniques/atmospheric-scattering.md) | volumetric-rendering |
+| Realistic lighting (PBR, Phong) | [lighting-model](techniques/lighting-model.md) | shadow-techniques, ambient-occlusion |
+| Shadows (soft / hard) | [shadow-techniques](techniques/shadow-techniques.md) | lighting-model |
+| Ambient occlusion | [ambient-occlusion](techniques/ambient-occlusion.md) | lighting-model, normal-estimation |
+| Path tracing / global illumination | [path-tracing-gi](techniques/path-tracing-gi.md) | analytic-ray-tracing, multipass-buffer |
+| Precise ray-geometry intersections | [analytic-ray-tracing](techniques/analytic-ray-tracing.md) | lighting-model |
+| Voxel worlds (Minecraft-style) | [voxel-rendering](techniques/voxel-rendering.md) | lighting-model, shadow-techniques |
+| Noise / FBM textures | [procedural-noise](techniques/procedural-noise.md) | domain-warping |
+| Tiled 2D patterns | [procedural-2d-pattern](techniques/procedural-2d-pattern.md) | polar-uv-manipulation |
+| Voronoi / cell patterns | [voronoi-cellular-noise](techniques/voronoi-cellular-noise.md) | color-palette |
+| Fractals (Mandelbrot, Julia, 3D) | [fractal-rendering](techniques/fractal-rendering.md) | color-palette, polar-uv-manipulation |
+| Color grading / palettes | [color-palette](techniques/color-palette.md) | — |
+| Bloom / tone mapping / glitch | [post-processing](techniques/post-processing.md) | multipass-buffer |
+| Multi-pass ping-pong buffers | [multipass-buffer](techniques/multipass-buffer.md) | — |
+| Texture / sampling techniques | [texture-sampling](techniques/texture-sampling.md) | — |
+| Camera / matrix transforms | [matrix-transform](techniques/matrix-transform.md) | — |
+| Surface normals | [normal-estimation](techniques/normal-estimation.md) | — |
+| Polar coords / kaleidoscope | [polar-uv-manipulation](techniques/polar-uv-manipulation.md) | procedural-2d-pattern |
+| 2D shapes / UI from SDF | [sdf-2d](techniques/sdf-2d.md) | color-palette |
+| Procedural audio / music | [sound-synthesis](techniques/sound-synthesis.md) | — |
+| SDF tricks / optimization | [sdf-tricks](techniques/sdf-tricks.md) | sdf-3d, ray-marching |
+| Anti-aliased rendering | [anti-aliasing](techniques/anti-aliasing.md) | sdf-2d, post-processing |
+| Depth of field / motion blur / lens effects | [camera-effects](techniques/camera-effects.md) | post-processing, multipass-buffer |
+| Advanced texture mapping / no-tile textures | [texture-mapping-advanced](techniques/texture-mapping-advanced.md) | terrain-rendering, texture-sampling |
+| WebGL2 shader errors / debugging | [webgl-pitfalls](techniques/webgl-pitfalls.md) | — |
+
+## Technique Index
+
+### Geometry & SDF
+- **sdf-2d** — 2D signed distance functions for shapes, UI, anti-aliased rendering
+- **sdf-3d** — 3D signed distance functions for real-time implicit surface modeling
+- **csg-boolean-operations** — Constructive solid geometry: union, subtraction, intersection with smooth blending
+- **domain-repetition** — Infinite space repetition, folding, and limited tiling
+- **domain-warping** — Distort domains with noise for organic, flowing shapes
+- **sdf-tricks** — SDF optimization, bounding volumes, binary search refinement, hollowing, layered edges, debug visualization
+
+### Ray Casting & Lighting
+- **ray-marching** — Sphere tracing with SDF for 3D scene rendering
+- **analytic-ray-tracing** — Closed-form ray-primitive intersections (sphere, plane, box, torus)
+- **path-tracing-gi** — Monte Carlo path tracing for photorealistic global illumination
+- **lighting-model** — Phong, Blinn-Phong, PBR (Cook-Torrance), and toon shading
+- **shadow-techniques** — Hard shadows, soft shadows (penumbra estimation), cascade shadows
+- **ambient-occlusion** — SDF-based AO, screen-space AO approximation
+- **normal-estimation** — Finite-difference normals, tetrahedron technique
+
+### Simulation & Physics
+- **fluid-simulation** — Navier-Stokes fluid solver with advection, diffusion, pressure projection
+- **simulation-physics** — GPU-based physics: springs, cloth, N-body gravity, collision
+- **particle-system** — Stateless and stateful particle systems (fire, rain, sparks, galaxies)
+- **cellular-automata** — Game of Life, reaction-diffusion (Turing patterns), sand simulation
+
+### Natural Phenomena
+- **water-ocean** — Gerstner waves, FFT ocean, caustics, underwater fog
+- **terrain-rendering** — Heightfield ray marching, FBM terrain, erosion
+- **atmospheric-scattering** — Rayleigh/Mie scattering, god rays, SSS approximation
+- **volumetric-rendering** — Volume ray marching for clouds, fog, fire, explosions
+
+### Procedural Generation
+- **procedural-noise** — Value noise, Perlin, Simplex, Worley, FBM, ridged noise
+- **procedural-2d-pattern** — Brick, hexagon, truchet, Islamic geometric patterns
+- **voronoi-cellular-noise** — Voronoi diagrams, Worley noise, cracked earth, crystal
+- **fractal-rendering** — Mandelbrot, Julia sets, 3D fractals (Mandelbox, Mandelbulb)
+- **color-palette** — Cosine palettes, HSL/HSV/Oklab, dynamic color mapping
+
+### Post-Processing & Infrastructure
+- **post-processing** — Bloom, tone mapping (ACES, Reinhard), vignette, chromatic aberration, glitch
+- **multipass-buffer** — Ping-pong FBO setup, state persistence across frames
+- **texture-sampling** — Bilinear, bicubic, mipmap, procedural texture lookup
+- **matrix-transform** — Camera look-at, projection, rotation, orbit controls
+- **polar-uv-manipulation** — Polar/log-polar coordinates, kaleidoscope, spiral mapping
+- **anti-aliasing** — SSAA, SDF analytical AA, temporal anti-aliasing (TAA), FXAA post-process
+- **camera-effects** — Depth of field (thin lens), motion blur, lens distortion, film grain, vignette
+- **texture-mapping-advanced** — Biplanar mapping, texture repetition avoidance, ray differential filtering
+
+### Audio
+- **sound-synthesis** — Procedural audio in GLSL: oscillators, envelopes, filters, FM synthesis
+
+### Debugging & Validation
+- **webgl-pitfalls** — Common WebGL2/GLSL errors: `fragCoord`, `main()` wrapper, function order, macro limitations, uniform null
+
+## WebGL2 Adaptation Rules
+
+All technique files use ShaderToy GLSL style. When generating standalone HTML pages, apply these adaptations:
+
+### Shader Version & Output
+- Use `canvas.getContext("webgl2")`
+- Shader first line: `#version 300 es`, fragment shader adds `precision highp float;`
+- Fragment shader must declare: `out vec4 fragColor;`
+- Vertex shader: `attribute` → `in`, `varying` → `out`
+- Fragment shader: `varying` → `in`, `gl_FragColor` → `fragColor`, `texture2D()` → `texture()`
+
+### Fragment Coordinate
+- **Use `gl_FragCoord.xy`** instead of `fragCoord` (WebGL2 does not have `fragCoord` built-in)
+```glsl
+// WRONG
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+// CORRECT
+vec2 uv = (2.0 * gl_FragCoord.xy - iResolution.xy) / iResolution.y;
+```
+
+### main() Wrapper for ShaderToy Templates
+- ShaderToy uses `void mainImage(out vec4 fragColor, in vec2 fragCoord)`
+- WebGL2 requires standard `void main()` entry point — always wrap mainImage:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // shader code...
+    fragColor = vec4(col, 1.0);
+}
+
+void main() {
+    mainImage(fragColor, gl_FragCoord.xy);
+}
+```
+
+### Function Declaration Order
+- GLSL requires functions to be declared before use — either declare before use or reorder:
+```glsl
+// WRONG — getAtmosphere() calls getSunDirection() before it's defined
+vec3 getAtmosphere(vec3 dir) { return getSunDirection(); } // Error!
+vec3 getSunDirection() { return normalize(vec3(1.0)); }
+
+// CORRECT — define callee first
+vec3 getSunDirection() { return normalize(vec3(1.0)); }
+vec3 getAtmosphere(vec3 dir) { return getSunDirection(); } // Works
+```
+
+### Macro Limitations
+- `#define` cannot use function calls — use `const` instead:
+```glsl
+// WRONG
+#define SUN_DIR normalize(vec3(0.8, 0.4, -0.6))
+
+// CORRECT
+const vec3 SUN_DIR = vec3(0.756, 0.378, -0.567); // Pre-computed normalized value
+```
+
+### Script Tag Extraction
+- When extracting shader source from `<script>` tags, ensure `#version` is the **first character** — use `.trim()`:
+```javascript
+const fs = document.getElementById('fs').text.trim();
+```
+
+### Common Pitfalls
+- **Unused uniforms**: Compiler may optimize away unused uniforms, causing `gl.getUniformLocation()` to return `null` — always use uniforms in a way the compiler cannot optimize out
+- **Loop indices**: Use runtime constants in loops, not `#define` macros in some ES versions
+- **Terrain functions**: Functions like `terrainM(vec2)` need XZ components — use `terrainM(pos.xz + offset)` not `terrainM(pos + offset)`
+
+## HTML Page Setup
+
+When generating a standalone HTML page:
+
+- Canvas fills the entire viewport, auto-resizes on window resize
+- Page background black, no scrollbars: `body { margin: 0; overflow: hidden; background: #000; }`
+- Implement ShaderToy-compatible uniforms: `iTime`, `iResolution`, `iMouse`, `iFrame`
+- For multi-pass effects (Buffer A/B), use WebGL2 framebuffer + ping-pong (see multipass-buffer technique)
+
+## Common Pitfalls
+
+### JS Variable Declaration Order (TDZ — causes white screen crash)
+
+`let`/`const` variables must be declared at the **top** of the `<script>` block, before any function that references them:
+
+```javascript
+// 1. State variables FIRST
+let frameCount = 0;
+let startTime = Date.now();
+
+// 2. Canvas/GL init, shader compile, FBO creation
+const canvas = document.getElementById('canvas');
+const gl = canvas.getContext('webgl2');
+// ...
+
+// 3. Functions and event bindings LAST
+function resize() { /* can now safely reference frameCount */ }
+function render() { /* ... */ }
+window.addEventListener('resize', resize);
+```
+
+Reason: `let`/`const` have a Temporal Dead Zone — referencing them before declaration throws `ReferenceError`, causing a white screen.
+
+### GLSL Compilation Errors (self-check after writing shaders)
+
+- **Function signature mismatch**: Call must exactly match definition in parameter count and types. If defined as `float fbm(vec3 p)`, cannot call `fbm(uv)` with a `vec2`
+- **Reserved words as variable names**: Do not use: `patch`, `cast`, `sample`, `filter`, `input`, `output`, `common`, `partition`, `active`
+- **Strict type matching**: `vec3 x = 1.0` is illegal — use `vec3 x = vec3(1.0)`; cannot use `.z` to access a `vec2`
+- **No ternary on structs**: ESSL does not allow ternary operator on struct types — use `if`/`else` instead
+
+### Performance Budget
+
+Deployment environments may use headless software rendering with limited GPU power. Stay within these limits:
+
+- Ray marching main loop: ≤ 128 steps
+- Volume sampling / lighting inner loops: ≤ 32 steps
+- FBM octaves: ≤ 6 layers
+- Total nested loop iterations per pixel: ≤ 1000 (exceeding this freezes the browser)
+
+## Quick Recipes
+
+Common effect combinations — complete rendering pipelines assembled from technique modules.
+
+### Photorealistic SDF Scene
+1. **Geometry**: sdf-3d (extended primitives) + csg-boolean-operations (cubic/quartic smin)
+2. **Rendering**: ray-marching + normal-estimation (tetrahedron method)
+3. **Lighting**: lighting-model (outdoor three-light model) + shadow-techniques (improved soft shadow) + ambient-occlusion
+4. **Atmosphere**: atmospheric-scattering (height-based fog with sun tint)
+5. **Post**: post-processing (ACES tone mapping) + anti-aliasing (2x SSAA) + camera-effects (vignette)
+
+### Organic / Biological Forms
+1. **Geometry**: sdf-3d (extended primitives + deformation operators: twist, bend) + csg-boolean (gradient-aware smin for material blending)
+2. **Detail**: procedural-noise (FBM with derivatives) + domain-warping
+3. **Surface**: lighting-model (subsurface scattering approximation via half-Lambert)
+
+### Procedural Landscape
+1. **Terrain**: terrain-rendering + procedural-noise (erosion FBM with derivatives)
+2. **Texturing**: texture-mapping-advanced (biplanar mapping + no-tile)
+3. **Sky**: atmospheric-scattering (Rayleigh/Mie + height fog)
+4. **Water**: water-ocean (Gerstner waves) + lighting-model (Fresnel reflections)
+
+### Stylized 2D Art
+1. **Shapes**: sdf-2d (extended library) + sdf-tricks (layered edges, hollowing)
+2. **Color**: color-palette (cosine palettes) + polar-uv-manipulation (kaleidoscope)
+3. **Polish**: anti-aliasing (SDF analytical AA) + post-processing (bloom, chromatic aberration)
+
+## Shader Debugging Techniques
+
+Visual debugging methods — temporarily replace your output to diagnose issues.
+
+| What to check | Code | What to look for |
+|---|---|---|
+| Surface normals | `col = nor * 0.5 + 0.5;` | Smooth gradients = correct normals; banding = epsilon too large |
+| Ray march step count | `col = vec3(float(steps) / float(MAX_STEPS));` | Red hotspots = performance bottleneck; uniform = wasted iterations |
+| Depth / distance | `col = vec3(t / MAX_DIST);` | Verify correct hit distances |
+| UV coordinates | `col = vec3(uv, 0.0);` | Check coordinate mapping |
+| SDF distance field | `col = (d > 0.0 ? vec3(0.9,0.6,0.3) : vec3(0.4,0.7,0.85)) * (0.8 + 0.2*cos(150.0*d));` | Visualize SDF bands and zero-crossing |
+| Checker pattern (UV) | `col = vec3(mod(floor(uv.x*10.)+floor(uv.y*10.), 2.0));` | Verify UV distortion, seams |
+| Lighting only | `col = vec3(shadow);` or `col = vec3(ao);` | Isolate shadow/AO contributions |
+| Material ID | `col = palette(matId / maxMatId);` | Verify material assignment |
--- a/skills/shader-dev/reference/ambient-occlusion.md
+++ b/skills/shader-dev/reference/ambient-occlusion.md
@@ -0,0 +1,382 @@
+# SDF Ambient Occlusion — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing a complete step-by-step tutorial, mathematical derivations, variant analysis, and advanced usage.
+
+## Prerequisites
+
+- GLSL basic syntax (uniform, varying, function definitions)
+- **Signed Distance Field (SDF)** concept: `map(p)` returns the distance from point p to the nearest surface
+- **Raymarching** basic loop: marching along a ray to find surface intersections
+- **Surface normal computation**: Obtaining the normal direction via SDF gradient (finite differences)
+- Vector math fundamentals: dot product, normalization, vector addition/subtraction
+
+## Core Principles in Detail
+
+The core idea of SDF ambient occlusion: **Sample the SDF at multiple distances along the surface normal and compare the "expected distance" with the "actual distance" to estimate the degree of occlusion.**
+
+For a point P on the surface with normal N, at distance h:
+- **Expected distance** = h (if the surroundings are completely open, the SDF value should equal the distance to the surface)
+- **Actual distance** = map(P + N × h) (real SDF value)
+- **Occlusion contribution** = h - map(P + N × h) (the larger the difference, the more nearby geometry is occluding)
+
+The final result is a weighted sum of occlusion contributions from multiple sample points, yielding a [0, 1] occlusion factor:
+- 1.0 = no occlusion (bright)
+- 0.0 = fully occluded (dark corner)
+
+Key mathematical formula (additive accumulation form):
+
+```
+AO = 1 - k × Σ(weight_i × max(0, h_i - map(P + N × h_i)))
+```
+
+Where `weight_i` typically decays exponentially or geometrically (closer samples have higher weight), and `k` is a global intensity coefficient.
+
+## Implementation Steps in Detail
+
+### Step 1: Build the Base SDF Scene
+
+**What**: Define a `map()` function that returns the signed distance value for any point in space.
+
+**Why**: AO computation relies entirely on SDF queries, so a working distance field is needed first.
+
+```glsl
+float map(vec3 p) {
+    float d = p.y; // Ground plane
+    d = min(d, length(p - vec3(0.0, 1.0, 0.0)) - 1.0); // Sphere
+    d = min(d, length(vec2(length(p.xz) - 1.5, p.y - 0.5)) - 0.4); // Torus
+    return d;
+}
+```
+
+### Step 2: Compute Surface Normal
+
+**What**: Compute the normal direction via finite difference approximation of the SDF gradient.
+
+**Why**: AO sampling probes outward along the normal direction; the normal determines the sampling direction.
+
+```glsl
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+```
+
+### Step 3: Implement Classic Normal-Direction AO (Additive Accumulation)
+
+**What**: Sample the SDF at 5 distances along the normal direction, accumulating occlusion.
+
+**Why**: This is a classic method — the most concise and efficient SDF-AO implementation. 5 samples strike an excellent balance between quality and performance. The weight decays at 0.95 exponentially, giving closer samples more influence (near-surface occlusion is more perceptually important).
+
+```glsl
+// Classic AO
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0; // Initial weight
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0; // Sample distance: 0.01 ~ 0.13
+        float d = map(pos + h * nor);             // Actual SDF distance
+        occ += (h - d) * sca;                     // Accumulate (expected - actual) × weight
+        sca *= 0.95;                              // Weight decay
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+```
+
+### Step 4: Apply AO to Lighting
+
+**What**: Multiply the AO factor into ambient and indirect light components.
+
+**Why**: AO simulates the degree to which indirect light is occluded. Physically, it should only affect ambient/indirect light, not the direct light source's diffuse and specular (direct light occlusion is handled by shadows). However, in practice AO is often multiplied into all lighting for a stronger visual effect.
+
+```glsl
+float ao = calcAO(pos, nor);
+
+// Method A: Affect only ambient light (physically correct)
+vec3 ambient = vec3(0.2, 0.3, 0.5) * ao;
+vec3 color = diffuse * shadow + ambient;
+
+// Method B: Affect all lighting (stronger visual effect)
+vec3 color = (diffuse * shadow + ambient) * ao;
+
+// Method C: Combined with sky visibility bias
+float skyVis = 0.5 + 0.5 * nor.y; // Upward-facing surfaces are brighter
+vec3 color = diffuse * shadow + ambient * ao * skyVis;
+```
+
+### Step 5: Raymarching Main Loop Integration
+
+**What**: Integrate AO into the complete raymarching pipeline.
+
+**Why**: AO is part of the lighting computation and needs to be calculated after hitting a surface but before final output.
+
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // ... camera setup, ray generation ...
+
+    // Raymarching loop
+    float t = 0.0;
+    for (int i = 0; i < 128; i++) {
+        vec3 p = ro + rd * t;
+        float d = map(p);
+        if (d < 0.001) break;
+        t += d;
+        if (t > 100.0) break;
+    }
+
+    // Compute lighting on hit
+    vec3 col = vec3(0.0);
+    if (t < 100.0) {
+        vec3 pos = ro + rd * t;
+        vec3 nor = calcNormal(pos);
+        float ao = calcAO(pos, nor);
+
+        // Lighting
+        vec3 lig = normalize(vec3(1.0, 0.8, -0.6));
+        float dif = clamp(dot(nor, lig), 0.0, 1.0);
+        float sky = 0.5 + 0.5 * nor.y;
+        col = vec3(1.0) * dif + vec3(0.2, 0.3, 0.5) * sky * ao;
+    }
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Variant Details
+
+### Variant 1: Multiplicative AO
+
+**Difference from base version**: Starts at 1.0 and progressively multiplies down, rather than using additive accumulation then inverting. The multiplicative form naturally guarantees the result stays in [0, 1], avoids the need for clamping, and provides more natural falloff for multiple overlapping occlusions.
+
+**Source**: Multiplicative accumulation approach
+
+```glsl
+// Multiplicative AO
+float calcAO_multiplicative(vec3 pos, vec3 nor) {
+    float ao = 1.0;
+    float dist = 0.0;
+    for (int i = 0; i <= 5; i++) {
+        dist += 0.1; // Uniform step of 0.1
+        float d = map(pos + nor * dist);
+        ao *= 1.0 - max(0.0, (dist - d) * 0.2 / dist);
+    }
+    return ao;
+}
+```
+
+### Variant 2: Multi-Scale AO
+
+**Difference from base version**: Exponentially increases sampling distances (0.1, 0.2, 0.4, 0.8, 1.6, 3.2, 6.4), computing short-range and long-range occlusion separately. Short-range AO reveals contact shadows and surface detail; long-range AO reveals large-scale environmental occlusion. Fully unrolled with no loops, making it GPU-efficient.
+
+**Source**: Multi-scale sampling approach
+
+```glsl
+// Multi-scale AO
+float calcAO_multiscale(vec3 pos, vec3 nor) {
+    // Short-range AO (contact shadows)
+    float aoS = 1.0;
+    aoS *= clamp(map(pos + nor * 0.1) * 10.0, 0.0, 1.0);  // Adjustable: distance 0.1, weight 10.0
+    aoS *= clamp(map(pos + nor * 0.2) * 5.0,  0.0, 1.0);  // Adjustable: distance 0.2, weight 5.0
+    aoS *= clamp(map(pos + nor * 0.4) * 2.5,  0.0, 1.0);  // Adjustable: distance 0.4, weight 2.5
+    aoS *= clamp(map(pos + nor * 0.8) * 1.25, 0.0, 1.0);  // Adjustable: distance 0.8, weight 1.25
+
+    // Long-range AO (large-scale occlusion)
+    float ao = aoS;
+    ao *= clamp(map(pos + nor * 1.6) * 0.625,  0.0, 1.0); // Adjustable: distance 1.6
+    ao *= clamp(map(pos + nor * 3.2) * 0.3125, 0.0, 1.0); // Adjustable: distance 3.2
+    ao *= clamp(map(pos + nor * 6.4) * 0.15625,0.0, 1.0);  // Adjustable: distance 6.4
+
+    return max(0.035, pow(ao, 0.3)); // pow compresses dynamic range, min prevents total black
+}
+```
+
+### Variant 3: Jittered Sampling AO
+
+**Difference from base version**: Adds hash-based jitter on top of uniform sample positions, breaking the banding artifacts caused by fixed sample spacing. Also uses a `1/(1+l)` distance-decay weight so farther samples have less influence.
+
+**Source**: Jittered sampling approach
+
+```glsl
+// Jittered sampling AO
+float hash(float n) { return fract(sin(n) * 43758.5453); }
+
+float calcAO_jittered(vec3 pos, vec3 nor, float maxDist) {
+    float ao = 0.0;
+    const float nbIte = 6.0;          // Adjustable: number of samples
+    for (float i = 1.0; i < nbIte + 0.5; i++) {
+        float l = (i + hash(i)) * 0.5 / nbIte * maxDist; // Jittered sample position
+        ao += (l - map(pos + nor * l)) / (1.0 + l);       // Distance-decay weight
+    }
+    return clamp(1.0 - ao / nbIte, 0.0, 1.0);
+}
+// Usage example: calcAO_jittered(pos, nor, 4.0)
+```
+
+### Variant 4: Hemispherical Random Direction AO
+
+**Difference from base version**: Instead of sampling only along the normal direction, generates multiple random directions within the normal hemisphere. Closer to the true physical model of ambient occlusion (light arriving from all directions in the hemisphere), but requires more samples (typically 32) for smooth results.
+
+**Source**: Hemispherical random direction approach
+
+```glsl
+// Hemispherical random direction AO
+vec2 hash2(float n) {
+    return fract(sin(vec2(n, n + 1.0)) * vec2(43758.5453, 22578.1459));
+}
+
+float calcAO_hemisphere(vec3 pos, vec3 nor, float seed) {
+    float occ = 0.0;
+    for (int i = 0; i < 32; i++) {                              // Adjustable: sample count (16~64)
+        float h = 0.01 + 4.0 * pow(float(i) / 31.0, 2.0);      // Quadratic distribution biased toward near-field
+        vec2 an = hash2(seed + float(i) * 13.1) * vec2(3.14159, 6.2831); // Random spherical coordinates
+        vec3 dir = vec3(sin(an.x) * sin(an.y), sin(an.x) * cos(an.y), cos(an.x));
+        dir *= sign(dot(dir, nor));                               // Flip to normal hemisphere
+        occ += clamp(5.0 * map(pos + h * dir) / h, -1.0, 1.0); // Signed occlusion
+    }
+    return clamp(occ / 32.0, 0.0, 1.0);
+}
+```
+
+### Variant 5: Fibonacci Sphere Uniform Hemisphere AO
+
+**Difference from base version**: Uses Fibonacci sphere points instead of random directions, achieving quasi-uniform hemisphere sampling distribution. Avoids the clustering problem of pure random sampling, yielding higher quality at the same sample count. Can also be paired with a separate directional occlusion function (e.g., SSS/soft shadow) for multi-level occlusion.
+
+**Source**: Fibonacci sphere sampling approach
+
+```glsl
+// Fibonacci sphere sampling AO
+vec3 forwardSF(float i, float n) {
+    const float PI  = 3.141592653589793;
+    const float PHI = 1.618033988749895;
+    float phi = 2.0 * PI * fract(i / PHI);
+    float zi = 1.0 - (2.0 * i + 1.0) / n;
+    float sinTheta = sqrt(1.0 - zi * zi);
+    return vec3(cos(phi) * sinTheta, sin(phi) * sinTheta, zi);
+}
+
+float hash1(float n) { return fract(sin(n) * 43758.5453); }
+
+float calcAO_fibonacci(vec3 pos, vec3 nor) {
+    float ao = 0.0;
+    for (int i = 0; i < 32; i++) {                         // Adjustable: sample count
+        vec3 ap = forwardSF(float(i), 32.0);
+        float h = hash1(float(i));
+        ap *= sign(dot(ap, nor)) * h * 0.1;                // Flip to hemisphere + random scale
+        ao += clamp(map(pos + nor * 0.01 + ap) * 3.0, 0.0, 1.0);
+    }
+    ao /= 32.0;
+    return clamp(ao * 6.0, 0.0, 1.0);
+}
+```
+
+## Performance Optimization Details
+
+### Bottleneck Analysis
+
+The performance bottleneck of SDF-AO lies almost entirely in **SDF sample count** — each `map()` call is a full scene distance computation. For complex scenes, this can be very expensive.
+
+### Optimization Techniques
+
+#### 1. Reduce Sample Count
+
+Classic normal-direction AO only needs 3~5 samples for acceptable quality. Hemispherical sampling is more physically correct but requires 16~32 samples; use it when the performance budget allows.
+
+#### 2. Early Exit Optimization
+
+Exit the loop early when accumulated occlusion is already large enough, avoiding unnecessary SDF computations.
+
+```glsl
+if (occ > 0.35) break; // Early exit when heavily occluded
+```
+
+#### 3. Unroll Loops
+
+For fixed sample counts (especially 4~7), manually unrolling loops avoids branch overhead and is GPU-friendly. The multi-scale AO variant fully unrolls 7 samples.
+
+#### 4. Simplify AO for Distant Objects
+
+Objects far from the camera can use fewer AO samples or skip AO entirely.
+
+```glsl
+float aoSteps = mix(5.0, 2.0, clamp(t / 50.0, 0.0, 1.0));
+```
+
+#### 5. Precompilation Switches
+
+Use `#ifdef` to disable AO in debug or low-performance modes.
+
+```glsl
+#ifdef ENABLE_AMBIENT_OCCLUSION
+    float ao = calcAO(pos, nor);
+#else
+    float ao = 1.0;
+#endif
+```
+
+#### 6. Hand-Painted Pseudo-AO Blending
+
+For static or semi-static scenes, pseudo-AO values (based on material ID or position) can be precomputed and blended with real-time AO to reduce runtime computation.
+
+```glsl
+float focc = /* preset occlusion based on material */;
+float finalAO = calcAO(pos, nor) * focc;
+```
+
+#### 7. SDF Simplification
+
+A simplified version of `map()` (ignoring small details) can be used for AO sampling, since AO is inherently low-frequency information.
+
+## Combination Suggestions in Detail
+
+### 1. AO + Soft Shadow
+
+The most common combination. AO handles indirect light occlusion (corners, crevices); soft shadows handle direct light occlusion. Simply multiply the two:
+
+```glsl
+float sha = calcShadow(pos, lightDir, 0.02, 20.0, 8.0);
+float ao = calcAO(pos, nor);
+col = diffuse * sha + ambient * ao; // Each handles its own domain
+// Or more simply:
+col = lighting * sha * ao;
+```
+
+### 2. AO + Sky Visibility
+
+Use the normal's y component to estimate the degree of upward-facing, multiplied with AO to simulate sky light occlusion:
+
+```glsl
+float skyVis = 0.5 + 0.5 * nor.y;
+col += skyColor * ao * skyVis;
+```
+
+### 3. AO + Subsurface Scattering / Bounce Light
+
+AO can modulate bounce light and SSS intensity (occluded areas also don't receive bounce light):
+
+```glsl
+float bou = clamp(-nor.y, 0.0, 1.0); // Downward-facing surfaces receive ground bounce
+col += bounceColor * bou * ao;
+col += sssColor * sss * (0.05 + 0.95 * ao); // SSS also modulated by AO
+```
+
+### 4. AO + Convexity / Corner Detection
+
+The same SDF probing loop can sample both outward (+N) and inward (-N), yielding AO and convexity information respectively, useful for edge highlights or wear effects:
+
+```glsl
+vec2 aoAndCorner = getOcclusion(pos, nor); // .x = AO, .y = convexity
+col *= aoAndCorner.x;                       // AO darkening
+col = mix(col, edgeColor, aoAndCorner.y);   // Convexity coloring
+```
+
+### 5. AO + Fresnel Environment Reflection
+
+AO should also modulate the environment reflection term; otherwise concave areas will show unnatural bright environment reflections:
+
+```glsl
+float fre = pow(1.0 - max(dot(rd, nor), 0.0), 5.0);
+col += envColor * fre * ao; // Reduce environment reflection in occluded areas
+```
--- a/skills/shader-dev/reference/analytic-ray-tracing.md
+++ b/skills/shader-dev/reference/analytic-ray-tracing.md
@@ -0,0 +1,651 @@
+# Analytic Ray Tracing - Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisite knowledge, step-by-step tutorial, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **Vector math fundamentals**: Dot product `dot()`, cross product `cross()`, vector normalization `normalize()`
+- **Quadratic equation solving**: Discriminant `b²-4ac`, meaning of the two roots
+- **Ray parametric representation**: `P(t) = ro + t * rd`, where `ro` is the ray origin, `rd` is the direction, `t` is the distance
+- **GLSL fundamentals**: `struct`, `inout` parameters, `vec3`/`vec4` operations
+- **ShaderToy framework**: `mainImage()` function, `iResolution`, `iTime`, and other uniforms
+
+## Use Cases (Complete List)
+
+- When rendering scenes composed of geometric primitives (spheres, planes, boxes, cylinders, tori, etc.)
+- When precise surface intersection points, normals, and distances are needed (no iterative approximation required)
+- When efficient ray intersection is needed in real-time rendering (several times faster than ray marching)
+- Building the underlying geometric engine for ray tracers and path tracers
+- Creating visualization effects for hard-surface modeling (jewelry, mechanical parts, chess scenes, etc.)
+- Scenes requiring precise shadows, reflections, and refractions (analytic solutions have no sampling error)
+
+## Core Principles in Detail
+
+The core idea of analytic ray tracing is: substitute the ray equation `P(t) = O + tD` into the implicit equation of the geometric body, obtaining an algebraic equation in `t`, then solve it using closed-form formulas.
+
+### Unified Framework
+
+All analytic intersection functions follow the same pattern:
+
+1. **Set up equation**: Substitute the ray parametric form into the geometry's implicit equation
+2. **Simplify and solve**: Use algebraic identities to reduce to a standard form (quadratic/quartic equation)
+3. **Discriminant check**: Discriminant < 0 indicates no intersection
+4. **Select nearest intersection**: Take the smallest positive root satisfying distance constraints
+5. **Compute normal**: Evaluate the gradient of the implicit equation at the intersection point
+
+### Key Mathematical Formulas
+
+**Sphere** `|P-C|² = r²` → quadratic equation: `t² + 2bt + c = 0`
+
+**Plane** `N·P + d = 0` → linear equation: `t = -(N·O + d) / (N·D)`
+
+**Box** Intersection of three pairs of parallel planes → Slab Method: `tN = max(t1.x, t1.y, t1.z), tF = min(t2.x, t2.y, t2.z)`
+
+**Ellipsoid** `|P/R|² = 1` → sphere intersection in scaled space
+
+**Torus** `(|P_xy| - R)² + P_z² = r²` → quartic equation, solved via resolvent cubic
+
+## Implementation Steps in Detail
+
+### Step 1: Ray Generation
+
+**What**: Generate a ray from the camera position through each pixel.
+
+**Why**: This is the starting point of ray tracing. Each pixel corresponds to a ray from the camera through the near plane. The standard approach is to construct a camera coordinate system (right, up, forward) and map normalized screen coordinates to world-space directions.
+
+```glsl
+// Construct camera ray
+vec3 generateRay(vec2 fragCoord, vec2 resolution, vec3 ro, vec3 ta) {
+    vec2 p = (2.0 * fragCoord - resolution) / resolution.y;
+
+    // Build camera coordinate system
+    vec3 cw = normalize(ta - ro);               // forward
+    vec3 cu = normalize(cross(cw, vec3(0, 1, 0))); // right
+    vec3 cv = cross(cu, cw);                    // up
+
+    float fov = 1.5; // Adjustable: field of view control (larger = narrower angle)
+    vec3 rd = normalize(p.x * cu + p.y * cv + fov * cw);
+    return rd;
+}
+```
+
+### Step 2: Ray-Sphere Intersection
+
+**What**: Compute the exact intersection of a ray with a sphere. This is the most fundamental and commonly used intersection function.
+
+**Why**: Substituting the ray `P = O + tD` into the sphere equation `|P - C|² = r²` and expanding yields a quadratic equation in `t`. The discriminant `h = b² - c` determines the number of intersections (0, 1, or 2); the smallest positive root is the nearest intersection.
+
+This is a ubiquitous technique, with two common variants:
+
+**Code (optimized version, assumes sphere centered at origin)**:
+```glsl
+// Ray-sphere intersection (optimized version for sphere at origin)
+// ro: ray origin (sphere center offset already subtracted)
+// rd: ray direction (must be normalized)
+// r:  sphere radius
+// Returns: intersection distance, MAX_DIST if no intersection
+float iSphere(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, float r) {
+    float b = dot(ro, rd);
+    float c = dot(ro, ro) - r * r;
+    float h = b * b - c;       // Discriminant (optimized: 4a factor omitted)
+    if (h < 0.0) return MAX_DIST; // No intersection
+
+    h = sqrt(h);
+    float d1 = -b - h;        // Near intersection
+    float d2 = -b + h;        // Far intersection
+
+    // Select the nearest intersection within valid range
+    if (d1 >= distBound.x && d1 <= distBound.y) {
+        normal = normalize(ro + rd * d1);
+        return d1;
+    } else if (d2 >= distBound.x && d2 <= distBound.y) {
+        normal = normalize(ro + rd * d2);
+        return d2;
+    }
+    return MAX_DIST;
+}
+```
+
+**Code (general version, arbitrary sphere center)**:
+```glsl
+// Ray-sphere intersection (general version, supports arbitrary sphere center)
+// sph: vec4(center.xyz, radius)
+float sphIntersect(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    if (h < 0.0) return -1.0;
+    return -b - sqrt(h);  // Returns only the near intersection
+}
+```
+
+### Step 3: Ray-Plane Intersection
+
+**What**: Compute the intersection of a ray with an infinite plane.
+
+**Why**: The plane equation `N·P + d = 0` substituted with the ray yields a linear equation, solved directly by division. This is the simplest intersection primitive, commonly used for floors, walls, Cornell Boxes, etc. Note: when `N·D ≈ 0`, the ray is parallel to the plane.
+
+```glsl
+// Ray-plane intersection
+// planeNormal: plane normal (must be normalized)
+// planeDist:   distance from plane to origin (N·P + planeDist = 0)
+float iPlane(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal,
+             vec3 planeNormal, float planeDist) {
+    float denom = dot(rd, planeNormal);
+    // Only intersects when ray hits the front face of the plane
+    if (denom > 0.0) return MAX_DIST;
+
+    float d = -(dot(ro, planeNormal) + planeDist) / denom;
+
+    if (d < distBound.x || d > distBound.y) return MAX_DIST;
+
+    normal = planeNormal;
+    return d;
+}
+
+// Quick version: horizontal ground plane (y-axis aligned)
+float iGroundPlane(vec3 ro, vec3 rd, float height) {
+    return -(ro.y - height) / rd.y;
+}
+```
+
+### Step 4: Ray-Box Intersection (Slab Method)
+
+**What**: Compute the intersection of a ray with an axis-aligned bounding box (AABB).
+
+**Why**: The Slab Method treats the box as the intersection of three pairs of parallel planes. It computes the ray's intersection with each pair of planes `(tmin, tmax)`, then takes the maximum of all `tmin` values and the minimum of all `tmax` values. If `tN > tF` or `tF < 0`, there is no intersection. The normal is determined by which face was hit first.
+
+```glsl
+// Ray-box intersection (Slab Method, optimized version)
+// boxSize: box half-size vec3(halfW, halfH, halfD)
+float iBox(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, vec3 boxSize) {
+    vec3 m = sign(rd) / max(abs(rd), 1e-8); // Avoid division by zero
+    vec3 n = m * ro;
+    vec3 k = abs(m) * boxSize;
+
+    vec3 t1 = -n - k;  // Near plane intersections
+    vec3 t2 = -n + k;  // Far plane intersections
+
+    float tN = max(max(t1.x, t1.y), t1.z); // Entry distance into the box
+    float tF = min(min(t2.x, t2.y), t2.z); // Exit distance from the box
+
+    if (tN > tF || tF <= 0.0) return MAX_DIST; // No intersection
+
+    if (tN >= distBound.x && tN <= distBound.y) {
+        // Normal: determine which face was hit
+        normal = -sign(rd) * step(t1.yzx, t1.xyz) * step(t1.zxy, t1.xyz);
+        return tN;
+    } else if (tF >= distBound.x && tF <= distBound.y) {
+        normal = -sign(rd) * step(t1.yzx, t1.xyz) * step(t1.zxy, t1.xyz);
+        return tF;
+    }
+    return MAX_DIST;
+}
+```
+
+### Step 5: Ray-Ellipsoid Intersection
+
+**What**: Compute the intersection of a ray with an ellipsoid.
+
+**Why**: An ellipsoid can be viewed as a sphere scaled differently along each axis. By dividing both the ray origin and direction by the ellipsoid radii `R`, a sphere intersection is performed in scaled space, then the normal is transformed back to the original space. This "space transformation" technique is one of the core ideas of analytic intersection.
+
+```glsl
+// Ray-ellipsoid intersection
+// rad: vec3(rx, ry, rz) three-axis radii
+float iEllipsoid(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, vec3 rad) {
+    // Transform to unit sphere space
+    vec3 ocn = ro / rad;
+    vec3 rdn = rd / rad;
+
+    float a = dot(rdn, rdn);
+    float b = dot(ocn, rdn);
+    float c = dot(ocn, ocn);
+    float h = b * b - a * (c - 1.0);
+
+    if (h < 0.0) return MAX_DIST;
+
+    float d = (-b - sqrt(h)) / a;
+
+    if (d < distBound.x || d > distBound.y) return MAX_DIST;
+
+    // Normal in original space: gradient of implicit equation |P/R|²=1 → P/(R²)
+    normal = normalize((ro + d * rd) / rad);
+    return d;
+}
+```
+
+### Step 6: Ray-Cylinder Intersection
+
+**What**: Compute the intersection of a ray with a finite cylinder (with end caps).
+
+**Why**: Cylinder intersection has two parts: (1) project the problem onto a plane perpendicular to the axis, solving a quadratic equation for side surface intersections; (2) check if the intersection is within the finite length, and if not, test the end cap planes.
+
+```glsl
+// Ray-capped cylinder intersection
+// pa, pb: two endpoints of the cylinder axis
+// ra: cylinder radius
+float iCylinder(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal,
+                vec3 pa, vec3 pb, float ra) {
+    vec3 ca = pb - pa;          // Cylinder axis vector
+    vec3 oc = ro - pa;
+
+    float caca = dot(ca, ca);
+    float card = dot(ca, rd);
+    float caoc = dot(ca, oc);
+
+    // Project onto plane perpendicular to axis, build quadratic equation
+    float a = caca - card * card;
+    float b = caca * dot(oc, rd) - caoc * card;
+    float c = caca * dot(oc, oc) - caoc * caoc - ra * ra * caca;
+    float h = b * b - a * c;
+
+    if (h < 0.0) return MAX_DIST;
+
+    h = sqrt(h);
+    float d = (-b - h) / a;
+
+    // Check if side intersection is within finite length
+    float y = caoc + d * card;
+    if (y > 0.0 && y < caca && d >= distBound.x && d <= distBound.y) {
+        normal = (oc + d * rd - ca * y / caca) / ra;
+        return d;
+    }
+
+    // Test end caps
+    d = ((y < 0.0 ? 0.0 : caca) - caoc) / card;
+    if (abs(b + a * d) < h && d >= distBound.x && d <= distBound.y) {
+        normal = normalize(ca * sign(y) / caca);
+        return d;
+    }
+
+    return MAX_DIST;
+}
+```
+
+### Step 7: Scene Intersection & Shading
+
+**What**: Traverse all objects in the scene, find the nearest intersection, and compute lighting.
+
+**Why**: Scene traversal in analytic ray tracing is linear — each ray tests all objects sequentially. Through the unified intersection API (`distBound` parameter), each time a nearer intersection is found, the search range is automatically shortened, achieving implicit culling.
+
+```glsl
+#define MAX_DIST 1e10
+
+// Unified scene intersection function
+// Returns vec3(current nearest distance, final intersection distance, material ID)
+vec3 worldHit(vec3 ro, vec3 rd, vec2 dist, out vec3 normal) {
+    vec3 d = vec3(dist, 0.0); // (distBound.x, distBound.y, matID)
+    vec3 tmpNormal;
+
+    // Ground plane
+    float t = iPlane(ro, rd, d.xy, normal, vec3(0, 1, 0), 0.0);
+    if (t < d.y) { d.y = t; d.z = 1.0; }
+
+    // Sphere
+    t = iSphere(ro - vec3(0, 0.5, 0), rd, d.xy, tmpNormal, 0.5);
+    if (t < d.y) { d.y = t; d.z = 2.0; normal = tmpNormal; }
+
+    // Box
+    t = iBox(ro - vec3(2, 0.5, 0), rd, d.xy, tmpNormal, vec3(0.5));
+    if (t < d.y) { d.y = t; d.z = 3.0; normal = tmpNormal; }
+
+    return d;
+}
+
+// Basic shading (Lambertian + shadow)
+vec3 shade(vec3 pos, vec3 normal, vec3 rd, vec3 albedo) {
+    vec3 lightDir = normalize(vec3(-1.0, 0.75, 1.0));
+
+    // Diffuse
+    float diff = max(dot(normal, lightDir), 0.0);
+
+    // Ambient
+    float amb = 0.5 + 0.5 * normal.y;
+
+    return albedo * (amb * 0.2 + diff * 0.8);
+}
+```
+
+### Step 8: Reflection & Refraction
+
+**What**: Implement iterative reflection/refraction for non-recursive ray bounces.
+
+**Why**: GLSL does not support recursion, so loops are used to simulate multiple bounces. At each bounce, the intersection point plus offset (epsilon) serves as the new ray origin, with the reflected/refracted direction as the new direction. The Fresnel term determines the energy distribution between reflection and refraction.
+
+```glsl
+#define MAX_BOUNCES 4       // Adjustable: number of reflection bounces (more = more realistic but slower)
+#define EPSILON 0.001        // Adjustable: self-intersection offset
+
+// Schlick Fresnel approximation
+float schlickFresnel(float cosTheta, float F0) {
+    return F0 + (1.0 - F0) * pow(1.0 - cosTheta, 5.0);
+}
+
+vec3 radiance(vec3 ro, vec3 rd) {
+    vec3 color = vec3(0.0);
+    vec3 mask = vec3(1.0);
+    vec3 normal;
+
+    for (int i = 0; i < MAX_BOUNCES; i++) {
+        vec3 res = worldHit(ro, rd, vec2(EPSILON, MAX_DIST), normal);
+
+        if (res.z < 0.5) {
+            // No object hit → sky color
+            color += mask * vec3(0.6, 0.8, 1.0);
+            break;
+        }
+
+        vec3 hitPos = ro + rd * res.y;
+        vec3 albedo = getAlbedo(res.z);
+
+        // Fresnel reflection coefficient
+        float F = schlickFresnel(max(0.0, dot(normal, -rd)), 0.04);
+
+        // Add diffuse contribution
+        color += mask * (1.0 - F) * shade(hitPos, normal, rd, albedo);
+
+        // Update mask and ray (reflection)
+        mask *= F * albedo;
+        rd = reflect(rd, normal);
+        ro = hitPos + EPSILON * rd;
+    }
+
+    return color;
+}
+```
+
+## Complete Code Template
+
+For a complete runnable ShaderToy template, see the "Complete Code Template" section in [SKILL.md](SKILL.md), which includes sphere, plane, and box primitives with support for reflections and Blinn-Phong shading.
+
+The following table describes the adjustable parameters in the template:
+
+| Parameter | Default | Description |
+|-----------|---------|-------------|
+| `MAX_DIST` | `1e10` | Maximum trace distance |
+| `EPSILON` | `0.001` | Self-intersection offset |
+| `MAX_BOUNCES` | `4` | Maximum number of reflections |
+| `NUM_SPHERES` | `3` | Number of spheres |
+| `FOV` | `1.5` | Field of view (larger = narrower angle) |
+| `GAMMA` | `2.2` | Gamma correction value |
+| `SHADOW_ENABLED` | `true` | Whether shadows are enabled |
+
+## Variant Details
+
+### Variant 1: Path Tracing
+
+Difference from base version: Replaces deterministic reflection with random hemisphere sampling to achieve global illumination. Requires multi-frame accumulation and random number generation.
+
+Key code:
+```glsl
+// Cosine-weighted random hemisphere direction
+vec3 cosWeightedRandomHemisphereDirection(vec3 n, inout float seed) {
+    vec2 r = hash2(seed);
+    vec3 uu = normalize(cross(n, abs(n.y) > 0.5 ? vec3(1,0,0) : vec3(0,1,0)));
+    vec3 vv = cross(uu, n);
+    float ra = sqrt(r.y);
+    float rx = ra * cos(6.2831 * r.x);
+    float ry = ra * sin(6.2831 * r.x);
+    float rz = sqrt(1.0 - r.y);
+    return normalize(rx * uu + ry * vv + rz * n);
+}
+
+// Replace reflect in the bounce loop:
+rd = cosWeightedRandomHemisphereDirection(normal, seed);
+ro = hitPos + EPSILON * rd;
+mask *= mat.albedo; // No Fresnel weighting
+```
+
+### Variant 2: Analytical Soft Shadow
+
+Difference from base version: Uses the analytical distance from a sphere to the ray to compute soft shadow gradients, without additional sampling.
+
+Key code:
+```glsl
+// Sphere soft shadow
+float sphSoftShadow(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    // d: closest distance from ray to sphere surface, t: distance along ray
+    float d = sqrt(max(0.0, sph.w * sph.w - h)) - sph.w;
+    float t = -b - sqrt(max(h, 0.0));
+    return (t > 0.0) ? max(d, 0.0) / t : 1.0;
+}
+```
+
+### Variant 3: Analytical Antialiasing
+
+Difference from base version: Uses the analytical distance from a sphere to the ray to compute pixel coverage, achieving edge smoothing without multi-sampling.
+
+Key code:
+```glsl
+// Sphere distance information (for antialiasing)
+vec2 sphDistances(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    float d = sqrt(max(0.0, sph.w * sph.w - h)) - sph.w; // Closest distance
+    return vec2(d, -b - sqrt(max(h, 0.0)));                // (distance, depth)
+}
+
+// In rendering, use coverage instead of hard boundary:
+float px = 2.0 / iResolution.y; // Pixel size
+vec2 dt = sphDistances(ro, rd, sph);
+float coverage = 1.0 - clamp(dt.x / (dt.y * px), 0.0, 1.0);
+col = mix(bgColor, sphereColor, coverage);
+```
+
+### Variant 4: Refraction (with Snell's Law)
+
+Difference from base version: Adds refracted rays; requires detecting whether the ray hits the surface from outside or inside, and flipping the normal accordingly.
+
+Key code:
+```glsl
+float refrIndex = 1.5; // Adjustable: index of refraction (glass≈1.5, water≈1.33)
+
+// Add refraction branch in the bounce loop:
+bool inside = dot(rd, normal) > 0.0;
+vec3 n = inside ? -normal : normal;
+float eta = inside ? refrIndex : 1.0 / refrIndex;
+vec3 refracted = refract(rd, n, eta);
+
+// Fresnel determines reflection/refraction ratio
+float cosI = abs(dot(rd, n));
+float F = schlick(cosI, pow((1.0 - eta) / (1.0 + eta), 2.0));
+
+if (refracted != vec3(0.0) && hash1(seed) > F) {
+    rd = refracted;
+} else {
+    rd = reflect(rd, n);
+}
+ro = hitPos + rd * EPSILON;
+```
+
+### Variant 5: Higher-Order Algebraic Surfaces (Quartic Surfaces - Sphere4, Goursat, Torus)
+
+Difference from base version: Substitutes the ray into quartic equations, solving via the resolvent cubic method. Suitable for tori, super-ellipsoids, and similar shapes.
+
+Key code:
+```glsl
+// Ray-Sphere4 intersection (|x|⁴+|y|⁴+|z|⁴ = r⁴)
+float iSphere4(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, float ra) {
+    float r2 = ra * ra;
+    vec3 d2 = rd*rd, d3 = d2*rd;
+    vec3 o2 = ro*ro, o3 = o2*ro;
+    float ka = 1.0 / dot(d2, d2);
+
+    float k0 = ka * dot(ro, d3);
+    float k1 = ka * dot(o2, d2);
+    float k2 = ka * dot(o3, rd);
+    float k3 = ka * (dot(o2, o2) - r2 * r2);
+
+    // Reduce to depressed quartic, solve via resolvent cubic
+    float c0 = k1 - k0 * k0;
+    float c1 = k2 + 2.0 * k0 * (k0 * k0 - 1.5 * k1);
+    float c2 = k3 - 3.0 * k0 * (k0 * (k0 * k0 - 2.0 * k1) + 4.0/3.0 * k2);
+
+    float p = c0 * c0 * 3.0 + c2;
+    float q = c0 * c0 * c0 - c0 * c2 + c1 * c1;
+    float h = q * q - p * p * p * (1.0/27.0);
+
+    if (h < 0.0) return MAX_DIST; // Convex body: only need to handle 2 real roots case
+
+    h = sqrt(h);
+    float s = sign(q+h) * pow(abs(q+h), 1.0/3.0);
+    float t = sign(q-h) * pow(abs(q-h), 1.0/3.0);
+
+    vec2 v = vec2((s+t) + c0*4.0, (s-t) * sqrt(3.0)) * 0.5;
+    float r = length(v);
+    float d = -abs(v.y) / sqrt(r + v.x) - c1/r - k0;
+
+    if (d >= distBound.x && d <= distBound.y) {
+        vec3 pos = ro + rd * d;
+        normal = normalize(pos * pos * pos); // Gradient: 4x³
+        return d;
+    }
+    return MAX_DIST;
+}
+```
+
+## Performance Optimization Details
+
+### 1. Distance Bound Pruning
+
+The most important optimization. Each time a nearer intersection is found, `distBound.y` is shortened, and subsequent objects are automatically skipped:
+```glsl
+// distBound.y continuously shrinks with opU
+d = opU(d, iSphere(..., d.xy, ...), matId);
+d = opU(d, iBox(..., d.xy, ...), matId);   // Automatically skips objects farther than current hit
+```
+
+### 2. Bounding Sphere / Bounding Box Pre-Test
+
+For complex geometry (tori, Goursat surfaces, etc.), test a simple bounding sphere first to check for possible intersection:
+```glsl
+// Test bounding sphere before torus intersection
+if (iSphere(ro, rd, distBound, tmpNormal, torus.x + torus.y) > distBound.y) {
+    return MAX_DIST; // Bounding sphere missed, skip expensive quartic equation
+}
+```
+
+### 3. Shadow Ray Early Exit
+
+Shadow detection only needs to know "whether there is an occluder," not the nearest intersection, so a simplified intersection function can be used:
+```glsl
+// Fast sphere occlusion test (only checks for intersection, no normal computation)
+float fastSphIntersect(vec3 ro, vec3 rd, vec3 center, float r) {
+    vec3 v = ro - center;
+    float b = dot(v, rd);
+    float c = dot(v, v) - r * r;
+    float d = b * b - c;
+    if (d > 0.0) {
+        float t = -b - sqrt(d);
+        if (t > 0.0) return t;
+        t = -b + sqrt(d);
+        if (t > 0.0) return t;
+    }
+    return -1.0;
+}
+```
+
+### 4. Grid Acceleration Structure
+
+For large numbers of identical primitives (e.g., hundreds of spheres), use a spatial grid to accelerate ray traversal:
+```glsl
+// 3D DDA grid traversal (for scenes with many spheres)
+vec3 pos = floor(ro / GRIDSIZE) * GRIDSIZE;
+vec3 ri = 1.0 / rd;
+vec3 rs = sign(rd) * GRIDSIZE;
+vec3 dis = (pos - ro + 0.5 * GRIDSIZE + rs * 0.5) * ri;
+
+for (int i = 0; i < MAX_STEPS; i++) {
+    // Test spheres in current cell
+    testSphereInGrid(pos.xz, ro, rd, ...);
+    // DDA step to next cell
+    vec3 mm = step(dis.xyz, dis.zyx);
+    dis += mm * rs * ri;
+    pos += mm * rs;
+}
+```
+
+### 5. Avoiding Unnecessary sqrt
+
+Return early when the discriminant is negative, avoiding `sqrt()` on negative numbers. In some scenarios, the discriminant's sign can be used for coarse pre-filtering:
+```glsl
+// Check if ray is heading toward sphere and not inside it
+if (c > 0.0 && b > 0.0) return MAX_DIST; // Fast cull
+```
+
+## Combination Suggestions in Detail
+
+### 1. Analytic Intersection + Raymarching SDF
+
+Use analytic primitives for large simple geometry (ground, bounding boxes), and SDF raymarching for complex details (fractals, smooth boolean operations). Analytic intersection provides precise start/end distances, accelerating marching convergence:
+```glsl
+float d = iBox(ro, rd, distBound, normal, boxSize); // Analytic box
+if (d < MAX_DIST) {
+    // Refine with SDF inside the box
+    float t = d;
+    for (int i = 0; i < 64; i++) {
+        float h = sdfScene(ro + t * rd);
+        if (h < 0.001) break;
+        t += h;
+    }
+}
+```
+
+### 2. Analytic Intersection + Volumetric Effects
+
+Use analytic intersection to obtain precise entry/exit distances, then perform volumetric sampling (clouds, fog, subsurface scattering) within that range:
+```glsl
+// Use analytic ellipsoid intersection to obtain volume bounds
+float tEnter = (-b - sqrt(h)) / a;
+float tExit  = (-b + sqrt(h)) / a;
+float thickness = tExit - tEnter; // Analytic thickness
+
+// Sample volume within [tEnter, tExit]
+vec3 volumeColor = vec3(0.0);
+float dt = (tExit - tEnter) / float(VOLUME_STEPS);
+for (int i = 0; i < VOLUME_STEPS; i++) {
+    vec3 p = ro + rd * (tEnter + float(i) * dt);
+    volumeColor += sampleVolume(p) * dt;
+}
+```
+
+### 3. Analytic Intersection + PBR Material System
+
+Analytic intersection provides precise normals and intersection positions, feeding directly into Cook-Torrance and other PBR shading models:
+```glsl
+// Cook-Torrance BRDF (requires precise normals)
+float D = beckmannDistribution(NdotH, roughness);
+float G = geometricAttenuation(NdotV, NdotL, VdotH, NdotH);
+float F = fresnelSchlick(VdotH, F0);
+vec3 specular = vec3(D * G * F) / (4.0 * NdotV * NdotL);
+```
+
+### 4. Analytic Intersection + Spatial Transforms
+
+Reuse the same intersection function for transformed geometry by rotating/translating/scaling the ray:
+```glsl
+// Rotate object: rotate the ray instead of the object
+vec3 localRo = rotateY(ro - objectPos, angle);
+vec3 localRd = rotateY(rd, angle);
+float t = iBox(localRo, localRd, distBound, localNormal, boxSize);
+// Transform normal back to world space
+normal = rotateY(localNormal, -angle);
+```
+
+### 5. Analytic Intersection + Analytical AO / Soft Shadow / Antialiasing
+
+A fully analytic rendering pipeline: intersection, shadows, occlusion, and edge smoothing all use closed-form formulas, producing zero noise:
+```glsl
+// Fully analytic pipeline (no random sampling, no noise)
+float t = sphIntersect(ro, rd, sph);        // Analytic intersection
+float shadow = sphSoftShadow(hitPos, ld, sph); // Analytic soft shadow
+float ao = sphOcclusion(hitPos, normal, sph);  // Analytic ambient occlusion
+float coverage = sphAntiAlias(ro, rd, sph, px); // Analytic antialiasing
+```
--- a/skills/shader-dev/reference/anti-aliasing.md
+++ b/skills/shader-dev/reference/anti-aliasing.md
@@ -0,0 +1,71 @@
+# Anti-Aliasing Detailed Reference
+
+## Prerequisites
+- Understanding of screen-space derivatives (`dFdx`, `dFdy`, `fwidth`)
+- Multipass buffer setup (for TAA)
+- Basic signal processing concepts
+
+## Sampling Theory (Nyquist)
+
+The **Nyquist-Shannon theorem** states: to accurately represent a signal, sampling rate must be ≥ 2× the highest frequency present. In shader terms:
+- Pixel grid = sampling rate
+- Procedural detail / edge sharpness = signal frequency
+- When detail frequency > pixel frequency → aliasing (moiré, crawling edges)
+
+**Solutions**: either increase sampling rate (SSAA) or reduce signal frequency (analytical AA, filtering).
+
+## SSAA Implementation Details
+
+### Jitter Patterns
+- **Grid**: `offset = vec2(m, n) / AA - 0.5` — simple, uniform coverage
+- **Rotated grid (RGSS)**: 4 samples at rotated positions — better edge coverage for near-horizontal/vertical lines
+- **Halton sequence**: quasi-random low-discrepancy — best coverage for high sample counts
+
+### Performance
+AA=2 (4 samples) is the practical limit for real-time SDF scenes. AA=3 (9 samples) for offline/screenshot quality only.
+
+## SDF Analytical AA Deep Dive
+
+### Why `fwidth` Works
+
+`fwidth(d) = abs(dFdx(d)) + abs(dFdy(d))` approximates how much the SDF value changes across one pixel. Using this as the smoothstep width:
+- Edge transition spans exactly ~1 pixel regardless of zoom level
+- No texture sampling needed — purely analytical
+- Works for any SDF shape
+
+### Signed Distance to Coverage
+
+For a 2D SDF with value `d` at a pixel center:
+```
+coverage ≈ clamp(0.5 - d / fwidth(d), 0.0, 1.0)
+```
+This maps the signed distance to an approximate pixel coverage, equivalent to a box filter over the pixel footprint.
+
+## TAA with Neighborhood Clamping
+
+Full TAA pipeline:
+1. **Jitter**: offset pixel center by Halton(2,3) sequence each frame
+2. **Render**: full scene at jittered position → Buffer A
+3. **Reproject**: use motion vectors to find previous frame's pixel for current position
+4. **Clamp**: restrict history color to the min/max of current frame's 3×3 neighborhood (prevents ghosting)
+5. **Blend**: `output = mix(current, clampedHistory, 0.9)`
+
+### Neighborhood Clamping
+```glsl
+vec3 minCol = vec3(1e10), maxCol = vec3(-1e10);
+for (int x = -1; x <= 1; x++)
+for (int y = -1; y <= 1; y++) {
+    vec3 s = texelFetch(currentBuffer, ivec2(fragCoord) + ivec2(x,y), 0).rgb;
+    minCol = min(minCol, s);
+    maxCol = max(maxCol, s);
+}
+vec3 clampedHistory = clamp(history, minCol, maxCol);
+```
+
+## FXAA Algorithm Walkthrough
+
+1. **Luma computation**: Convert 5 samples (center + NSEW) to luminance
+2. **Edge detection**: `lumaRange = lumaMax - lumaMin` — skip if below threshold
+3. **Edge orientation**: Compare horizontal vs vertical luma gradients to determine edge direction
+4. **Sub-pixel blending**: Sample along the edge direction at 1/3 and 2/3 offsets
+5. **Quality**: The simplified version uses 2 taps; full FXAA 3.11 uses up to 12 taps along the edge for better endpoint detection
--- a/skills/shader-dev/reference/atmospheric-scattering.md
+++ b/skills/shader-dev/reference/atmospheric-scattering.md
@@ -0,0 +1,571 @@
+# Atmospheric & Subsurface Scattering — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, variant details, and complete combination code examples.
+
+## Prerequisites
+
+Foundational concepts required before using this Skill:
+
+- **GLSL Fundamentals**: uniforms, varyings, built-in functions
+- **Vector Math**: dot product, cross product, vector normalization
+- **Ray-Sphere Intersection**: given a ray origin and direction, find the intersection distances with a sphere surface
+- **Physical Meaning of Exponential Functions** (Beer-Lambert Law): light attenuates exponentially through a medium, `I = I₀ × e^(-σ×d)`, where σ is the extinction coefficient and d is the distance
+- **Basic Ray Marching Concepts**: advancing step by step along a ray direction, accumulating information at each sample point
+
+## Core Principles
+
+Atmospheric scattering simulates the process of photons passing through the atmosphere and colliding with gas molecules/aerosol particles, changing direction. There are three core physical mechanisms:
+
+### 1. Rayleigh Scattering (Molecular Scattering)
+
+Caused by particles much smaller than the wavelength of light (nitrogen, oxygen molecules). **Short wavelengths (blue light) scatter much more strongly than long wavelengths (red light)** — this is why the sky is blue and sunsets are red.
+
+The scattering coefficient is inversely proportional to the fourth power of wavelength:
+```
+β_R(λ) ∝ 1/λ⁴
+```
+Typical sea-level values for Earth: `β_R = vec3(5.5e-6, 13.0e-6, 22.4e-6)` (RGB channels, in m⁻¹)
+
+**Rayleigh Phase Function** (describes the angular distribution of light scattering, symmetric front-to-back):
+```
+P_R(θ) = 3/(16π) × (1 + cos²θ)
+```
+
+### 2. Mie Scattering (Aerosol Scattering)
+
+Caused by particles roughly the same size as the wavelength of light (water droplets, dust). **Wavelength-independent (all colors scatter equally)**, but with strong forward scattering characteristics, forming the halo around the sun.
+
+Typical sea-level values for Earth: `β_M = vec3(21e-6)` (same for all channels)
+
+**Henyey-Greenstein Phase Function** (describes the strong forward scattering of Mie scattering):
+```
+P_HG(θ, g) = (1 - g²) / (4π × (1 + g² - 2g·cosθ)^(3/2))
+```
+Where `g ∈ (-1, 1)` controls forward scattering strength; typical Earth atmosphere value `g ≈ 0.76 ~ 0.88`.
+
+### 3. Beer-Lambert Attenuation
+
+Exponential attenuation of light through a medium:
+```
+T(A→B) = exp(-∫ σ_e(s) ds)   // Transmittance from A to B
+```
+Where `σ_e` is the extinction coefficient (extinction = scattering + absorption).
+
+### Overall Algorithm Flow
+
+March along the view direction (ray march), at each sample point:
+1. Compute the atmospheric density at that point (decreases exponentially with altitude)
+2. Perform a second march toward the light source to compute the optical depth from the sun to that point
+3. Use Beer-Lambert to calculate the sun light intensity reaching that point
+4. Use the phase function to compute the amount of light scattered toward the camera
+5. Accumulate contributions from all sample points
+
+## Implementation Steps
+
+### Step 1: Ray-Sphere Intersection
+
+**What**: Compute the intersection points of the view ray with the atmospheric shell to determine the ray march start/end range.
+
+**Why**: The atmosphere is a spherical shell around the planet; we only integrate within the shell.
+
+```glsl
+// Ray-sphere intersection, returns distances to two intersection points (t_near, t_far)
+// p: ray origin (relative to sphere center), dir: ray direction, r: sphere radius
+vec2 raySphereIntersect(vec3 p, vec3 dir, float r) {
+    float b = dot(p, dir);
+    float c = dot(p, p) - r * r;
+    float d = b * b - c;
+    if (d < 0.0) return vec2(1e5, -1e5); // No intersection
+    d = sqrt(d);
+    return vec2(-b - d, -b + d);
+}
+```
+
+Derivation: sphere equation `|p + t·dir|² = r²` expands to `t² + 2t·dot(p,dir) + dot(p,p) - r² = 0`. Since `dir` is normalized, `a=1` can be omitted, and the two t values are solved directly with the quadratic formula.
+
+### Step 2: Define Atmospheric Physical Constants
+
+**What**: Set the scale parameters and scattering coefficients for the planet and atmosphere.
+
+**Why**: These physical constants determine the sky's color characteristics. The different RGB values in Rayleigh produce the blue sky (blue channel has the largest scattering coefficient); Mie's uniform values produce white halos (all wavelengths scatter equally).
+
+```glsl
+#define PLANET_RADIUS 6371e3          // Earth radius (m)
+#define ATMOS_RADIUS  6471e3          // Atmosphere outer radius (m), about 100km above Earth's radius
+#define PLANET_CENTER vec3(0.0)       // Planet center position
+
+// Scattering coefficients (m⁻¹), sea-level values
+#define BETA_RAY vec3(5.5e-6, 13.0e-6, 22.4e-6) // Tunable: Rayleigh scattering, changes sky base color
+#define BETA_MIE vec3(21e-6)                      // Tunable: Mie scattering, changes halo intensity
+#define BETA_OZONE vec3(2.04e-5, 4.97e-5, 1.95e-6) // Tunable: ozone absorption, affects zenith deep blue
+
+// Mie phase function anisotropy parameter
+#define MIE_G 0.76   // Tunable: 0.76~0.88, larger = more concentrated sun halo
+
+// Scale heights (m): altitude at which density drops to 1/e
+#define H_RAY 8000.0  // Tunable: Rayleigh scale height, larger = thicker atmosphere
+#define H_MIE 1200.0  // Tunable: Mie scale height, larger = higher haze layer
+
+// Ozone parameters (optional)
+#define H_OZONE 30e3         // Ozone peak altitude
+#define OZONE_FALLOFF 4e3    // Ozone falloff width
+
+// Sample step counts
+#define PRIMARY_STEPS 32 // Tunable: primary ray steps, more = higher quality
+#define LIGHT_STEPS 8    // Tunable: light direction steps
+```
+
+Parameter tuning guide:
+- Increase overall `BETA_RAY` → more vivid sky color
+- Modify `BETA_RAY` RGB ratios → change sky base hue (e.g., increasing the red component produces a more purple sky)
+- Increase `BETA_MIE` → brighter halo around the sun, more haze
+- Increase `MIE_G` → halo more concentrated toward the sun direction (narrower disk)
+- Increase `H_RAY` → effective atmosphere thickness increases, sky color more uniform
+- Increase `H_MIE` → haze layer higher, low-altitude fog effect weakened
+
+### Step 3: Implement Phase Functions
+
+**What**: Compute the probability distribution of light being scattered at different angles.
+
+**Why**: The Rayleigh phase is symmetrically distributed (scatters both forward and backward); the Mie phase is strongly biased forward. This determines the brightness distribution across the sky — brighter facing the sun (Mie dominant), with some brightness away from the sun (Rayleigh dominant).
+
+```glsl
+// Rayleigh phase function: symmetric front-to-back
+float phaseRayleigh(float cosTheta) {
+    return 3.0 / (16.0 * 3.14159265) * (1.0 + cosTheta * cosTheta);
+}
+
+// Henyey-Greenstein phase function: forward scattering
+// g: anisotropy parameter, 0 = isotropic, close to 1 = strong forward scattering
+float phaseMie(float cosTheta, float g) {
+    float gg = g * g;
+    float num = (1.0 - gg) * (1.0 + cosTheta * cosTheta);
+    float denom = (2.0 + gg) * pow(1.0 + gg - 2.0 * g * cosTheta, 1.5);
+    return 3.0 / (8.0 * 3.14159265) * num / denom;
+}
+```
+
+Note: the Mie phase function here uses the Cornette-Shanks improved version (with an additional `(1 + cos²θ)` term in the numerator and `(2 + g²)` normalization correction in the denominator), which is more physically accurate than the original HG.
+
+### Step 4: Atmospheric Density Sampling
+
+**What**: Compute the atmospheric particle density at a given point based on altitude.
+
+**Why**: Atmospheric density decreases exponentially with altitude, and different components (Rayleigh, Mie, ozone) have different decay rates. Rayleigh particles (gas molecules) have a scale height of about 8km, Mie particles (aerosols) are concentrated in the lower layer with a scale height of about 1.2km, and ozone peaks at approximately 30km altitude.
+
+```glsl
+// Returns vec3(rayleigh_density, mie_density, ozone_density)
+vec3 atmosphereDensity(vec3 pos, float planetRadius) {
+    float height = length(pos) - planetRadius;
+
+    float densityRay = exp(-height / H_RAY);
+    float densityMie = exp(-height / H_MIE);
+
+    // Ozone: peaks at ~30km altitude, approximated with Lorentzian distribution
+    float denom = (H_OZONE - height) / OZONE_FALLOFF;
+    float densityOzone = (1.0 / (denom * denom + 1.0)) * densityRay;
+
+    return vec3(densityRay, densityMie, densityOzone);
+}
+```
+
+Mathematical explanation of ozone distribution: `1/(x² + 1)` is the form of a Lorentzian/Cauchy distribution, reaching its maximum value of 1 at `x=0` (i.e., `height = H_OZONE`), then symmetrically decaying on both sides. Multiplying by `densityRay` accounts for ozone also being affected by the overall atmospheric density decrease.
+
+### Step 5: Light Direction Optical Depth
+
+**What**: From a sample point on the primary ray, march toward the sun to the atmosphere edge, accumulating optical depth.
+
+**Why**: This determines how much the sunlight has been attenuated before reaching that point. At sunset, the light path passes through more atmosphere, and blue light is scattered away (because Rayleigh scattering coefficient's blue component is largest), leaving only red light — this is the physical reason sunsets are red.
+
+```glsl
+// Compute optical depth from pos along sunDir to the atmosphere edge
+vec3 lightOpticalDepth(vec3 pos, vec3 sunDir) {
+    float atmoDist = raySphereIntersect(pos - PLANET_CENTER, sunDir, ATMOS_RADIUS).y;
+    float stepSize = atmoDist / float(LIGHT_STEPS);
+    float rayPos = stepSize * 0.5;
+
+    vec3 optDepth = vec3(0.0); // (ray, mie, ozone)
+
+    for (int i = 0; i < LIGHT_STEPS; i++) {
+        vec3 samplePos = pos + sunDir * rayPos;
+        float height = length(samplePos - PLANET_CENTER) - PLANET_RADIUS;
+
+        // If sample point is below the surface, it's occluded by the planet
+        if (height < 0.0) return vec3(1e10); // Fully occluded
+
+        vec3 density = atmosphereDensity(samplePos, PLANET_RADIUS);
+        optDepth += density * stepSize;
+
+        rayPos += stepSize;
+    }
+    return optDepth;
+}
+```
+
+`stepSize * 0.5` as the starting offset is the midpoint sampling rule, which approximates the integral more accurately than endpoint sampling.
+
+### Step 6: Primary Scattering Integral (Core Loop)
+
+**What**: Ray march along the view direction, computing the in-scattering contribution at each sample point and accumulating.
+
+**Why**: This is the core of the entire algorithm — integrating all scattered light along the view direction that reaches the eye. Each point's contribution = sunlight reaching that point × density at that point × attenuation from that point to the camera.
+
+Mathematical expression:
+```
+L(camera) = ∫[tStart→tEnd] sunIntensity × T(sun→s) × σ_s(s) × P(θ) × T(s→camera) ds
+```
+Where T is transmittance, σ_s is the scattering coefficient, and P is the phase function.
+
+```glsl
+vec3 calculateScattering(
+    vec3 rayOrigin,    // Camera position
+    vec3 rayDir,       // View direction
+    float maxDist,     // Maximum distance (scene occlusion)
+    vec3 sunDir,       // Sun direction
+    vec3 sunIntensity  // Sun intensity
+) {
+    // Compute ray-atmosphere intersection
+    vec2 atmoHit = raySphereIntersect(rayOrigin - PLANET_CENTER, rayDir, ATMOS_RADIUS);
+    if (atmoHit.x > atmoHit.y) return vec3(0.0); // Missed atmosphere
+
+    // Compute ray-planet intersection (ground occlusion)
+    vec2 planetHit = raySphereIntersect(rayOrigin - PLANET_CENTER, rayDir, PLANET_RADIUS);
+
+    // Determine march range
+    float tStart = max(atmoHit.x, 0.0);
+    float tEnd = atmoHit.y;
+    if (planetHit.x > 0.0) tEnd = min(tEnd, planetHit.x); // Ground occlusion
+    tEnd = min(tEnd, maxDist); // Scene object occlusion
+
+    float stepSize = (tEnd - tStart) / float(PRIMARY_STEPS);
+
+    // Precompute phase functions (view-sun angle is constant along the entire ray)
+    float cosTheta = dot(rayDir, sunDir);
+    float phaseR = phaseRayleigh(cosTheta);
+    float phaseM = phaseMie(cosTheta, MIE_G);
+
+    // Accumulators
+    vec3 totalRay = vec3(0.0); // Rayleigh in-scatter
+    vec3 totalMie = vec3(0.0); // Mie in-scatter
+    vec3 optDepthI = vec3(0.0); // View direction optical depth (ray, mie, ozone)
+
+    float rayPos = tStart + stepSize * 0.5;
+
+    for (int i = 0; i < PRIMARY_STEPS; i++) {
+        vec3 samplePos = rayOrigin + rayDir * rayPos;
+
+        // 1. Sample density
+        vec3 density = atmosphereDensity(samplePos, PLANET_RADIUS) * stepSize;
+        optDepthI += density;
+
+        // 2. Compute light direction optical depth
+        vec3 optDepthL = lightOpticalDepth(samplePos, sunDir);
+
+        // 3. Beer-Lambert attenuation: total attenuation from sun through this point to camera
+        vec3 tau = BETA_RAY * (optDepthI.x + optDepthL.x)
+                 + BETA_MIE * 1.1 * (optDepthI.y + optDepthL.y) // 1.1 is Mie extinction/scattering ratio
+                 + BETA_OZONE * (optDepthI.z + optDepthL.z);
+        vec3 attenuation = exp(-tau);
+
+        // 4. Accumulate in-scattering
+        totalRay += density.x * attenuation;
+        totalMie += density.y * attenuation;
+
+        rayPos += stepSize;
+    }
+
+    // 5. Final color = scattering coefficient × phase function × accumulated scattering
+    return sunIntensity * (
+        totalRay * BETA_RAY * phaseR +
+        totalMie * BETA_MIE * phaseM
+    );
+}
+```
+
+Key detail explanations:
+- `1.1` is the Mie extinction/scattering ratio: Mie particles not only scatter light but also absorb a small amount, so the extinction coefficient ≈ 1.1 × scattering coefficient
+- `optDepthI` records all three components simultaneously for correctly compositing all extinction contributions in the attenuation calculation
+- Phase functions are precomputed outside the loop because the angle between view and sun directions is constant along the entire ray
+
+### Step 7: Tone Mapping and Output
+
+**What**: Apply tone mapping and gamma correction to the HDR scattering results.
+
+**Why**: The scattering calculation outputs HDR linear values (potentially much greater than 1.0), which must be mapped to [0,1] for display. Different tonemapping methods affect the final look:
+
+- **Exposure mapping `1 - exp(-x)`**: simplest, naturally saturates and never overexposes, but limited highlight detail
+- **Reinhard**: preserves more highlight detail, suitable for high dynamic range scenes
+- **ACES**: cinematic tone mapping, richer colors but more complex implementation
+
+```glsl
+// Method 1: Simple exposure mapping (most common)
+vec3 tonemapExposure(vec3 color) {
+    return 1.0 - exp(-color); // Natural saturation, never overexposes
+}
+
+// Method 2: Reinhard (preserves more highlight detail)
+vec3 tonemapReinhard(vec3 color) {
+    float l = dot(color, vec3(0.2126, 0.7152, 0.0722));
+    vec3 tc = color / (color + 1.0);
+    return mix(color / (l + 1.0), tc, tc);
+}
+
+// Gamma correction
+vec3 gammaCorrect(vec3 color) {
+    return pow(color, vec3(1.0 / 2.2));
+}
+```
+
+Reinhard implementation detail: uses a blend of luminance `l` (perceptually weighted) and per-channel mapping `tc`, balancing color fidelity and highlight detail.
+
+## Variant Details
+
+### Variant 1: Non-Physical Analytical Approximation (No Ray March)
+
+**Difference from the base version**: No ray marching at all — uses analytical functions to simulate sky color with extremely high performance. Not based on physical scattering equations, but uses empirical formulas to simulate visual effects.
+
+**Use cases**: Mobile platforms, backgrounds, scenes with low physical accuracy requirements.
+
+**How it works**:
+- `zenithDensity` simulates atmospheric density variation with viewing angle (denser looking toward the horizon)
+- `getSkyAbsorption` uses `exp2` to simulate atmospheric absorption (similar to Beer-Lambert)
+- `getMie` uses distance falloff + smoothstep to simulate the sun halo
+- The final blend considers the sun altitude's effect on the overall sky color tone
+
+**Performance comparison**: No loops, no ray march — only a small amount of math per pixel, 10-50x faster than the base version.
+
+### Variant 2: With Ozone Absorption Layer
+
+**Difference from the base version**: Adds ozone absorption as a third component, making the zenith deeper blue and introducing subtle purple tones at sunset.
+
+**Use cases**: Pursuing more physically accurate sky colors.
+
+**Physical principle**: Ozone primarily absorbs in the Chappuis band (500-700nm, i.e., green and red), which makes the zenith direction (short light path, remaining light after Rayleigh scattering is filtered by ozone) appear deeper blue. At sunset, the long light path makes ozone absorption more significant — after red is Rayleigh-scattered and green is ozone-absorbed, only blue-purple tones remain.
+
+**Key modification**: Set `BETA_OZONE` to a non-zero value in the complete template to enable — already built-in.
+
+### Variant 3: Subsurface Scattering (SSS)
+
+**Difference from the base version**: Scatters inside a semi-transparent object rather than in the atmosphere. Estimates object thickness via SDF and controls light transmission with thickness.
+
+**Use cases**: Candles, skin, jelly, leaves, and other translucent materials.
+
+**How it works**:
+1. Use Snell's law (`refract`) to calculate the refracted direction after light enters the object
+2. March along the refracted direction in the SDF, accumulating negative distance values (SDF is negative inside the object)
+3. Greater accumulated negative value means a thicker object, less light transmission
+4. Use a power function to control the attenuation curve (`pow` parameter is tunable)
+
+**Tunable parameters**:
+- IOR (index of refraction): 1.3 (water) ~ 1.5 (glass) ~ 2.0 (gemstone), affects refraction angle
+- `MAX_SCATTER`: maximum scatter march distance, affects SSS penetration depth
+- `SCATTER_STRENGTH`: scattering intensity multiplier
+- Step size 0.2: smaller = more accurate but slower
+
+**Usage**:
+```glsl
+float ss = max(0.0, subsurface(hitPos, viewDir, normal));
+vec3 sssColor = albedo * smoothstep(0.0, 2.0, pow(ss, 0.6));
+finalColor = mix(lambertian, sssColor, 0.7) + specular;
+```
+
+### Variant 4: LUT Precomputation Pipeline (Production-Grade)
+
+**Difference from the base version**: Precomputes Transmittance, Multiple Scattering, and Sky-View into separate LUT textures; at runtime only performs lookups, with extremely high frame rates.
+
+**Use cases**: Production-grade sky rendering in game engines and real-time applications requiring high frame rates.
+
+**Architecture details**:
+
+- **Buffer A (Transmittance LUT)**: 256x64 texture, parameterized by (sunCosZenith, height), storing transmittance from a certain height along a direction to the atmosphere edge. This is the most fundamental LUT; all other LUTs depend on it.
+
+- **Buffer B (Multiple Scattering LUT)**: 32x32 texture, precomputing multiple scattering contributions. Single scattering is not accurate enough — in the real atmosphere, light is scattered multiple times. This LUT uses an iterative method to approximate the cumulative effect of multiple scattering.
+
+- **Buffer C (Sky-View LUT)**: 200x200 texture, storing sky colors for all directions. Uses nonlinear height mapping to allocate more precision to the horizon region (where color changes are most dramatic).
+
+- **Image Pass**: Only looks up the Sky-View LUT + overlays the sun disk; each pixel requires only one texture query.
+
+```glsl
+// Transmittance LUT query (from Hillaire 2020 implementation)
+vec3 getValFromTLUT(sampler2D tex, vec2 bufferRes, vec3 pos, vec3 sunDir) {
+    float height = length(pos);
+    vec3 up = pos / height;
+    float sunCosZenithAngle = dot(sunDir, up);
+    vec2 uv = vec2(
+        256.0 * clamp(0.5 + 0.5 * sunCosZenithAngle, 0.0, 1.0),
+        64.0 * max(0.0, min(1.0, (height - groundRadiusMM) / (atmosphereRadiusMM - groundRadiusMM)))
+    );
+    uv /= bufferRes;
+    return texture(tex, uv).rgb;
+}
+```
+
+**Performance**: The Image Pass is nearly O(1); all heavy computation is done in low-resolution LUTs. LUTs can be incrementally updated as the sun angle changes.
+
+### Variant 5: Analytical Fast Atmosphere (No Ray March but Supports Aerial Perspective)
+
+**Difference from the base version**: Uses analytical exponential approximations instead of ray marching, while supporting distance-attenuated aerial perspective effects.
+
+**Use cases**: Game scenes requiring atmospheric perspective without per-pixel ray marching.
+
+**How it works**:
+- `getRayleighMie` uses `1 - exp(-x)` form to approximate the scattering integral (analytical solution based on Beer-Lambert)
+- `getLightTransmittance` uses multiple exponential term superposition to approximate optical depth at different sun altitudes
+- No loops required — only a fixed number of math operations per pixel
+
+```glsl
+// Based on Felix Westin's Fast Atmosphere
+void getRayleighMie(float opticalDepth, float densityR, float densityM, out vec3 R, out vec3 M) {
+    vec3 C_RAYLEIGH = vec3(5.802, 13.558, 33.100) * 1e-6;
+    vec3 C_MIE = vec3(3.996e-6);
+    R = (1.0 - exp(-opticalDepth * densityR * C_RAYLEIGH / 2.5)) * 2.5;
+    M = (1.0 - exp(-opticalDepth * densityM * C_MIE / 0.5)) * 0.5;
+}
+
+// Analytical approximation of light transmittance (replaces ray march)
+vec3 getLightTransmittance(vec3 lightDir) {
+    vec3 C_RAYLEIGH = vec3(5.802, 13.558, 33.100) * 1e-6;
+    vec3 C_MIE = vec3(3.996e-6);
+    vec3 C_OZONE = vec3(0.650, 1.881, 0.085) * 1e-6;
+    float extinction = exp(-clamp(lightDir.y + 0.05, 0.0, 1.0) * 40.0)
+                     + exp(-clamp(lightDir.y + 0.5, 0.0, 1.0) * 5.0) * 0.4
+                     + pow(clamp(1.0 - lightDir.y, 0.0, 1.0), 2.0) * 0.02
+                     + 0.002;
+    return exp(-(C_RAYLEIGH + C_MIE + C_OZONE) * extinction * 1e6);
+}
+```
+
+**Mathematical basis of the analytical approximation**: Treating the atmosphere as a single uniform layer, the scattering integral `∫ e^(-σx) dx` has the analytical solution `(1 - e^(-σL)) / σ`. The `2.5` and `0.5` in the code are empirical scaling factors to make the analytical result visually approximate a full ray march.
+
+## Performance Optimization Details
+
+### Bottleneck 1: Nested Ray March (O(N×M) Samples)
+
+N primary ray steps × M light direction steps per step = N×M density calculations.
+
+**Optimization approaches**:
+- **Reduce step counts**: Use `PRIMARY_STEPS=12, LIGHT_STEPS=4` on mobile; visual difference is small but performance improvement is significant
+- **Analytical approximation**: Replace the light direction ray march with the Fast Atmosphere approach, reducing complexity from O(N×M) to O(N)
+- **Transmittance LUT**: After precomputation, runtime only performs lookups, reducing complexity to O(N) or even O(1)
+
+### Bottleneck 2: Dense exp() and pow() Calls
+
+Multiple exponential function calls at each sample point — these are relatively expensive operations on GPUs.
+
+**Optimization approaches**:
+- Replace Henyey-Greenstein phase function with Schlick approximation:
+```glsl
+// Schlick approximation, only 1 division, no pow
+float k = 1.55 * g - 0.55 * g * g * g;
+float phaseSchlick = (1.0 - k * k) / (4.0 * PI * pow(1.0 + k * cosTheta, 2.0));
+```
+- Combine multiple exp calls: `exp(a) * exp(b) = exp(a+b)`, reducing exp call count
+- Use `exp2` instead of `exp` in scenarios with lower precision requirements (exp2 is faster on some GPUs)
+
+### Bottleneck 3: Full-Screen Per-Pixel Computation
+
+Each pixel independently computes the full scattering.
+
+**Optimization approaches**:
+- **Sky-View LUT**: Render the sky to a low-resolution LUT (e.g., 200x200), then look up at full resolution. Allocate more resolution near the horizon (nonlinear mapping)
+- **Half-resolution rendering**: Compute scattering at half resolution, then bilinearly upsample. For sky — a low-frequency signal — quality loss is minimal
+
+### Bottleneck 4: High Sample Count Needed to Avoid Banding
+
+Low step counts lead to visible banding artifacts.
+
+**Optimization approaches**:
+- **Non-uniform stepping**: `newT = ((i + 0.3) / numSteps) * tMax`, offset by 0.3 instead of 0.5 to reduce visual artifacts
+- **Jittered start offset**: `startOffset += hash(fragCoord) * stepSize`, randomly offsetting the march start per pixel
+- **Temporal blue noise dithering**: Use temporal blue noise to jitter sample positions across frames; combined with TAA, banding is nearly eliminated
+
+## Combination Suggestions
+
+### 1. Atmospheric Scattering + Volumetric Clouds
+
+Atmospheric scattering provides sky background color and light source color; volumetric cloud lighting uses the atmospheric transmittance to determine the sun light color reaching the cloud layer.
+
+Key integration points:
+- Setting the `maxDist` parameter of the atmospheric scattering function to the cloud layer distance achieves correct pre-cloud atmospheric effects
+- During cloud layer rendering, use the transmittance LUT to get the sun light color upon reaching the cloud layer
+- Sky color behind clouds should be the full atmospheric scattering result
+
+```glsl
+// Pseudo-code example
+float cloudDist = rayMarchClouds(rayOrigin, rayDir);
+vec3 cloudColor = calculateCloudLighting(cloudPos, sunDir, transmittance);
+vec3 skyBehind = calculateScattering(rayOrigin, rayDir, 1e12, sunDir, sunIntensity);
+vec3 skyBeforeCloud = calculateScattering(rayOrigin, rayDir, cloudDist, sunDir, sunIntensity);
+
+// Compositing: pre-cloud atmosphere + cloud × cloud opacity + post-cloud sky × transmittance
+vec3 final = skyBeforeCloud + cloudColor * cloudAlpha + skyBehind * (1.0 - cloudAlpha) * atmosphereTransmittance;
+```
+
+### 2. Atmospheric Scattering + SDF Scene
+
+Pass the SDF ray march hit distance as the `maxDist` parameter to `calculateScattering()`, and the scene color as `sceneColor`, to automatically get aerial perspective effects.
+
+```glsl
+// SDF ray march yields hit information
+float hitDist = sdfRayMarch(rayOrigin, rayDir);
+vec3 sceneColor = shadeSurface(hitPos, normal, lightDir);
+
+// Atmospheric scattering automatically handles perspective
+vec3 final = calculateScattering(
+    rayOrigin, rayDir, hitDist,
+    sceneColor, sunDir, SUN_INTENSITY
+);
+```
+
+### 3. Atmospheric Scattering + God Rays
+
+Adding an occlusion parameter in the scattering integral (via shadow map or additional ray march for occlusion detection) can produce volumetric light beam effects.
+
+```glsl
+// Add occlusion detection in the main loop
+for (int i = 0; i < PRIMARY_STEPS; i++) {
+    // ... density sampling ...
+
+    // God rays: check if sample point is occluded
+    float occlusion = 1.0;
+    if (sdfScene(samplePos + sunDir * 0.1) < 0.0) {
+        occlusion = 0.0; // Occluded by scene object, no in-scattering
+    }
+
+    totalRay += density.x * attenuation * occlusion;
+    totalMie += density.y * attenuation * occlusion;
+}
+```
+
+The Fast Atmosphere example implements this functionality through the `occlusion` parameter.
+
+### 4. Atmospheric Scattering + Terrain Rendering
+
+Use aerial perspective: distant terrain colors blend into atmospheric scattering color based on distance.
+
+Key formula:
+```glsl
+// Basic aerial perspective
+vec3 finalColor = terrainColor * transmittance + inscattering;
+
+// transmittance: atmospheric transmittance from camera to terrain point
+// inscattering: scattered light between camera and terrain point
+// Distant objects: transmittance → 0, inscattering dominates → appears blue/gray
+```
+
+### 5. SSS + PBR Materials
+
+Combine subsurface scattering with GGX microsurface specular and Fresnel reflection. SSS contribution replaces part of the diffuse (via mix), with the specular layer added on top:
+
+```glsl
+// Complete PBR + SSS shading
+float fresnel = pow(max(0.0, 1.0 + dot(normal, viewDir)), 5.0);
+vec3 diffuse = mix(lambert, sssContribution, 0.7);  // SSS replaces part of diffuse
+vec3 final = ambient + albedo * diffuse + specular + fresnel * envColor;
+```
+
+Layering logic:
+1. Bottom layer: ambient light
+2. Diffuse layer: blend of Lambert and SSS (SSS allows light to pass through dark sides)
+3. Specular layer: GGX microsurface reflection
+4. Fresnel layer: enhanced environment reflection at grazing angles
--- a/skills/shader-dev/reference/camera-effects.md
+++ b/skills/shader-dev/reference/camera-effects.md
@@ -0,0 +1,80 @@
+# Camera Effects Detailed Reference
+
+## Prerequisites
+- Ray marching fundamentals (ray origin, ray direction)
+- Multipass buffers (for accumulation-based DoF)
+- Hash functions for stochastic sampling
+
+## Thin Lens Model Derivation
+
+A real camera lens focuses light from a focal plane onto the sensor. Points not on the focal plane project to a **circle of confusion (CoC)** on the sensor.
+
+### Circle of Confusion Formula
+```
+CoC = |S2 - S1| × A × f / (S1 × (S2 - f))
+```
+Where:
+- `S1` = focal distance (distance to in-focus plane)
+- `S2` = object distance
+- `A` = aperture diameter
+- `f` = focal length
+
+### Simplified for Shaders
+```
+CoC ≈ apertureSize × |depth - focalDistance| / depth
+```
+
+### Ray-Based Implementation
+Instead of computing CoC per pixel, we model the physical process:
+1. Choose a random point on the aperture disk → new ray origin
+2. The focal point (where the original ray hits the focal plane) stays fixed
+3. New ray direction = `normalize(focalPoint - newOrigin)`
+4. Average many such samples → natural bokeh with correct occlusion
+
+### Aperture Shape
+- Circular: `vec2 p = sqrt(r) * vec2(cos(a), sin(a))` — uniform disk
+- Polygonal: reject samples outside polygon for hexagonal/octagonal bokeh
+- The `sqrt(r)` is critical for uniform distribution (area-preserving)
+
+## Poisson Disk Sampling
+
+Pre-computed 16-point Poisson disk for blur kernels:
+```glsl
+const vec2 poissonDisk[16] = vec2[](
+    vec2(-0.94201624, -0.39906216), vec2(0.94558609, -0.76890725),
+    vec2(-0.09418410, -0.92938870), vec2(0.34495938,  0.29387760),
+    vec2(-0.91588581,  0.45771432), vec2(-0.81544232, -0.87912464),
+    vec2(-0.38277543,  0.27676845), vec2(0.97484398,  0.75648379),
+    vec2(0.44323325, -0.97511554),  vec2(0.53742981, -0.47373420),
+    vec2(-0.26496911, -0.41893023), vec2(0.79197514,  0.19090188),
+    vec2(-0.24188840,  0.99706507), vec2(-0.81409955,  0.91437590),
+    vec2(0.19984126,  0.78641367),  vec2(0.14383161, -0.14100790)
+);
+```
+
+Advantages over regular grid: no structured aliasing patterns, better coverage per sample count.
+
+## Motion Blur Approaches
+
+### Stochastic Time Sampling (Ray Marching)
+For each pixel, pick a random time within the shutter interval:
+```
+t_sample = iTime + (rand - 0.5) * shutterDuration
+```
+Use `t_sample` for all scene animation. Accumulate multiple frames for convergence.
+
+### Velocity Buffer (Post-Process)
+1. Render scene + store per-pixel velocity vectors
+2. For each pixel, sample along the velocity direction
+3. Weight samples by distance from center (triangle filter)
+
+### Hybrid
+Use temporal accumulation (TAA-style) with per-frame time jitter — converges over frames with no per-frame cost increase.
+
+## Film Grain Characteristics
+
+Real film grain properties:
+- **Luminance-dependent**: More visible in shadows, less in highlights
+- **Temporally varying**: Different pattern each frame (use `fract(iTime)` in hash seed)
+- **Spatially uncorrelated**: Use pixel coordinates in hash, not UV (grain should be screen-resolution)
+- **Intensity**: 0.02-0.05 for subtle, 0.1+ for stylized/vintage look
--- a/skills/shader-dev/reference/cellular-automata.md
+++ b/skills/shader-dev/reference/cellular-automata.md
@@ -0,0 +1,635 @@
+# Cellular Automata & Reaction-Diffusion — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisites, step-by-step explanations, variant details, performance analysis, and complete code examples for combination suggestions.
+
+---
+
+## Prerequisites
+
+### GLSL Basics
+- **Uniform variables**: `iResolution` (viewport resolution), `iFrame` (current frame number), `iTime` (elapsed time), `iMouse` (mouse position)
+- **Texture sampling**: `texture(iChannel0, uv)` samples using UV coordinates (with filtering), `texelFetch(iChannel0, ivec2(px), 0)` samples at exact integer pixel coordinates
+- **Multi-buffer feedback architecture**: ShaderToy supports Buffer A~D, each buffer can bind itself or other buffers as iChannel input
+
+### ShaderToy Multi-Pass Mechanism
+Data written by Buffer A → next frame Buffer A reads via iChannel0 self-feedback. This is the core mechanism for inter-frame state persistence. The Image pass handles final visual output.
+
+### 2D Grid Sampling
+- Pixel coordinates `fragCoord` are floating point, range `[0.5, resolution - 0.5]`
+- UV coordinates = `fragCoord / iResolution.xy`, range `[0, 1]`
+- `texelFetch(iChannel0, ivec2(px), 0)` reads the specified pixel exactly (no filtering), suitable for discrete CA
+- `texture(iChannel0, uv)` uses hardware bilinear interpolation, suitable for continuous RD
+
+### Basic Vector Math
+- `normalize(v)`: normalize a vector
+- `dot(a, b)`: dot product
+- `cross(a, b)`: cross product
+- `length(v)`: vector length
+
+### Convolution Kernel Concepts
+A 3x3 stencil performs a weighted sum of the center pixel and its 8 neighbors. Different weights produce different effects:
+- **Laplacian kernel**: Detects deviation of the current value from the neighborhood mean (diffusion)
+- **Gaussian kernel**: Blur/smoothing
+- **Sobel kernel**: Edge detection/gradient computation
+
+---
+
+## Implementation Steps in Detail
+
+### Step 1: Grid State Storage and Self-Feedback
+
+**What**: Use ShaderToy's Buffer self-read mechanism to persistently store simulation state in a buffer texture. Each frame reads the previous frame's state, computes new state, and writes it back.
+
+**Why**: GPU shaders are inherently stateless; buffer inter-frame feedback is required for time-step iteration. State is stored in RGBA channels — CA can use a single channel for alive/dead, while RD uses two channels for u and v respectively.
+
+**Code**:
+```glsl
+// Buffer A: read previous frame's own state
+// iChannel0 is bound to Buffer A itself (self-feedback)
+vec4 prevState = texelFetch(iChannel0, ivec2(fragCoord), 0);
+
+// Can also sample with UV coordinates (supports texture filtering)
+vec2 uv = fragCoord / iResolution.xy;
+vec4 prevSmooth = texture(iChannel0, uv);
+```
+
+**Key points**:
+- `texelFetch` performs no filtering, reads a single pixel exactly, suitable for discrete CA
+- `texture` uses hardware bilinear interpolation, blending adjacent pixel values near pixel boundaries, suitable for continuous RD
+- The four RGBA channels can store different state variables (e.g., u, v, velocity field components, etc.)
+
+### Step 2: Initialization (Noise Seeding)
+
+**What**: Initialize the grid with pseudo-random noise on the first frame (or first few frames) to provide seeds for the simulation.
+
+**Why**: Both CA and RD need initial perturbation to start evolution. Different initial conditions produce different final patterns. In practice, seeding is often repeated for the first 2~10 frames, since ShaderToy occasionally skips the first frame.
+
+**Code**:
+```glsl
+// Simple hash noise function
+float hash1(float n) {
+    return fract(sin(n) * 138.5453123);
+}
+
+vec3 hash33(in vec2 p) {
+    float n = sin(dot(p, vec2(41, 289)));
+    return fract(vec3(2097152, 262144, 32768) * n);
+}
+
+// Initialization branch in mainImage
+if (iFrame < 2) {
+    // CA: random binary initialization
+    float f = step(0.9, hash1(fragCoord.x * 13.0 + hash1(fragCoord.y * 71.1)));
+    fragColor = vec4(f, 0.0, 0.0, 0.0);
+} else if (iFrame < 10) {
+    // RD: random continuous value initialization
+    vec3 noise = hash33(fragCoord / iResolution.xy + vec2(53, 43) * float(iFrame));
+    fragColor = vec4(noise, 1.0);
+}
+```
+
+**Key points**:
+- `hash1` is a simple pseudo-random number generator based on `sin`, producing values in [0, 1)
+- `hash33` generates a 3D random vector from 2D coordinates, used for multi-channel RD initialization
+- CA initialization uses `step(0.9, ...)` to produce approximately 10% density of living cells
+- RD initialization uses continuous random values, with `iFrame` added so each frame seeds differently
+- Multi-frame seeding (`iFrame < 10`) ensures sufficiently rich initial perturbation
+
+### Step 3: Neighbor Sampling and Laplacian Computation
+
+**What**: Perform weighted sampling of the current pixel's 8 (or 4) neighbors, computing the Laplacian or neighbor count.
+
+**Why**: This is the core of CA/RD — local rules drive state updates through neighbor information. The Laplacian describes how much a point's value deviates from the surrounding average, physically corresponding to diffusion. The nine-point stencil is more accurate and isotropic than a simple cross stencil.
+
+**Three Sampling Methods Compared**:
+
+| Method | Use Case | Advantages | Disadvantages |
+|------|----------|------|------|
+| Method A: Discrete neighbor counting | CA | Exact integer coordinates, no filtering error | Can only handle discrete states |
+| Method B: Nine-point Laplacian | RD | Good isotropy, high accuracy | 9 texture samples |
+| Method C: 3x3 Gaussian blur | Simplified RD | Good smoothing effect | Not a true Laplacian |
+
+**Method A Code Details**:
+```glsl
+// Discrete CA neighbor counting using texelFetch for exact reads
+int cell(in ivec2 p) {
+    ivec2 r = ivec2(textureSize(iChannel0, 0));
+    p = (p + r) % r;  // Wrap-around boundary (toroidal topology), left overflow appears on right
+    return (texelFetch(iChannel0, p, 0).x > 0.5) ? 1 : 0;
+}
+
+ivec2 px = ivec2(fragCoord);
+// Moore neighborhood: sum of 8 neighbors
+int k = cell(px + ivec2(-1,-1)) + cell(px + ivec2(0,-1)) + cell(px + ivec2(1,-1))
+      + cell(px + ivec2(-1, 0))                          + cell(px + ivec2(1, 0))
+      + cell(px + ivec2(-1, 1)) + cell(px + ivec2(0, 1)) + cell(px + ivec2(1, 1));
+```
+
+**Method B Code Details**:
+```glsl
+// Nine-point Laplacian stencil (for RD)
+// Weights: diagonal 0.5, cross 1.0, center -6.0 (sum = 0, ensuring Laplacian of a constant field is zero)
+vec2 laplacian(vec2 uv) {
+    vec2 px = 1.0 / iResolution.xy;
+    vec4 P = vec4(px, 0.0, -px.x);
+    return
+        0.5 * texture(iChannel0, uv - P.xy).xy   // bottom-left
+      +       texture(iChannel0, uv - P.zy).xy   // bottom
+      + 0.5 * texture(iChannel0, uv - P.wy).xy   // bottom-right
+      +       texture(iChannel0, uv - P.xz).xy   // left
+      - 6.0 * texture(iChannel0, uv).xy           // center
+      +       texture(iChannel0, uv + P.xz).xy   // right
+      + 0.5 * texture(iChannel0, uv + P.wy).xy   // top-left
+      +       texture(iChannel0, uv + P.zy).xy   // top
+      + 0.5 * texture(iChannel0, uv + P.xy).xy;  // top-right
+}
+```
+
+**Method C Code Details**:
+```glsl
+// 3x3 weighted blur (Gaussian approximation)
+// Weights: diagonal 1, cross 2, center 4, total 16
+// Uses vec3 swizzle to cleverly encode 9 offset directions
+float blur3x3(vec2 uv) {
+    vec3 e = vec3(1, 0, -1);  // e.x=1, e.y=0, e.z=-1
+    vec2 px = 1.0 / iResolution.xy;
+    float res = 0.0;
+    // e.xx=(1,1), e.xz=(1,-1), e.zx=(-1,1), e.zz=(-1,-1) → four diagonals
+    res += texture(iChannel0, uv + e.xx * px).x + texture(iChannel0, uv + e.xz * px).x
+         + texture(iChannel0, uv + e.zx * px).x + texture(iChannel0, uv + e.zz * px).x;       // ×1
+    // e.xy=(1,0), e.yx=(0,1), e.yz=(0,-1), e.zy=(-1,0) → four edges
+    res += (texture(iChannel0, uv + e.xy * px).x + texture(iChannel0, uv + e.yx * px).x
+          + texture(iChannel0, uv + e.yz * px).x + texture(iChannel0, uv + e.zy * px).x) * 2.; // ×2
+    // e.yy=(0,0) → center
+    res += texture(iChannel0, uv + e.yy * px).x * 4.;                                          // ×4
+    return res / 16.0;
+}
+```
+
+### Step 4: State Update Rules
+
+**What**: Apply CA rules or RD differential equations based on neighbor information to compute new state values.
+
+**Why**: This is the core simulation logic. CA uses discrete decisions (birth/survival/death), RD uses continuous differential equations with Euler integration.
+
+**CA Rule Details**:
+
+Conway's Game of Life B3/S23 means:
+- B3 = Birth when 3 neighbors
+- S23 = Survive when 2 or 3 neighbors
+
+```glsl
+int e = cell(px);  // current state (0 or 1)
+// Equivalent to: if (k==3) born/survive; else if (k==2 && alive) survive; else die
+float f = (((k == 2) && (e == 1)) || (k == 3)) ? 1.0 : 0.0;
+```
+
+**Generic Bitmask Rules**: Bitmasks can encode arbitrary CA rule sets without modifying logic code. For example:
+- B3/S23 → bornset=8 (binary 1000, bit 3), stayset=12 (binary 1100, bits 2,3)
+- B36/S23 → bornset=40 (bits 3,5), stayset=12
+
+```glsl
+// stayset/bornset are bitmasks; bit n=1 means triggered when neighbor count is n
+float ff = 0.0;
+if (currentAlive) {
+    ff = ((stayset & (1 << (k - 1))) > 0) ? float(k) : 0.0;  // survive
+} else {
+    ff = ((bornset & (1 << (k - 1))) > 0) ? 1.0 : 0.0;       // birth
+}
+```
+
+**RD Gray-Scott Update Details**:
+
+Physical meaning of the Gray-Scott equations:
+- `Du·∇²u`: diffusion of u (spatial smoothing)
+- `-u·v²`: reaction consumption (u decreases when u and v meet)
+- `F·(1-u)`: replenishment of u (feed, pulling u back toward 1.0)
+- `Dv·∇²v`: diffusion of v
+- `+u·v²`: reaction production (v increases when u and v meet)
+- `-(F+k)·v`: removal of v (combined decay from kill + feed)
+
+```glsl
+float u = prevState.x;
+float v = prevState.y;
+vec2 Duv = laplacian(uv) * DIFFUSION;  // DIFFUSION = vec2(Du, Dv)
+float du = Duv.x - u * v * v + F * (1.0 - u);
+float dv = Duv.y + u * v * v - (F + k) * v;
+// Forward Euler integration, clamp to prevent numerical instability
+fragColor.xy = clamp(vec2(u + du * DT, v + dv * DT), 0.0, 1.0);
+```
+
+**Simplified RD Details**:
+This approach doesn't use the standard Gray-Scott equations, but instead uses gradient-driven displacement and random decay to approximate reaction-diffusion behavior. The results are more organic but less controllable.
+
+```glsl
+float avgRD = blur3x3(uv);
+vec2 pwr = (1.0 / iResolution.xy) * 1.5;
+// Compute gradient (similar to Sobel)
+vec2 lap = vec2(
+    texture(iChannel0, uv + vec2(pwr.x, 0)).y - texture(iChannel0, uv - vec2(pwr.x, 0)).y,
+    texture(iChannel0, uv + vec2(0, pwr.y)).y - texture(iChannel0, uv - vec2(0, pwr.y)).y
+);
+uv = uv + lap * (1.0 / iResolution.xy) * 3.0;  // Displace sampling point along gradient (diffusion)
+float newRD = texture(iChannel0, uv).x + (noise.z - 0.5) * 0.0025 - 0.002;  // Random decay
+newRD += dot(texture(iChannel0, uv + (noise.xy - 0.5) / iResolution.xy).xy, vec2(1, -1)) * 0.145;  // Reaction term
+```
+
+### Step 5: Visualization and Coloring
+
+**What**: Map simulation buffer data to visual effects — color mapping, gradient lighting, bump mapping, etc.
+
+**Why**: Raw simulation data consists of scalar/vector values in 0~1 range, requiring artistic processing to produce appealing visuals. The most common technique is computing the gradient of buffer values to obtain normal information for bump lighting.
+
+**Color mapping techniques**:
+```glsl
+// Basic: nonlinear color separation
+// c is a [0,1] value; different pow exponents make RGB channels respond at different rates
+float c = 1.0 - texture(iChannel0, uv).y;
+vec3 col = pow(vec3(1.5, 1, 1) * c, vec3(1, 4, 12));
+// R channel responds linearly, G channel with 4th power (rapid decay in dark areas), B channel with 12th power (blue only at brightest spots)
+```
+
+**Gradient normal computation**:
+```glsl
+// Compute surface normals from scalar field (for bump map lighting)
+vec3 normal(vec2 uv) {
+    vec3 delta = vec3(1.0 / iResolution.xy, 0.0);
+    // Central difference for x and y gradients
+    float du = texture(iChannel0, uv + delta.xz).x - texture(iChannel0, uv - delta.xz).x;
+    float dv = texture(iChannel0, uv + delta.zy).x - texture(iChannel0, uv - delta.zy).x;
+    // z component controls bump intensity (smaller = stronger bumps)
+    return normalize(vec3(du, dv, 1.0));
+}
+```
+
+**Specular highlight effect**:
+```glsl
+// Produce specular edges via sampling offset
+float c2 = 1.0 - texture(iChannel0, uv + 0.5 / iResolution.xy).y;
+// c2*c2 - c*c is positive at gradient changes, producing edge highlights
+col += vec3(0.36, 0.73, 1.0) * max(c2 * c2 - c * c, 0.0) * 12.0;
+```
+
+**Vignette + gamma correction**:
+```glsl
+// Vignette: darken edges
+col *= pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.125) * 1.15;
+// Fade-in effect
+col *= smoothstep(0.0, 1.0, iTime / 2.0);
+// Gamma correction (approximately 2.0)
+fragColor = vec4(sqrt(min(col, 1.0)), 1.0);
+```
+
+---
+
+## Variant Details
+
+### Variant 1: Conway's Game of Life (Discrete CA)
+
+**Difference from base version**: Uses discrete binary state and neighbor counting rules instead of continuous RD equations. This is the most classic cellular automaton, with simple rules that can give rise to extremely complex behavior (gliders, oscillators, still lifes, etc.).
+
+**Complete Buffer A code**:
+```glsl
+int cell(in ivec2 p) {
+    ivec2 r = ivec2(textureSize(iChannel0, 0));
+    p = (p + r) % r;  // wrap-around boundary
+    return (texelFetch(iChannel0, p, 0).x > 0.5) ? 1 : 0;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    ivec2 px = ivec2(fragCoord);
+
+    // Moore neighborhood counting
+    int k = cell(px+ivec2(-1,-1)) + cell(px+ivec2(0,-1)) + cell(px+ivec2(1,-1))
+          + cell(px+ivec2(-1, 0))                        + cell(px+ivec2(1, 0))
+          + cell(px+ivec2(-1, 1)) + cell(px+ivec2(0, 1)) + cell(px+ivec2(1, 1));
+    int e = cell(px);
+
+    // B3/S23 rule
+    float f = (((k == 2) && (e == 1)) || (k == 3)) ? 1.0 : 0.0;
+
+    // Initialization: approximately 10% random living cells
+    if (iFrame < 2) {
+        f = step(0.9, fract(sin(fragCoord.x * 13.0 + sin(fragCoord.y * 71.1)) * 138.5));
+    }
+
+    fragColor = vec4(f, 0.0, 0.0, 1.0);
+}
+```
+
+**Adjustment directions**:
+- Modifying B/S rule numbers can produce completely different behavior
+- Increasing initial density (changing the 0.9 in `step(0.9, ...)`) alters the evolution result
+- The .y channel can store "age" for color mapping during visualization
+
+### Variant 2: Configurable Rule Set CA (Birth/Survival Bitmask)
+
+**Difference from base version**: Uses bitmasks to encode arbitrary CA rules, supporting Moore/von Neumann/extended neighborhoods, capable of producing worms, sponges, explosions, and other patterns.
+
+**Bitmask encoding explanation**:
+- `BORN_SET = 8` is binary `0b1000`, meaning bit 3 is set → B3 (birth when 3 neighbors)
+- `STAY_SET = 12` is binary `0b1100`, meaning bits 2,3 are set → S23 (survive when 2 or 3 neighbors)
+- `LIVEVAL` controls the living cell's state value; when greater than 1, combined with `DECIMATE` it can produce gradient decay effects
+- `DECIMATE` is the per-frame decay amount, producing a "trailing" effect
+
+**Key code**:
+```glsl
+#define BORN_SET  8        // birth bitmask, 8 = B3 (bit 3 set)
+#define STAY_SET  12       // survival bitmask, 12 = S23 (bits 2,3 set)
+#define LIVEVAL   2.0      // living cell state value
+#define DECIMATE  1.0      // decay value (0=no decay)
+
+// Rule evaluation
+float ff = 0.0;
+float ev = texelFetch(iChannel0, px, 0).w;
+if (ev > 0.5) {
+    // Living cell: decay first, then check if survival rule is met
+    if (DECIMATE > 0.0) ff = ev - DECIMATE;
+    if ((STAY_SET & (1 << (k - 1))) > 0) ff = LIVEVAL;
+} else {
+    // Dead cell: check if birth rule is met
+    ff = ((BORN_SET & (1 << (k - 1))) > 0) ? LIVEVAL : 0.0;
+}
+```
+
+**Notable rule sets**:
+- B3/S23 (Conway Life): BORN=8, STAY=12
+- B36/S23 (HighLife): BORN=40, STAY=12 — has self-replicators
+- B1/S1 (Gnarl): BORN=2, STAY=2 — fractal growth
+- B3/S012345678 (Life without death): BORN=8, STAY=511 — only grows, never dies
+
+### Variant 3: Separable Gaussian Blur RD (Multi-Buffer Architecture)
+
+**Difference from base version**: Replaces the single 3x3 Laplacian with separable horizontal/vertical Gaussian blur for the diffusion step, achieving a larger effective diffusion radius with smoother patterns.
+
+**Architecture**:
+- Buffer A: Reaction step (reads Buffer C's blur result as diffusion term)
+- Buffer B: Horizontal Gaussian blur (reads Buffer A)
+- Buffer C: Vertical Gaussian blur (reads Buffer B)
+
+**Why separate**:
+- A direct NxN kernel requires N² samples
+- Separating into horizontal + vertical passes requires N samples each, 2N total
+- A 9-tap separable blur = 18 samples ≈ equivalent to an 81-point 9x9 kernel
+
+**Buffer B complete code (horizontal blur)**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    float h = 1.0 / iResolution.x;
+    vec4 sum = vec4(0.0);
+    // 9-tap Gaussian weights (approximate normal distribution)
+    sum += texture(iChannel0, fract(vec2(uv.x - 4.0*h, uv.y))) * 0.05;
+    sum += texture(iChannel0, fract(vec2(uv.x - 3.0*h, uv.y))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x - 2.0*h, uv.y))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x - 1.0*h, uv.y))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x,         uv.y))) * 0.16;
+    sum += texture(iChannel0, fract(vec2(uv.x + 1.0*h, uv.y))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x + 2.0*h, uv.y))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x + 3.0*h, uv.y))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x + 4.0*h, uv.y))) * 0.05;
+    fragColor = vec4(sum.xyz / 0.98, 1.0);  // 0.98 = weight sum, normalized
+}
+```
+
+Buffer C has identical structure but blurs along the y-axis (replace `uv.x ± n*h` with `uv.y ± n*v`, where `v = 1.0/iResolution.y`).
+
+### Variant 4: Continuous Differential Operator CA (Vein/Fluid Style)
+
+**Difference from base version**: Computes curl, divergence, and Laplacian on the grid, combined with multi-step advection loops, producing vein/fluid-like organic patterns that sit between CA and PDE fluid simulation.
+
+**Core concepts**:
+- **Curl**: Describes the rotational tendency of a field, used to produce vortex effects
+- **Divergence**: Describes the spreading/converging tendency of a field
+- **Advection**: Propagates field values along the velocity field direction
+
+**Parameter tuning guide**:
+- `STEPS (10~60)`: Advection steps; more = smoother but slower
+- `ts (0.1~0.5)`: Advection rotation strength, controls vortex intensity
+- `cs (-3~-1)`: Curl scaling; negative values produce counter-clockwise rotation
+- `ls (0.01~0.1)`: Laplacian scaling, controls diffusion strength
+- `amp (0.5~2.0)`: Self-amplification coefficient
+- `upd (0.2~0.6)`: Update smoothing coefficient, controls old/new state blend ratio
+
+**Key code**:
+```glsl
+#define STEPS 40
+#define ts    0.2
+#define cs   -2.0
+#define ls    0.05
+#define amp   1.0
+#define upd   0.4
+
+// Discrete curl and divergence on a 3x3 stencil
+// Standard weights: _K0=-20/6 (center), _K1=4/6 (edge), _K2=1/6 (corner)
+curl = uv_n.x - uv_s.x - uv_e.y + uv_w.y
+     + _D * (uv_nw.x + uv_nw.y + uv_ne.x - uv_ne.y
+           + uv_sw.y - uv_sw.x - uv_se.y - uv_se.x);
+div  = uv_s.y - uv_n.y - uv_e.x + uv_w.x
+     + _D * (uv_nw.x - uv_nw.y - uv_ne.x - uv_ne.y
+           + uv_sw.x + uv_sw.y + uv_se.y - uv_se.x);
+
+// Multi-step advection loop
+for (int i = 0; i < STEPS; i++) {
+    advect(off, vUv, texel, curl, div, lapl, blur);
+    offd = rot(offd, ts * curl);  // rotate offset direction
+    off += offd;                   // accumulate offset
+    ab += blur / float(STEPS);    // accumulate blurred value
+}
+```
+
+### Variant 5: RD-Driven 3D Surface (Raymarched RD)
+
+**Difference from base version**: 2D RD results serve as a texture mapped onto a 3D sphere, driving surface displacement and color; the Image pass becomes a full raymarcher.
+
+**Implementation points**:
+1. Buffer A maintains the standard RD simulation unchanged
+2. Image pass becomes a raymarching renderer
+3. The SDF function maps 3D points to spherical UV, then samples the RD buffer
+4. RD values drive surface displacement
+
+**Key code**:
+```glsl
+// Image pass: use RD texture for displacement in the SDF
+vec2 map(in vec3 pos) {
+    vec3 p = normalize(pos);
+    vec2 uv;
+    // Spherical parameterization: 3D point → 2D UV
+    uv.x = 0.5 + atan(p.z, p.x) / (2.0 * 3.14159);  // longitude [0, 1]
+    uv.y = 0.5 - asin(p.y) / 3.14159;                 // latitude [0, 1]
+
+    float y = texture(iChannel0, uv).y;     // read v component from RD buffer
+    float displacement = 0.1 * y;            // displacement amount (adjustable scale factor)
+    float sd = length(pos) - (2.0 + displacement);  // base sphere SDF + displacement
+    return vec2(sd, y);  // return distance and material parameter
+}
+```
+
+**Extension directions**:
+- Replace the sphere with a torus, plane, or other base shapes
+- Use the two RD channels to separately drive displacement and color
+- Add normal perturbation for finer surface detail
+- Combine with environment maps for reflection/refraction
+
+---
+
+## Performance Optimization In-Depth Analysis
+
+### 1. texelFetch vs texture
+
+**Discrete CA** should use `texelFetch(iChannel0, ivec2(px), 0)` instead of `texture()`:
+- Avoids unnecessary texture filtering overhead
+- Guarantees pixel-precise reads without floating-point precision causing sampling of adjacent pixels
+- For binary states (0/1), any interpolation introduces errors
+
+**Continuous RD** can use `texture()` with linear filtering:
+- Hardware automatically performs bilinear interpolation
+- The interpolation effect is equivalent to additional smoothing/diffusion, which can be advantageous in some cases
+- Hardware-accelerated, faster than manual interpolation
+
+### 2. Separable Blur Instead of Large-Kernel Laplacian
+
+If a large diffusion radius is needed:
+- **Don't** use a larger NxN Laplacian kernel → O(N²) samples
+- **Do** use separable two-pass Gaussian blur (horizontal + vertical) → O(2N) samples
+- Implemented through additional buffer passes
+
+**Numerical comparison**:
+| Method | Equivalent Kernel Size | Sample Count |
+|------|-----------|---------|
+| 3x3 Laplacian | 3×3 | 9 |
+| 5x5 Laplacian | 5×5 | 25 |
+| 9x9 Laplacian | 9×9 | 81 |
+| Separable 9-tap Gaussian | ≈9×9 | 18 |
+| Separable 13-tap Gaussian | ≈13×13 | 26 |
+
+### 3. Multi-Step Sub-Iteration
+
+For RD, you can loop multiple sub-iterations within a single frame using smaller DT, improving convergence speed while maintaining stability:
+
+```glsl
+#define SUBSTEPS 4     // sub-iteration count
+#define SUB_DT 0.25    // = DT / SUBSTEPS
+for (int i = 0; i < SUBSTEPS; i++) {
+    vec2 lap = laplacian9(uv);
+    float uvv = u * v * v;
+    u += (DU * lap.x - uvv + F * (1.0 - u)) * SUB_DT;
+    v += (DV * lap.y + uvv - (F + K) * v) * SUB_DT;
+}
+```
+
+**Note**: In sub-iterations, the Laplacian is only correct when read from the texture on the first step; subsequent steps should recompute the Laplacian based on updated values. However, in practice, the approximation of single-read multi-step integration is often good enough.
+
+### 4. Reduced-Resolution Simulation
+
+If the target display resolution is high but the pattern's spatial frequency doesn't require 1:1 pixel precision:
+- Run the simulation at lower resolution in the buffer (not directly configurable in ShaderToy, but possible in custom engines)
+- Use bilinear interpolation upsampling in the Image pass
+- Can save 4x~16x computation
+
+### 5. Avoiding Branches and Conditionals
+
+Use `step()`, `mix()`, `clamp()` instead of `if/else` for CA rule evaluation to reduce GPU warp divergence:
+
+```glsl
+// Original if/else version:
+// if (k==3) f=1.0; else if (k==2 && e==1) f=1.0; else f=0.0;
+
+// Branch-free version:
+float f = max(step(abs(float(k) - 3.0), 0.5),
+              step(abs(float(k) - 2.0), 0.5) * step(0.5, float(e)));
+```
+
+**Explanation**:
+- `step(abs(float(k) - 3.0), 0.5)` is 1.0 when k=3, otherwise 0.0
+- `step(abs(float(k) - 2.0), 0.5) * step(0.5, float(e))` is 1.0 when k=2 and e=1
+- `max()` combines the two conditions
+
+---
+
+## Combination Suggestions — Full Details
+
+### 1. RD + Raymarching (3D Displacement/Shaping)
+
+Map RD results as a heightmap onto 3D surfaces (sphere, plane, torus) and create organic bumpy surfaces through SDF displacement. Suitable for biological organisms, alien terrain, and similar effects.
+
+**Complete Image pass example** (sphere + RD displacement):
+```glsl
+vec2 map(in vec3 pos) {
+    vec3 p = normalize(pos);
+    vec2 uv;
+    uv.x = 0.5 + atan(p.z, p.x) / (2.0 * 3.14159);
+    uv.y = 0.5 - asin(p.y) / 3.14159;
+    float y = texture(iChannel0, uv).y;
+    float displacement = 0.1 * y;
+    float sd = length(pos) - (2.0 + displacement);
+    return vec2(sd, y);
+}
+
+// Use map() in the raymarch loop
+// Normals computed via central difference of map()
+// Material color based on y value returned by map() for color mapping
+```
+
+### 2. CA/RD + Particle Systems
+
+Use CA/RD fields as velocity fields or spawn probability fields for particles:
+- Particles flow along RD gradients
+- New particles spawn at living CA cells
+- Produces "living" particle effects
+
+**Implementation approach**:
+- Buffer A: RD/CA simulation
+- Buffer B: Particle position storage (each pixel stores one particle's position)
+- Image: Visualize particles and/or fields
+
+### 3. RD + Post-Processing Lighting
+
+In the Image pass, compute normals from RD values → bump mapping → lighting/reflection/refraction. Combined with environment maps (cubemaps), this can produce etched metal surfaces, liquid ripples, and similar effects.
+
+**Key techniques**:
+- Compute gradients from RD scalar field to get normals
+- Use Phong/Blinn-Phong lighting model
+- Normals used to sample cubemaps for environment reflections
+- Multiple color mapping schemes increase visual richness
+
+### 4. CA + Color Decay Trails
+
+Living cells use high values; after death, values decay each frame (instead of immediately dropping to zero), with different decay rates in RGB channels producing colorful trailing effects. This is the core technique of the Automata X Showcase.
+
+**Implementation code example**:
+```glsl
+// Add decay logic after CA update
+vec4 prev = texelFetch(iChannel0, px, 0);
+if (f > 0.5) {
+    // Living cell: set to high value
+    fragColor = vec4(1.0, 1.0, 1.0, 1.0);
+} else {
+    // Dead cell: different decay rates per channel
+    fragColor = vec4(
+        prev.x * 0.99,   // R decays slowly → longest red trail
+        prev.y * 0.95,   // G decays moderately
+        prev.z * 0.90,   // B decays fast → shortest blue trail
+        1.0
+    );
+}
+```
+
+### 5. RD + Domain Warping
+
+Apply vortex warp or spiral zoom domain transforms to the RD sampling UV before computing, causing the diffusion field itself to be distorted, producing spiral and vortex-like organic patterns. Flexi's Expansive RD uses this technique.
+
+**Implementation code example**:
+```glsl
+// Apply domain transform to UV before RD update
+vec2 warpedUV = uv;
+// Vortex warp
+float angle = length(uv - 0.5) * 3.14159 * 2.0;
+float s = sin(angle * 0.1);
+float c = cos(angle * 0.1);
+warpedUV = (warpedUV - 0.5) * mat2(c, -s, s, c) + 0.5;
+
+// Sample state using transformed UV
+vec2 state = texture(iChannel0, warpedUV).xy;
+// Then proceed with normal RD computation...
+```
--- a/skills/shader-dev/reference/color-palette.md
+++ b/skills/shader-dev/reference/color-palette.md
@@ -0,0 +1,481 @@
+# Color Palette & Color Space Techniques - Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- GLSL basic syntax: `vec3`, `mix`, `clamp`, `smoothstep`, `fract`, `mod`
+- Basic properties of trigonometric functions `cos`/`sin` (periodicity, range [-1, 1])
+- Color space fundamentals: RGB is a cube, HSV/HSL is cylindrical coordinates, Lab/Lch is a perceptually uniform space
+- Gamma correction concept: monitors store sRGB (nonlinear), shading computations should be performed in linear space
+
+## Step-by-Step Tutorial
+
+### Step 1: Cosine Palette Function
+
+**What**: Implement the most fundamental and commonly used procedural palette function
+
+**Why**: Only 4 vec3 parameters are needed to generate infinite smooth color ramps, with extremely low computational cost (a single cos operation). This function is widely used in the ShaderToy community and is the cornerstone of procedural coloring.
+
+**Mathematical Derivation**:
+```
+color(t) = a + b * cos(2pi * (c * t + d))
+```
+
+- **a** = brightness offset (center luminance of the color ramp), typically ~0.5
+- **b** = amplitude (color contrast), typically ~0.5
+- **c** = frequency (how many times each channel oscillates), vec3(1,1,1) means R/G/B each oscillate once
+- **d** = phase offset (hue starting position per channel), this is the key parameter controlling color style
+
+When a=b=0.5, c=(1,1,1), changing d alone generates completely different color ramps like rainbow, warm tones, cool tones, etc.
+
+**Code**:
+```glsl
+// Cosine Palette
+// a: offset/center color, b: amplitude, c: frequency, d: phase
+// t: input scalar, typically [0,1] but can exceed this range
+vec3 palette(float t, vec3 a, vec3 b, vec3 c, vec3 d) {
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Step 2: Classic Parameter Presets
+
+**What**: Provide ready-to-use palette parameters
+
+**Why**: The original demo showcases 7 classic parameter combinations, covering common needs like rainbow, warm, cool, and duotone schemes. Memorizing a few parameter sets enables rapid color adjustment.
+
+**Code**:
+```glsl
+// Rainbow color ramp (classic)
+// a=(.5,.5,.5) b=(.5,.5,.5) c=(1,1,1) d=(0.0, 0.33, 0.67)
+
+// Warm gradient
+// a=(.5,.5,.5) b=(.5,.5,.5) c=(1,1,1) d=(0.0, 0.10, 0.20)
+
+// Blue-purple to orange tones
+// a=(.5,.5,.5) b=(.5,.5,.5) c=(1,0.7,0.4) d=(0.0, 0.15, 0.20)
+
+// Custom warm-cool mix
+// a=(.8,.5,.4) b=(.2,.4,.2) c=(2,1,1) d=(0.0, 0.25, 0.25)
+
+// Simplified version: fix a/b/c, just adjust d
+vec3 palette(float t) {
+    vec3 a = vec3(0.5, 0.5, 0.5);
+    vec3 b = vec3(0.5, 0.5, 0.5);
+    vec3 c = vec3(1.0, 1.0, 1.0);
+    vec3 d = vec3(0.263, 0.416, 0.557);
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Step 3: HSV to RGB Conversion (Standard + Smooth)
+
+**What**: Implement branchless HSV to RGB conversion and its cubic smooth variant
+
+**Why**: HSV space is ideal for rotating by hue, scaling by saturation/value. The standard implementation has C0 discontinuity (piecewise linear); the smooth version achieves C1 continuity through Hermite interpolation, producing smoother hue animation.
+
+**Principle**: Using vectorized `mod` + `abs` + `clamp` operations avoids if/else branching:
+
+```
+rgb = clamp(abs(mod(H*6 + vec3(0,4,2), 6) - 3) - 1, 0, 1)
+```
+
+This essentially uses piecewise linear functions to model R/G/B channel variation with hue H. C1 discontinuity can be eliminated via cubic smoothing `rgb*rgb*(3-2*rgb)`.
+
+**Code**:
+```glsl
+// Standard HSV -> RGB (branchless)
+// c.x = Hue [0,1], c.y = Saturation [0,1], c.z = Value [0,1]
+vec3 hsv2rgb(vec3 c) {
+    vec3 rgb = clamp(abs(mod(c.x * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+    return c.z * mix(vec3(1.0), rgb, c.y);
+}
+
+// Smooth HSV -> RGB (C1 continuous)
+vec3 hsv2rgb_smooth(vec3 c) {
+    vec3 rgb = clamp(abs(mod(c.x * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+    rgb = rgb * rgb * (3.0 - 2.0 * rgb); // Cubic Hermite smoothing
+    return c.z * mix(vec3(1.0), rgb, c.y);
+}
+```
+
+### Step 4: HSL to RGB Conversion
+
+**What**: Implement HSL color space conversion
+
+**Why**: HSL is more intuitive than HSV — L=0 is black, L=1 is white, L=0.5 is pure color. Suitable for scenarios requiring control over "lightness" rather than "value" (e.g., mapping iteration counts to hue in data visualization).
+
+**Code**:
+```glsl
+// Hue -> RGB base color (branchless)
+vec3 hue2rgb(float h) {
+    return clamp(abs(mod(h * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+}
+
+// HSL -> RGB
+// h: Hue [0,1], s: Saturation [0,1], l: Lightness [0,1]
+vec3 hsl2rgb(float h, float s, float l) {
+    vec3 rgb = hue2rgb(h);
+    return l + s * (rgb - 0.5) * (1.0 - abs(2.0 * l - 1.0));
+}
+```
+
+### Step 5: Bidirectional RGB <-> HSV Conversion
+
+**What**: Implement the reverse conversion from RGB back to HSV
+
+**Why**: When blending colors in HSV space, you need to first convert both endpoint colors from RGB to HSV, interpolate, then convert back. RGB to HSV uses a classic branchless implementation.
+
+**Code**:
+```glsl
+// RGB -> HSV (branchless method)
+vec3 rgb2hsv(vec3 c) {
+    vec4 K = vec4(0.0, -1.0 / 3.0, 2.0 / 3.0, -1.0);
+    vec4 p = mix(vec4(c.bg, K.wz), vec4(c.gb, K.xy), step(c.b, c.g));
+    vec4 q = mix(vec4(p.xyw, c.r), vec4(c.r, p.yzx), step(p.x, c.r));
+    float d = q.x - min(q.w, q.y);
+    float e = 1.0e-10;
+    return vec3(abs(q.z + (q.w - q.y) / (6.0 * d + e)), d / (q.x + e), q.x);
+}
+```
+
+### Step 6: CIE Lab/Lch Perceptually Uniform Interpolation
+
+**What**: Implement the complete RGB <-> Lab <-> Lch conversion pipeline
+
+**Why**: Linear interpolation in RGB and HSV spaces is not perceptually uniform — the human eye is more sensitive to green than red. Interpolation in Lch (Lightness-Chroma-Hue) space produces the most visually natural gradients, especially suitable for UI color schemes and artistic gradients.
+
+**Mathematical Derivation**: The conversion pipeline is RGB -> XYZ (via sRGB D65 matrix) -> Lab (via nonlinear mapping) -> Lch (via converting a,b to polar coordinates: Chroma, Hue). The inverse process reverses each step.
+
+**Code**:
+```glsl
+// Helper function: XYZ nonlinear mapping
+float xyzF(float t) { return mix(pow(t, 1.0/3.0), 7.787037 * t + 0.139731, step(t, 0.00885645)); }
+float xyzR(float t) { return mix(t * t * t, 0.1284185 * (t - 0.139731), step(t, 0.20689655)); }
+
+// RGB -> Lch (via XYZ -> Lab -> polar coordinates)
+vec3 rgb2lch(vec3 c) {
+    // RGB -> XYZ (sRGB D65 matrix)
+    c *= mat3(0.4124, 0.3576, 0.1805,
+              0.2126, 0.7152, 0.0722,
+              0.0193, 0.1192, 0.9505);
+    // XYZ -> Lab
+    c = vec3(xyzF(c.x), xyzF(c.y), xyzF(c.z));
+    vec3 lab = vec3(max(0.0, 116.0 * c.y - 16.0),
+                    500.0 * (c.x - c.y),
+                    200.0 * (c.y - c.z));
+    // Lab -> Lch (convert a,b to polar: Chroma, Hue)
+    return vec3(lab.x, length(lab.yz), atan(lab.z, lab.y));
+}
+
+// Lch -> RGB (inverse process)
+vec3 lch2rgb(vec3 c) {
+    // Lch -> Lab
+    c = vec3(c.x, cos(c.z) * c.y, sin(c.z) * c.y);
+    // Lab -> XYZ
+    float lg = (1.0 / 116.0) * (c.x + 16.0);
+    vec3 xyz = vec3(xyzR(lg + 0.002 * c.y),
+                    xyzR(lg),
+                    xyzR(lg - 0.005 * c.z));
+    // XYZ -> RGB (inverse matrix)
+    return xyz * mat3( 3.2406, -1.5372, -0.4986,
+                      -0.9689,  1.8758,  0.0415,
+                       0.0557, -0.2040,  1.0570);
+}
+
+// Circular hue interpolation (avoids 0/360 degree wraparound jump)
+float lerpAngle(float a, float b, float x) {
+    float ang = mod(mod((a - b), 6.28318) + 9.42477, 6.28318) - 3.14159;
+    return ang * x + b;
+}
+
+// Lch space linear interpolation
+vec3 lerpLch(vec3 a, vec3 b, float x) {
+    return vec3(mix(b.xy, a.xy, x), lerpAngle(a.z, b.z, x));
+}
+```
+
+### Step 7: sRGB Gamma and Linear Space Workflow
+
+**What**: Implement correct sRGB encode/decode functions and a complete linear-space pipeline
+
+**Why**: All lighting/blending computations must be performed in linear space. sRGB textures need to be decoded first (pow 2.2 or exact piecewise function), then encoded back to sRGB after computation. Ignoring this step causes colors to appear too dark and unnatural blending.
+
+**Complete Pipeline**: sRGB texture decode -> linear space shading/blending -> Reinhard tonemap -> sRGB encode
+
+**Code**:
+```glsl
+// Exact sRGB encode (linear -> sRGB)
+float sRGB_encode(float t) {
+    return mix(1.055 * pow(t, 1.0/2.4) - 0.055, 12.92 * t, step(t, 0.0031308));
+}
+vec3 sRGB_encode(vec3 c) {
+    return vec3(sRGB_encode(c.x), sRGB_encode(c.y), sRGB_encode(c.z));
+}
+
+// Fast approximation (sufficient for most scenarios)
+// Decode: pow(color, vec3(2.2))
+// Encode: pow(color, vec3(1.0/2.2))
+
+// Reinhard tone mapping (maps HDR values to [0,1])
+vec3 tonemap_reinhard(vec3 col) {
+    return col / (1.0 + col);
+}
+```
+
+### Step 8: Blackbody Radiation Palette
+
+**What**: Implement a physics-based temperature-to-color mapping
+
+**Why**: Used for fire, lava, stars, hot metal, and other scenarios requiring physically realistic emission colors. More believable than manual color tuning, with intuitive parameterization (input is just temperature).
+
+**Mathematical Derivation**: Maps temperature T to CIE chromaticity coordinates (cx, cy) via Planck locus approximation, then converts to XYZ -> RGB, combined with Stefan-Boltzmann law (T^4) brightness scaling to produce physically realistic emission colors.
+
+**Code**:
+```glsl
+// Blackbody radiation palette
+// t: normalized temperature [0,1], internally mapped to [0, TEMP_MAX] Kelvin
+#define TEMP_MAX 4000.0 // Tunable: maximum temperature (K), affects color gamut width
+vec3 blackbodyPalette(float t) {
+    t *= TEMP_MAX;
+    // Planck locus approximation on CIE chromaticity diagram
+    float cx = (0.860117757 + 1.54118254e-4 * t + 1.28641212e-7 * t * t)
+             / (1.0 + 8.42420235e-4 * t + 7.08145163e-7 * t * t);
+    float cy = (0.317398726 + 4.22806245e-5 * t + 4.20481691e-8 * t * t)
+             / (1.0 - 2.89741816e-5 * t + 1.61456053e-7 * t * t);
+    // CIE chromaticity coordinates -> XYZ tristimulus values
+    float d = 2.0 * cx - 8.0 * cy + 4.0;
+    vec3 XYZ = vec3(3.0 * cx / d, 2.0 * cy / d, 1.0 - (3.0 * cx + 2.0 * cy) / d);
+    // XYZ -> sRGB matrix
+    vec3 RGB = mat3(3.240479, -0.969256, 0.055648,
+                   -1.537150,  1.875992, -0.204043,
+                   -0.498535,  0.041556,  1.057311) * vec3(XYZ.x / XYZ.y, 1.0, XYZ.z / XYZ.y);
+    // Stefan-Boltzmann brightness scaling (T^4)
+    return max(RGB, 0.0) * pow(t * 0.0004, 4.0);
+}
+```
+
+## Variant Detailed Descriptions
+
+### Variant 1: Multi-Harmonic Cosine Palette (Anti-Aliased)
+
+**Difference from base version**: Extends the single cos to 9 layers of different frequencies for richer color detail; uses `fwidth()` for band-limited filtering to prevent high-frequency aliasing.
+
+**Principle**: `fwidth()` returns the variation across adjacent pixels. When oscillation frequency exceeds pixel resolution (i.e., w approaches or exceeds one full TAU period), `smoothstep` attenuates the cos contribution to 0, achieving approximate sinc filtering.
+
+**Complete code**:
+```glsl
+// Band-limited cos: automatically attenuates when oscillation frequency exceeds pixel resolution
+vec3 fcos(vec3 x) {
+    vec3 w = fwidth(x);
+    return cos(x) * smoothstep(TAU, 0.0, w); // Approximate sinc filtering
+}
+
+// 9-layer stacked palette
+vec3 getColor(float t) {
+    vec3 col = vec3(0.4);
+    col += 0.12 * fcos(TAU * t *   1.0 + vec3(0.0, 0.8, 1.1));
+    col += 0.11 * fcos(TAU * t *   3.1 + vec3(0.3, 0.4, 0.1));
+    col += 0.10 * fcos(TAU * t *   5.1 + vec3(0.1, 0.7, 1.1));
+    col += 0.09 * fcos(TAU * t *   9.1 + vec3(0.2, 0.8, 1.4));
+    col += 0.08 * fcos(TAU * t *  17.1 + vec3(0.2, 0.6, 0.7));
+    col += 0.07 * fcos(TAU * t *  31.1 + vec3(0.1, 0.6, 0.7));
+    col += 0.06 * fcos(TAU * t *  65.1 + vec3(0.0, 0.5, 0.8));
+    col += 0.06 * fcos(TAU * t * 115.1 + vec3(0.1, 0.4, 0.7));
+    col += 0.09 * fcos(TAU * t * 265.1 + vec3(1.1, 1.4, 2.7));
+    return col;
+}
+```
+
+### Variant 2: Hash-Driven Per-Tile Color Variation
+
+**Difference from base version**: Uses a hash function to generate a unique ID for each grid/tile, feeding the ID as the palette's t value to achieve "same palette but different color per tile".
+
+**Use cases**: Procedural tiles/brickwork/mosaics, Voronoi cell coloring, building facades.
+
+**Complete code**:
+```glsl
+// Hash function (sin-free version, avoids precision issues)
+float hash12(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * 0.1031);
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+// Usage in tile coloring
+vec2 tileId = floor(uv);
+vec3 tileColor = palette(hash12(tileId)); // Different color per tile
+```
+
+### Variant 3: Saturation-Preserving Improved RGB Interpolation
+
+**Difference from base version**: Detects saturation decay during RGB space interpolation and displaces colors away from the gray diagonal, achieving approximate perceptually uniform interpolation at very low cost (~15 instructions).
+
+**Principle**:
+1. Compute RGB linear interpolation result `ic`
+2. Compute the difference between expected saturation `mix(getsat(a), getsat(b), x)` and actual saturation `getsat(ic)`
+3. Find the direction away from the gray diagonal `dir`
+4. Compensate saturation loss along that direction
+
+**Complete code**:
+```glsl
+float getsat(vec3 c) {
+    float mi = min(min(c.x, c.y), c.z);
+    float ma = max(max(c.x, c.y), c.z);
+    return (ma - mi) / (ma + 1e-7);
+}
+
+vec3 iLerp(vec3 a, vec3 b, float x) {
+    vec3 ic = mix(a, b, x) + vec3(1e-6, 0.0, 0.0);
+    float sd = abs(getsat(ic) - mix(getsat(a), getsat(b), x));
+    vec3 dir = normalize(vec3(2.0*ic.x - ic.y - ic.z,
+                              2.0*ic.y - ic.x - ic.z,
+                              2.0*ic.z - ic.y - ic.x));
+    float lgt = dot(vec3(1.0), ic);
+    float ff = dot(dir, normalize(ic));
+    ic += 1.5 * dir * sd * ff * lgt; // 1.5 = DSP_STR, tunable
+    return clamp(ic, 0.0, 1.0);
+}
+```
+
+### Variant 4: Circular Hue Interpolation (HSV/Lch Space)
+
+**Difference from base version**: When interpolating in color spaces with a circular hue dimension, the hue wraparound from 0.9 to 0.1 crossing through 1.0/0.0 must be handled, otherwise interpolation takes the "long way" (e.g., red -> magenta -> blue -> cyan -> green -> yellow -> red instead of directly red -> orange -> yellow).
+
+**Complete code**:
+```glsl
+// HSV space circular hue interpolation (hue range [0,1])
+vec3 lerpHSV(vec3 a, vec3 b, float x) {
+    float hue = (mod(mod((b.x - a.x), 1.0) + 1.5, 1.0) - 0.5) * x + a.x;
+    return vec3(hue, mix(a.yz, b.yz, x));
+}
+
+// Lch space circular hue interpolation (hue range [0, 2pi])
+float lerpAngle(float a, float b, float x) {
+    float ang = mod(mod((a - b), TAU) + PI * 3.0, TAU) - PI;
+    return ang * x + b;
+}
+```
+
+### Variant 5: Additive Color Stacking (Glow/HDR Effects)
+
+**Difference from base version**: Instead of selecting a single color, additively stack palette colors from multiple iterations, producing natural HDR glow effects. Requires tone mapping.
+
+**Use cases**: Fractal glow, halos, laser effects, particle systems, volumetric light.
+
+**Complete code**:
+```glsl
+vec3 finalColor = vec3(0.0);
+for (int i = 0; i < 4; i++) {
+    vec3 col = palette(length(uv) + float(i) * 0.4 + iTime * 0.4);
+    float glow = pow(0.01 / abs(sdfValue), 1.2); // Inverse-distance glow
+    finalColor += col * glow; // Additive stacking, naturally produces HDR
+}
+finalColor = finalColor / (1.0 + finalColor); // Reinhard tonemap
+```
+
+## Performance Optimization Details
+
+### 1. Branchless HSV/HSL Conversion
+Use vectorized `mod`/`abs`/`clamp` operations instead of if-else. All implementations above are already branchless. Branching is expensive on GPUs (especially divergent branches within a warp/wavefront); branchless versions ensure all threads follow the same execution path.
+
+### 2. Band-Limited Filtering for Multi-Harmonic Palettes
+High-frequency cos layers produce Moire patterns at distance or small angles. Using `fwidth()` + `smoothstep` for automatic attenuation costs only ~2 extra instructions to eliminate aliasing. `fwidth()` leverages hardware partial derivative computation at nearly zero cost.
+
+### 3. Lch Pipeline Cost Analysis
+The complete RGB -> XYZ -> Lab -> Lch pipeline requires ~57 instructions, including matrix multiplication, pow, atan, etc. If you only need "slightly better than RGB" interpolation, use `iLerp` (improved RGB, ~15 instructions) instead of the full Lch pipeline for an excellent quality/performance ratio.
+
+### 4. sRGB Gamma Approximation
+The exact piecewise linear sRGB conversion requires branching. In most visual scenarios, `pow(c, 2.2)` / `pow(c, 1.0/2.2)` is sufficiently accurate (error < 0.4%) and allows better compiler optimization. The exact version uses `mix` + `step` for branchless implementation but costs a few extra instructions.
+
+### 5. Cosine Palette Vectorization
+`a + b * cos(TAU*(c*t+d))` compiles to 1 MAD + 1 COS + 1 MAD on the GPU, approximately 3-4 clock cycles, extremely efficient. All three channels (R/G/B) execute in parallel via SIMD.
+
+### 6. Texture sRGB Decoding
+If texture data is already stored as sRGB, use `pow(texture(...).rgb, vec3(2.2))` to decode to linear space before computation, avoiding color distortion from lighting in nonlinear space. In OpenGL/Vulkan, you can also use the `GL_SRGB8_ALPHA8` format for automatic hardware decoding.
+
+## Combination Suggestions in Detail
+
+### 1. Cosine Palette + SDF Raymarching
+The most classic combination. Use the normal direction, distance, or surface attributes of ray march hit points as palette t input, producing rich surface coloring.
+
+**Example**:
+```glsl
+// After SDF raymarching hit
+vec3 nor = calcNormal(pos);
+float t_palette = dot(nor, vec3(0.0, 1.0, 0.0)) * 0.5 + 0.5; // Normal y-component mapped to [0,1]
+vec3 col = palette(t_palette + iTime * 0.1);
+```
+
+### 2. HSL/HSV + Data Visualization
+Map iteration counts, distance values, or gradient directions to hue (H), encoding other dimensions via saturation/lightness. E.g., using different hues to mark each step in SDF trace visualization.
+
+**Example**:
+```glsl
+// Mandelbrot iteration coloring
+float h = float(iterations) / float(maxIterations);
+vec3 col = hsl2rgb(h, 0.8, 0.5);
+```
+
+### 3. Cosine Palette + Fractals/Noise
+Use `length(uv)` or `fbm(p)` output plus `iTime` as t, combined with additive stacking and inverse-distance glow, producing psychedelic dynamic color effects.
+
+**Example**:
+```glsl
+float n = fbm(uv * 3.0 + iTime * 0.2);
+vec3 col = palette(n + length(uv) * 0.5);
+```
+
+### 4. Blackbody Palette + Volume Rendering/Fire
+Map a temperature field (noise-driven or physically simulated) through `blackbodyPalette()` to color, producing physically plausible fire, lava, and stellar effects.
+
+**Example**:
+```glsl
+// In fire volume rendering
+float temperature = fbm(pos * 2.0 - vec3(0, iTime, 0)); // Noise-driven temperature field
+vec3 fireColor = blackbodyPalette(temperature);
+fireColor = tonemap_reinhard(fireColor); // HDR -> LDR
+```
+
+### 5. Linear Space Workflow + Any Palette Technique
+Regardless of which palette method is used, always follow: sRGB texture decode -> linear space shading/blending -> Reinhard tonemap -> sRGB encode as the complete pipeline, ensuring physically correct color computation.
+
+**Complete pipeline example**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // 1. Decode sRGB texture to linear space
+    vec3 texColor = pow(texture(iChannel0, uv).rgb, vec3(2.2));
+
+    // 2. Perform all shading computations in linear space
+    vec3 col = texColor * lighting;
+    col += palette(t) * emission;
+
+    // 3. Tone mapping (HDR -> LDR)
+    col = col / (1.0 + col);
+
+    // 4. sRGB encode
+    col = pow(col, vec3(1.0/2.2));
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+### 6. Hash + Palette + Tiling System
+In procedural tiles/brickwork/mosaics, use `hash(tileID)` as palette input so each tile has a different color while maintaining an overall coordinated color scheme.
+
+**Complete example**:
+```glsl
+vec2 tileUV = fract(uv * 10.0);
+vec2 tileID = floor(uv * 10.0);
+
+// Base color per tile
+float h = hash12(tileID);
+vec3 tileColor = palette(h);
+
+// Internal tile pattern (e.g., circle)
+float d = length(tileUV - 0.5);
+float mask = smoothstep(0.4, 0.38, d);
+
+vec3 col = mix(vec3(0.05), tileColor, mask);
+```
--- a/skills/shader-dev/reference/csg-boolean-operations.md
+++ b/skills/shader-dev/reference/csg-boolean-operations.md
@@ -0,0 +1,466 @@
+# CSG Boolean Operations — Detailed Reference
+
+This document is a complete reference manual for [SKILL.md](SKILL.md), including step-by-step tutorials, mathematical derivations, variant details, and advanced usage.
+
+## Use Cases
+
+- **Geometric Modeling**: Build complex shapes from simple primitives (spheres, boxes, cylinders) through boolean combinations — nuts, buildings, mechanical parts, organic characters, etc.
+- **Ray Marching Scenes**: All SDF-based ray marching rendering relies on CSG to compose scenes
+- **Organic Forms**: Use smooth variants (smin/smax) to create natural transitions between shapes, suitable for character modeling (snails, elephants), clouds, terrain, etc.
+- **Architectural / Industrial Design**: Use subtraction to carve windows and doorways, intersection to cut shapes
+- **2D SDF Compositing**: Equally applicable to 2D scenes (cyberpunk clouds, UI shape compositing, etc.)
+
+## Prerequisites
+
+- GLSL basic syntax (`vec3`, `float`, `mix`, `clamp`, `min`, `max`)
+- SDF (Signed Distance Field) concept: the signed distance from each point in space to the nearest surface, with negative values indicating the interior
+- Basic SDF primitives: sphere `length(p) - r`, box `length(max(abs(p)-b, 0.0))`
+- Ray Marching basics: stepping from the camera along the view direction, using SDF values to determine step size
+
+## Core Principles in Detail
+
+The essence of CSG boolean operations is **per-point value operations on two distance fields**:
+
+| Operation | Math Expression | Meaning |
+|-----------|----------------|---------|
+| Union | `min(d1, d2)` | Take the nearest surface, keeping both shapes |
+| Intersection | `max(d1, d2)` | Take the farthest surface, keeping only the overlap |
+| Subtraction | `max(d1, -d2)` | Use d2's interior (negated) to cut d1 |
+
+**Hard booleans** produce sharp edges at the junction. **Smooth booleans** (smooth min/max) introduce a blend band in the transition region, "fusing" the two shapes together. The key parameter `k` controls the blend band width:
+
+- Larger `k` means wider, smoother transitions
+- Smaller `k` means closer to hard boolean sharp edges
+- `k = 0` degenerates to hard boolean
+
+Three mainstream smooth formulas, each with distinct characteristics:
+1. **Polynomial**: Most commonly used, fast to compute, natural transitions
+2. **Quadratic optimized**: More compact and mathematically elegant
+3. **Exponential**: Smoothest transitions but more expensive to compute
+
+## Implementation Steps in Detail
+
+### Step 1: Hard Boolean Operations
+
+**What**: Implement the three basic boolean operations — union, intersection, subtraction.
+
+**Why**: These are the foundation of all CSG operations. `min` selects the nearest surface to achieve union; `max` selects the farthest surface for intersection; negating the second operand and taking `max` with the first achieves subtraction (keeping the region of d1 that is not inside d2).
+
+```glsl
+// Union: keep both shapes
+float opUnion(float d1, float d2) {
+    return min(d1, d2);
+}
+
+// Intersection: keep only the overlapping region
+float opIntersection(float d1, float d2) {
+    return max(d1, d2);
+}
+
+// Subtraction: carve d2 out of d1
+float opSubtraction(float d1, float d2) {
+    return max(d1, -d2);
+}
+```
+
+### Step 2: Smooth Union — Polynomial Version
+
+**What**: Implement a union operation with a blend transition, producing rounded junctions between two shapes.
+
+**Why**: Hard `min` produces C0 continuity (sharp creases) at the SDF junction. Polynomial smooth min interpolates within the transition band where `|d1-d2| < k`, producing C1 continuity (smooth transitions). In the formula, `h` is the normalized blend factor, and the `k*h*(1-h)` term ensures the distance field correctly dips in the transition region (producing more accurate distance values than plain `mix`).
+
+```glsl
+// Polynomial smooth union
+// k: blend radius, typical values 0.05~0.5
+float opSmoothUnion(float d1, float d2, float k) {
+    float h = clamp(0.5 + 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) - k * h * (1.0 - h);
+}
+```
+
+### Step 3: Smooth Subtraction and Smooth Intersection — Polynomial Version
+
+**What**: Extend the smooth union approach to subtraction and intersection.
+
+**Why**: Subtraction = intersection with an inverted SDF; intersection = inverted union of inverted inputs. The sign changes in the formulas reflect this duality. Note that subtraction uses `d2+d1` (not `d2-d1`), because d1 is negated in the operation.
+
+```glsl
+// Smooth subtraction: smoothly carve d2 out of d1
+float opSmoothSubtraction(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 + d1) / k, 0.0, 1.0);
+    return mix(d2, -d1, h) + k * h * (1.0 - h);
+}
+
+// Smooth intersection: smoothly keep the overlapping region
+float opSmoothIntersection(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) + k * h * (1.0 - h);
+}
+```
+
+### Step 4: Quadratic Optimized Smooth Operations
+
+**What**: Implement smin/smax using a more compact quadratic polynomial formula.
+
+**Why**: This version is mathematically equivalent but more concise with fewer branches. `h = max(k - abs(a-b), 0.0)` directly computes the influence within the transition band, being non-zero only when `|a-b| < k`. `h*h*0.25/k` is the quadratic correction term. smax can be derived directly through smin's duality: `smax(a,b,k) = -smin(-a,-b,k)`.
+
+```glsl
+// Quadratic optimized smooth union
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+
+// Quadratic optimized smooth intersection / smooth max
+float smax(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+
+// Subtraction via smax
+float sSub(float d1, float d2, float k) {
+    return smax(d1, -d2, k);
+}
+```
+
+### Step 5: Basic SDF Primitives
+
+**What**: Define the basic shape SDFs used for combination.
+
+**Why**: CSG needs operands. Spheres and boxes are the most common primitives; cylinders are often used for drilling holes.
+
+```glsl
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return length(max(d, 0.0)) + min(max(d.x, max(d.y, d.z)), 0.0);
+}
+
+float sdCylinder(vec3 p, float h, float r) {
+    vec2 d = abs(vec2(length(p.xz), p.y)) - vec2(r, h);
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0));
+}
+```
+
+### Step 6: CSG Combination for Scene Construction
+
+**What**: Combine primitives with boolean operations to build complex geometry.
+
+**Why**: The power of CSG lies in combination. Classic example: intersecting a sphere with a cube yields a rounded cube, then subtracting three cylinders produces a nut shape.
+
+```glsl
+float mapScene(vec3 p) {
+    // Primitives
+    float cube = sdBox(p, vec3(1.0));
+    float sphere = sdSphere(p, 1.2);
+    float cylX = sdCylinder(p.yzx, 2.0, 0.4); // Along X axis
+    float cylY = sdCylinder(p.xyz, 2.0, 0.4); // Along Y axis
+    float cylZ = sdCylinder(p.zxy, 2.0, 0.4); // Along Z axis
+
+    // CSG combination: (Cube ∩ Sphere) - three cylinders
+    float shape = opIntersection(cube, sphere);
+    float holes = opUnion(cylX, opUnion(cylY, cylZ));
+    return opSubtraction(shape, holes);
+}
+```
+
+### Step 7: Organic Body Modeling with Smooth CSG
+
+**What**: Use smin/smax with different k values to blend multiple ellipsoids/capsules into organic characters.
+
+**Why**: Different body parts need different blend amounts — large k values for broad connections (torso-legs), small k values for fine details (eyes-head). This is the core technique for organic character modeling with smooth CSG.
+
+```glsl
+float mapCreature(vec3 p) {
+    // Torso
+    float body = sdSphere(p, 0.5);
+
+    // Head — larger blend radius
+    float head = sdSphere(p - vec3(0.0, 0.6, 0.3), 0.25);
+    float d = smin(body, head, 0.15);
+
+    // Limbs — medium blend radius
+    float leg = sdCylinder(p - vec3(0.2, -0.5, 0.0), 0.3, 0.08);
+    d = smin(d, leg, 0.08);
+
+    // Eye sockets — small blend radius for smooth subtraction
+    float eye = sdSphere(p - vec3(0.05, 0.75, 0.4), 0.05);
+    d = smax(d, -eye, 0.02);
+
+    return d;
+}
+```
+
+### Step 8: Ray Marching Main Loop
+
+**What**: Render the SDF scene using the sphere tracing algorithm.
+
+**Why**: SDF scenes cannot be rendered with traditional rasterization. Ray Marching is needed: cast a ray from each pixel, advance by the current point's distance to the nearest surface (i.e., the SDF value) at each step, until close enough to a surface or out of range.
+
+```glsl
+float rayMarch(vec3 ro, vec3 rd, float maxDist) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + rd * t;
+        float d = mapScene(p);
+        if (d < SURF_DIST) return t;
+        t += d;
+        if (t > maxDist) break;
+    }
+    return -1.0; // No hit
+}
+```
+
+### Step 9: Normal Computation and Lighting
+
+**What**: Compute the surface normal by taking the finite-difference gradient of the SDF, then apply lighting.
+
+**Why**: The gradient direction of the SDF is the surface normal direction. Using tetrahedral sampling only requires 4 SDF samples, which is more efficient than the 6 needed for central differences.
+
+```glsl
+vec3 calcNormal(vec3 pos) {
+    vec2 e = vec2(0.001, -0.001);
+    return normalize(
+        e.xyy * mapScene(pos + e.xyy) +
+        e.yyx * mapScene(pos + e.yyx) +
+        e.yxy * mapScene(pos + e.yxy) +
+        e.xxx * mapScene(pos + e.xxx)
+    );
+}
+```
+
+## Common Variants in Detail
+
+### Variant 1: Polynomial Smooth Union (Most Universal Version)
+
+Differs from the basic (quadratic optimized) version by using the `clamp + mix` form, which makes the code intent more intuitive. Mathematically approximately equivalent to the quadratic version, but with slight differences in the transition curve in extreme cases.
+
+```glsl
+float opSmoothUnion(float d1, float d2, float k) {
+    float h = clamp(0.5 + 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) - k * h * (1.0 - h);
+}
+
+float opSmoothSubtraction(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 + d1) / k, 0.0, 1.0);
+    return mix(d2, -d1, h) + k * h * (1.0 - h);
+}
+
+float opSmoothIntersection(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) + k * h * (1.0 - h);
+}
+```
+
+### Variant 2: Exponential Smooth Union
+
+**Difference from the basic version**: Uses `exp` for implementation, with smoother transitions (C-infinity continuity vs polynomial's C1). However, `exp` is more expensive. Suitable for terrain modeling (e.g., craters). The parameter `k` has a different meaning — in the exponential version, larger `k` produces sharper transitions (opposite to polynomial). Used in RME4-Crater for volcano terrain blending.
+
+```glsl
+float sminExp(float a, float b, float k) {
+    float res = exp(-k * a) + exp(-k * b);
+    return -log(res) / k;
+}
+```
+
+### Variant 3: Smooth Operations with Color Blending
+
+**Difference from the basic version**: Blends material colors using the same blend factor during geometric fusion. This way, the material at the junction transitions naturally rather than showing an abrupt color boundary. Useful for color gradients between organic shape junctions (e.g., shell and body).
+
+```glsl
+// vec3 overloaded smax, blending colors simultaneously
+vec3 smax(vec3 a, vec3 b, float k) {
+    vec3 h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+
+// Alternatively, a separated version: returns the blend factor to the caller
+float sminWithFactor(float a, float b, float k, out float blend) {
+    float h = clamp(0.5 + 0.5 * (b - a) / k, 0.0, 1.0);
+    blend = h;
+    return mix(b, a, h) - k * h * (1.0 - h);
+}
+// Usage example:
+// float blend;
+// float d = sminWithFactor(d1, d2, 0.1, blend);
+// vec3 color = mix(color2, color1, blend);
+```
+
+### Variant 4: Layered CSG Modeling (Architectural / Industrial Scenes)
+
+**Difference from the basic version**: Does not use smooth variants; instead uses multi-level nested hard boolean operations to build precise geometric structures. An additive-then-subtractive pattern — first build the overall form with union, then carve details (windows, doorways) with subtraction. Commonly used for architectural modeling.
+
+```glsl
+float sdBuilding(vec3 p) {
+    // Step 1: Additive phase — build walls
+    float walls = sdBox(p, vec3(1.0, 0.8, 1.0));
+
+    // Step 2: Additive — roof
+    vec3 roofP = p;
+    roofP.y -= 0.8;
+    float roof = sdBox(roofP, vec3(1.2, 0.3, 1.2));
+    float d = opUnion(walls, roof);
+
+    // Step 3: Subtractive phase — carve windows
+    vec3 winP = abs(p);                  // Exploit symmetry
+    winP -= vec3(1.01, 0.3, 0.4);
+    float window = sdBox(winP, vec3(0.1, 0.15, 0.12));
+    d = opSubtraction(d, window);
+
+    // Step 4: Hollow out the interior
+    float hollow = sdBox(p, vec3(0.95, 0.75, 0.95));
+    d = opSubtraction(d, hollow);
+
+    return d;
+}
+```
+
+### Variant 5: Large-Scale Organic Character Modeling
+
+**Difference from the basic version**: Extensively uses smin/smax (100+ calls), with different k values for different body parts to control blend amounts. Large k (0.1~0.3) for torso connections, small k (0.01~0.05) for detail areas. Complex organic characters can use over 100 smooth operations to sculpt a complete form.
+
+```glsl
+float mapCharacter(vec3 p) {
+    // Torso — main ellipsoid
+    float body = sdEllipsoid(p, vec3(0.5, 0.4, 0.6));
+
+    // Head — large blend, natural transition to neck
+    float head = sdEllipsoid(p - vec3(0.0, 0.5, 0.5), vec3(0.25));
+    float d = smin(body, head, 0.2);               // Large k: wide blend band
+
+    // Ears — medium blend
+    float ear = sdEllipsoid(p - vec3(0.3, 0.6, 0.3), vec3(0.15, 0.2, 0.05));
+    d = smin(d, ear, 0.08);
+
+    // Nostrils — small blend for smooth subtraction
+    float nostril = sdSphere(p - vec3(0.0, 0.4, 0.7), 0.03);
+    d = smax(d, -nostril, 0.02);                   // Small k: fine carving
+
+    return d;
+}
+```
+
+## Performance Optimization in Detail
+
+### 1. Bounding Volume Acceleration
+
+The biggest performance bottleneck in CSG scenes is `mapScene()` being called too many times (MAX_STEPS per pixel per frame). Use AABB bounding boxes to skip distant sub-scenes:
+
+```glsl
+float mapScene(vec3 p) {
+    float d = MAX_DIST;
+    // Only compute complex sub-scene when inside bounding sphere
+    float bound = length(p - vec3(2.0, 0.0, 0.0)) - 1.5;
+    if (bound < d) {
+        d = min(d, complexSubScene(p));
+    }
+    return d;
+}
+```
+
+Using `intersectAABB` to pre-test rays against AABBs can skip regions that cannot be hit.
+
+### 2. Reducing SDF Sample Count
+
+- Use tetrahedral sampling for normal computation (4 calls) instead of central differences (6 calls)
+- Use `t += d * 0.9` to slightly reduce step size, preventing overshoot-induced penetration
+
+### 3. smin/smax Selection
+
+| Method | Performance | Accuracy | Recommended Use |
+|--------|-------------|----------|----------------|
+| Quadratic optimized | Fastest | Good | General first choice |
+| Polynomial clamp | Fast | Good | When a separate blend factor is needed |
+| Exponential | Slower | Best | Terrain, when extremely smooth transitions are needed |
+
+### 4. Avoiding k=0 with smin
+
+When `k` is zero, the quadratic optimized version causes a division-by-zero error. Always ensure `k > 0`, or fall back to hard boolean when k approaches zero:
+
+```glsl
+float safeSmin(float a, float b, float k) {
+    if (k < 0.0001) return min(a, b);
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+```
+
+### 5. Symmetry Exploitation
+
+For symmetric shapes, use `abs()` to fold coordinates and only define one side. Useful for symmetric windows, limbs, and other mirrored features:
+
+```glsl
+vec3 q = vec3(p.xy, abs(p.z)); // Mirror along Z axis
+```
+
+## Combination Suggestions in Detail
+
+### 1. CSG + Domain Repetition
+
+CSG shapes can be infinitely repeated in space via `mod()` or `fract()`, suitable for mechanical arrays, architectural railings, etc.:
+
+```glsl
+float mapRepeated(vec3 p) {
+    vec3 q = p;
+    q.x = mod(q.x + 1.0, 2.0) - 1.0; // Repeat every 2 units along X axis
+    return mapSinglePiston(q);
+}
+```
+
+### 2. CSG + Procedural Displacement
+
+Add noise displacement on top of SDF results to give smooth CSG shapes surface detail textures, adding a flowing or organic appearance:
+
+```glsl
+float mapWithDisplacement(vec3 p) {
+    float base = smin(body, limb, 0.1);
+    float noise = 0.02 * sin(10.0 * p.x) * sin(10.0 * p.y) * sin(10.0 * p.z);
+    return base + noise;
+}
+```
+
+### 3. CSG + Procedural Texturing
+
+Use smin's blend factor to blend not just geometry but also material IDs or colors, achieving cross-shape material gradients:
+
+```glsl
+vec2 mapWithMaterial(vec3 p) {
+    float d1 = sdSphere(p, 0.5);
+    float d2 = sdBox(p - vec3(0.3), vec3(0.3));
+    float blend;
+    float d = sminWithFactor(d1, d2, 0.1, blend);
+    float matId = mix(1.0, 2.0, blend); // Blend material ID
+    return vec2(d, matId);
+}
+```
+
+### 4. CSG + 2D SDF
+
+CSG is not limited to 3D. In 2D scenes, smooth union can similarly create organic shapes, like stylized cloud effects:
+
+```glsl
+float sdCloud2D(vec2 p) {
+    float d = sdBox(p, vec2(0.5, 0.1));
+    d = opSmoothUnion(d, length(p - vec2(-0.3, 0.1)) - 0.15, 0.1);
+    d = opSmoothUnion(d, length(p - vec2(0.1, 0.15)) - 0.12, 0.1);
+    d = opSmoothUnion(d, length(p - vec2(0.3, 0.08)) - 0.1, 0.1);
+    return d;
+}
+```
+
+### 5. CSG + Animation
+
+By binding CSG parameters (k values, primitive positions, primitive radii) to `iTime`, you can achieve dynamic shape deformation and blend animations:
+
+```glsl
+float mapAnimated(vec3 p) {
+    float k = 0.1 + 0.15 * sin(iTime);            // Dynamic blend radius
+    float r = 0.3 + 0.1 * sin(iTime * 2.0);       // Dynamic radius
+    float d1 = sdSphere(p, 0.5);
+    float d2 = sdSphere(p - vec3(0.8 * sin(iTime), 0.0, 0.0), r);
+    return smin(d1, d2, k);
+}
+```
--- a/skills/shader-dev/reference/domain-repetition.md
+++ b/skills/shader-dev/reference/domain-repetition.md
@@ -0,0 +1,436 @@
+# Domain Repetition and Spatial Folding — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- GLSL basic syntax, `vec2/vec3/mat2` operations
+- Behavior of built-in functions like `mod()`, `fract()`, `abs()`, `atan()`
+- Signed Distance Field (SDF) concept — a function returning the distance from a point to the nearest surface
+- Basic principles of Ray Marching
+- 2D rotation matrix `mat2(cos(a), sin(a), -sin(a), cos(a))`
+
+## Core Principles in Detail
+
+The essence of domain repetition is **coordinate transformation**: before computing the SDF, the point `p`'s coordinates are folded/mapped into a finite "fundamental domain," so that every point in infinite space maps to the same cell. The SDF function only needs to evaluate coordinates within this single cell, and the result automatically repeats across all of space.
+
+**Three fundamental operations:**
+
+| Operation | Formula | Effect |
+|-----------|---------|--------|
+| **mod repetition** | `p = mod(p + period/2, period) - period/2` | Infinite translational repetition along an axis |
+| **abs mirroring** | `p = abs(p)` | Mirror symmetry across an axis plane |
+| **Rotational folding** | `angle = mod(atan(p.y, p.x), TAU/N); p = rotate(p, -angle)` | N-fold rotational symmetry |
+
+**Key mathematics:**
+
+- `mod(x, c)` maps x to the `[0, c)` range, providing periodicity
+- `abs(x)` folds the negative half-space onto the positive half-space, providing reflective symmetry
+- `fract(x) = x - floor(x)` is equivalent to `mod(x, 1.0)`, providing normalized periodicity
+
+## Step-by-Step Details
+
+### Step 1: Basic Cartesian Domain Repetition (mod Repetition)
+
+**What**: Infinitely repeat 3D space along one or more axes via translation.
+
+**Why**: `mod(p, c) - c/2` constrains coordinates to the `[-c/2, c/2)` range, dividing space into an infinite number of cells of size `c`, where each cell has identical coordinates. The SDF only needs to be defined within a single cell.
+
+**Code**:
+```glsl
+// Standard 3D domain repetition (centered version)
+// period is the size of each cell
+vec3 domainRepeat(vec3 p, vec3 period) {
+    return mod(p + period * 0.5, period) - period * 0.5;
+}
+
+// Usage example: infinitely repeat a box
+float map(vec3 p) {
+    vec3 q = domainRepeat(p, vec3(4.0)); // Repeat every 4 units
+    return sdBox(q, vec3(0.5));          // One box per cell
+}
+```
+
+> This `pos = mod(pos-2., 4.) -2.;` is this exact pattern — period=4, offset=2, perfectly centered. `p1.x = mod(p1.x-5., 10.) - 5.;` follows the same logic (period=10, centered at origin).
+
+### Step 2: Symmetric Fold Repetition (abs-mod Hybrid)
+
+**What**: On top of mod repetition, use `abs()` to give each cell mirror symmetry, eliminating seams at cell boundaries.
+
+**Why**: Plain `mod` repetition has coordinate discontinuity at cell boundaries (jumping from `+c/2` to `-c/2`), which can cause visible seams. `abs(tile - mod(p, tile*2))` makes coordinates fold back and forth within each tile from 0 to tile to 0, ensuring continuity at boundaries (equivalent to a "triangle wave").
+
+**Code**:
+```glsl
+// Symmetric fold (triangle wave mapping)
+// tile is the half-period length, full period is tile*2
+vec3 symmetricFold(vec3 p, float tile) {
+    return abs(vec3(tile) - mod(p, vec3(tile * 2.0)));
+}
+
+// Usage: classic tiling fold
+vec3 p = from + s * dir * 0.5;
+p = abs(vec3(tile) - mod(p, vec3(tile * 2.0)));
+```
+
+> The core line `p = abs(vec3(tile)-mod(p,vec3(tile*2.)));` is this pattern. `tpos.xz=abs(.5-mod(tpos.xz,1.));` is the 2D version of the same pattern (tile=0.5, period=1).
+
+### Step 3: Angular Domain Repetition (Polar Coordinate Folding)
+
+**What**: Divide space into N equal rotational sectors around an axis, achieving a kaleidoscope effect.
+
+**Why**: After converting coordinates to polar form, applying `mod(angle, TAU/N)` folds the full 360 degrees into a single `TAU/N` sector. Rotating the coordinates back makes all sectors share the same SDF.
+
+**Code**:
+```glsl
+// Angular domain repetition
+// p: xz plane coordinates, count: repetition count
+// Returns rotated coordinates (folded into the first sector)
+vec2 pmod(vec2 p, float count) {
+    float angle = atan(p.x, p.y) + PI / count;
+    float sector = TAU / count;
+    angle = floor(angle / sector) * sector;
+    return p * rot(-angle);  // rot is a 2D rotation matrix
+}
+
+// Usage: 5-fold rotational symmetry
+vec3 p1 = p;
+p1.xy = pmod(p1.xy, 5.0); // 5-fold symmetry in the xy plane
+```
+
+> The `pmod()` function implements this pattern. An alternative `amod()` function follows the same idea but uses `inout` parameters to directly modify coordinates and returns the sector index (for coloring variants).
+
+### Step 4: fract Domain Folding (For Fractal Iteration)
+
+**What**: Use `fract()` in fractal iteration loops to repeatedly fold coordinates back into the `[0,1)` range, combined with scaling to achieve self-similar structures.
+
+**Why**: `-1.0 + 2.0*fract(0.5*p+0.5)` maps p to the `[-1, 1)` range (centered fract). Each iteration divides space into 8 sub-cells (in 3D), each recursively undergoing the same operation. Combined with the scaling factor `k = s/dot(p,p)` (spherical inversion), this produces fractal hierarchical structure.
+
+**Code**:
+```glsl
+// Core loop of an Apollonian fractal
+float map(vec3 p, float s) {
+    float scale = 1.0;
+    vec4 orb = vec4(1000.0); // Orbit trap for coloring
+
+    for (int i = 0; i < 8; i++) {
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5); // Centered fract fold
+
+        float r2 = dot(p, p);
+        orb = min(orb, vec4(abs(p), r2));  // Orbit capture
+
+        float k = s / r2;    // Spherical inversion scaling
+        p *= k;
+        scale *= k;
+    }
+
+    return 0.25 * abs(p.y) / scale; // Distance must be divided by accumulated scale
+}
+```
+
+> `-1.0 + 2.0*fract(0.5*p+0.5)` is equivalent to `mod(p+1, 2) - 1`, mapping p to [-1,1).
+
+### Step 5: Iterative abs Folding (IFS / Kali-set)
+
+**What**: Repeatedly execute `p = abs(p) - offset` inside a loop, combined with rotation and scaling, to generate fractal symmetric structures.
+
+**Why**: `abs(p)` folds space into the positive octant, `-offset` translates the origin, then `abs()` folds again... each iteration adds another layer of symmetry. This is one implementation of an Iterated Function System (IFS). Combined with rotation, it produces extremely rich fractal structures.
+
+**Code**:
+```glsl
+// IFS abs folding fractal
+float ifsBox(vec3 p) {
+    for (int i = 0; i < 5; i++) {
+        p = abs(p) - 1.0;        // Fold + offset
+        p.xy *= rot(iTime * 0.3); // Rotation adds complexity
+        p.xz *= rot(iTime * 0.1);
+    }
+    return sdBox(p, vec3(0.4, 0.8, 0.3));
+}
+
+// Kali-set variant: uses dot(p,p) scaling
+vec2 de(vec3 pos) {
+    vec3 tpos = pos;
+    tpos.xz = abs(0.5 - mod(tpos.xz, 1.0)); // mod repetition first, then IFS
+    vec4 p = vec4(tpos, 1.0);                // w component tracks scaling
+    for (int i = 0; i < 7; i++) {
+        p.xyz = abs(p.xyz) - vec3(-0.02, 1.98, -0.02);
+        p = p * (2.0) / clamp(dot(p.xyz, p.xyz), 0.4, 1.0)
+            - vec4(0.5, 1.0, 0.4, 0.0);
+        p.xz *= rot(0.416);  // Intra-iteration rotation
+    }
+    return vec2(length(max(abs(p.xyz)-vec3(0.1,5.0,0.1), 0.0)) / p.w, 0.0);
+}
+```
+
+> Note that the `de()` variant uses the `vec4`'s w component to accumulate the scaling factor (`p.w`), and the final distance is divided by `p.w` to maintain SDF validity.
+
+### Step 6: Reflection Folding (Polyhedral Symmetry)
+
+**What**: Fold space into the fundamental domain of a polyhedron (such as an icosahedron) through a set of reflection planes.
+
+**Why**: Regular polyhedra have multiple symmetry planes. Reflecting along each symmetry plane via `p = p - 2*dot(p,n)*n` folds all of space into a "fundamental domain" (1/60th of the entire polyhedron for an icosahedron). Geometry only needs to be defined within this fundamental domain.
+
+**Code**:
+```glsl
+// Plane reflection
+float pReflect(inout vec3 p, vec3 planeNormal, float offset) {
+    float t = dot(p, planeNormal) + offset;
+    if (t < 0.0) {
+        p = p - (2.0 * t) * planeNormal;
+    }
+    return sign(t);
+}
+
+// Icosahedral folding
+void pModIcosahedron(inout vec3 p) {
+    // nc is the third fold plane normal (the first two are the xz and yz planes)
+    vec3 nc = vec3(-0.5, -cos(PI/5.0), sqrt(0.75 - cos(PI/5.0)*cos(PI/5.0)));
+    p = abs(p);          // xz and yz plane reflections
+    pReflect(p, nc, 0.0);
+    p.xy = abs(p.xy);
+    pReflect(p, nc, 0.0);
+    p.xy = abs(p.xy);
+    pReflect(p, nc, 0.0);
+}
+```
+
+> Full icosahedral symmetry group is achieved through alternating `abs()` and `pReflect()`.
+
+### Step 7: Toroidal / Cylindrical Domain Warping (displaceLoop)
+
+**What**: Bend planar space into cylindrical or toroidal topology.
+
+**Why**: `displaceLoop` converts Cartesian coordinates `(x, z)` into `(distance_to_center - R, angle)`, "rolling" a plane into a cylinder/torus of radius R. The angular dimension can then undergo `amod` for angular repetition.
+
+**Code**:
+```glsl
+// Toroidal domain warp: bend the xz plane into a torus
+vec2 displaceLoop(vec2 p, float radius) {
+    return vec2(length(p) - radius, atan(p.y, p.x));
+}
+
+// Usage example: architectural ring corridor
+vec3 pDonut = p;
+pDonut.x += donutRadius;
+pDonut.xz = displaceLoop(pDonut.xz, donutRadius);
+pDonut.z *= donutRadius; // Unwrap angle to linear length
+// Now pDonut is "flattened" ring coordinates, ready for linear repetition
+```
+
+> The `displaceLoop` function bends an architectural scene into a ring structure.
+
+### Step 8: 1D Centered Domain Repetition (with Cell ID)
+
+**What**: Perform centered mod repetition along one axis and return the current cell number.
+
+**Why**: Cell IDs can be used to assign different random properties (color, size, rotation, etc.) to each cell's geometry, breaking the uniformity of perfect repetition.
+
+**Code**:
+```glsl
+// 1D centered domain repetition, returns cell index
+float pMod1(inout float p, float size) {
+    float halfsize = size * 0.5;
+    float c = floor((p + halfsize) / size); // Cell index
+    p = mod(p + halfsize, size) - halfsize; // Centered local coordinate
+    return c;
+}
+
+// Usage: repeat along x axis and get cell ID
+float cellID = pMod1(p.x, 2.0);
+float salt = fract(sin(cellID * 127.1) * 43758.5453); // Random seed
+```
+
+> This is a standard domain repetition library function. A simpler `repeat()` function follows the same pattern (version without returning the index).
+
+## Common Variants in Detail
+
+### 1. Volumetric Glow Rendering
+
+Unlike standard ray marching, this does not check for surface hits. Instead, it accumulates a "distance-to-brightness" contribution at each step.
+
+**Difference from the basic version**: No normal computation or traditional shading needed. Each step accumulates glow via `exp(-dist * k)`.
+
+**Key modified code**:
+```glsl
+// Replace hit detection in raymarch with glow accumulation
+float acc = 0.0;
+float t = 0.0;
+for (int i = 0; i < 99; i++) {
+    vec3 pos = ro + rd * t;
+    float dist = map(pos);
+    dist = max(abs(dist), 0.02);     // Prevent division by zero, abs allows passing through surfaces
+    acc += exp(-dist * 3.0);          // Adjustable: decay coefficient controls glow sharpness
+    t += dist * 0.5;                  // Adjustable: step scale (<1 means denser sampling)
+}
+vec3 col = vec3(acc * 0.01, acc * 0.011, acc * 0.012);
+```
+
+> This volumetric glow rendering strategy is commonly used in fractal domain repetition shaders.
+
+### 2. Single-Axis / Dual-Axis Selective Repetition
+
+Repeat along only certain axes while keeping others unchanged. Suitable for corridors, columns, and other directional scenes.
+
+**Difference from the basic version**: Does not use `vec3` full-axis repetition; only applies mod to the needed components.
+
+**Key modified code**:
+```glsl
+// Repeat only along x and z axes, y axis unrepeated
+float map(vec3 pos) {
+    vec3 q = pos;
+    q.xz = mod(q.xz + 2.0, 4.0) - 2.0; // Only xz repeated
+    // q.y retains original value, providing finite height
+    return sdBox(q, vec3(0.3, 0.5, 0.3));
+}
+```
+
+### 3. Fractal fract Domain Folding (Apollonian Type)
+
+Uses `fract()` instead of `mod()` for iterative folding, combined with scaling and orbit trapping to create fractals.
+
+**Difference from the basic version**: Repeatedly applies fract+scaling in a loop rather than a one-time mod; uses orbit trap coloring.
+
+**Key modified code**:
+```glsl
+float scale = 1.0;
+for (int i = 0; i < 8; i++) {
+    p = -1.0 + 2.0 * fract(0.5 * p + 0.5); // fract fold
+    float r2 = dot(p, p);
+    float k = 1.2 / r2;                      // Adjustable: scaling parameter
+    p *= k;
+    scale *= k;
+}
+return 0.25 * abs(p.y) / scale;
+```
+
+### 4. Multi-Level Nested Repetition
+
+Apply angular repetition within a sector, then linear repetition within each sector, or vice versa.
+
+**Difference from the basic version**: Domain repetition operations are nested across multiple levels, each providing a different spatial organization.
+
+**Key modified code**:
+```glsl
+// Outer level: angular repetition
+float indexX = amod(p.xz, segments); // Divide into N sectors
+p.x -= radius;
+// Inner level: linear repetition
+p.y = repeat(p.y, cellSize);         // Repeat along y axis
+// Random seed for each cell
+float salt = rng(vec2(indexX, floor(p.y / cellSize)));
+```
+
+> This kind of nesting is commonly used in architectural scene shaders.
+
+### 5. Bounded Domain Repetition (Finite Repetition)
+
+Use `clamp` to limit the mod cell index, achieving a finite number of repetitions.
+
+**Difference from the basic version**: Uses `clamp` to restrict the cell index to `[-N, N]`, repeating only `2N+1` times.
+
+**Key modified code**:
+```glsl
+// Finite domain repetition: repeat at most N times along each axis
+vec3 domainRepeatLimited(vec3 p, float size, vec3 limit) {
+    return p - size * clamp(floor(p / size + 0.5), -limit, limit);
+}
+
+// Usage: repeat 5 times along x, 3 times each along y and z
+vec3 q = domainRepeatLimited(p, 2.0, vec3(2.0, 1.0, 1.0));
+```
+
+## Performance Optimization Deep Dive
+
+### Bottleneck 1: High Iteration Count in Fractal Loops
+
+**Problem**: When IFS or fract folding loops iterate too many times, the `map()` function slows down, and `map()` is called at every step during ray marching.
+
+**Optimization**:
+- Reduce fractal iteration count (5-8 iterations are usually sufficient)
+- Use the `vec4`'s w component to track the scaling factor, avoiding extra scaling variables
+- Set upper and lower bounds in `clamp(dot(p,p), min, max)` to prevent numerical blowup
+
+### Bottleneck 2: mod Repetition Causing Inaccurate Distance Fields
+
+**Problem**: The SDF after domain repetition may be inaccurate at cell boundaries (geometry in adjacent cells may be closer), causing ray marching overshoot or extra steps.
+
+**Optimization**:
+- Ensure geometry fits entirely within the cell (radius < period/2)
+- Use a smaller step factor (`t += d * 0.5` instead of `t += d`)
+- For volumetric glow rendering, use `max(abs(d), minDist)` to prevent excessively small step sizes
+
+### Bottleneck 3: Compilation Time from Nested Repetition
+
+**Problem**: Multi-level nested domain repetition and fractal loops can cause very long shader compilation times.
+
+**Optimization**:
+- Pre-compute constant expressions in `map()`
+- Avoid `normalize()` inside loops (manually divide by length instead)
+- Use the loop version for normal computation instead of unrolled version to reduce compiler inlining
+
+### Bottleneck 4: Sampling Rate for Volumetric Glow Rendering
+
+**Problem**: Volumetric glow rendering requires dense sampling along the ray.
+
+**Optimization**:
+- Increase step size with distance: `t += dist * (0.3 + t * 0.02)`
+- Reduce sampling density for distant regions; the distance decay `exp(-totdist)` naturally hides precision loss
+- Use a `distfading` multiplier to gradually attenuate distant contributions (e.g., `fade *= distfading`)
+
+## Combination Suggestions with Complete Code
+
+### 1. Domain Repetition + Ray Marching
+
+**The most basic and most common combination.** Domain repetition defines the geometric spatial structure; ray marching handles rendering. This is the most fundamental combination in SDF rendering.
+
+### 2. Domain Repetition + Orbit Trap Coloring
+
+Record intermediate values during the fractal iteration loop (e.g., `min(orb, abs(p))`), used to color fractal structures. Avoids the high cost of normal computation + lighting on fractal surfaces.
+
+**Combination approach**:
+```glsl
+vec4 orb = vec4(1000.0);
+for (...) {
+    p = fold(p);
+    orb = min(orb, vec4(abs(p), dot(p,p)));
+}
+// Use orb values for color mapping
+vec3 color = mix(vec3(1,0.8,0.2), vec3(1,0.55,0), clamp(orb.y * 6.0, 0.0, 1.0));
+```
+
+### 3. Domain Repetition + Toroidal / Polar Coordinate Warping
+
+First use `displaceLoop` to bend space into a toroidal topology, then perform linear and angular repetition in the flattened coordinates. Suitable for creating ring corridors, donut buildings, etc.
+
+**Combination approach**:
+```glsl
+p.xz = displaceLoop(p.xz, R);  // Bend into ring
+p.z *= R;                       // Angle to length
+amod(p.xz, N);                  // Angular repetition
+p.y = repeat(p.y, cellSize);    // Linear repetition
+```
+
+### 4. Domain Repetition + Noise / Random Variants
+
+Generate pseudo-random numbers from cell IDs to inject variation into each repeated cell (size, rotation, color offset), breaking the uniformity.
+
+**Combination approach**:
+```glsl
+float cellID = pMod1(p.x, size);
+float salt = fract(sin(cellID * 127.1) * 43758.5453);
+// Use salt to modulate geometric parameters
+float boxSize = 0.3 + 0.2 * salt;
+```
+
+### 5. Domain Repetition + Polar Coordinate Spiral Transform
+
+Use `cartToPolar` / `polarToCart` coordinate transforms combined with `pMod1` for repetition along spiral paths. Suitable for DNA double helices, springs, threads, etc.
+
+**Combination approach**:
+```glsl
+p = cartToPolar(p);         // Convert to polar coordinates
+p.y *= radius;               // Unwrap angle to length
+// Repeat along spiral line
+vec2 closest = closestPointOnRepeatedLine(vec2(lead, radius*TAU), p.xy);
+p.xy -= closest;             // Local coordinates
+```
--- a/skills/shader-dev/reference/domain-warping.md
+++ b/skills/shader-dev/reference/domain-warping.md
@@ -0,0 +1,419 @@
+# Domain Warping — Detailed Reference
+
+This document contains the complete step-by-step tutorial, mathematical derivations, and advanced usage for domain warping techniques. See [SKILL.md](SKILL.md) for the condensed version.
+
+## Prerequisites
+
+- **GLSL Basics**: uniform variables, built-in functions (`mix`, `smoothstep`, `fract`, `floor`, `sin`, `dot`)
+- **Vector Math**: dot product, matrix multiplication, 2D rotation matrix
+- **Noise Function Concepts**: understanding the basic principle of value noise (lattice interpolation)
+- **fBM (Fractal Brownian Motion)**: superposition of multiple noise layers at different frequencies/amplitudes
+- **ShaderToy Environment**: meaning of `iTime`, `iResolution`, `fragCoord`
+
+## Implementation Steps in Detail
+
+### Step 1: Hash Function
+
+**What**: Implement a hash function that maps 2D integer coordinates to a pseudo-random float.
+
+**Why**: This is the foundation of noise functions — producing deterministic "random" values at each lattice point. The `sin-dot` trick compresses 2D input to 1D then takes the fractional part, using sin's high-frequency oscillation to produce a chaotic distribution.
+
+**Code**:
+```glsl
+float hash(vec2 p) {
+    p = fract(p * 0.6180339887); // Golden ratio pre-perturbation
+    p *= 25.0;
+    return fract(p.x * p.y * (p.x + p.y));
+}
+```
+
+> Note: The classic `fract(sin(dot(p, vec2(127.1, 311.7))) * 43758.5453)` version can also be used, but the sin-free version above is more stable in precision on some GPUs.
+
+### Step 2: Value Noise
+
+**What**: Implement 2D value noise — take hash values at integer lattice points and interpolate between them with Hermite smoothing.
+
+**Why**: Value noise is the simplest continuous noise, producing smooth, jump-free output suitable as the foundation for fBM. Hermite interpolation `f*f*(3.0-2.0*f)` ensures the derivative is zero at lattice points, avoiding the angular appearance of linear interpolation.
+
+**Code**:
+```glsl
+float noise(vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+    f = f * f * (3.0 - 2.0 * f); // Hermite smooth interpolation
+
+    return mix(
+        mix(hash(i + vec2(0.0, 0.0)), hash(i + vec2(1.0, 0.0)), f.x),
+        mix(hash(i + vec2(0.0, 1.0)), hash(i + vec2(1.0, 1.0)), f.x),
+        f.y
+    );
+}
+```
+
+### Step 3: fBM (Fractal Brownian Motion)
+
+**What**: Superpose multiple noise layers at different frequencies/amplitudes to create fractal noise with self-similar properties.
+
+**Why**: A single noise layer is too uniform. fBM superimposes multiple "octaves" to simulate nature's fractal structures. Each layer doubles in frequency (lacunarity ~ 2.0), halves in amplitude (persistence = 0.5), and uses a rotation matrix to break lattice alignment.
+
+**Code**:
+```glsl
+const mat2 mtx = mat2(0.80, 0.60, -0.60, 0.80); // Rotation ~36.87°, for decorrelation
+
+float fbm(vec2 p) {
+    float f = 0.0;
+    f += 0.500000 * noise(p); p = mtx * p * 2.02;
+    f += 0.250000 * noise(p); p = mtx * p * 2.03;
+    f += 0.125000 * noise(p); p = mtx * p * 2.01;
+    f += 0.062500 * noise(p); p = mtx * p * 2.04;
+    f += 0.031250 * noise(p); p = mtx * p * 2.01;
+    f += 0.015625 * noise(p);
+    return f / 0.96875; // Normalize: sum of all amplitudes
+}
+```
+
+> Using lacunarity values of 2.01~2.04 rather than exact 2.0 is to **avoid visual artifacts caused by lattice regularity**. This is a widely adopted trick in classic implementations.
+
+### Step 4: Domain Warping (Core)
+
+**What**: Use fBM output as a coordinate offset, recursively nesting to form multi-level warping.
+
+**Why**: This is the core of the entire technique. `fbm(p)` generates a scalar field; adding it to the coordinate `p` is equivalent to "pulling and stretching space according to the noise field's shape." Multi-level nesting makes the deformation more complex and organic — each warping level operates in space already deformed by the previous level.
+
+**Code**:
+```glsl
+float pattern(vec2 p) {
+    return fbm(p + fbm(p + fbm(p)));
+}
+```
+
+This single line is the classic three-level domain warping. It can be decomposed for understanding:
+
+```glsl
+float pattern(vec2 p) {
+    float warp1 = fbm(p);           // Level 1: noise in original space
+    float warp2 = fbm(p + warp1);   // Level 2: noise in first-level warped space
+    float result = fbm(p + warp2);  // Level 3: final value in second-level warped space
+    return result;
+}
+```
+
+### Step 5: Time Animation
+
+**What**: Inject `iTime` into specific fBM octaves so the warp field evolves over time.
+
+**Why**: Directly offsetting all octaves causes uniform translation, lacking organic feel. The classic approach is to inject time only in the lowest frequency (first layer) and highest frequency (last layer) — low frequency drives overall flow, high frequency adds detail variation.
+
+**Code**:
+```glsl
+float fbm(vec2 p) {
+    float f = 0.0;
+    f += 0.500000 * noise(p + iTime);  // Lowest frequency with time: slow overall flow
+    p = mtx * p * 2.02;
+    f += 0.250000 * noise(p); p = mtx * p * 2.03;
+    f += 0.125000 * noise(p); p = mtx * p * 2.01;
+    f += 0.062500 * noise(p); p = mtx * p * 2.04;
+    f += 0.031250 * noise(p); p = mtx * p * 2.01;
+    f += 0.015625 * noise(p + sin(iTime)); // Highest frequency with time: subtle detail motion
+    return f / 0.96875;
+}
+```
+
+### Step 6: Coloring
+
+**What**: Map the scalar output of the warp field to colors.
+
+**Why**: Domain warping outputs a scalar field (0~1 range) that needs to be mapped to visually meaningful colors. The classic method uses a `mix` chain — interpolating between multiple preset colors using the warp value.
+
+**Code**:
+```glsl
+vec3 palette(float t) {
+    vec3 col = vec3(0.2, 0.1, 0.4);                              // Deep purple base
+    col = mix(col, vec3(0.3, 0.05, 0.05), t);                    // Dark red
+    col = mix(col, vec3(0.9, 0.9, 0.9), t * t);                  // White at high values
+    col = mix(col, vec3(0.0, 0.2, 0.4), smoothstep(0.6, 0.8, t));// Blue highlights
+    return col * t * 2.0;                                         // Overall brightness modulation
+}
+```
+
+## Common Variants in Detail
+
+### Variant 1: Multi-Resolution Layered Warping
+
+**Difference from the basic version**: Uses different octave counts for different warping layers — coarse layers use 4 octaves (fast, low frequency), detail layers use 6 octaves (fine, high frequency). Outputs `vec2` for two-dimensional displacement rather than scalar offset. Intermediate variables participate in coloring, producing richer color gradients.
+
+**Key modified code**:
+```glsl
+// 4-octave fBM (coarse layer)
+float fbm4(vec2 p) {
+    float f = 0.0;
+    f += 0.5000 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.02;
+    f += 0.2500 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.03;
+    f += 0.1250 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.01;
+    f += 0.0625 * (-1.0 + 2.0 * noise(p));
+    return f / 0.9375;
+}
+
+// 6-octave fBM (fine layer)
+float fbm6(vec2 p) {
+    float f = 0.0;
+    f += 0.500000 * noise(p); p = mtx * p * 2.02;
+    f += 0.250000 * noise(p); p = mtx * p * 2.03;
+    f += 0.125000 * noise(p); p = mtx * p * 2.01;
+    f += 0.062500 * noise(p); p = mtx * p * 2.04;
+    f += 0.031250 * noise(p); p = mtx * p * 2.01;
+    f += 0.015625 * noise(p);
+    return f / 0.96875;
+}
+
+// vec2 output version (independent displacement per axis)
+vec2 fbm4_2(vec2 p) {
+    return vec2(fbm4(p + vec2(1.0)), fbm4(p + vec2(6.2)));
+}
+vec2 fbm6_2(vec2 p) {
+    return vec2(fbm6(p + vec2(9.2)), fbm6(p + vec2(5.7)));
+}
+
+// Layered warping chain
+float func(vec2 q, out vec2 o, out vec2 n) {
+    q += 0.05 * sin(vec2(0.11, 0.13) * iTime + length(q) * 4.0);
+    o = 0.5 + 0.5 * fbm4_2(q);           // Level 1: coarse displacement
+    o += 0.02 * sin(vec2(0.13, 0.11) * iTime * length(o));
+    n = fbm6_2(4.0 * o);                  // Level 2: fine displacement
+    vec2 p = q + 2.0 * n + 1.0;
+    float f = 0.5 + 0.5 * fbm4(2.0 * p); // Level 3: final scalar field
+    f = mix(f, f * f * f * 3.5, f * abs(n.x)); // Contrast enhancement
+    return f;
+}
+
+// Coloring uses intermediate variables o, n
+vec3 col = vec3(0.2, 0.1, 0.4);
+col = mix(col, vec3(0.3, 0.05, 0.05), f);
+col = mix(col, vec3(0.9, 0.9, 0.9), dot(n, n));         // n magnitude drives white
+col = mix(col, vec3(0.5, 0.2, 0.2), 0.5 * o.y * o.y);   // o.y drives brown
+col = mix(col, vec3(0.0, 0.2, 0.4), 0.5 * smoothstep(1.2, 1.3, abs(n.y) + abs(n.x)));
+col *= f * 2.0;
+```
+
+### Variant 2: Turbulence / Ridge Warping (Electric Arc / Plasma Effect)
+
+**Difference from the basic version**: Takes the absolute value of noise `abs(noise - 0.5)` inside fBM, producing sharp ridge textures instead of smooth waves. Dual-axis independent fBM displacement (separate x/y offsets) combined with reverse time drift creates turbulence.
+
+**Key modified code**:
+```glsl
+// Turbulence / ridged fBM
+float fbm_ridged(vec2 p) {
+    float z = 2.0;
+    float rz = 0.0;
+    for (float i = 1.0; i < 6.0; i++) {
+        rz += abs((noise(p) - 0.5) * 2.0) / z; // abs() produces ridge folding
+        z *= 2.0;
+        p *= 2.0;
+    }
+    return rz;
+}
+
+// Dual-axis independent displacement
+float dualfbm(vec2 p) {
+    vec2 p2 = p * 0.7;
+    // Opposite time drift in two directions creates turbulence
+    vec2 basis = vec2(
+        fbm_ridged(p2 - iTime * 0.24),  // x axis drifts left
+        fbm_ridged(p2 + iTime * 0.26)   // y axis drifts right
+    );
+    basis = (basis - 0.5) * 0.2;         // Scale to small displacement
+    p += basis;
+    return fbm_ridged(p * makem2(iTime * 0.03)); // Slow overall rotation
+}
+
+// Electric arc coloring (division creates high-contrast light/dark)
+vec3 col = vec3(0.2, 0.1, 0.4) / rz;
+```
+
+### Variant 3: Domain Warping with Pseudo-3D Lighting
+
+**Difference from the basic version**: Estimates screen-space normals from the warp field using finite differences, then applies directional lighting, giving the 2D warp field a 3D relief appearance. Combined with color inversion and square compression to produce a characteristic dark tone.
+
+**Key modified code**:
+```glsl
+// Screen-space normal estimation (finite differences)
+float e = 2.0 / iResolution.y; // Sample spacing = 1 pixel
+vec3 nor = normalize(vec3(
+    pattern(p + vec2(e, 0.0)) - shade,  // df/dx
+    2.0 * e,                             // Constant y (controls normal tilt)
+    pattern(p + vec2(0.0, e)) - shade    // df/dy
+));
+
+// Dual-component lighting
+vec3 lig = normalize(vec3(0.9, 0.2, -0.4));
+float dif = clamp(0.3 + 0.7 * dot(nor, lig), 0.0, 1.0);
+vec3 lin = vec3(0.70, 0.90, 0.95) * (nor.y * 0.5 + 0.5);  // Hemisphere ambient light
+lin += vec3(0.15, 0.10, 0.05) * dif;                         // Warm diffuse
+
+col *= 1.2 * lin;
+col = 1.0 - col;       // Color inversion
+col = 1.1 * col * col;  // Square compression, increases dark contrast
+```
+
+### Variant 4: Flow Field Iterative Warping (Gas Giant Planet Effect)
+
+**Difference from the basic version**: Instead of directly nesting fBM, computes the fBM gradient field and iteratively advances coordinates via Euler integration. Simulates fluid advection, producing vortex-like planetary atmospheric banding.
+
+**Key modified code**:
+```glsl
+#define ADVECT_ITERATIONS 5 // Adjustable: iteration count, more = more pronounced vortices
+
+// Compute fBM gradient (finite differences)
+vec2 field(vec2 p) {
+    float t = 0.2 * iTime;
+    p.x += t;
+    float n = fbm(p, t);
+    float e = 0.25;
+    float nx = fbm(p + vec2(e, 0.0), t);
+    float ny = fbm(p + vec2(0.0, e), t);
+    return vec2(n - ny, nx - n) / e; // 90° rotated gradient = streamline direction
+}
+
+// Iterative flow field advection
+vec3 distort(vec2 p) {
+    for (float i = 0.0; i < float(ADVECT_ITERATIONS); i++) {
+        p += field(p) / float(ADVECT_ITERATIONS);
+    }
+    return vec3(fbm(p, 0.0)); // Sample at the advected coordinates
+}
+```
+
+### Variant 5: 3D Volumetric Domain Warping (Explosion / Fireball Effect)
+
+**Difference from the basic version**: Extends domain warping from 2D to 3D, using 3D fBM to displace a sphere's distance field, then rendering via sphere tracing or volumetric ray marching. Produces volcanic eruptions, solar surface, and other volumetric effects.
+
+**Key modified code**:
+```glsl
+#define NOISE_FREQ 4.0     // Adjustable: noise frequency
+#define NOISE_AMP -0.5     // Adjustable: displacement amplitude (negative = inward bulging feel)
+
+// 3D rotation matrix (for decorrelation)
+mat3 m3 = mat3(0.00, 0.80, 0.60,
+              -0.80, 0.36,-0.48,
+              -0.60,-0.48, 0.64);
+
+// 3D value noise
+float noise3D(vec3 p) {
+    vec3 fl = floor(p);
+    vec3 fr = fract(p);
+    fr = fr * fr * (3.0 - 2.0 * fr);
+    float n = fl.x + fl.y * 157.0 + 113.0 * fl.z;
+    return mix(mix(mix(hash(n+0.0),   hash(n+1.0),   fr.x),
+                   mix(hash(n+157.0), hash(n+158.0), fr.x), fr.y),
+               mix(mix(hash(n+113.0), hash(n+114.0), fr.x),
+                   mix(hash(n+270.0), hash(n+271.0), fr.x), fr.y), fr.z);
+}
+
+// 3D fBM
+float fbm3D(vec3 p) {
+    float f = 0.0;
+    f += 0.5000 * noise3D(p); p = m3 * p * 2.02;
+    f += 0.2500 * noise3D(p); p = m3 * p * 2.03;
+    f += 0.1250 * noise3D(p); p = m3 * p * 2.01;
+    f += 0.0625 * noise3D(p); p = m3 * p * 2.02;
+    f += 0.03125 * abs(noise3D(p)); // Last layer uses abs for added detail
+    return f / 0.9375;
+}
+
+// Sphere distance field + domain warping displacement
+float distanceFunc(vec3 p, out float displace) {
+    float d = length(p) - 0.5; // Sphere SDF
+    displace = fbm3D(p * NOISE_FREQ + vec3(0, -1, 0) * iTime);
+    d += displace * NOISE_AMP;  // fBM displaces the surface
+    return d;
+}
+```
+
+## Performance Optimization Deep Dive
+
+### Bottleneck Analysis
+
+The main performance bottleneck of domain warping is **repeated noise sampling**. Three warping levels times 6 octaves = 18 noise samples per pixel, plus finite differences for lighting (2 additional full warping computations), totaling up to **54 noise samples/pixel**.
+
+### Optimization Techniques
+
+1. **Reduce octave count**: Using 4 octaves instead of 6 shows little visual difference but improves performance by ~33%
+   ```glsl
+   // Use 4 octaves for coarse layers, only 6 octaves for fine layers
+   ```
+
+2. **Reduce warping depth**: Two-level warping `fbm(p + fbm(p))` already produces organic results, saving ~33% performance over three levels
+
+3. **Use sin-product noise instead of value noise**: `sin(p.x)*sin(p.y)` is completely branch-free with no memory access, suitable for mobile
+   ```glsl
+   float noise(vec2 p) {
+       return sin(p.x) * sin(p.y); // Minimal version, no hash needed
+   }
+   ```
+
+4. **GPU built-in derivatives instead of finite differences**: Saves 2 extra full warping computations
+   ```glsl
+   // Use dFdx/dFdy instead of manual finite differences (slightly lower quality but 3x faster)
+   vec3 nor = normalize(vec3(dFdx(shade) * iResolution.x, 6.0, dFdy(shade) * iResolution.y));
+   ```
+
+5. **Texture noise**: Pre-bake noise textures and use `texture()` instead of procedural noise, converting computation to memory reads
+   ```glsl
+   float noise(vec2 x) {
+       return texture(iChannel0, x * 0.01).x;
+   }
+   ```
+
+6. **LOD adaptation**: Reduce octave count for distant pixels
+   ```glsl
+   int octaves = int(mix(float(NUM_OCTAVES), 2.0, length(uv) / 5.0));
+   ```
+
+7. **Supersampling strategy**: Only use 2x2 supersampling when anti-aliasing is needed (4x performance cost)
+   ```glsl
+   #if HW_PERFORMANCE == 0
+   #define AA 1
+   #else
+   #define AA 2
+   #endif
+   ```
+
+## Combination Suggestions with Complete Code Examples
+
+### Combining with Ray Marching
+The scalar field generated by domain warping can serve directly as an SDF displacement function, deforming smooth geometry into organic forms. Used for flames, explosions, alien creatures, etc.
+```glsl
+float sdf(vec3 p) {
+    return length(p) - 1.0 + fbm3D(p * 4.0) * 0.3;
+}
+```
+
+### Combining with Polar Coordinate Transform
+Perform domain warping in polar coordinate space to produce vortices, nebulae, spirals, and other effects.
+```glsl
+vec2 polar = vec2(length(uv), atan(uv.y, uv.x));
+float shade = pattern(polar);
+```
+
+### Combining with Cosine Color Palette
+The cosine palette `a + b*cos(2*pi*(c*t+d))` is more flexible than a fixed mix chain. By adjusting four vec3 parameters, you can quickly switch color schemes.
+```glsl
+vec3 palette(float t) {
+    vec3 a = vec3(0.5); vec3 b = vec3(0.5);
+    vec3 c = vec3(1.0); vec3 d = vec3(0.0, 0.33, 0.67);
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Combining with Post-Processing Effects
+- **Bloom/Glow**: Blur and overlay high-brightness areas to enhance glow effects
+- **Tone Mapping**: `col = col / (1.0 + col)` to compress HDR range
+- **Chromatic Aberration**: Sample the warp field at offset positions for R/G/B channels separately
+```glsl
+float r = pattern(uv + vec2(0.003, 0.0));
+float g = pattern(uv);
+float b = pattern(uv - vec2(0.003, 0.0));
+```
+
+### Combining with Particle Systems / Geometry
+The domain warping scalar field can drive particle velocity fields, mesh vertex displacement, or UV animation deformation — not limited to pure fragment shader usage.
--- a/skills/shader-dev/reference/fluid-simulation.md
+++ b/skills/shader-dev/reference/fluid-simulation.md
@@ -0,0 +1,425 @@
+# Fluid Simulation — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisite knowledge, step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+### GLSL Basics
+- `texture`/`texelFetch` sampling, `iChannel0` buffer feedback, multi-pass rendering
+- ShaderToy multi-buffer architecture: data flow between Buffer A/B/C/D
+
+### Vector Calculus Basics
+- Gradient: the spatial rate of change of a scalar field, pointing in the direction of greatest increase
+- Divergence: the "source/sink" strength of a vector field
+- Curl: the local rotational strength of a vector field
+- Laplacian: the second derivative of a scalar field, measuring deviation from the neighborhood mean
+
+### Data Encoding Paradigm
+Understanding the paradigm of "encoding physical quantities into texture RGBA channels":
+- `.xy` = velocity
+- `.z` = pressure / density
+- `.w` = passive scalar, e.g., ink concentration
+
+## Implementation Steps in Detail
+
+### Step 1: Data Encoding and Buffer Layout
+
+**What**: Encode fluid physical quantities into the RGBA channels of a texture.
+
+**Why**: GPU textures serve as the storage medium for fluid state. Each pixel is a grid cell, with channels storing different physical quantities, enabling full fluid state persistence.
+
+**Code**:
+```glsl
+// Data layout convention:
+// .xy = velocity field
+// .z  = pressure / density
+// .w  = passive scalar, e.g., ink concentration
+
+// Sampling macro — simplify neighborhood access
+#define T(p) texture(iChannel0, (p) / iResolution.xy)
+
+// Get current pixel and its four neighbors
+vec4 c = T(p);                    // center
+vec4 n = T(p + vec2(0, 1));       // north
+vec4 e = T(p + vec2(1, 0));       // east
+vec4 s = T(p - vec2(0, 1));       // south
+vec4 w = T(p - vec2(1, 0));       // west
+```
+
+### Step 2: Discrete Differential Operators
+
+**What**: Compute gradient, Laplacian, divergence, and curl over a 3x3 pixel neighborhood.
+
+**Why**: These operators are the foundation for discretizing the Navier-Stokes equations. A 3x3 stencil is more isotropic than a simple cross stencil, reducing grid-direction artifacts.
+
+**Code**:
+```glsl
+// ===== Laplacian =====
+// Weighted 3x3 stencil: center weight _K0, edge weight _K1, corner weight _K2
+const float _K0 = -20.0 / 6.0;  // adjustable: center weight
+const float _K1 =   4.0 / 6.0;  // adjustable: edge weight
+const float _K2 =   1.0 / 6.0;  // adjustable: corner weight
+
+vec4 laplacian = _K0 * c
+    + _K1 * (n + e + s + w)
+    + _K2 * (T(p+vec2(1,1)) + T(p+vec2(-1,1)) + T(p+vec2(1,-1)) + T(p+vec2(-1,-1)));
+
+// ===== Gradient =====
+// Central difference with diagonal correction
+vec4 dx = (e - w) / 2.0;
+vec4 dy = (n - s) / 2.0;
+
+// ===== Divergence =====
+float div = dx.x + dy.y;  // ∂vx/∂x + ∂vy/∂y
+
+// ===== Curl / Vorticity =====
+float curl = dx.y - dy.x;  // ∂vy/∂x - ∂vx/∂y
+```
+
+### Step 3: Initial Frame and Noise
+
+**What**: Initialize the fluid state and inject a small amount of noise to avoid symmetry lock.
+
+**Why**: If the initial state is entirely zero (zero velocity), the fluid equations will maintain this symmetric state and never move. Adding a small amount of random noise breaks the symmetry, allowing turbulence to develop naturally.
+
+**Code**:
+```glsl
+if (iFrame < 10) {
+    vec2 uv = p / iResolution.xy;
+    // Position-based pseudo-random noise
+    float noise = fract(sin(dot(uv, vec2(12.9898, 78.233))) * 43758.5453);
+    // velocity.xy = small noise, pressure.z = 1.0, ink.w = small amount
+    fragColor = vec4(noise * 1e-4, noise * 1e-4, 1.0, noise * 0.1);
+    return;
+}
+```
+
+### Step 4: Semi-Lagrangian Advection
+
+**What**: Trace backward along the velocity field and sample from the upstream position to update the current pixel.
+
+**Why**: This is the standard method for handling the `-(v·∇)v` advection term. Direct forward advection on an Eulerian grid leads to instability, while the semi-Lagrangian method is unconditionally stable — it won't blow up regardless of time step size.
+
+**Code**:
+```glsl
+#define DT 0.15  // adjustable: time step, larger = faster fluid motion but may reduce accuracy
+
+// Core: backward tracing — find the "upstream" position by tracing backward along velocity
+// Then sample from the upstream position, effectively "transporting" the upstream state here
+vec4 advected = T(p - DT * c.xy);
+
+// Only advect velocity and passive scalar (ink), preserve local pressure
+c.xyw = advected.xyw;
+```
+
+### Step 5: Viscous Diffusion
+
+**What**: Apply Laplacian diffusion to the velocity field to simulate viscosity.
+
+**Why**: Corresponds to the `ν∇²v` term. Viscosity smooths the velocity field, dissipating small-scale vortices. The parameter `ν` controls whether the fluid behaves like "water" (low viscosity) or "honey" (high viscosity).
+
+**Code**:
+```glsl
+#define NU 0.5     // adjustable: kinematic viscosity coefficient. 0.01=water, 1.0=syrup
+#define KAPPA 0.1  // adjustable: passive scalar (ink) diffusion coefficient
+
+c.xy  += DT * NU * laplacian.xy;     // velocity diffusion
+c.w   += DT * KAPPA * laplacian.w;   // ink diffusion
+```
+
+### Step 6: Pressure Projection
+
+**What**: Compute the gradient of the pressure field and subtract it from the velocity field to enforce the incompressibility constraint.
+
+**Why**: This is the core of Helmholtz-Hodge decomposition — decomposing the velocity field into a divergence-free part (what we want) and a curl-free part. By projecting out the divergence component via `v = v - K·∇p`, we ensure `∇·v ≈ 0`. In ShaderToy, the per-frame buffer feedback itself constitutes an implicit Jacobi iteration.
+
+**Code**:
+```glsl
+#define K 0.2  // adjustable: pressure correction strength. Too large causes oscillation, too small yields poor incompressibility
+
+// Pressure is stored in the .z channel
+// Use pressure gradient to correct velocity, eliminating divergence
+c.xy -= K * vec2(dx.z, dy.z);
+
+// Mass conservation: update density/pressure based on divergence (Euler method)
+c.z -= DT * (dx.z * c.x + dy.z * c.y + div * c.z);
+```
+
+### Step 7: External Forces and Mouse Interaction
+
+**What**: Inject velocity and ink into the fluid based on mouse input.
+
+**Why**: The external force term `f` is the entry point for user interaction. The typical approach is to apply a Gaussian-decaying velocity impulse and ink injection near the mouse position.
+
+**Code**:
+```glsl
+// Mouse interaction — drag to inject velocity and ink
+if (iMouse.z > 0.0) {
+    vec2 mousePos = iMouse.xy;
+    vec2 mouseDelta = iMouse.xy - iMouse.zw;  // drag direction
+
+    float dist = length(p - mousePos);
+    float influence = exp(-dist * dist / 50.0);  // adjustable: 50.0 controls influence radius
+
+    c.xy += DT * influence * mouseDelta;  // inject velocity
+    c.w  += DT * influence;                // inject ink
+}
+```
+
+### Step 8: Boundary Conditions and Numerical Stability
+
+**What**: Handle boundary pixels, clamp numerical ranges, and apply dissipation.
+
+**Why**: Without boundary conditions, the fluid "leaks" off-screen; without dissipation, fluid energy accumulates indefinitely, causing numerical explosion.
+
+**Code**:
+```glsl
+// Boundary condition: zero velocity at edge pixels (no-slip)
+if (p.x < 1.0 || p.y < 1.0 ||
+    iResolution.x - p.x < 1.0 || iResolution.y - p.y < 1.0) {
+    c.xyw *= 0.0;
+}
+
+// IMPORTANT: Ink decay: must use multiplicative decay; subtractive decay causes saturation in high-concentration areas and overly fast decay in low-concentration areas
+c.w *= 0.995;  // 0.5% decay per frame, adjustable [0.99=fast dissipation, 0.999=persistent]
+
+// Numerical clamping (prevent explosion)
+c = clamp(c, vec4(-5, -5, 0.5, 0), vec4(5, 5, 3, 5));
+```
+
+### Step 9: Visualization Rendering (Image Pass)
+
+**What**: Map physical quantities from the buffer to visible colors.
+
+**Why**: Raw physical data (velocity, pressure) needs artistic color mapping to produce visual effects. Common techniques include: mapping velocity direction to hue, pressure to brightness, and overlaying ink concentration.
+
+**Code**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec4 c = texture(iChannel0, uv);
+
+    // IMPORTANT: Color base must be bright enough! 0.5+0.5*cos produces bright colors in [0,1] range
+    // Never use extremely dark base colors like vec3(0.02, 0.01, 0.08) — multiplied by ink, they become invisible
+    vec3 col = 0.5 + 0.5 * cos(atan(c.y, c.x) + vec3(0.0, 2.1, 4.2));
+    // IMPORTANT: Use smoothstep instead of linear division to preserve gradient variation
+    float ink = smoothstep(0.0, 2.0, c.w);
+    col *= ink;
+
+    // IMPORTANT: Background color must be visible to the eye (RGB at least > 5/255 ≈ 0.02), otherwise users think the page is all black
+    col = max(col, vec3(0.02, 0.012, 0.035));
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Variant Details
+
+### Variant 1: Rotational Self-Advection
+
+**Difference from base version**: Instead of pressure projection, uses multi-scale rotational sampling to achieve natural divergence-free advection. Simpler computation, suitable for purely decorative fluid effects.
+
+**Core idea**: Compute local rotation (curl) at different scales, then use rotationally offset sampling positions for advection.
+
+**Key code**:
+```glsl
+#define RotNum 3           // adjustable: rotational sample count [3-7], more = more precise
+#define angRnd 1.0         // adjustable: rotational randomness [0-1]
+
+const float ang = 2.0 * 3.14159 / float(RotNum);
+mat2 m = mat2(cos(ang), sin(ang), -sin(ang), cos(ang));
+
+// Compute rotation amount at a given scale
+float getRot(vec2 uv, float sc) {
+    float ang2 = angRnd * randS(uv).x * ang;
+    vec2 p = vec2(cos(ang2), sin(ang2));
+    float rot = 0.0;
+    for (int i = 0; i < RotNum; i++) {
+        vec2 p2 = p * sc;
+        vec2 v = texture(iChannel0, fract(uv + p2)).xy - vec2(0.5);
+        rot += cross(vec3(v, 0.0), vec3(p2, 0.0)).z / dot(p2, p2);
+        p = m * p;
+    }
+    return rot / float(RotNum);
+}
+
+// Main loop: multi-scale advection accumulation
+vec2 v = vec2(0);
+float sc = 1.0 / max(iResolution.x, iResolution.y);
+for (int level = 0; level < 20; level++) {
+    if (sc > 0.7) break;
+    vec2 p = vec2(cos(ang2), sin(ang2));
+    for (int i = 0; i < RotNum; i++) {
+        vec2 p2 = p * sc;
+        float rot = getRot(uv + p2, sc);
+        v += p2.yx * rot * vec2(-1, 1);
+        p = m * p;
+    }
+    sc *= 2.0;  // next scale
+}
+fragColor = texture(iChannel0, fract(uv + v * 3.0 / iResolution.x));
+```
+
+### Variant 2: Vorticity Confinement
+
+**Difference from base version**: Adds vorticity confinement force on top of the base solver to prevent small vortices from dissipating too quickly due to numerical diffusion. Suitable for smoke, fire, and other scenes that need rich detail.
+
+**Core idea**: Compute the gradient direction of the vorticity field (the direction where vorticity concentrates), then apply a restoring force along that direction.
+
+**Key code**:
+```glsl
+#define VORT_STRENGTH 0.01  // adjustable: vorticity confinement strength [0.001 - 0.1]
+
+// Compute gradient of vorticity magnitude (points toward increasing vorticity)
+float curl_c = curl_at(uv);                    // current vorticity
+float curl_n = abs(curl_at(uv + vec2(0, texel.y)));
+float curl_s = abs(curl_at(uv - vec2(0, texel.y)));
+float curl_e = abs(curl_at(uv + vec2(texel.x, 0)));
+float curl_w = abs(curl_at(uv - vec2(texel.x, 0)));
+
+vec2 eta = normalize(vec2(curl_e - curl_w, curl_n - curl_s) + 1e-5);
+
+// Vorticity confinement force = ε * (η × ω)
+vec2 conf = VORT_STRENGTH * vec2(eta.y, -eta.x) * curl_c;
+c.xy += DT * conf;
+```
+
+### Variant 3: Viscous Fingering / Reaction-Diffusion Style
+
+**Difference from base version**: No advection; instead uses rotation-driven self-amplification and Laplacian diffusion to produce organic patterns resembling reaction-diffusion. Suitable for abstract art generation.
+
+**Core idea**: Compute a rotation angle from curl, apply 2D rotation to velocity components, and combine with Laplacian diffusion and divergence feedback.
+
+**Key code**:
+```glsl
+const float cs = 0.25;   // adjustable: curl → rotation angle scaling
+const float ls = 0.24;   // adjustable: Laplacian diffusion strength
+const float ps = -0.06;  // adjustable: divergence-pressure feedback strength
+const float amp = 1.0;   // adjustable: self-amplification coefficient (>1 enhances patterns)
+const float pwr = 0.2;   // adjustable: curl exponent (controls rotation sensitivity)
+
+// Compute rotation angle from curl
+float sc = cs * sign(curl) * pow(abs(curl), pwr);
+
+// Temporary velocity (with diffusion and divergence feedback)
+float ta = amp * uv.x + ls * lapl.x + norm.x * sp + uv.x * sd;
+float tb = amp * uv.y + ls * lapl.y + norm.y * sp + uv.y * sd;
+
+// Rotate velocity components
+float a = ta * cos(sc) - tb * sin(sc);
+float b = ta * sin(sc) + tb * cos(sc);
+
+fragColor = clamp(vec4(a, b, div, 1), -1.0, 1.0);
+```
+
+### Variant 4: Gaussian Kernel SPH Particle Fluid
+
+**Difference from base version**: Completely abandons grid advection, instead using Gaussian kernel functions to estimate density and velocity at each grid point. Minimal (about 20 lines of core code), suitable for rapid prototyping and teaching.
+
+**Core idea**: For all pixels in the neighborhood, perform mass-weighted velocity blending using Gaussian weights based on velocity + displacement. This is essentially a grid-based approximation of SPH.
+
+**Key code**:
+```glsl
+#define RADIUS 7    // adjustable: search radius [3-10], larger = slower but smoother
+
+vec4 r = vec4(0);
+for (vec2 i = vec2(-RADIUS); ++i.x < float(RADIUS);)
+    for (i.y = -float(RADIUS); ++i.y < float(RADIUS);) {
+        vec2 v = texelFetch(iChannel0, ivec2(i + fragCoord), 0).xy;  // neighbor velocity
+        float mass = texelFetch(iChannel0, ivec2(i + fragCoord), 0).z; // neighbor mass
+        float w = exp(-dot(v + i, v + i)) / 3.14;  // Gaussian kernel weight
+        r += mass * w * vec4(mix(v + v + i, v, mass), 1, 1);
+    }
+r.xy /= r.z + 1e-6;  // mass-weighted average velocity
+```
+
+### Variant 5: Lagrangian Vortex Particle Method
+
+**Difference from base version**: Instead of solving on a grid, tracks discrete vortex particles with their positions and vorticities. Uses the Biot-Savart law to compute the velocity field directly from the vorticity distribution. Suitable for precise simulation of a small number of vortices.
+
+**Core idea**: Each particle carries a position and vorticity. Induced velocity is computed through N-body summation. Uses Heun (semi-implicit) time integration for improved accuracy.
+
+**Key code**:
+```glsl
+#define N 20                     // adjustable: N×N particles
+#define STRENGTH 1e3 * 0.25      // adjustable: vorticity strength scaling
+
+// Biot-Savart velocity computation (similar to 2D vortex 1/r decay)
+vec2 F = vec2(0);
+for (int j = 0; j < N; j++)
+    for (int i = 0; i < N; i++) {
+        float w = vorticity(i, j);          // particle vorticity
+        vec2 d = particle_pos(i, j) - my_pos;
+        float l = dot(d, d);
+        if (l > 1e-5)
+            F += vec2(-d.y, d.x) * w / l;  // Biot-Savart: v = ω × r / |r|²
+    }
+velocity = STRENGTH * F;
+position += velocity * dt;
+```
+
+## Performance Optimization Details
+
+### Bottleneck 1: Neighborhood Sample Count
+- The basic 5-point stencil (cross) is fastest but has poor isotropy
+- A 3x3 stencil (9 samples) is the best balance between accuracy and performance
+- The `N×N` search radius in the SPH variant is extremely expensive; anything above 7 becomes slow
+- **Optimization**: Use `texelFetch` instead of `texture` (skips filtering), or use `textureLod` to lock the mip level
+
+### Bottleneck 2: Multi-Pass Overhead
+- Classic solvers need 2-4 buffer passes (velocity, pressure, vorticity, visualization)
+- **Optimization**: Merge multiple steps into a single pass. Pressure projection can leverage inter-frame feedback as implicit Jacobi iteration, eliminating the need for dedicated iteration passes
+- For decorative effects that don't require strict incompressibility, rotational self-advection (Variant 1) can completely eliminate pressure projection
+
+### Bottleneck 3: Advection Accuracy vs. Performance
+- Single-step advection loses detail in high-velocity regions
+- **Optimization**: Multi-step advection (`ADVECTION_STEPS = 3`) uses 3 small steps instead of 1 large step, at the cost of 3x the sampling
+- Compromise: pre-compute offsets then uniformly subdivide sampling (avoid recalculating offsets at each step)
+
+### Bottleneck 4: Mipmap as Alternative to Multi-Scale Traversal
+- Multi-scale fluid requires computation at different spatial scales. The brute-force approach is multiple large-radius samples
+- **Optimization**: Leverage GPU-generated mipmaps for O(1) multi-scale reads, using `textureLod(channel, uv, mip)` to directly read at different scales
+
+### General Tips
+- Add tiny noise on the initial frame (`1e-6 * noise`) to avoid symmetry lock caused by numerical precision issues
+- Use `fract(uv + offset)` for periodic boundaries (torus topology), eliminating boundary check branches
+- Multiply the pressure field by a near-1 decay factor (e.g., `0.9999`) to prevent pressure drift
+
+## Combination Suggestions
+
+### 1. Fluid + Normal Map Lighting
+Treat the fluid velocity/density field as a height map, compute normals, and apply Phong/GGX lighting to produce a liquid metal visual effect.
+```glsl
+// Compute normals from the density field
+vec2 dxy = vec2(
+    texture(buf, uv + vec2(tx, 0)).z - texture(buf, uv - vec2(tx, 0)).z,
+    texture(buf, uv + vec2(0, ty)).z - texture(buf, uv - vec2(0, ty)).z
+);
+vec3 normal = normalize(vec3(-BUMP * dxy, 1.0));
+// Then plug into Phong/GGX lighting calculation
+```
+
+### 2. Fluid + Particle Tracing
+Scatter passive particles in the fluid velocity field, updating particle positions each frame according to the flow velocity. Suitable for visualizing streamlines and creating ink diffusion effects.
+```glsl
+// Particle position update (in a separate buffer)
+vec2 pos = texture(particleBuf, id).xy;
+vec2 vel = texture(fluidBuf, pos / iResolution.xy).xy;
+pos += vel * dt;
+pos = mod(pos, iResolution.xy);  // periodic boundary
+```
+
+### 3. Fluid + Color Advection
+Store RGB colors in extra channels or buffers and perform semi-Lagrangian advection synchronized with the velocity field, producing colorful ink mixing effects.
+
+### 4. Fluid + Audio Response
+Map audio spectrum low-frequency energy to force intensity and high frequencies to vorticity injection, creating music-driven fluid visualization.
+```glsl
+float bass = texture(iChannel1, vec2(0.05, 0.0)).x;   // low frequency
+float treble = texture(iChannel1, vec2(0.8, 0.0)).x;   // high frequency
+// Low frequency → thrust, high frequency → vortex disturbance
+c.xy += bass * radialForce + treble * randomVortex;
+```
+
+### 5. Fluid + 3D Volume Rendering
+Extend 2D fluid to 3D (using 2D texture slice packing to store 3D voxels) and render semi-transparent volumes via ray marching. Suitable for clouds and explosion effects.
--- a/skills/shader-dev/reference/fractal-rendering.md
+++ b/skills/shader-dev/reference/fractal-rendering.md
@@ -0,0 +1,525 @@
+# Fractal Rendering — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisites, step-by-step explanations, mathematical derivations, variant descriptions, in-depth performance analysis, and complete combination example code.
+
+## Prerequisites
+
+- **GLSL Basics**: uniform, varying, built-in functions (`dot`, `length`, `normalize`, `abs`, `fract`)
+- **Complex Number Arithmetic**: representing complex numbers as `vec2`, multiplication `(a+bi)(c+di) = (ac-bd, ad+bc)`
+- **Vector Math**: dot product, cross product, matrix transforms
+- **Ray Marching Basics** (required for 3D fractals): stepping along a ray, using distance fields for collision detection
+- **Coordinate Normalization**: mapping pixel coordinates to the `[-1, 1]` range
+
+## Core Principles in Detail
+
+The essence of fractal rendering is **visualization of iterative systems**. Core algorithm patterns fall into three categories:
+
+### 1. Escape-Time Algorithm
+
+For each point `c` on the complex plane, repeatedly iterate `Z <- Z^2 + c`, counting the number of steps needed for Z to escape (`|Z| > R`). More steps means closer to the fractal boundary.
+
+**Distance Estimation** computes the precise distance from a point to the fractal by simultaneously tracking the derivative `Z'`:
+```
+Z  <- Z^2 + c       (value iteration)
+Z' <- 2*Z*Z' + 1    (derivative iteration)
+d(c) = |Z|*log|Z| / |Z'|  (Hubbard-Douady potential function)
+```
+Distance estimation produces smoother coloring than pure escape-time step counting, and is a prerequisite for ray marching in 3D fractals.
+
+### 2. Iterated Function System (IFS)
+
+Apply a set of transforms (folding `abs()`, scaling `Scale`, offset `Offset`) to points in space, iterating repeatedly to produce self-similar structures. Core steps of KIFS (Kaleidoscopic IFS) commonly used in 3D:
+```
+p = abs(p)                          // Fold (symmetrize)
+sort p.xyz descending               // Sort (select symmetry axis)
+p = Scale * p - Offset * (Scale-1)  // Scale and offset
+```
+
+### 3. Spherical Inversion Fractal
+
+Apollonian-type fractals use `fract()` for space folding + spherical inversion `p *= s/dot(p,p)`:
+```
+p = -1.0 + 2.0 * fract(0.5*p + 0.5)   // Fold space to [-1,1]
+r^2 = dot(p, p)
+k = s / r^2                             // Inversion factor
+p *= k; scale *= k                       // Spherical inversion
+```
+
+All 3D fractals are rendered using **Sphere Tracing (Ray Marching)**: stepping along the view ray by the distance field value at each step, until close enough to the surface.
+
+## Implementation Steps in Detail
+
+### Step 1: Coordinate Normalization
+
+**What**: Map pixel coordinates to standard coordinates centered on the screen with aspect ratio correction.
+
+**Why**: All fractal calculations must be performed in mathematical space, independent of pixel resolution.
+
+```glsl
+vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+// p now has y range [-1,1], x scaled by aspect ratio
+```
+
+### Step 2: 2D Fractal — Mandelbrot Escape-Time Iteration
+
+**What**: For each pixel point as complex number `c`, iterate `Z <- Z^2 + c` while tracking the derivative.
+
+**Why**: Escape time produces fractal structure; derivative tracking enables distance estimation coloring.
+
+```glsl
+float distanceToMandelbrot(in vec2 c) {
+    vec2 z  = vec2(0.0);
+    vec2 dz = vec2(0.0);  // Derivative
+    float m2 = 0.0;
+
+    for (int i = 0; i < MAX_ITER; i++) {
+        if (m2 > BAILOUT * BAILOUT) break;
+
+        // Z' -> 2*Z*Z' + 1 (complex derivative chain rule)
+        dz = 2.0 * vec2(z.x*dz.x - z.y*dz.y,
+                         z.x*dz.y + z.y*dz.x) + vec2(1.0, 0.0);
+
+        // Z -> Z^2 + c (complex squaring)
+        z = vec2(z.x*z.x - z.y*z.y, 2.0*z.x*z.y) + c;
+
+        m2 = dot(z, z);
+    }
+
+    // Distance estimation: d(c) = |Z|*log|Z| / |Z'|
+    return 0.5 * sqrt(dot(z,z) / dot(dz,dz)) * log(dot(z,z));
+}
+```
+
+### Step 3: 3D Fractal — Distance Field Function (Mandelbulb Example)
+
+**What**: Implement the Mandelbulb power-N iteration using spherical coordinates, returning a distance estimate.
+
+**Why**: 3D fractals cannot be directly colored via escape-time on pixels; they require distance fields for ray marching.
+
+```glsl
+float mandelbulb(vec3 p) {
+    vec3 z = p;
+    float dr = 1.0;  // Derivative (distance scaling factor)
+    float r;
+
+    for (int i = 0; i < FRACTAL_ITER; i++) {
+        r = length(z);
+        if (r > BAILOUT) break;
+
+        // Convert to spherical coordinates
+        float theta = atan(z.y, z.x);
+        float phi   = asin(z.z / r);
+
+        // Derivative: dr -> power * r^(power-1) * dr + 1
+        dr = pow(r, POWER - 1.0) * dr * POWER + 1.0;
+
+        // z -> z^power + p (spherical coordinate exponentiation)
+        r = pow(r, POWER);
+        theta *= POWER;
+        phi *= POWER;
+        z = r * vec3(cos(theta)*cos(phi),
+                      sin(theta)*cos(phi),
+                      sin(phi)) + p;
+    }
+
+    // Distance estimation
+    return 0.5 * log(r) * r / dr;
+}
+```
+
+### Step 4: 3D Fractal — IFS Distance Field (Menger Sponge Example)
+
+**What**: Construct a KIFS fractal distance field through fold-sort-scale-offset iteration.
+
+**Why**: IFS fractals produce self-similar structures through spatial transforms rather than numerical iteration; distance is tracked via `Scale^(-n)` scaling.
+
+```glsl
+float mengerDE(vec3 z) {
+    z = abs(1.0 - mod(z, 2.0));  // Infinite tiling
+    float d = 1000.0;
+
+    for (int n = 0; n < IFS_ITER; n++) {
+        z = abs(z);                              // Fold
+        if (z.x < z.y) z.xy = z.yx;             // Sort
+        if (z.x < z.z) z.xz = z.zx;
+        if (z.y < z.z) z.yz = z.zy;
+        z = SCALE * z - OFFSET * (SCALE - 1.0); // Scale + offset
+        if (z.z < -0.5 * OFFSET.z * (SCALE - 1.0))
+            z.z += OFFSET.z * (SCALE - 1.0);
+        d = min(d, length(z) * pow(SCALE, float(-n) - 1.0));
+    }
+
+    return d - 0.001;
+}
+```
+
+### Step 5: 3D Fractal — Spherical Inversion Distance Field (Apollonian Type)
+
+**What**: Construct an Apollonian fractal using fract folding + spherical inversion iteration, while recording orbit traps.
+
+**Why**: Spherical inversion `p *= s/dot(p,p)` produces sphere packing structures; orbit traps provide color and AO information.
+
+```glsl
+vec4 orb;  // Global orbit trap
+
+float apollonianDE(vec3 p, float s) {
+    float scale = 1.0;
+    orb = vec4(1000.0);
+
+    for (int i = 0; i < INVERSION_ITER; i++) {
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5);  // Fold space to [-1,1]
+        float r2 = dot(p, p);
+        orb = min(orb, vec4(abs(p), r2));          // Record orbit trap
+        float k = s / r2;                          // Inversion factor
+        p *= k;
+        scale *= k;
+    }
+
+    return 0.25 * abs(p.y) / scale;
+}
+```
+
+### Step 6: Ray Marching (Sphere Tracing)
+
+**What**: Step along the ray direction, advancing by the distance field value at each step, until hitting the surface.
+
+**Why**: The distance field guarantees safe stepping (won't pass through the surface), and is the standard method for rendering implicit 3D fractals.
+
+```glsl
+float rayMarch(vec3 ro, vec3 rd) {
+    float t = 0.01;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        float precis = PRECISION * t;  // Relax precision with distance
+        float h = map(ro + rd * t);
+        if (h < precis || t > MAX_DIST) break;
+        t += h * FUDGE_FACTOR;         // fudge < 1.0 improves safety
+    }
+    return (t > MAX_DIST) ? -1.0 : t;
+}
+```
+
+### Step 7: Normal Calculation (Finite Differences)
+
+**What**: Sample the distance field gradient around the hit point as the surface normal.
+
+**Why**: Implicit surfaces have no analytical normals and require numerical approximation. Tetrahedral sampling (4-tap) saves 1/3 of the cost compared to central differences (6-tap).
+
+```glsl
+// 6-tap central difference method (more intuitive)
+vec3 calcNormal_6tap(vec3 pos) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(pos + e.xyy) - map(pos - e.xyy),
+        map(pos + e.yxy) - map(pos - e.yxy),
+        map(pos + e.yyx) - map(pos - e.yyx)));
+}
+
+// 4-tap tetrahedral method (more efficient, recommended)
+vec3 calcNormal_4tap(vec3 pos, float t) {
+    float precis = 0.001 * t;
+    vec2 e = vec2(1.0, -1.0) * precis;
+    return normalize(
+        e.xyy * map(pos + e.xyy) +
+        e.yyx * map(pos + e.yyx) +
+        e.yxy * map(pos + e.yxy) +
+        e.xxx * map(pos + e.xxx));
+}
+```
+
+### Step 8: Shading and Lighting
+
+**What**: Compute Lambertian diffuse + ambient + AO for hit surfaces.
+
+**Why**: Lighting gives 3D fractals depth and material quality. Orbit trap values (`orb`) can serve both as color mapping and as simple AO.
+
+```glsl
+vec3 shade(vec3 pos, vec3 nor, vec3 rd, vec4 trap) {
+    vec3 light1 = normalize(LIGHT_DIR);
+    float diff = clamp(dot(light1, nor), 0.0, 1.0);
+    float amb  = 0.7 + 0.3 * nor.y;
+    float ao   = pow(clamp(trap.w * 2.0, 0.0, 1.0), 1.2); // Orbit trap AO
+
+    vec3 brdf = vec3(0.4) * amb * ao      // Ambient
+              + vec3(1.0) * diff * ao;     // Diffuse
+
+    // Map material color from orbit trap
+    vec3 rgb = vec3(1.0);
+    rgb = mix(rgb, vec3(1.0, 0.8, 0.2), clamp(6.0*trap.y, 0.0, 1.0));
+    rgb = mix(rgb, vec3(1.0, 0.55, 0.0), pow(clamp(1.0-2.0*trap.z, 0.0, 1.0), 8.0));
+
+    return rgb * brdf;
+}
+```
+
+### Step 9: Camera Setup
+
+**What**: Build a look-at camera matrix, converting pixel coordinates to 3D ray directions.
+
+**Why**: All 3D fractal ray marching requires a unified camera framework to generate rays.
+
+```glsl
+void setupCamera(vec2 uv, vec3 ro, vec3 ta, float cr,
+                 out vec3 rd) {
+    vec3 cw = normalize(ta - ro);                   // forward
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);          // roll
+    vec3 cu = normalize(cross(cw, cp));              // right
+    vec3 cv = normalize(cross(cu, cw));              // up
+    rd = normalize(uv.x * cu + uv.y * cv + 2.0 * cw); // FOV ~ 2.0
+}
+```
+
+## Common Variants in Detail
+
+### 1. 2D Mandelbrot (Distance Estimation Coloring)
+
+Difference from base version (3D Apollonian): pure 2D computation, no ray marching needed, uses complex iteration + distance coloring.
+
+```glsl
+// Replace entire mainImage
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 p = (2.0*fragCoord - iResolution.xy) / iResolution.y;
+
+    // Animated zoom
+    float tz = 0.5 - 0.5*cos(0.225*iTime);
+    float zoo = pow(0.5, 13.0*tz);
+    vec2 c = vec2(-0.05, 0.6805) + p * zoo; // Tunable: zoom center point
+
+    // Iteration
+    vec2 z = vec2(0.0), dz = vec2(0.0);
+    for (int i = 0; i < 300; i++) { // Tunable: iteration count
+        if (dot(z,z) > 1024.0) break;
+        dz = 2.0*vec2(z.x*dz.x-z.y*dz.y, z.x*dz.y+z.y*dz.x) + vec2(1.0,0.0);
+        z  = vec2(z.x*z.x-z.y*z.y, 2.0*z.x*z.y) + c;
+    }
+
+    float d = 0.5*sqrt(dot(z,z)/dot(dz,dz))*log(dot(z,z));
+    d = clamp(pow(4.0*d/zoo, 0.2), 0.0, 1.0); // Tunable: 0.2 controls contrast
+    fragColor = vec4(vec3(d), 1.0);
+}
+```
+
+### 2. Mandelbulb Power-N (3D Spherical Coordinate Fractal)
+
+Difference from base version: uses spherical coordinate trigonometric functions instead of spherical inversion, with a tunable `POWER` parameter controlling the fractal shape.
+
+```glsl
+#define POWER 8.0   // Tunable: 2-16, higher = more complex structure
+#define FRACTAL_ITER 4  // Tunable: 2-8, more = more detail
+
+float mandelbulbDE(vec3 p) {
+    vec3 z = p;
+    float dr = 1.0;
+    float r;
+    for (int i = 0; i < FRACTAL_ITER; i++) {
+        r = length(z);
+        if (r > 2.0) break;
+        float theta = atan(z.y, z.x);
+        float phi   = asin(z.z / r);
+        dr = pow(r, POWER - 1.0) * dr * POWER + 1.0;
+        r = pow(r, POWER);
+        theta *= POWER;
+        phi   *= POWER;
+        z = r * vec3(cos(theta)*cos(phi), sin(theta)*cos(phi), sin(phi)) + p;
+    }
+    return 0.5 * log(r) * r / dr;
+}
+```
+
+### 3. Menger Sponge (KIFS Folding Type)
+
+Difference from base version: uses abs() folding + conditional sorting instead of spherical inversion, producing regular geometric fractals.
+
+```glsl
+#define SCALE 3.0                           // Tunable: scaling factor, 2.0-4.0
+#define OFFSET vec3(0.92858,0.92858,0.32858) // Tunable: offset vector, changes shape
+#define IFS_ITER 7                          // Tunable: iteration count
+
+float mengerDE(vec3 z) {
+    z = abs(1.0 - mod(z, 2.0));  // Infinite tiling
+    float d = 1000.0;
+    for (int n = 0; n < IFS_ITER; n++) {
+        z = abs(z);
+        if (z.x < z.y) z.xy = z.yx;    // Conditional sorting
+        if (z.x < z.z) z.xz = z.zx;
+        if (z.y < z.z) z.yz = z.zy;
+        z = SCALE * z - OFFSET * (SCALE - 1.0);
+        if (z.z < -0.5*OFFSET.z*(SCALE-1.0))
+            z.z += OFFSET.z*(SCALE-1.0);
+        d = min(d, length(z) * pow(SCALE, float(-n)-1.0));
+    }
+    return d - 0.001;
+}
+```
+
+### 4. Quaternion Julia Set
+
+Difference from base version: uses quaternion algebra `Z <- Z^2 + c` (4D), Julia sets use a fixed `c` parameter instead of per-point `c`, visualized by taking 3D cross-sections.
+
+```glsl
+// Quaternion squaring
+vec4 qsqr(vec4 a) {
+    return vec4(a.x*a.x - a.y*a.y - a.z*a.z - a.w*a.w,
+                2.0*a.x*a.y, 2.0*a.x*a.z, 2.0*a.x*a.w);
+}
+
+float juliaDE(vec3 p, vec4 c) {
+    vec4 z = vec4(p, 0.0);
+    float md2 = 1.0;
+    float mz2 = dot(z, z);
+
+    for (int i = 0; i < 11; i++) { // Tunable: iteration count
+        md2 *= 4.0 * mz2;         // |dz| -> 2*|z|*|dz|
+        z = qsqr(z) + c;          // z -> z^2 + c
+        mz2 = dot(z, z);
+        if (mz2 > 4.0) break;
+    }
+
+    return 0.25 * sqrt(mz2 / md2) * log(mz2);
+}
+// Animated Julia parameter c:
+// vec4 c = 0.45*cos(vec4(0.5,3.9,1.4,1.1) + time*vec4(1.2,1.7,1.3,2.5)) - vec4(0.3,0,0,0);
+```
+
+### 5. Minimal IFS Field (2D, No Ray Marching)
+
+Difference from base version: pure 2D implementation, only ~20 lines of code, using `abs(p)/dot(p,p) + offset` for iteration, producing a density field through weighted accumulation.
+
+```glsl
+float field(vec3 p) {
+    float strength = 7.0 + 0.03 * log(1.e-6 + fract(sin(iTime) * 4373.11));
+    float accum = 0.0, prev = 0.0, tw = 0.0;
+    for (int i = 0; i < 32; ++i) {  // Tunable: iteration count
+        float mag = dot(p, p);
+        p = abs(p) / mag + vec3(-0.5, -0.4, -1.5); // Tunable: offset values change shape
+        float w = exp(-float(i) / 7.0);             // Tunable: 7.0 controls decay
+        accum += w * exp(-strength * pow(abs(mag - prev), 2.3));
+        tw += w;
+        prev = mag;
+    }
+    return max(0.0, 5.0 * accum / tw - 0.7);
+}
+// Sample field() directly on fragCoord as brightness/color
+```
+
+## Performance Optimization Details
+
+### Bottleneck Analysis
+
+The core bottleneck in fractal rendering is **nested loops**: outer ray marching steps x inner fractal iterations. A single pixel may execute `200 steps x 8 iterations = 1600` distance field evaluations.
+
+### Optimization Techniques
+
+#### 1. Reduce Ray Marching Steps
+Lower `MAX_STEPS` from 200 to 60-100, compensating precision loss with a fudge factor (0.7-0.9).
+```glsl
+t += h * 0.7; // Fudge factor < 1.0, allows larger steps but reduces penetration risk
+```
+
+#### 2. Adaptive Precision
+Relax the collision threshold as distance increases; far objects don't need pixel-level precision.
+```glsl
+float precis = 0.001 * t; // Precision grows linearly with distance
+```
+
+#### 3. Early Exit
+In fractal iteration, break immediately once `|z|^2 > bailout`.
+```glsl
+if (m2 > 4.0) break; // Don't continue useless iterations
+```
+
+#### 4. Reduce Iteration Count
+Fractal iteration counts (`INVERSION_ITER`, `IFS_ITER`) reduced from 8 to 4-5 have minimal visual impact but significant performance gains.
+
+#### 5. Use 4-Tap Instead of 6-Tap for Normals
+The tetrahedral method requires only 4 `map()` calls instead of 6, saving 33% normal computation cost.
+
+#### 6. AA Downgrade
+Use `#define AA 1` during development, switch to `AA 2` for release. `AA 3` has massive performance impact (9x overhead).
+
+#### 7. Distance Field Scaling
+For non-unit-sized fractals, scale the space first then scale the distance value to avoid precision issues.
+```glsl
+float z1 = 2.0;
+return mandelbulb(p / z1) * z1;
+```
+
+#### 8. Avoid `pow()` Inside Loops
+`pow(r, power)` in Mandelbulb is expensive; low powers (e.g., 2, 3) can be manually expanded instead.
+
+## Combination Suggestions
+
+### 1. Fractal + Volumetric Lighting
+
+Accumulate scattered light passing through fractal gaps during ray marching, producing "god rays" effects.
+
+```glsl
+// Accumulate additionally in ray march loop
+float glow = 0.0;
+for (...) {
+    float h = map(ro + rd*t);
+    glow += exp(-10.0 * h); // Closer to surface = larger contribution
+    ...
+}
+col += glowColor * glow * 0.01;
+```
+
+### 2. Fractal + Post-Processing (Tone Mapping / FXAA)
+
+3D fractals have rich high-frequency detail, prone to aliasing. Use ACES Tone Mapping + sRGB correction + FXAA post-processing.
+
+```glsl
+// ACES tone mapping
+vec3 aces_approx(vec3 v) {
+    v = max(v, 0.0) * 0.6;
+    float a=2.51, b=0.03, c=2.43, d=0.59, e=0.14;
+    return clamp((v*(a*v+b))/(v*(c*v+d)+e), 0.0, 1.0);
+}
+col = aces_approx(col);
+col = pow(col, vec3(1.0/2.4)); // sRGB gamma
+```
+
+### 3. Fractal + Transparent Refraction (Multi-Bounce Refraction)
+
+Used for "crystal ball" effects on volumetric fractals like Mandelbulb. Uses negative distance fields for reverse ray marching inside, combined with Beer's law absorption.
+
+```glsl
+// Invert distance field for interior stepping
+float dfactor = isInside ? -1.0 : 1.0;
+float d = dfactor * map(ro + rd * t);
+// Beer's law light absorption
+ragg *= exp(-st * beer); // beer = negative color vector
+// Refraction direction
+vec3 refr = refract(rd, sn, isInside ? 1.0/ior : ior);
+```
+
+### 4. Fractal + Orbit Trap Texture Mapping
+
+Orbit trap values can be mapped to HSV color space for rich coloring, or mapped as self-emission for glowing fractal effects.
+
+```glsl
+vec3 hsv2rgb(vec3 c) {
+    vec4 K = vec4(1.0, 2.0/3.0, 1.0/3.0, 3.0);
+    vec3 p = abs(fract(c.xxx + K.xyz) * 6.0 - K.www);
+    return c.z * mix(K.xxx, clamp(p - K.xxx, 0.0, 1.0), c.y);
+}
+// Map orbit trap to HSV
+vec3 col = hsv2rgb(vec3(trap.x * 0.5, 0.9, 0.8));
+```
+
+### 5. Fractal + Soft Shadow
+
+Perform an additional ray march from the fractal surface toward the light source, accumulating the minimum `h/t` ratio to generate soft shadows.
+
+```glsl
+float softshadow(vec3 ro, vec3 rd, float mint, float k) {
+    float res = 1.0;
+    float t = mint;
+    for (int i = 0; i < 64; i++) {
+        float h = map(ro + rd*t);
+        res = min(res, k * h / t); // Larger k = harder shadows
+        if (res < 0.001) break;
+        t += clamp(h, 0.01, 0.5);
+    }
+    return clamp(res, 0.0, 1.0);
+}
+```
--- a/skills/shader-dev/reference/lighting-model.md
+++ b/skills/shader-dev/reference/lighting-model.md
@@ -0,0 +1,639 @@
+# Lighting Models Detailed Reference
+
+This document is a detailed supplementary reference to [SKILL.md](SKILL.md), covering prerequisite knowledge, in-depth explanations for each step, complete descriptions of variants, performance optimization analysis, and full code examples for combination suggestions.
+
+---
+
+## Prerequisites
+
+### Vector Math Fundamentals
+- **Dot product**: `dot(A, B) = |A||B|cos(θ)`, used to compute the angular relationship between two vectors. Lighting models heavily use dot products such as N·L, N·V, N·H, V·H
+- **Cross product**: `cross(A, B)` returns a vector perpendicular to both A and B, used to build camera coordinate systems and tangent spaces
+- **normalize**: Scales a vector to unit length; lighting calculations require all direction vectors to be normalized
+- **reflect**: `reflect(I, N) = I - 2.0 * dot(N, I) * N`, computes the reflection of incident vector I about normal N
+
+### GLSL Fundamentals
+- **uniform / varying**: uniforms are global constants (e.g., iTime, iResolution); varyings are interpolated from vertex to fragment
+- **Key built-in functions**:
+  - `clamp(x, min, max)` — clamp to range
+  - `mix(a, b, t)` — linear interpolation `a*(1-t) + b*t`
+  - `pow(base, exp)` — exponentiation, used for specular falloff
+  - `exp(x)` / `exp2(x)` — exponential functions, used for attenuation and Beer's Law
+  - `smoothstep(edge0, edge1, x)` — Hermite smooth interpolation
+
+### Basic Computer Graphics Concepts
+- **Normal (N)**: Unit vector pointing outward from the surface, determines lighting intensity
+- **View Direction (V)**: Unit vector from the surface point toward the camera
+- **Light Direction (L)**: Unit vector from the surface point toward the light source
+- **Half Vector (H)**: `normalize(V + L)`, the core of the Blinn-Phong model
+- **Reflect Vector (R)**: `reflect(-L, N)`, used in the classic Phong model
+
+### Raymarching Basics (Recommended)
+- **SDF (Signed Distance Function)**: Returns the signed distance from a point to the nearest surface
+- **Normal computation (finite differences)**: Approximates the gradient (i.e., normal direction) by computing small-offset differences of the SDF along the x, y, and z axes
+- **March**: Advances along the ray direction by the distance returned by the SDF until hitting a surface or exceeding the range
+
+---
+
+## Implementation Steps in Detail
+
+### Step 1: Scene Foundation (UV, Camera, Raymarching)
+
+**What**: Establish the standard ShaderToy framework — UV coordinates, camera ray, SDF scene, normal computation.
+
+**Why**: Lighting calculations require normal N, view direction V, and light direction L as inputs, all of which depend on scene geometry. Without correct normals and direction vectors, no lighting model can work.
+
+**Details**:
+- UV coordinates are typically normalized as `(2.0 * fragCoord - iResolution.xy) / iResolution.y` to ensure correct aspect ratio
+- The camera uses a look-at matrix: forward direction `ww`, right direction `uu`, up direction `vv`
+- SDF normals use six-point central difference, which is more accurate than forward difference
+- The epsilon value in `e = vec2(0.001, 0.0)` affects normal accuracy: too large blurs details, too small introduces noise
+
+**Code**:
+```glsl
+// Compute normal from SDF scene (finite differences) — standard technique
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+// Prepare basic vectors needed for lighting
+vec3 N = calcNormal(pos);           // Surface normal
+vec3 V = -rd;                        // View direction (reverse of ray)
+vec3 L = normalize(lightPos - pos);  // Light direction (point light)
+// Or directional light: vec3 L = normalize(vec3(0.6, 0.8, -0.5));
+```
+
+### Step 2: Lambert Diffuse
+
+**What**: Compute basic diffuse lighting — the foundation of all lighting models.
+
+**Why**: Lambert's law describes the ideal diffuse behavior of rough surfaces — brightness is proportional to cos(angle of incidence). This is the most fundamental physically-based lighting model, assuming light enters the surface and is scattered uniformly.
+
+**Details**:
+- `max(0.0, dot(N, L))` uses `max(0,...)` to avoid negative values (backface lighting)
+- Energy-conserving Lambertian diffuse requires dividing by PI, since Lambert BRDF = albedo/PI and the integrated irradiance = PI * L_incoming
+- Half-Lambert (`NdotL * 0.5 + 0.5`) is a technique invented by Valve that maps [-1,1] to [0,1], giving backlit areas some brightness; commonly used for character rendering and SSS approximation
+- Many ocean shaders use a similar wrapped diffuse pattern
+
+**Code**:
+```glsl
+// Basic Lambert diffuse
+float NdotL = max(0.0, dot(N, L));
+vec3 diffuse = albedo * lightColor * NdotL;
+
+// Energy-conserving version (albedo/PI)
+vec3 diffuse_conserved = albedo / PI * lightColor * NdotL;
+
+// Half-Lambert variant (wrapped dot product)
+// Reduces over-darkening on backlit faces, commonly used for SSS approximation
+float halfLambert = NdotL * 0.5 + 0.5;
+vec3 diffuse_wrapped = albedo * lightColor * halfLambert;
+```
+
+### Step 3: Blinn-Phong Specular
+
+**What**: Add specular highlights based on the half vector.
+
+**Why**: Blinn-Phong is more computationally efficient and physically plausible than classic Phong. The half vector H is the average direction of V and L; the highlight is brightest when H aligns with N. Blinn-Phong also behaves more realistically at grazing angles compared to Phong.
+
+**Details**:
+- Half vector H = normalize(V + L), which avoids the reflect computation needed by Phong's reflect(-L, N)
+- Shininess controls highlight concentration: 4.0 gives a very rough surface feel, 256.0 approaches a mirror
+- The normalization factor `(shininess + 8.0) / (8.0 * PI)` ensures total reflected energy remains constant when changing shininess (energy conservation)
+- Based on the standard half vector method used in many raymarching shaders
+
+**Code**:
+```glsl
+// Blinn-Phong specular (standard half vector method)
+vec3 H = normalize(V + L);
+float NdotH = max(0.0, dot(N, H));
+
+// Empirical model: directly use shininess exponent
+float SHININESS = 32.0;  // Adjustable: 4.0 (rough) ~ 256.0 (mirror-like)
+float spec = pow(NdotH, SHININESS);
+
+// With energy-conserving normalization factor
+// Normalization factor (s+8)/(8*PI) ensures total energy is preserved when changing shininess
+float normFactor = (SHININESS + 8.0) / (8.0 * PI);
+float spec_normalized = normFactor * pow(NdotH, SHININESS);
+
+vec3 specular = lightColor * spec_normalized;
+```
+
+### Step 4: Fresnel-Schlick Approximation
+
+**What**: Compute reflectance based on viewing angle — reflectance increases at grazing angles ("edge brightening" effect).
+
+**Why**: All real materials approach 100% reflectance at grazing angles. This is a fundamental physical phenomenon (Fresnel effect). The Schlick approximation uses a fifth-power curve to simulate this, and is a core component of all PBR pipelines. This is a ubiquitous formula in real-time rendering.
+
+**Details**:
+- F0 is the reflectance at normal incidence (looking straight at the surface)
+- Dielectrics (plastic, water, etc.): F0 is approximately 0.02~0.04; most light is scattered (diffuse)
+- Metals: F0 uses the material's baseColor, since metals have virtually no diffuse reflection
+- `mix(vec3(0.04), baseColor, metallic)` is the unified metallic workflow, interpolating between dielectrics and metals
+- Using V·H for the Cook-Torrance BRDF specular term
+- Using N·V for environment reflections, rim lighting, etc.
+- A widely used approximation in both real-time and offline rendering pipelines.
+
+**Code**:
+```glsl
+// Fresnel-Schlick approximation (standard formulation)
+vec3 fresnelSchlick(vec3 F0, float cosTheta) {
+    return F0 + (1.0 - F0) * pow(1.0 - cosTheta, 5.0);
+}
+
+// Dielectrics (plastic, water, etc.): F0 approximately 0.02~0.04
+vec3 F0_dielectric = vec3(0.04);
+
+// Metals: F0 uses the material's baseColor
+vec3 F0_metal = baseColor;
+
+// Unified metallic workflow
+vec3 F0 = mix(vec3(0.04), baseColor, metallic);
+
+// Compute Fresnel using V·H (for specular BRDF)
+float VdotH = max(0.0, dot(V, H));
+vec3 F = fresnelSchlick(F0, VdotH);
+
+// Alternatively, compute Fresnel using N·V (for environment reflections, rim light)
+// Optional: pow(fGloss, 20.0) factor for gloss adjustment
+float NdotV = max(0.0, dot(N, V));
+vec3 F_env = F0 + (1.0 - F0) * pow(1.0 - NdotV, 5.0);
+```
+
+### Step 5: GGX Normal Distribution Function (D Term)
+
+**What**: Compute the probability distribution of microfacet normals aligning with the half vector.
+
+**Why**: The GGX (Trowbridge-Reitz) distribution has a wider "long tail" highlight, closer to real materials than the Beckmann distribution. This is the core term in PBR pipelines that determines highlight shape and size. This is the standard GGX formula used across PBR implementations.
+
+**Details**:
+- Roughness must be squared first (`a = roughness * roughness`); this is Disney's mapping from perceptual roughness to alpha
+- `a2 = a * a` is the alpha^2 term in the GGX formula
+- When roughness = 0.0, D approaches a delta function (perfect mirror); when roughness = 1.0, it approaches a uniform distribution
+- The denominator `PI * denom * denom` ensures the distribution function integrates to 1 over the hemisphere
+- The standard GGX formula used across PBR implementations
+
+**Code**:
+```glsl
+// GGX/Trowbridge-Reitz normal distribution function (standard formulation)
+float distributionGGX(float NdotH, float roughness) {
+    float a = roughness * roughness;  // Note: roughness must be squared first!
+    float a2 = a * a;
+    float denom = NdotH * NdotH * (a2 - 1.0) + 1.0;
+    return a2 / (PI * denom * denom);
+}
+
+// Roughness parameter guide:
+// roughness = 0.0 → perfect mirror (D approaches delta function)
+// roughness = 0.5 → medium roughness
+// roughness = 1.0 → fully rough (D approaches uniform distribution)
+```
+
+### Step 6: Geometric Occlusion Function (G Term)
+
+**What**: Compute the mutual shadowing and masking between microfacets.
+
+**Why**: Not all correctly-oriented microfacets can be "seen" by both the light and the view simultaneously — the G term corrects for this occlusion loss. The microfacet model assumes the surface is composed of countless tiny flat surfaces that can occlude each other (shadowing and masking).
+
+**Details**:
+- The Smith method decomposes G into two independent terms for the light direction (G1_L) and view direction (G1_V)
+- **Schlick-GGX**: `k = (roughness+1)^2 / 8` for direct lighting, `k = roughness^2 / 2` for IBL
+- **Height-Correlated Smith**: More physically accurate, accounts for height correlation of microfacets; directly returns the visibility term `G/(4*NdotV*NdotL)`
+- **Simplified approximation** (G1V): Most compact implementation, suitable for code golf or extremely performance-constrained scenarios
+- Three common implementations with different accuracy/performance tradeoffs
+
+**Code**:
+```glsl
+// Smith method: decompose G into two independent G1 terms for light and view directions
+
+// Method 1: Schlick-GGX (separated implementation)
+// The clearest pedagogical implementation
+float geometrySchlickGGX(float NdotV, float roughness) {
+    float r = roughness + 1.0;
+    float k = (r * r) / 8.0;  // For direct lighting: k = (r+1)^2/8
+    return NdotV / (NdotV * (1.0 - k) + k);
+}
+
+float geometrySmith(float NdotV, float NdotL, float roughness) {
+    float ggx1 = geometrySchlickGGX(NdotV, roughness);
+    float ggx2 = geometrySchlickGGX(NdotL, roughness);
+    return ggx1 * ggx2;
+}
+
+// Method 2: Height-Correlated Smith (visibility term form)
+// More physically accurate, directly returns G/(4*NdotV*NdotL), i.e., the "visibility term"
+float visibilitySmith(float NdotV, float NdotL, float roughness) {
+    float a2 = roughness * roughness;
+    float gv = NdotL * sqrt(NdotV * (NdotV - NdotV * a2) + a2);
+    float gl = NdotV * sqrt(NdotL * (NdotL - NdotL * a2) + a2);
+    return 0.5 / max(gv + gl, 0.00001);
+}
+
+// Method 3: Simplified approximation (compact G1V helper)
+// Most compact implementation
+float G1V(float dotNV, float k) {
+    return 1.0 / (dotNV * (1.0 - k) + k);
+}
+// Usage: float vis = G1V(NdotL, k) * G1V(NdotV, k); where k = roughness/2
+```
+
+### Step 7: Assembling the Cook-Torrance BRDF
+
+**What**: Combine the D, F, and G terms into a complete specular reflection BRDF.
+
+**Why**: The Cook-Torrance microfacet model is currently the most widely used physically-based specular reflection model in real-time rendering. It is based on microfacet theory, modeling the surface as countless tiny perfect mirrors.
+
+**Details**:
+- Full formula: `f_specular = D * F * G / (4 * NdotV * NdotL)`
+- When using `visibilitySmith` (which returns `G/(4*NdotV*NdotL)`), there is no need to manually divide by the denominator
+- When using the standard `geometrySmith` (which returns G), you must explicitly divide by `4 * NdotV * NdotL`
+- `max(4.0 * NdotV * NdotL, 0.001)` prevents division by zero
+- Based on the standard Cook-Torrance BRDF formulation
+
+**Code**:
+```glsl
+// Complete Cook-Torrance BRDF assembly
+// Standard Cook-Torrance BRDF assembly
+vec3 cookTorranceBRDF(vec3 N, vec3 V, vec3 L, float roughness, vec3 F0) {
+    vec3 H = normalize(V + L);
+
+    float NdotL = max(0.0, dot(N, L));
+    float NdotV = max(0.0, dot(N, V));
+    float NdotH = max(0.0, dot(N, H));
+    float VdotH = max(0.0, dot(V, H));
+
+    // D: Normal distribution
+    float D = distributionGGX(NdotH, roughness);
+
+    // F: Fresnel
+    vec3 F = fresnelSchlick(F0, VdotH);
+
+    // G: Geometric occlusion (using visibility term form, which includes the 4*NdotV*NdotL denominator)
+    float Vis = visibilitySmith(NdotV, NdotL, roughness);
+
+    // Assembly (Vis version already divides by 4*NdotV*NdotL)
+    vec3 specular = D * F * Vis;
+
+    // Or using the standard G term form:
+    // float G = geometrySmith(NdotV, NdotL, roughness);
+    // vec3 specular = (D * F * G) / max(4.0 * NdotV * NdotL, 0.001);
+
+    return specular * NdotL;
+}
+```
+
+### Step 8: Multi-Light Accumulation and Final Compositing
+
+**What**: Blend diffuse and specular reflections with energy conservation, and accumulate contributions from multiple lights.
+
+**Why**: Real scenes contain multiple light sources (sun, sky, ground bounce, etc.). Energy conservation must be maintained between diffuse and specular: energy that has been reflected (F) should not participate in diffuse reflection.
+
+**Details**:
+- `kD = (1.0 - F) * (1.0 - metallic)` implements energy conservation:
+  - `(1.0 - F)` ensures already-reflected light does not participate in diffuse
+  - `(1.0 - metallic)` ensures metals have no diffuse (metals' free electrons absorb all refracted light)
+- Sky light uses `0.5 + 0.5 * N.y` to approximate hemisphere integration — the more upward the normal, the brighter
+- Back/rim light uses wrapped diffuse from the opposite direction of the sun to provide fill lighting
+- Based on multi-light architecture patterns common in PBR raymarching shaders
+
+**Code**:
+```glsl
+// Complete multi-light PBR lighting accumulation
+// Multi-light PBR architecture
+
+vec3 shade(vec3 pos, vec3 N, vec3 V, vec3 albedo, float roughness, float metallic) {
+    vec3 F0 = mix(vec3(0.04), albedo, metallic);
+    vec3 diffuseColor = albedo * (1.0 - metallic);  // Metals have no diffuse
+    vec3 color = vec3(0.0);
+
+    // --- Main light (sun) ---
+    vec3 sunDir = normalize(vec3(0.6, 0.8, -0.5));
+    vec3 sunColor = vec3(1.0, 0.95, 0.85) * 2.0;
+
+    vec3 H = normalize(V + sunDir);
+    float NdotL = max(0.0, dot(N, sunDir));
+    float NdotV = max(0.0, dot(N, V));
+    float VdotH = max(0.0, dot(V, H));
+
+    vec3 F = fresnelSchlick(F0, VdotH);
+    vec3 kD = (1.0 - F) * (1.0 - metallic);  // Energy conservation
+
+    // Diffuse contribution
+    color += kD * diffuseColor / PI * sunColor * NdotL;
+    // Specular contribution
+    color += cookTorranceBRDF(N, V, sunDir, roughness, F0) * sunColor;
+
+    // --- Sky light (hemisphere light approximation) ---
+    // Sky light (hemisphere light approximation)
+    vec3 skyColor = vec3(0.2, 0.5, 1.0) * 0.3;
+    float skyDiffuse = 0.5 + 0.5 * N.y;  // Simple hemisphere integration approximation
+    color += diffuseColor * skyColor * skyDiffuse;
+
+    // --- Back light / rim light ---
+    // Back-light / fill light term
+    vec3 backDir = normalize(vec3(-sunDir.x, 0.0, -sunDir.z));
+    float backDiffuse = clamp(dot(N, backDir) * 0.5 + 0.5, 0.0, 1.0);
+    color += diffuseColor * vec3(0.25, 0.15, 0.1) * backDiffuse;
+
+    return color;
+}
+```
+
+### Step 9: Ambient Occlusion (AO)
+
+**What**: Approximate the reduction of indirect lighting in surface crevices due to geometric occlusion.
+
+**Why**: Scenes without AO appear overly "flat" and lack spatial depth. In raymarching scenes, the SDF can be used to efficiently compute AO — sample several points along the normal direction and compare the SDF distance with the ideal distance.
+
+**Details**:
+- Principle: Step gradually away from the surface along the normal, querying the SDF value at each sample point. If the SDF value is less than the sample distance h, nearby occluding geometry is present
+- `sca *= 0.95` gradually decreases the weight of farther sample points
+- The multiplier in `3.0 * occ` controls AO intensity (adjustable)
+- AO affects both diffuse and specular, but in different ways:
+  - Diffuse: multiply directly by the AO value
+  - Specular: use `pow(NdotV + ao, roughness^2) - 1 + ao` for more subtle attenuation
+- Based on the standard SDF ambient occlusion technique
+
+**Code**:
+```glsl
+// AO computation for raymarching scenes (standard SDF-based technique)
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0;
+        float d = map(pos + h * nor);
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+
+// Using AO (AO affects both diffuse and specular)
+float ao = calcAO(pos, N);
+diffuseLight *= ao;
+// More subtle specular AO:
+specularLight *= clamp(pow(NdotV + ao, roughness * roughness) - 1.0 + ao, 0.0, 1.0);
+```
+
+---
+
+## Variant Details
+
+### Variant 1: Classic Phong (Non-PBR)
+
+**Difference from base version**: Uses the reflection vector `R = reflect(-L, N)` instead of the half vector; no D/F/G decomposition.
+
+**Use cases**: Quick prototyping, retro-style rendering, performance-constrained scenarios. The Phong model has the lowest computational cost but does not satisfy energy conservation, and highlights disappear at grazing angles (the opposite of real materials).
+
+**Key code**:
+```glsl
+// Classic Phong reflection model
+vec3 R = reflect(-L, N);
+float spec = pow(max(0.0, dot(R, V)), 32.0);
+vec3 color = albedo * lightColor * NdotL    // diffuse
+           + lightColor * spec;              // specular
+```
+
+### Variant 2: Point Light Attenuation
+
+**Difference from base version**: Adds distance attenuation, suitable for point light / spotlight scenarios. The base version assumes directional light (sun), while point light intensity decreases with distance.
+
+**Use cases**: Indoor scenes, multiple point lights, close-range light effects.
+
+**Details**:
+- Physically correct attenuation should be `1/distance²`, but in practice `1/(1 + k1*d + k2*d²)` avoids infinite brightness at close range
+- k1 (linear attenuation): 0.01~0.5, k2 (quadratic attenuation): 0.001~0.1
+- Alternatively, use physical attenuation with a maximum intensity cap: `min(1.0/(d*d), maxIntensity)`
+
+**Key code**:
+```glsl
+// Point light attenuation (standard pattern)
+float dist = length(lightPos - pos);
+float attenuation = 1.0 / (1.0 + dist * 0.1 + dist * dist * 0.01);
+// k1: linear attenuation coefficient (adjustable 0.01~0.5)
+// k2: quadratic attenuation coefficient (adjustable 0.001~0.1)
+color *= attenuation;
+```
+
+### Variant 3: IBL (Image-Based Lighting)
+
+**Difference from base version**: Uses environment maps instead of analytic light sources, split into diffuse SH (spherical harmonics) and specular split-sum parts.
+
+**Use cases**: Scenes requiring realistic environmental lighting reflections. IBL can capture complex lighting environments (e.g., HDRI panoramas), producing very natural lighting effects.
+
+**Details**:
+- Diffuse IBL uses spherical harmonics (SH) to precompute the low-frequency component of environmental lighting
+- Specular IBL uses Epic Games' split-sum approximation: splits the BRDF integral into environment map LOD lookup + precomputed BRDF integration lookup table
+- `EnvBRDFApprox` is Unreal Engine 4's approximation, avoiding the need for a precomputed LUT texture
+- `textureLod(envMap, R, roughness * 7.0)` uses mipmap levels to simulate blurred reflections on rough surfaces
+- Based on the SH + EnvBRDFApprox method common in PBR pipelines
+
+**Key code**:
+```glsl
+// IBL approximation (SH + EnvBRDFApprox method)
+// Diffuse IBL: spherical harmonics
+vec3 diffuseIBL = diffuseColor * SHIrradiance(N);
+
+// Specular IBL: Unreal's EnvBRDFApprox approximation
+vec3 EnvBRDFApprox(vec3 specColor, float roughness, float NdotV) {
+    vec4 c0 = vec4(-1, -0.0275, -0.572, 0.022);
+    vec4 c1 = vec4(1, 0.0425, 1.04, -0.04);
+    vec4 r = roughness * c0 + c1;
+    float a004 = min(r.x * r.x, exp2(-9.28 * NdotV)) * r.x + r.y;
+    vec2 AB = vec2(-1.04, 1.04) * a004 + r.zw;
+    return specColor * AB.x + AB.y;
+}
+vec3 R = reflect(-V, N);
+vec3 envColor = textureLod(envMap, R, roughness * 7.0).rgb;
+vec3 specularIBL = EnvBRDFApprox(F0, roughness, NdotV) * envColor;
+```
+
+### Variant 4: Subsurface Scattering Approximation (SSS)
+
+**Difference from base version**: Simulates light passing through translucent materials (e.g., skin, wax, water surfaces).
+
+**Use cases**: Water surfaces, skin, candles, leaves, and other translucent materials. SSS makes thin parts appear brighter and more translucent.
+
+**Details**:
+- **Method 1 (SDF probing)**: Probes the SDF value along the light direction into the material interior. If the SDF value is much smaller than the probe distance, the material is thicker at that point and transmits less light; otherwise it transmits more
+- **Method 2 (Henyey-Greenstein phase function)**: Describes the directional distribution of light scattering in a medium. Parameter g controls forward/backward scattering: g > 0 for forward scattering (e.g., skin), g < 0 for backward scattering
+- Combines SDF-based interior probing with Henyey-Greenstein phase function
+
+**Key code**:
+```glsl
+// SSS approximation (SDF-based interior probing)
+// Method 1: SDF-based interior probing
+float subsurface(vec3 pos, vec3 L) {
+    float sss = 0.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.05 + float(i) * 0.1;
+        float d = map(pos + L * h);  // Probe along light direction into interior
+        sss += max(0.0, h - d);      // Thinner areas transmit more light
+    }
+    return clamp(1.0 - sss * 4.0, 0.0, 1.0);
+}
+
+// Method 2: Henyey-Greenstein phase function
+float HenyeyGreenstein(float cosTheta, float g) {
+    float g2 = g * g;
+    return (1.0 - g2) / (pow(1.0 + g2 - 2.0 * g * cosTheta, 1.5) * 4.0 * PI);
+}
+float sssAmount = HenyeyGreenstein(dot(V, L), 0.5);
+color += sssColor * sssAmount * NdotL;
+```
+
+### Variant 5: Beer's Law Water Lighting
+
+**Difference from base version**: Simulates the exponential attenuation of light in water/transparent media.
+
+**Use cases**: Water surfaces, underwater scenes, glass, juice, and other transparent/translucent media. The Beer-Lambert law describes the exponential decay of light intensity as it travels through a medium.
+
+**Details**:
+- `exp2(-opticalDepth * extinctColor)` implements wavelength-dependent exponential attenuation
+- Different color channels have different attenuation coefficients, producing the characteristic color of water (blue/green transmits the most)
+- In `extinctColor = 1.0 - vec3(0.5, 0.4, 0.1)`, the vec3 controls the absorption rate per channel
+- Inscattering simulates multiple scattering of light inside the water body, giving deep water its inherent color
+- `1.0 - exp(-depth * 0.1)` is a simplified inscattering model
+- Based on the Beer-Lambert law for wavelength-dependent attenuation
+
+**Key code**:
+```glsl
+// Beer's Law light attenuation
+vec3 waterExtinction(float depth) {
+    float opticalDepth = depth * 6.0;  // Adjustable: controls attenuation rate
+    vec3 extinctColor = 1.0 - vec3(0.5, 0.4, 0.1);  // Adjustable: water absorption color
+    return exp2(-opticalDepth * extinctColor);
+}
+
+// Usage: underwater object color multiplied by attenuation
+vec3 underwaterColor = objectColor * waterExtinction(depth);
+// Add water inscattering
+vec3 inscatter = waterDiffuse * (1.0 - exp(-depth * 0.1));
+underwaterColor += inscatter;
+```
+
+---
+
+## Performance Optimization In-Depth Analysis
+
+### 1. Avoiding the Cost of pow(x, 5.0)
+
+The `pow` function on some GPUs is implemented as `exp2(5.0 * log2(x))`, involving two transcendental functions. Manually unrolling into a multiplication chain is more efficient:
+
+```glsl
+// Efficient implementation of Schlick Fresnel
+float x = 1.0 - cosTheta;
+float x2 = x * x;
+float x5 = x2 * x2 * x;  // Faster than pow(x, 5.0)
+vec3 F = F0 + (1.0 - F0) * x5;
+```
+
+### 2. Merging G and the Denominator (Visibility Term)
+
+Using `V_SmithGGX` to directly return `G / (4 * NdotV * NdotL)` avoids computing G separately and then dividing. This not only eliminates one division but also avoids numerical instability when `4 * NdotV * NdotL` is near zero. The Height-Correlated Smith version is also more physically accurate.
+
+### 3. AO Sample Count
+
+- 5 samples are sufficient for most scenes
+- Distant objects can use as few as 3 (since details are not visible)
+- The upper bound of sample step h (`0.12 * i / 4.0`) controls the AO influence range: increasing it detects larger-scale occlusion but requires more samples
+- The decay rate `sca *= 0.95` is also adjustable: smaller values make AO more concentrated near the surface
+
+### 4. Soft Shadow Optimization
+
+- Using `clamp(h, 0.02, 0.2)` to limit step size: minimum step 0.02 prevents getting stuck near the surface, maximum step 0.2 prevents skipping thin geometry
+- Shadow ray maxSteps can be lower than the primary ray (14~24 steps is usually enough), since shadows don't need precise hit points
+- The 8.0 in `8.0 * h / t` controls shadow softness: higher values produce harder shadows, lower values softer ones. This is an intuitive penumbra size control
+
+### 5. Simplified IBL
+
+- Without a cubemap, use a simple sky color gradient as a substitute for environment mapping
+- `mix(groundColor, skyColor, R.y * 0.5 + 0.5)` is the cheapest "environment reflection"
+- A `pow(max(0, dot(R, sunDir)), 64.0)` in the sun direction can be added to simulate the sun's specular reflection
+
+### 6. Branch Culling
+
+When NdotL <= 0, the surface faces away from the light source, and all specular calculations (D, F, G) can be skipped:
+
+```glsl
+// Skip entire specular computation when NdotL <= 0
+if (NdotL > 0.0) {
+    // ... D, F, G computation ...
+}
+```
+
+Note: Branch efficiency on GPUs depends on the coherence of pixels within the same warp/wavefront. If large areas face away from the light, this branch is effective; if the branch condition switches frequently between adjacent pixels, it may actually be slower.
+
+---
+
+## Combination Suggestions in Detail
+
+### Lighting + Raymarching
+
+Raymarching scenes are the most common host for lighting models. Normals are obtained via SDF finite differences, and AO and shadows directly leverage SDF queries.
+
+Key integration points:
+- `calcNormal` provides normal N
+- `calcAO` leverages SDF for ambient occlusion
+- `softShadow` leverages SDF for soft shadows
+- Material IDs can be passed through the return value of the `map` function
+
+### Lighting + Volumetric Rendering
+
+Volumetric effects like clouds, smoke, and fog require Beer's Law attenuation and phase functions (e.g., Henyey-Greenstein). PBR surface lighting integrates naturally with volumetric cloud lighting.
+
+Key integration points:
+- Volumetric rendering uses ray marching to step through the volume
+- Each step accumulates density and applies Beer's Law attenuation
+- Lighting uses the Henyey-Greenstein phase function instead of a BRDF
+- The final result is alpha-blended with the surface rendering output
+
+### Lighting + Normal Maps / Procedural Normals
+
+Normals don't have to come from the SDF. Procedural normals generated by FBM noise (e.g., ocean wave normals, water surface normals) can be passed directly to lighting functions, producing rich surface detail.
+
+Key integration points:
+- Procedural normals work by perturbing the base normal: `N = normalize(N + perturbation)`
+- FBM noise frequency and amplitude control the coarseness and strength of detail
+- SDF normals and procedural normals can be combined for macro shape + micro detail
+
+### Lighting + Post-Processing
+
+Tone mapping and gamma correction are essential parts of a PBR pipeline. HDR lighting values must be mapped to the [0,1] LDR range for correct display:
+
+```glsl
+// ACES — currently the most popular tone mapping
+col = (col * (2.51 * col + 0.03)) / (col * (2.43 * col + 0.59) + 0.14);
+
+// Reinhard — simplest tone mapping
+col = col / (col + 1.0);
+
+// Gamma correction — convert from linear space to sRGB
+col = pow(col, vec3(1.0 / 2.2));
+```
+
+Note: All lighting calculations must be performed in linear space; gamma correction is only applied at final output.
+
+### Lighting + Reflections
+
+Multi-layer reflections or environment reflections query the scene again in the `reflect(rd, N)` direction, blending the reflected color into the final result weighted by Fresnel.
+
+```glsl
+// Basic reflection pattern
+vec3 R = reflect(rd, N);
+vec3 reflColor = traceScene(pos + N * 0.01, R);  // Offset to avoid self-intersection
+vec3 F = fresnelSchlick(F0, NdotV);
+color = mix(color, reflColor, F);
+```
+
+A common water surface rendering approach combines refraction + reflection + Fresnel blending:
+- Reflection direction `reflect(rd, N)` queries the sky/scene
+- Refraction direction `refract(rd, N, 1.0/1.33)` queries the underwater scene
+- Fresnel coefficient blends between reflection and refraction
--- a/skills/shader-dev/reference/matrix-transform.md
+++ b/skills/shader-dev/reference/matrix-transform.md
@@ -0,0 +1,535 @@
+# Matrix Transforms & Camera — Detailed Reference
+
+This document is the complete detailed version of [SKILL.md](SKILL.md), covering step-by-step tutorials, mathematical derivations, detailed explanations, and advanced usage.
+
+## Prerequisites
+
+- **Vector Fundamentals**: Meaning of `vec2/vec3/vec4`, dot product `dot()`, cross product `cross()`, `normalize()`
+- **Matrix Fundamentals**: Column-major storage of `mat2/mat3/mat4` in GLSL, semantics of matrix multiplication `m * v`
+- **Coordinate Systems**: NDC (Normalized Device Coordinates), screen-space to world-space mapping, aspect ratio correction
+- **Trigonometry**: Relationship between `sin()`/`cos()` and rotation
+- **ShaderToy Built-in Variables**: `iResolution`, `iTime`, `iMouse`, `fragCoord`
+
+## Core Principles
+
+The essence of matrix transforms is **coordinate system transformation**. In ShaderToy's ray marching pipeline, transformation matrices serve two key roles:
+
+1. **Camera Matrix**: Converts screen pixel coordinates to ray directions in world space (view-to-world)
+2. **Object Transform Matrix**: Converts sampling points from world space to the object's local space (world-to-local, i.e., "domain transform")
+
+### Key Mathematical Formulas
+
+**2D Rotation Matrix** (rotation by angle θ around the origin):
+
+```
+R(θ) = | cos θ  -sin θ |
+       | sin θ   cos θ |
+```
+
+**3D Single-Axis Rotation** (rotation around Y axis as example):
+
+```
+Ry(θ) = | cos θ   0   sin θ |
+        |   0     1     0   |
+        | -sin θ  0   cos θ |
+```
+
+**Rodrigues' Rotation Formula** (rotation by angle θ around arbitrary axis **k**):
+
+```
+R = cos θ · I + (1 - cos θ) · k⊗k + sin θ · K
+```
+where K is the skew-symmetric matrix of axis vector k.
+
+**LookAt Camera** (looking from eye toward target):
+
+```
+forward = normalize(target - eye)
+right   = normalize(cross(forward, worldUp))
+up      = cross(right, forward)
+viewMatrix = mat3(right, up, forward)
+```
+
+**Perspective Ray Generation**:
+
+```
+rayDir = normalize(camMatrix * vec3(uv, focalLength))
+```
+
+where `uv` is the aspect-ratio-corrected screen coordinate, and `focalLength` controls the field of view (larger values produce smaller FOV).
+
+## Implementation Steps
+
+### Step 1: Screen Coordinate Normalization and Aspect Ratio Correction
+
+**What**: Convert pixel coordinates `fragCoord` to normalized UV coordinates centered at the screen center, with Y-axis pointing up and correct aspect ratio.
+
+**Why**: All subsequent ray generation depends on correctly normalized screen coordinates. Without aspect ratio correction, circles would become ellipses.
+
+**Code**:
+```glsl
+// Method A: range [-aspect, aspect] x [-1, 1] (most common)
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Method B: step-by-step approach (equivalent)
+vec2 uv = fragCoord / iResolution.xy * 2.0 - 1.0;
+uv.x *= iResolution.x / iResolution.y;
+```
+
+### Step 2: Building Rotation Matrices
+
+**What**: Choose the appropriate rotation matrix construction method based on requirements.
+
+**Why**: Rotation is the core of all 3D transforms. Different scenarios suit different rotation representations.
+
+**Method A: 2D Rotation (mat2)**
+
+The simplest form, commonly used for two-plane rotations in camera orbits:
+```glsl
+mat2 rot2D(float a) {
+    float c = cos(a), s = sin(a);
+    return mat2(c, s, -s, c); // Note GLSL column-major order
+}
+```
+
+**Method B: 3D Single-Axis Rotation (mat3)**
+
+Separate X/Y/Z axis rotation functions that can be freely combined:
+```glsl
+mat3 rotX(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(1, 0, 0,  0, c, s,  0, -s, c);
+}
+mat3 rotY(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(c, 0, s,  0, 1, 0,  -s, 0, c);
+}
+mat3 rotZ(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(c, s, 0,  -s, c, 0,  0, 0, 1);
+}
+```
+
+**Method C: Euler Angles to mat3**
+
+Build a complete rotation matrix from three angles (yaw/pitch/roll) in one step:
+```glsl
+mat3 fromEuler(vec3 ang) {
+    vec2 a1 = vec2(sin(ang.x), cos(ang.x));
+    vec2 a2 = vec2(sin(ang.y), cos(ang.y));
+    vec2 a3 = vec2(sin(ang.z), cos(ang.z));
+    mat3 m;
+    m[0] = vec3( a1.y*a3.y + a1.x*a2.x*a3.x,
+                  a1.y*a2.x*a3.x + a3.y*a1.x,
+                 -a2.y*a3.x);
+    m[1] = vec3(-a2.y*a1.x, a1.y*a2.y, a2.x);
+    m[2] = vec3( a3.y*a1.x*a2.x + a1.y*a3.x,
+                  a1.x*a3.x - a1.y*a3.y*a2.x,
+                  a2.y*a3.y);
+    return m;
+}
+```
+
+**Method D: Rodrigues Arbitrary-Axis Rotation (mat3)**
+
+Rotation around any normalized axis, based on Rodrigues' formula:
+```glsl
+mat3 rotationMatrix(vec3 axis, float angle) {
+    axis = normalize(axis);
+    float s = sin(angle);
+    float c = cos(angle);
+    float oc = 1.0 - c;
+    return mat3(
+        oc*axis.x*axis.x + c,          oc*axis.x*axis.y - axis.z*s, oc*axis.z*axis.x + axis.y*s,
+        oc*axis.x*axis.y + axis.z*s,   oc*axis.y*axis.y + c,        oc*axis.y*axis.z - axis.x*s,
+        oc*axis.z*axis.x - axis.y*s,   oc*axis.y*axis.z + axis.x*s, oc*axis.z*axis.z + c
+    );
+}
+```
+
+### Step 3: Building a LookAt Camera
+
+**What**: Construct a view-to-world matrix from the camera position (eye) and look-at target (target).
+
+**Why**: LookAt is the most intuitive camera definition — just specify "where to stand" and "where to look", and the matrix automatically computes three orthogonal basis vectors.
+
+**Classic setCamera (mat3)**:
+```glsl
+// cr = camera roll, usually pass 0.0
+// Returns mat3 that transforms local ray direction to world space
+mat3 setCamera(in vec3 ro, in vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);                   // forward
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);           // world up with roll
+    vec3 cu = normalize(cross(cw, cp));               // right
+    vec3 cv = normalize(cross(cu, cw));               // up
+    return mat3(cu, cv, cw);
+}
+```
+
+**Gram-Schmidt Orthogonalization Version (mat3)**:
+
+Projects out the component of camUp along camDir to ensure strict orthogonality:
+```glsl
+vec3 camDir   = normalize(target - camPos);
+vec3 camUp    = normalize(camUp - dot(camDir, camUp) * camDir); // Gram-Schmidt
+vec3 camRight = normalize(cross(camDir, camUp));
+```
+
+**mat4 LookAt (with translation)**:
+
+Returns a 4x4 matrix with the camera world position stored in the 4th column. Suitable for scenarios requiring homogeneous coordinates:
+```glsl
+mat4 LookAt(vec3 pos, vec3 target, vec3 up) {
+    vec3 dir = normalize(target - pos);
+    vec3 x = normalize(cross(dir, up));
+    vec3 y = cross(x, dir);
+    return mat4(vec4(x, 0), vec4(y, 0), vec4(dir, 0), vec4(pos, 1));
+}
+```
+
+### Step 4: Generating Perspective Rays
+
+**What**: Transform normalized screen coordinates through the camera matrix into world-space ray directions.
+
+**Why**: Perspective projection simulates the near-large far-small effect by appending a fixed Z component (focal length) after the UV. Larger focal length means smaller FOV.
+
+**Method A: mat3 Camera + normalize**:
+```glsl
+// focalLength controls FOV: 1.0 ≈ 90°, 2.0 ≈ 53°, 4.0 ≈ 28°
+#define FOCAL_LENGTH 2.0 // Adjustable: focal length, larger = narrower FOV
+mat3 cam = setCamera(ro, ta, 0.0);
+vec3 rd = cam * normalize(vec3(uv, FOCAL_LENGTH));
+```
+
+**Method B: Manual Basis Vector Combination**:
+```glsl
+// FieldOfView controls ray divergence
+#define FOV 1.0 // Adjustable: field of view scale factor
+vec3 rd = normalize(camDir + (uv.x * camRight + uv.y * camUp) * FOV);
+```
+
+**Method C: mat4 Camera + Homogeneous Coordinates**:
+```glsl
+// Direction vectors use w=0, positions use w=1
+mat4 viewToWorld = LookAt(camPos, camTarget, camUp);
+vec3 rd = (viewToWorld * normalize(vec4(uv, 1.0, 0.0))).xyz;
+```
+
+### Step 5: Mouse-Interactive Camera
+
+**What**: Map `iMouse` input to camera orbit angles.
+
+**Why**: An interactive camera is a fundamental need for debugging and showcasing 3D shaders. Mapping mouse X to horizontal rotation and Y to pitch angle is the most universal pattern.
+
+**Spherical Coordinate Orbit Camera**:
+```glsl
+#define CAM_DIST 5.0     // Adjustable: camera-to-origin distance
+#define CAM_HEIGHT 1.0   // Adjustable: default height offset
+
+vec2 mouse = iMouse.xy / iResolution.xy;
+float angleH = mouse.x * 6.2832;         // Horizontal: 0 ~ 2π
+float angleV = mouse.y * 3.1416 - 1.5708; // Vertical: -π/2 ~ π/2
+
+// Use auto-rotation when mouse is not clicked
+if (iMouse.z <= 0.0) {
+    angleH = iTime * 0.5;
+    angleV = 0.3;
+}
+
+vec3 ro = vec3(
+    CAM_DIST * cos(angleH) * cos(angleV),
+    CAM_DIST * sin(angleV) + CAM_HEIGHT,
+    CAM_DIST * sin(angleH) * cos(angleV)
+);
+vec3 ta = vec3(0.0, 0.0, 0.0); // Look-at target
+```
+
+**Euler Angle Driven Camera**:
+```glsl
+vec3 ang = vec3(0.0, 0.2, iTime * 0.3); // Default animation
+if (iMouse.z > 0.0) {
+    ang = vec3(0.0, clamp(2.0 - iMouse.y * 0.01, 0.0, 3.1416), iMouse.x * 0.01);
+}
+mat3 rot = fromEuler(ang);
+vec3 ori = vec3(0.0, 0.0, 2.8) * rot;
+vec3 dir = normalize(vec3(uv, -2.0)) * rot;
+```
+
+### Step 6: SDF Object Domain Transforms (Translation, Rotation, Scaling)
+
+**What**: In the ray marching distance function, apply inverse transforms to sampling points to achieve object translation/rotation/scaling.
+
+**Why**: The SDF domain transform principle is "transform the space, not the object" — inversely transforming the sampling point into the object's local coordinate system to evaluate distance is equivalent to transforming the object itself.
+
+**Basic Transforms**:
+```glsl
+// ===== Translation: offset the sampling point =====
+float sdTranslated = sdSphere(p - vec3(2.0, 0.0, 0.0), 1.0);
+
+// ===== Rotation: transform sampling point with rotation matrix =====
+// Note: for orthogonal matrices (rotations), inverse = transpose
+float sdRotated = sdBox(rotY(0.5) * p, vec3(1.0));
+
+// ===== Scaling: divide by scale factor, multiply back into distance =====
+#define SCALE 2.0 // Adjustable: object scale factor
+float sdScaled = sdSphere(p / SCALE, 1.0) * SCALE;
+```
+
+**SRT Combination (Scale → Rotate → Translate)**:
+
+mat4 version, using opTx for domain transform:
+```glsl
+mat4 Loc4(vec3 d) {
+    d *= -1.0;
+    return mat4(1,0,0,d.x, 0,1,0,d.y, 0,0,1,d.z, 0,0,0,1);
+}
+
+mat4 transposeM4(in mat4 m) {
+    return mat4(
+        vec4(m[0].x, m[1].x, m[2].x, m[3].x),
+        vec4(m[0].y, m[1].y, m[2].y, m[3].y),
+        vec4(m[0].z, m[1].z, m[2].z, m[3].z),
+        vec4(m[0].w, m[1].w, m[2].w, m[3].w)
+    );
+}
+
+vec3 opTx(vec3 p, mat4 m) {
+    return (transposeM4(m) * vec4(p, 1.0)).xyz;
+}
+
+// Usage example: translate to (3,0,0), then rotate 45° around Y axis
+mat4 xform = Rot4Y(0.785) * Loc4(vec3(3.0, 0.0, 0.0));
+float d = sdBox(opTx(p, xform), vec3(1.0));
+```
+
+### Step 7: Quaternion Rotation (Advanced)
+
+**What**: Use quaternions for rotation around arbitrary axes, suitable for joint animation and other scenarios requiring frequent rotation composition.
+
+**Why**: Quaternions avoid gimbal lock, and interpolation (slerp) is more natural than matrices. The double cross product formula `p + 2·cross(q.xyz, cross(q.xyz, p) + q.w·p)` is the most computationally efficient quaternion rotation implementation.
+
+```glsl
+// Axis-angle → quaternion
+vec4 axisAngleToQuat(vec3 axis, float angleDeg) {
+    float half_angle = angleDeg * 3.14159265 / 360.0; // degrees to half-radians
+    vec2 sc = sin(vec2(half_angle, half_angle + 1.5707963));
+    return vec4(normalize(axis) * sc.x, sc.y);
+}
+
+// Quaternion rotation (double cross product form)
+vec3 quatRotate(vec3 pos, vec3 axis, float angleDeg) {
+    vec4 q = axisAngleToQuat(axis, angleDeg);
+    return pos + 2.0 * cross(q.xyz, cross(q.xyz, pos) + q.w * pos);
+}
+
+// Usage example: hierarchical rotation in joint animation
+vec3 limbPos = quatRotate(p - shoulderOffset, vec3(1,0,0), swingAngle);
+float d = sdEllipsoid(limbPos, limbSize);
+```
+
+## Variant Details
+
+### Variant 1: Orthographic Projection Camera
+
+**Difference from basic version**: Ray direction is fixed (parallel rays); different pixel sampling is achieved by changing the ray origin position. Suitable for 2D-style rendering, engineering drawings, isometric views.
+
+**Key modified code**:
+```glsl
+// Replace the perspective ray generation section
+#define ORTHO_SIZE 5.0 // Adjustable: orthographic view size
+
+mat3 cam = setCamera(ro, ta, 0.0);
+// Orthographic: offset origin, fixed direction
+vec3 rd = cam * vec3(0.0, 0.0, 1.0);  // Fixed direction
+ro += cam * vec3(uv * ORTHO_SIZE, 0.0); // Offset origin
+```
+
+### Variant 2: Full Euler Angle Rotation Camera
+
+**Difference from basic version**: Does not use LookAt; instead builds the rotation matrix directly from three Euler angles. Suitable for first-person perspective or scenarios requiring roll.
+
+**Key modified code**:
+```glsl
+mat3 fromEuler(vec3 ang) {
+    vec2 a1 = vec2(sin(ang.x), cos(ang.x));
+    vec2 a2 = vec2(sin(ang.y), cos(ang.y));
+    vec2 a3 = vec2(sin(ang.z), cos(ang.z));
+    mat3 m;
+    m[0] = vec3(a1.y*a3.y+a1.x*a2.x*a3.x, a1.y*a2.x*a3.x+a3.y*a1.x, -a2.y*a3.x);
+    m[1] = vec3(-a2.y*a1.x, a1.y*a2.y, a2.x);
+    m[2] = vec3(a3.y*a1.x*a2.x+a1.y*a3.x, a1.x*a3.x-a1.y*a3.y*a2.x, a2.y*a3.y);
+    return m;
+}
+
+// In mainImage:
+vec3 ang = vec3(pitch, yaw, roll);
+mat3 rot = fromEuler(ang);
+vec3 ori = vec3(0.0, 0.0, 3.0) * rot;
+vec3 rd = normalize(vec3(uv, -2.0)) * rot;
+```
+
+### Variant 3: Quaternion Joint Rotation
+
+**Difference from basic version**: Uses quaternions instead of matrices for rotation in domain transforms, suitable for hierarchical joint animation (multi-limbed biological systems).
+
+**Key modified code**:
+```glsl
+vec4 axisAngleToQuat(vec3 axis, float angleDeg) {
+    float ha = angleDeg * 3.14159265 / 360.0;
+    vec2 sc = sin(vec2(ha, ha + 1.5707963));
+    return vec4(normalize(axis) * sc.x, sc.y);
+}
+
+vec3 quatRotate(vec3 p, vec3 axis, float angleDeg) {
+    vec4 q = axisAngleToQuat(axis, angleDeg);
+    return p + 2.0 * cross(q.xyz, cross(q.xyz, p) + q.w * p);
+}
+
+// Usage in scene:
+vec3 legP = quatRotate(p - hipOffset, vec3(1,0,0), legAngle);
+float dLeg = sdEllipsoid(legP, vec3(0.2, 0.6, 0.25));
+```
+
+### Variant 4: mat4 SRT Pipeline (Full 4x4 Transform)
+
+**Difference from basic version**: Uses `mat4` homogeneous coordinates to combine scale-rotate-translate into a single matrix, applying `opTx()` domain transform to sampling points. Suitable for complex scenes requiring management of many object transforms.
+
+**Key modified code**:
+```glsl
+mat4 Rot4Y(float a) {
+    float c = cos(a), s = sin(a);
+    return mat4(c,0,s,0, 0,1,0,0, -s,0,c,0, 0,0,0,1);
+}
+
+mat4 Loc4(vec3 d) {
+    d *= -1.0;
+    return mat4(1,0,0,d.x, 0,1,0,d.y, 0,0,1,d.z, 0,0,0,1);
+}
+
+mat4 transposeM4(mat4 m) {
+    return mat4(
+        vec4(m[0].x,m[1].x,m[2].x,m[3].x),
+        vec4(m[0].y,m[1].y,m[2].y,m[3].y),
+        vec4(m[0].z,m[1].z,m[2].z,m[3].z),
+        vec4(m[0].w,m[1].w,m[2].w,m[3].w));
+}
+
+vec3 opTx(vec3 p, mat4 m) {
+    return (transposeM4(m) * vec4(p, 1.0)).xyz;
+}
+
+// Usage: translate then rotate (note matrix multiplication order is right-to-left)
+mat4 xform = Rot4Y(angle) * Loc4(vec3(3.0, 0.0, 0.0));
+float d = sdBox(opTx(p, xform), boxSize);
+```
+
+### Variant 5: Path Camera (Animated Flight)
+
+**Difference from basic version**: The camera moves along a predefined path (e.g., tunnel, racetrack), using `LookAt` to track a forward target point. Common in tunnel-type shaders.
+
+**Key modified code**:
+```glsl
+// Path function (can be replaced with any curve)
+vec2 pathCenter(float z) {
+    return vec2(sin(z * 0.17) * 3.0, sin(z * 0.1 + 4.0) * 2.0);
+}
+
+// In mainImage:
+float z_offset = iTime * 10.0; // Speed
+vec3 camPos = vec3(pathCenter(z_offset), 0.0);
+vec3 camTarget = vec3(pathCenter(z_offset + 5.0), 5.0);
+vec3 camUp = vec3(sin(iTime * 0.3), cos(iTime * 0.3), 0.0);
+
+mat4 viewToWorld = LookAt(camPos, camTarget, camUp);
+vec3 rd = (viewToWorld * normalize(vec4(uv, 1.0, 0.0))).xyz;
+```
+
+## Performance Optimization Details
+
+### 1. Precompute Trigonometric Functions
+
+Compute `sin/cos` of the same angle only once, store in `vec2`:
+```glsl
+// Bad: sin/cos each called once
+mat2(cos(a), sin(a), -sin(a), cos(a));
+
+// Good: compute both with sincos in one step
+vec2 sc = sin(vec2(a, a + 1.5707963)); // sin(a), cos(a)
+mat2(sc.y, sc.x, -sc.x, sc.y);
+```
+
+### 2. Prefer mat3 Over mat4
+
+If translation is not needed (pure rotation), always use `mat3` instead of `mat4`. `mat3*vec3` requires 7 fewer multiply-add operations than `mat4*vec4`.
+
+### 3. Inverse of Rotation Matrix = Transpose
+
+Orthogonal rotation matrix R satisfies `R⁻¹ = Rᵀ`. When the inverse transform is needed, directly use `transpose(m)` or swap the multiplication order `v * m` (equivalent to `transpose(m) * v`), avoiding general matrix inversion.
+
+### 4. Avoid Rebuilding Matrices Inside the SDF
+
+If the rotation angle does not depend on the sampling point `p`, move matrix construction outside the `map()` function or cache it in a global variable:
+```glsl
+// Bad: rebuild matrix on every map() call
+float map(vec3 p) {
+    mat3 r = rotY(iTime); // Recomputed per pixel × per step
+    return sdBox(r * p, vec3(1.0));
+}
+
+// Good: precompute in mainImage
+mat3 g_rot; // Global
+void mainImage(...) {
+    g_rot = rotY(iTime); // Computed only once
+    // ... rayMarch ...
+}
+float map(vec3 p) {
+    return sdBox(g_rot * p, vec3(1.0));
+}
+```
+
+### 5. Merge Consecutive Rotations
+
+The product of multiple rotation matrices is still a rotation matrix. Pre-multiply and store as a single matrix:
+```glsl
+// Bad: two matrix multiplications per sample
+p = rotX(a) * (rotY(b) * p);
+
+// Good: pre-multiply
+mat3 combined = rotX(a) * rotY(b);
+p = combined * p;
+```
+
+## Combination Suggestions
+
+### Combining with Ray Marching / SDF (Most Common)
+
+Matrix transforms are almost always used together with SDF ray marching. The camera matrix generates rays, and domain transform matrices place objects. This is the foundational pipeline for all 3D ShaderToy shaders.
+
+### Combining with Noise / fBm
+
+Use rotation matrices to apply domain warping to noise sampling coordinates, breaking axis-aligned regularity:
+```glsl
+mat3 rot = rotAxis(vec3(0,0,1), 0.5 * iTime);
+float n = fbm(rot * p);  // Rotate noise sampling direction
+```
+Using time-varying rotation matrices makes water surface noise look more natural.
+
+### Combining with Fractals / IFS
+
+Add rotation transforms within each iteration of a fractal to create more complex geometric patterns:
+```glsl
+for (int i = 0; i < Iterations; i++) {
+    z.xy = rot2D(angle) * z.xy; // Rotate each iteration
+    z = abs(z);
+    z = Scale * z - Offset * (Scale - 1.0);
+}
+```
+Embedding `mat2` rotation within IFS iterations produces more complex fractal geometry.
+
+### Combining with Lighting / Materials
+
+After normal computation, transform matrices can be used to convert normals from local space back to world space (for lighting calculations). For pure rotation matrices, the normal transform is identical to the vertex transform.
+
+### Combining with Post-Processing
+
+Camera parameters (such as FOV) can be used for depth of field calculations; `mat2` rotation can be used for screen-space chromatic aberration or motion blur direction.
--- a/skills/shader-dev/reference/multipass-buffer.md
+++ b/skills/shader-dev/reference/multipass-buffer.md
@@ -0,0 +1,571 @@
+# Multi-Pass Buffer Techniques — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, in-depth explanations of each step, complete variant descriptions, performance optimization analysis, and full combination code examples.
+
+## Prerequisites
+
+### GLSL Fundamentals
+
+- GLSL basic syntax: `uniform`, `varying`, `sampler2D`
+- ShaderToy execution model: `iChannel0-3` texture inputs, `iResolution`, `iTime`, `iFrame`, `iMouse`
+- Difference between `texture()` and `texelFetch()`:
+  - `texture()` performs interpolated sampling (bilinear filtering), suitable for continuous field sampling
+  - `texelFetch()` reads a specific texel exactly, without interpolation, suitable for data storage reads
+- `textureLod()` is used for explicit MIP level sampling, avoiding the blur caused by automatic MIP selection
+- Buffer A/B/C/D concept in ShaderToy: each buffer is an independent render pass that outputs to a corresponding texture, which can be read by other passes or itself via iChannel
+
+### Basic Math
+
+- Basic vector math and matrix transforms
+- Finite difference method: using neighboring pixels to approximate gradients and the Laplacian operator
+- Iterative mapping: the concept of `x(n+1) = f(x(n))`, the mathematical basis for self-feedback
+
+## Implementation Steps
+
+### Step 1: Establish a Minimal Self-Feedback Loop
+
+**What**: Create a Buffer that reads its own previous frame output, adds new content, and outputs the result. The Image pass simply displays the Buffer result.
+
+**Why**: This is the cornerstone of all multi-pass techniques. Once you understand self-feedback loops, fluid simulation, temporal accumulation, etc. are all extensions of this foundation. An initialization guard (`iFrame == 0` or `iFrame < N`) prevents reading uninitialized data.
+
+**iChannel Binding**: Buffer A's iChannel0 → Buffer A (self-feedback); Image's iChannel0 → Buffer A
+
+**Key Points**:
+- `exp(-33.0 / iResolution.y)` controls the decay rate; higher values produce faster decay
+- The `fragCoord + vec2(1.0, sin(iTime))` offset creates motion effects
+- The `iFrame < 4` guard ensures stable initial values for the first few frames
+
+### Step 2: Implement Self-Advection
+
+**What**: Building on self-feedback, interpret the buffer values as a velocity field and implement self-advection — each pixel offsets its sampling position based on the local velocity.
+
+**Why**: Self-advection is the core of all Eulerian grid fluid simulations. By accumulating rotational information across multiple scales through rotational sampling, rich vortex structures can be produced without a complete Navier-Stokes solver.
+
+**Parameter Tuning**:
+- `ROT_NUM` (rotation sample count): Affects the sampling accuracy of the rotation field; 5 is a good balance
+- `SCALE_NUM` (number of scale levels): Affects the detail level of vortices; 20 levels produce rich multi-scale structures
+- `bbMax = 0.7 * iResolution.y`: Adaptive loop termination threshold
+
+**Mathematical Principles**:
+- The `getRot` function samples the velocity field at ROT_NUM equally spaced angular directions around a given position
+- Computes the rotational component via `dot(velocity - 0.5, perpendicular)`
+- The multi-scale loop `b *= 2.0` progressively enlarges the sampling radius, capturing vortices at different scales
+
+### Step 3: Navier-Stokes Fluid Solver
+
+**What**: Implement velocity field solving based on the paper "Simple and fast fluids" (Guay, Colin, Egli, 2011), including advection, viscous forces, and vorticity confinement.
+
+**Why**: More physically accurate than pure rotational self-advection, supporting low-viscosity fluid simulation (e.g., smoke, fire). Vorticity is stored in the alpha channel to avoid extra buffer overhead.
+
+**Complete `solveFluid` Function Breakdown**:
+
+```glsl
+vec4 solveFluid(sampler2D smp, vec2 uv, vec2 w, float time, vec3 mouse, vec3 lastMouse) {
+    const float K = 0.2;   // Pressure coefficient: controls the strength of the incompressibility constraint
+    const float v = 0.55;  // Viscosity coefficient: high value = viscous fluid, low value = thin fluid
+
+    // Read four neighboring pixels (basis for central differencing)
+    vec4 data = textureLod(smp, uv, 0.0);
+    vec4 tr = textureLod(smp, uv + vec2(w.x, 0), 0.0);
+    vec4 tl = textureLod(smp, uv - vec2(w.x, 0), 0.0);
+    vec4 tu = textureLod(smp, uv + vec2(0, w.y), 0.0);
+    vec4 td = textureLod(smp, uv - vec2(0, w.y), 0.0);
+
+    // Density and velocity gradients (central differencing)
+    vec3 dx = (tr.xyz - tl.xyz) * 0.5;  // x-direction gradient
+    vec3 dy = (tu.xyz - td.xyz) * 0.5;  // y-direction gradient
+    vec2 densDif = vec2(dx.z, dy.z);     // Density gradient
+
+    // Density update: continuity equation ∂ρ/∂t + ∇·(ρv) = 0
+    data.z -= DT * dot(vec3(densDif, dx.x + dy.y), data.xyz);
+
+    // Viscous force (Laplacian operator): μ∇²v
+    // Discrete Laplacian = up + down + left + right - 4*center
+    vec2 laplacian = tu.xy + td.xy + tr.xy + tl.xy - 4.0 * data.xy;
+    vec2 viscForce = vec2(v) * laplacian;
+
+    // Advection: Semi-Lagrangian backtrace method
+    // Trace backward from the current position along the reverse velocity direction, sample previous step's value
+    data.xyw = textureLod(smp, uv - DT * data.xy * w, 0.0).xyw;
+
+    // External forces (mouse interaction)
+    vec2 newForce = vec2(0);
+    if (mouse.z > 1.0 && lastMouse.z > 1.0) {
+        // Mouse movement velocity as force direction
+        vec2 vv = clamp((mouse.xy * w - lastMouse.xy * w) * 400.0, -6.0, 6.0);
+        // Force magnitude inversely proportional to distance from mouse (similar to a point charge field)
+        newForce += 0.001 / (dot(uv - mouse.xy * w, uv - mouse.xy * w) + 0.001) * vv;
+    }
+
+    // Velocity update: v += dt * (viscous force - pressure gradient + external forces)
+    data.xy += DT * (viscForce - K / DT * densDif + newForce);
+    // Linear decay: simulates energy dissipation
+    data.xy = max(vec2(0), abs(data.xy) - 1e-4) * sign(data.xy);
+
+    // Vorticity Confinement
+    // Compute curl = ∂vy/∂x - ∂vx/∂y
+    data.w = (tr.y - tl.y - tu.x + td.x);
+    // Vorticity gradient direction
+    vec2 vort = vec2(abs(tu.w) - abs(td.w), abs(tl.w) - abs(tr.w));
+    // Normalize then multiply by vorticity value to produce a force that enhances vortices
+    vort *= VORTICITY_AMOUNT / length(vort + 1e-9) * data.w;
+    data.xy += vort;
+
+    // Top/bottom boundaries: soft decay to avoid hard edges
+    data.y *= smoothstep(0.5, 0.48, abs(uv.y - 0.5));
+    // Numerical stability: clamp extreme values
+    data = clamp(data, vec4(vec2(-10), 0.5, -10.0), vec4(vec2(10), 3.0, 10.0));
+
+    return data;
+}
+```
+
+**RGBA Channel Packing Strategy**:
+- `xy` = velocity components (vx, vy)
+- `z` = density
+- `w` = vorticity (curl)
+
+A single vec4 carries the complete fluid state without needing extra buffers.
+
+### Step 4: Chained Buffers for Accelerated Simulation
+
+**What**: Execute the same simulation code in a chain through Buffer A → B → C, completing multiple simulation sub-steps per frame.
+
+**Why**: Each ShaderToy buffer executes only once per frame. By chaining identical code (A reads itself → B reads A → C reads B), three iterations are completed in a single frame, significantly increasing simulation speed without adding buffer count. Use the Common tab to avoid code duplication.
+
+**iChannel Binding**:
+- Buffer A: iChannel0 → Buffer C (reads previous frame's final result)
+- Buffer B: iChannel0 → Buffer A (reads current frame's first step result)
+- Buffer C: iChannel0 → Buffer B (reads current frame's second step result)
+
+**Mouse State Inter-Frame Transfer**:
+- `if (fragCoord.y < 1.0) data = iMouse;` writes the current frame's mouse state into the first row of pixels
+- `texelFetch(iChannel0, ivec2(0, 0), 0)` reads the previous frame's mouse state in the next frame
+- The delta between two frames' mouse positions gives mouse velocity, used to calculate the direction and magnitude of applied forces
+
+### Step 5: Separable Gaussian Blur Pipeline
+
+**What**: Use two Buffers to implement horizontal and vertical separable Gaussian blur.
+
+**Why**: A 2D Gaussian kernel can be separated into the product of two 1D kernels. An NxN kernel drops from N² samples to 2N. This is the standard implementation for Bloom, the diffusion term in reaction-diffusion, and various post-processing blurs.
+
+**iChannel Binding**: Buffer B: iChannel0 → Buffer A (source); Buffer C: iChannel0 → Buffer B (horizontal blur result)
+
+**Vertical blur complete code** (horizontal version in SKILL.md; vertical version symmetrically replaces the y-axis):
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 pixelSize = 1.0 / iResolution.xy;
+    vec2 uv = fragCoord * pixelSize;
+
+    float v = pixelSize.y;
+    vec4 sum = vec4(0.0);
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y - 4.0*v))) * 0.05;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y - 3.0*v))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y - 2.0*v))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y - 1.0*v))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y         ))) * 0.16;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y + 1.0*v))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y + 2.0*v))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y + 3.0*v))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x, uv.y + 4.0*v))) * 0.05;
+
+    fragColor = vec4(sum.xyz / 0.98, 1.0);
+}
+```
+
+**9-tap Weight Explanation**:
+- Weights [0.05, 0.09, 0.12, 0.15, 0.16, 0.15, 0.12, 0.09, 0.05] approximate a Gaussian distribution with sigma≈2.0
+- Total sum is 0.98, divided by 0.98 for normalization
+- `fract()` implements wrap addressing
+
+### Step 6: Structured State Storage (Texel-Addressed Registers)
+
+**What**: Use specific pixels in a Buffer as named registers to store non-image data (positions, velocities, scores, etc.).
+
+**Why**: GPUs have no global variables. By assigning semantic meaning to specific texel positions, arbitrary structured state can be persisted in a buffer. This enables complete game logic, particle system state, etc. to be implemented in shaders.
+
+**Design Pattern Details**:
+
+1. **Address Constants**: Use `const ivec2` to define the texel address for each state variable
+2. **Load Function**: `texelFetch(iChannel0, addr, 0)` for exact reads (no interpolation)
+3. **Store Function**: Use conditional assignment `fragColor = (px == addr) ? val : fragColor`, ensuring each pixel only writes data belonging to its own address
+4. **Region Storage**: `ivec4 rect` defines rectangular regions for grid-like data (e.g., brick matrices)
+5. **Discard outside data region**: `if (fragCoord.x > 14.0 || fragCoord.y > 14.0) discard;` skips unnecessary computation
+
+**Notes**:
+- `ivec2(fragCoord - 0.5)` ensures correct integer texel coordinates (fragCoord's center offset)
+- Initialization must set all state values when `iFrame == 0`
+- Default behavior `fragColor = loadValue(px)` keeps unmodified state unchanged
+
+### Step 7: Inter-Frame Mouse State Tracking
+
+**What**: Store the mouse position in specific pixels of a Buffer, and compute mouse movement delta by reading the previous frame's value.
+
+**Why**: ShaderToy does not directly provide mouse velocity. Storing the current frame's `iMouse` in a fixed pixel allows calculating the delta in the next frame. This is critical for fluid interaction — mouse velocity is needed to apply forces.
+
+**Comparison of Two Methods**:
+
+| Feature | Method 1 (First Row Pixel) | Method 2 (Fixed UV Region) |
+|---------|---------------------------|---------------------------|
+| Source | Chimera's Breath | Reaction-Diffusion |
+| Storage Location | `fragCoord.y < 1.0` | Fixed UV coordinate |
+| Read Method | `texelFetch(ch, ivec2(0,0), 0)` | `texture(ch, vec2(7.5/8, 2.5/8))` |
+| Advantage | Simple, suitable for fluids | Resolution-independent |
+| Disadvantage | Occupies the first row of pixels | Requires extra buffer channel |
+
+## Variant Details
+
+### Variant 1: Temporal Accumulation Anti-Aliasing (TAA)
+
+**Difference from basic version**: The Buffer does not perform physics simulation, but instead renders a jittered image and blends it with history frames to achieve supersampling. Uses YCoCg color space neighborhood clamping to prevent ghosting.
+
+**How It Works**:
+1. Buffer A renders the scene with sub-pixel level random jitter
+2. New frames are blended with history frames at a 10:90 ratio, accumulating supersampling over time
+3. The TAA buffer performs YCoCg neighborhood clamping: constraining the history frame color to the statistical range of the current frame's 3x3 neighborhood
+4. A 0.75 sigma clamping range balances ghost removal and detail preservation
+
+**Complete TAA Flow**:
+```
+Buffer A (render+jitter) → Buffer B (motion vectors, optional) → Buffer C (TAA blend) → Image
+```
+
+### Variant 2: Deferred Rendering G-Buffer Pipeline
+
+**Difference from basic version**: Buffers do not use self-feedback, but instead process in stages within a single frame: geometry → edge detection → post-processing.
+
+**G-Buffer Encoding Scheme**:
+- `col.xy`: View-space normal xy components (multiplied by camMat to convert to screen space)
+- `col.z`: Linear depth (normalized to [0,1])
+- `col.w`: Diffuse lighting + shadow information
+
+**Edge Detection Principle**:
+- The `checkSame` function compares normal and depth differences between adjacent pixels
+- `Sensitivity.x` controls normal edge sensitivity
+- `Sensitivity.y` controls depth edge sensitivity
+- Threshold 0.1 determines the edge detection criterion
+
+### Variant 3: HDR Bloom Post-Processing Pipeline
+
+**Difference from basic version**: Uses Buffers to build a MIP pyramid, achieving wide-range glow through multiple levels of downsampling and blur.
+
+**MIP Pyramid Packing Strategy**:
+- All MIP levels are packed into a single texture
+- `CalcOffset` computes the offset position of each level within the texture
+- Each level is half the size, with padding to prevent inter-level leakage
+
+**Complete Bloom Pipeline**:
+```
+Buffer A (scene render) → Buffer B (MIP pyramid) → Buffer C (horizontal blur) → Buffer D (vertical blur) → Image (compositing)
+```
+
+**Tone Mapping**:
+```glsl
+// Reinhard tone mapping
+color = pow(color, vec3(1.5));  // Gamma preprocessing
+color = color / (1.0 + color);  // Reinhard compression
+```
+
+### Variant 4: Reaction-Diffusion System
+
+**Difference from basic version**: Simulates chemical reaction-diffusion (e.g., Gray-Scott model). Diffusion is implemented via separable blur, and the reaction term is computed in the main buffer.
+
+**Gray-Scott Equations**:
+- `∂u/∂t = Du∇²u - uv² + F(1-u)` — Diffusion and reaction of chemical substance u
+- `∂v/∂t = Dv∇²v + uv² - (F+k)v` — Diffusion and reaction of chemical substance v
+- `Du`, `Dv` are diffusion coefficients, `F` is the feed rate, `k` is the kill rate
+
+**Implementation Strategy**:
+- The diffusion term is implemented via separable blur buffers (reusing the blur pipeline from Step 5)
+- The reaction term is computed in the main buffer
+- The offset of `uv_red` implements diffusion expansion
+- Random noise decay prevents pattern stagnation
+
+### Variant 5: Multi-Scale MIP Fluid
+
+**Difference from basic version**: Uses `textureLod` to explicitly sample different MIP levels, achieving O(n) complexity multi-scale computation (turbulence, vorticity confinement, Poisson solving), with each physical quantity in its own buffer.
+
+**Core Advantage**:
+- Traditional multi-scale computation requires O(N²) samples (sampling N neighbors at each scale)
+- MIP sampling leverages hardware automatic averaging; a single `textureLod` at high MIP levels is equivalent to a large-range mean
+- Total complexity drops to O(NUM_SCALES × 9) (3x3 neighborhood per scale)
+
+**Weight Function Choices**:
+- `1.0/float(i+1)`: Logarithmic decay, reduces large-scale influence
+- `1.0/float(1<<i)`: Exponential decay, rapidly suppresses large scales
+- Constant: Equal weight for all scales
+
+## In-Depth Performance Optimization
+
+### 1. Reduce Texture Samples
+
+**Separable Blur**:
+- Principle: The 2D Gaussian function G(x,y) = G(x) × G(y) can be separated into two 1D convolutions
+- An NxN kernel drops from N² to 2N samples
+- 9-tap example: 81 → 18 samples
+
+**Bilinear Tap Trick**:
+```glsl
+// Standard 9-tap: requires 9 samples
+// Bilinear optimization: achieves equivalent results with 5 samples using hardware interpolation
+// Key: place sample points between two texels, GPU hardware automatically computes weighted average
+float offset1 = 1.0 + weight2 / (weight1 + weight2);  // Offset encodes weight ratio
+vec4 s1 = texture(smp, uv + vec2(offset1, 0) * texelSize);
+// s1 is automatically the weighted average of texel[1] and texel[2]
+```
+
+**MIP Sampling Replaces Large Kernels**:
+- `textureLod(smp, uv, 3.0)` samples MIP level 3, equivalent to an 8×8 area mean
+- A single sample replaces 64 samples
+- Suitable for coarse-scale approximation in multi-scale computation
+
+### 2. Limit Computation Region
+
+**Data Region Discard**:
+```glsl
+// In a state storage shader, only the first 14×14 pixels store data
+// Remaining pixels are discarded, GPU skips subsequent computation
+if (fragCoord.x > 14.0 || fragCoord.y > 14.0) discard;
+```
+
+**Soft Boundaries**:
+```glsl
+// Use smoothstep instead of if-statements
+// Avoids branch divergence (warp divergence), more efficient on GPU
+data.y *= smoothstep(0.5, 0.48, abs(uv.y - 0.5));
+// Smoothly decays to 0 in the y=0.48~0.52 range
+```
+
+### 3. Reduce Buffer Count
+
+**RGBA Channel Packing**:
+| Channel | Fluid Simulation | G-Buffer | Particle System |
+|---------|-----------------|----------|----------------|
+| R | Velocity x | Normal x | Position x |
+| G | Velocity y | Normal y | Position y |
+| B | Density | Depth | Lifetime |
+| A | Vorticity | Diffuse | Type ID |
+
+**Chained Sub-Steps**:
+- 3 buffers running identical code = 3 iterations per frame
+- Equivalent to 3x time step, but more stable (each step is still a small step)
+- Code is shared via the Common tab, zero maintenance cost
+
+### 4. Reduce Iteration/Sample Count
+
+**Adaptive Loop Termination**:
+```glsl
+// In multi-scale sampling, exit early when the sampling radius exceeds the effective range
+float bbMax = 0.7 * iResolution.y;
+bbMax *= bbMax;
+for (int l = 0; l < SCALE_NUM; l++) {
+    if (dot(b, b) > bbMax) break;  // Beyond screen range, no need to continue
+    // ...
+    b *= 2.0;
+}
+```
+
+**MIP Level Count Adjustment**:
+- `TURBULENCE_SCALES = 11`: Full multi-scale, highest quality
+- `TURBULENCE_SCALES = 7`: Removes the largest scales, minimal quality loss
+- `TURBULENCE_SCALES = 5`: Noticeable speedup, suitable for mobile
+
+### 5. Initialization Strategy
+
+**Progressive Initialization**:
+```glsl
+// Output stable initial values for the first 20 frames
+if (iFrame < 20) data = vec4(0.5, 0, 0, 0);
+```
+- Why not `iFrame == 0`? Because some buffers depend on the output of other buffers
+- 20 frames ensures all buffers complete initialization propagation
+
+**Tiny Noise Initialization**:
+```glsl
+if (iFrame == 0) fragColor = 1e-6 * noise;
+```
+- Avoids exact zero values causing `0/0` or `normalize(vec2(0))` problems
+- Tiny noise breaks symmetry, allowing vortices to develop naturally
+
+## Combination Examples with Complete Code
+
+### 1. Fluid Simulation + Lighting
+
+```glsl
+// Image: Compute gradient from fluid buffer as normal, apply Phong lighting
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    float delta = 1.0 / iResolution.y;
+
+    // Compute fluid surface gradient
+    float valC = getVal(uv);
+    vec2 grad = vec2(
+        getVal(uv + vec2(delta, 0)) - getVal(uv - vec2(delta, 0)),
+        getVal(uv + vec2(0, delta)) - getVal(uv - vec2(0, delta))
+    ) / delta;
+
+    // Build normal (z=150 controls surface flatness)
+    vec3 normal = normalize(vec3(grad, 150.0));
+
+    // Lighting
+    vec3 lightDir = normalize(vec3(-1.0, -1.0, 2.0));
+    vec3 viewDir = vec3(0, 0, 1);
+
+    float diff = clamp(dot(normal, lightDir), 0.5, 1.0);
+    float spec = pow(clamp(dot(reflect(lightDir, normal), viewDir), 0.0, 1.0), 36.0);
+
+    vec3 baseColor = vec3(0.2, 0.4, 0.8);  // Water surface color
+    fragColor = vec4(baseColor * diff + vec3(1.0) * spec * 0.5, 1.0);
+}
+```
+
+### 2. Fluid Simulation + Color Advection
+
+```glsl
+// Color Buffer: Track a color field, advected by the velocity field
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 w = 1.0 / iResolution.xy;
+    float dt = 0.15;
+    float scale = 3.0;
+
+    // Read velocity field
+    vec2 velocity = textureLod(iChannel0, uv, 0.0).xy;
+
+    // Color advection: sample own previous frame in the reverse velocity direction
+    vec4 col = textureLod(iChannel1, uv - dt * velocity * w * scale, 0.0);
+
+    // Inject color at the emission point
+    vec2 emitPos = vec2(0.5, 0.5);
+    float dist = length(uv - emitPos);
+    float emitterStrength = 0.0025;
+    float epsilon = 0.0005;
+    col += emitterStrength / (epsilon + pow(dist, 1.75)) * dt * 0.12 * palette(iTime * 0.05);
+
+    // Color decay
+    float decay = 0.004;
+    col = max(col - (0.0001 + col * decay) * 0.5, 0.0);
+    col = clamp(col, 0.0, 5.0);
+
+    fragColor = col;
+}
+```
+
+### 3. Scene Rendering + Bloom + TAA Post-Processing Chain
+
+Four-Buffer pipeline:
+- **Buffer A**: Scene rendering (with sub-pixel jitter for TAA)
+- **Buffer B**: Brightness extraction + downsampling to build bloom pyramid
+- **Buffer C/D**: Separable Gaussian blur
+- **Image**: Bloom compositing + tone mapping + chromatic aberration + vignette
+
+```glsl
+// Image: Final compositing
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+
+    // Original scene
+    vec3 scene = texture(iChannel0, uv).rgb;
+
+    // Multi-level bloom compositing
+    vec3 bloom = vec3(0);
+    bloom += Grab(uv, 1.0, CalcOffset(0.0)).rgb * 1.0;
+    bloom += Grab(uv, 2.0, CalcOffset(1.0)).rgb * 1.5;
+    bloom += Grab(uv, 4.0, CalcOffset(2.0)).rgb * 2.0;
+    bloom += Grab(uv, 8.0, CalcOffset(3.0)).rgb * 3.0;
+
+    // Compositing
+    vec3 color = scene + bloom * 0.08;
+
+    // Filmic tone mapping
+    color = pow(color, vec3(1.5));
+    color = color / (1.0 + color);
+
+    // Chromatic Aberration
+    float ca = 0.002;
+    color.r = texture(iChannel0, uv + vec2(ca, 0)).r;
+    color.b = texture(iChannel0, uv - vec2(ca, 0)).b;
+
+    // Vignette
+    float vignette = 1.0 - dot(uv - 0.5, uv - 0.5) * 0.5;
+    color *= vignette;
+
+    fragColor = vec4(color, 1.0);
+}
+```
+
+### 4. G-Buffer + Screen-Space Effects
+
+Two-Buffer pipeline, no temporal feedback:
+- **Buffer A**: Output normals + depth + diffuse to G-Buffer
+- **Buffer B**: Screen-space edge detection / SSAO / SSR
+- **Image**: Stylized compositing (e.g., hand-drawn style, noise distortion)
+
+```glsl
+// Buffer B: Screen-space edge detection
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 offset = 1.0 / iResolution.xy;
+
+    vec4 center = texture(iChannel0, uv);
+
+    // Roberts Cross edge detection
+    vec4 tl = texture(iChannel0, uv + vec2(-offset.x, offset.y));
+    vec4 tr = texture(iChannel0, uv + vec2(offset.x, offset.y));
+    vec4 bl = texture(iChannel0, uv + vec2(-offset.x, -offset.y));
+    vec4 br = texture(iChannel0, uv + vec2(offset.x, -offset.y));
+
+    float edge = checkSame(center, tl) * checkSame(center, tr) *
+                 checkSame(center, bl) * checkSame(center, br);
+
+    fragColor = vec4(edge, center.w, center.z, 1.0);
+}
+```
+
+### 5. State Storage + Visualization Separation
+
+Standard pattern for games/particle systems. Logic and rendering are fully separated:
+- **Buffer A**: Pure logic computation, state stored in fixed texel positions
+- **Image**: Pure rendering, reads state via `texelFetch`, draws visuals using distance fields/rasterization
+
+```glsl
+// Image: Read game state from Buffer A and render
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 aspect = vec2(iResolution.x / iResolution.y, 1.0);
+
+    // Read ball state
+    vec4 ballPV = texelFetch(iChannel0, ivec2(0, 0), 0);
+    vec2 ballPos = ballPV.xy;
+
+    // Read paddle position
+    float paddleX = texelFetch(iChannel0, ivec2(1, 0), 0).x;
+
+    // Draw ball (distance field)
+    float ballDist = length((uv - ballPos * 0.5 - 0.5) * aspect);
+    vec3 ballColor = vec3(1.0, 0.8, 0.2) * smoothstep(0.02, 0.015, ballDist);
+
+    // Draw paddle
+    vec2 paddleCenter = vec2(paddleX * 0.5 + 0.5, 0.05);
+    vec2 paddleSize = vec2(0.08, 0.01);
+    vec2 d = abs((uv - paddleCenter) * aspect) - paddleSize;
+    float paddleDist = length(max(d, 0.0));
+    vec3 paddleColor = vec3(0.2, 0.6, 1.0) * smoothstep(0.005, 0.0, paddleDist);
+
+    // Read and draw brick grid
+    vec3 brickColor = vec3(0);
+    for (int y = 1; y <= 12; y++) {
+        for (int x = 0; x <= 13; x++) {
+            float alive = texelFetch(iChannel0, ivec2(x, y), 0).x;
+            if (alive > 0.5) {
+                vec2 brickCenter = vec2(float(x) / 14.0 + 0.036, float(y) / 14.0 + 0.036);
+                vec2 bd = abs((uv - brickCenter) * aspect) - vec2(0.03, 0.015);
+                float brickDist = length(max(bd, 0.0));
+                brickColor += vec3(0.8, 0.3, 0.5) * smoothstep(0.003, 0.0, brickDist);
+            }
+        }
+    }
+
+    fragColor = vec4(ballColor + paddleColor + brickColor, 1.0);
+}
+```
--- a/skills/shader-dev/reference/normal-estimation.md
+++ b/skills/shader-dev/reference/normal-estimation.md
@@ -0,0 +1,418 @@
+# SDF Normal Estimation — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisite knowledge, step-by-step explanations, mathematical derivations, variant analysis, and complete combination code examples.
+
+---
+
+## Prerequisites
+
+### GLSL Fundamentals
+
+- **Vector types**: `vec2`/`vec3` operations, swizzle syntax (`.xyy`, `.yxy`, `.yyx`)
+- Swizzle is used in normal estimation to quickly construct three-axis offset vectors from `vec2(h, 0.0)`
+
+### Vector Calculus
+
+- **Gradient concept**: The gradient `∇f` of a scalar field `f(x, y, z)` is a vector pointing in the direction of the fastest increase of the function value
+- For an SDF, the gradient direction is the **outward surface normal direction**
+- Mathematical definition of gradient: `∇f = (∂f/∂x, ∂f/∂y, ∂f/∂z)`
+
+### SDF Concepts
+
+- `map(p)` returns the signed distance from point `p` to the nearest surface
+- Positive = outside the surface, negative = inside, zero = exactly on the surface
+- An ideal SDF has a gradient magnitude of exactly 1 (Eikonal equation), but in practice this may deviate after boolean operations or deformations
+
+### Numerical Differentiation
+
+- **Finite differences** to approximate derivatives: `f'(x) ≈ (f(x+h) - f(x-h)) / 2h` (central difference)
+- Or `f'(x) ≈ (f(x+h) - f(x)) / h` (forward difference)
+- Forward difference accuracy is O(h), central difference accuracy is O(h²)
+
+---
+
+## Implementation Steps in Detail
+
+### Step 1: Define the SDF Scene Function
+
+**What**: Create a `map(vec3 p) -> float` function that returns the signed distance from any point in space to the scene surface.
+
+**Why**: All normal estimation methods need to repeatedly call this function to sample the distance field. The normal function itself does not care about the SDF shape — it only needs to query distance values at different positions in space.
+
+```glsl
+float map(vec3 p) {
+    float d = length(p) - 1.0; // Unit sphere
+    // Can compose more SDF primitives
+    return d;
+}
+```
+
+### Step 2: Choose a Difference Method and Implement the Normal Function
+
+#### Method A: Forward Differences — 4 Samples
+
+**What**: Sample the SDF at point `p` and at three axis-aligned offsets, using the differences to build the gradient.
+
+**Why**: The simplest and most intuitive approach. Requires 4 samples (`map(p)` once + three offsets once each), suitable for beginners and performance-sensitive scenarios with lower accuracy requirements.
+
+**Mathematical derivation**:
+- `∂f/∂x ≈ (f(x+ε, y, z) - f(x, y, z)) / ε`
+- Since we `normalize()` at the end, the constant denominator `ε` can be omitted
+- Thus `n = normalize(map(p+εx̂) - map(p), map(p+εŷ) - map(p), map(p+εẑ) - map(p))`
+
+```glsl
+// Classic forward difference
+const float EPSILON = 1e-3;
+
+vec3 getNormal(vec3 p) {
+    vec3 n;
+    n.x = map(vec3(p.x + EPSILON, p.y, p.z));
+    n.y = map(vec3(p.x, p.y + EPSILON, p.z));
+    n.z = map(vec3(p.x, p.y, p.z + EPSILON));
+    return normalize(n - map(p));
+}
+```
+
+#### Method B: Central Differences — 6 Samples
+
+**What**: Sample once in each positive and negative direction per axis, taking the difference.
+
+**Why**: Symmetric sampling eliminates the first-order error term, improving accuracy from O(ε) to O(ε²). The cost is 6 SDF calls.
+
+**Mathematical derivation**:
+- Taylor expansion: `f(x+ε) = f(x) + εf'(x) + ε²f''(x)/2 + ...`
+- `f(x-ε) = f(x) - εf'(x) + ε²f''(x)/2 - ...`
+- Subtraction: `f(x+ε) - f(x-ε) = 2εf'(x) + O(ε³)`
+- The first-order error term is eliminated, improving accuracy by one order
+
+```glsl
+// Compact swizzle notation
+vec3 getNormal(vec3 p) {
+    vec2 o = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + o.xyy) - map(p - o.xyy),
+        map(p + o.yxy) - map(p - o.yxy),
+        map(p + o.yyx) - map(p - o.yyx)
+    ));
+}
+```
+
+#### Method C: Tetrahedron Technique — 4 Samples (Recommended)
+
+**What**: Sample the SDF along the 4 vertices of a regular tetrahedron, computing the weighted sum to obtain the gradient.
+
+**Why**: Requires only 4 samples (2 fewer than central difference), yet is more accurate and symmetric than forward difference.
+
+**Mathematical derivation**:
+- The 4 vertices of a regular tetrahedron: `(+,+,+)`, `(+,-,-)`, `(-,+,-)`, `(-,-,+)`
+- The coefficient `0.5773 ≈ 1/√3` normalizes the vertices onto the unit sphere
+- The weighted sum `Σ eᵢ·map(p + eᵢ·ε)` is equivalent to a gradient estimate in 4 symmetric directions
+- Due to the perfect symmetry of the tetrahedron, error distribution is more uniform than forward difference
+- Actual accuracy falls between forward and central difference, but only requires 4 samples
+
+```glsl
+// Classic tetrahedron technique
+vec3 calcNormal(vec3 pos) {
+    float eps = 0.0005; // Adjustable: sample offset
+    vec2 e = vec2(1.0, -1.0) * 0.5773;
+    return normalize(
+        e.xyy * map(pos + e.xyy * eps) +
+        e.yyx * map(pos + e.yyx * eps) +
+        e.yxy * map(pos + e.yxy * eps) +
+        e.xxx * map(pos + e.xxx * eps)
+    );
+}
+```
+
+### Step 3: Normalize and Apply to Lighting
+
+**What**: Call `normalize()` on the gradient vector to obtain the unit normal for subsequent lighting calculations.
+
+**Why**: The gradient length obtained from finite differences depends on the local gradient magnitude of the SDF. Lighting calculations require unit vectors. For an ideal SDF (gradient magnitude of 1), normalize barely changes the direction, but for SDFs that have undergone boolean operations or deformations, the gradient magnitude may deviate from 1, and normalize ensures correct results.
+
+```glsl
+// After a raymarching hit
+vec3 pos = ro + rd * t;        // Hit point
+vec3 nor = calcNormal(pos);    // Surface normal
+
+// Basic Lambertian diffuse
+vec3 lightDir = normalize(vec3(1.0, 4.0, -4.0));
+float diff = max(dot(nor, lightDir), 0.0);
+vec3 col = vec3(0.8) * diff;
+```
+
+---
+
+## Variant Details
+
+### Variant 1: Reverse Offset Forward Difference
+
+**Difference from base version**: Uses center point minus three negative-direction offset samples, rather than positive-direction offsets minus center. Functionally equivalent to forward difference, but with a more compact code structure.
+
+**Principle**: `map(p) - map(p - εx̂)` is equivalent to the mirror version of `map(p + εx̂) - map(p)`. Since we normalize at the end, the direction is unchanged.
+
+```glsl
+// Reverse offset variant
+vec2 noff = vec2(0.001, 0.0);
+vec3 normal = normalize(
+    map(pos) - vec3(
+        map(pos - noff.xyy),
+        map(pos - noff.yxy),
+        map(pos - noff.yyx)
+    )
+);
+```
+
+### Variant 2: Adaptive Epsilon (Distance Scaling)
+
+**Difference from base version**: Epsilon is multiplied by the ray travel distance `t`, using larger offsets for distant surfaces (avoiding floating-point noise) and smaller offsets for nearby surfaces (preserving detail).
+
+**Principle**: The farther the ray distance, the lower the floating-point precision (since absolute error is proportional to the magnitude of the value). Meanwhile, distant pixels cover a larger world-space area and don't need high-precision normals. Adaptive epsilon naturally matches both requirements.
+
+**Typical coefficient**: `0.001 * t`, where `0.001` can be adjusted based on scene complexity.
+
+```glsl
+// Adaptive epsilon with tetrahedron technique
+vec3 calcNormal(vec3 pos, float t) {
+    float precis = 0.001 * t; // Adjustable: base coefficient 0.001
+
+    vec2 e = vec2(1.0, -1.0) * precis;
+    return normalize(
+        e.xyy * map(pos + e.xyy) +
+        e.yyx * map(pos + e.yyx) +
+        e.yxy * map(pos + e.yxy) +
+        e.xxx * map(pos + e.xxx)
+    );
+}
+// Usage: vec3 nor = calcNormal(pos, t);
+```
+
+### Variant 3: Large Epsilon Rounding / Anti-Aliasing Trick
+
+**Difference from base version**: Intentionally uses a large epsilon (e.g., `0.015`), causing normals to "blur" at geometric edges, producing a visual rounding and anti-aliasing effect.
+
+**Principle**: A large epsilon means the normal sampling spans a larger spatial range. At sharp edges of geometry, the SDF value changes on both sides are averaged out, causing normals to transition smoothly at edges, similar to a chamfer/fillet effect.
+
+**Use cases**: Procedural architecture, mechanical parts, and other scenarios needing visual rounding without modifying the SDF geometry.
+
+```glsl
+// Large epsilon rounding technique
+vec3 getNormal(vec3 p) {
+    vec2 e = vec2(0.015, -0.015); // Intentionally enlarged epsilon
+    return normalize(
+        e.xyy * map(p + e.xyy) +
+        e.yyx * map(p + e.yyx) +
+        e.yxy * map(p + e.yxy) +
+        e.xxx * map(p + e.xxx)
+    );
+}
+```
+
+### Variant 4: Anti-Inlining Loop Trick
+
+**Difference from base version**: Writes the tetrahedron's 4 samples as a `for` loop with bit operations to generate vertex directions, preventing the GLSL compiler from inlining `map()` 4 times, significantly reducing compile times for complex scenes.
+
+**Principle**:
+- GLSL compilers typically unroll small loops and inline function calls
+- For complex `map()` functions (e.g., hundreds of lines), being inlined 4 times causes code bloat
+- `#define ZERO (min(iFrame, 0))` makes the loop bound a runtime value (though it is always 0 in practice), preventing the compiler from unrolling at compile time
+- Bit operations `(((i+3)>>1)&1)` etc. generate the 4 tetrahedron vertex directions at runtime, equivalent to hand-written `e.xyy`, `e.yyx`, `e.yxy`, `e.xxx`
+
+**Bit operation correspondence**:
+| i | `(((i+3)>>1)&1)` | `((i>>1)&1)` | `(i&1)` | Direction |
+|---|---|---|---|---|
+| 0 | 1 | 0 | 0 | (+,-,-) |
+| 1 | 0 | 0 | 1 | (-,-,+) |
+| 2 | 0 | 1 | 0 | (-,+,-) |
+| 3 | 1 | 1 | 1 | (+,+,+) |
+
+```glsl
+// Anti-inlining loop trick
+#define ZERO (min(iFrame, 0)) // Prevent compile-time constant folding
+
+vec3 calcNormal(vec3 p, float t) {
+    vec3 n = vec3(0.0);
+    for (int i = ZERO; i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3(
+            (((i + 3) >> 1) & 1),
+            ((i >> 1) & 1),
+            (i & 1)
+        ) - 1.0);
+        n += e * map(p + e * 0.001 * t);
+    }
+    return normalize(n);
+}
+```
+
+### Variant 5: Normal + Edge Detection (Dual-Purpose Sampling)
+
+**Difference from base version**: On top of the 6+1 samples from central difference, additionally computes a Laplacian approximation (deviation of per-axis sample averages from the center value) for detecting surface discontinuities (edges).
+
+**Principle**:
+- The Laplacian operator `∇²f = ∂²f/∂x² + ∂²f/∂y² + ∂²f/∂z²` measures local curvature
+- Numerical approximation: `∂²f/∂x² ≈ (f(x+h) + f(x-h) - 2f(x)) / h²`
+- At surface discontinuities (edges, cracks), the Laplacian value spikes
+- In the code, `abs(d - 0.5*(d2+d1))` is the Laplacian approximation on the x axis (omitting constant factors)
+- `pow(edge, 0.55) * 15.0` is an empirical contrast adjustment
+
+```glsl
+// Normal + edge detection (dual-purpose sampling)
+float edge = 0.0;
+vec3 normal(vec3 p) {
+    vec3 e = vec3(0.0, det * 5.0, 0.0); // det = detail level
+
+    float d1 = de(p - e.yxx), d2 = de(p + e.yxx);
+    float d3 = de(p - e.xyx), d4 = de(p + e.xyx);
+    float d5 = de(p - e.xxy), d6 = de(p + e.xxy);
+    float d  = de(p);
+
+    // Laplacian edge detection: deviation of center value from per-axis averages
+    edge = abs(d - 0.5 * (d2 + d1))
+         + abs(d - 0.5 * (d4 + d3))
+         + abs(d - 0.5 * (d6 + d5));
+    edge = min(1.0, pow(edge, 0.55) * 15.0);
+
+    return normalize(vec3(d1 - d2, d3 - d4, d5 - d6));
+}
+```
+
+---
+
+## Performance Optimization In-Depth Analysis
+
+### Bottleneck 1: SDF Sample Count
+
+Normal estimation is the **second-largest SDF call hotspot** in the raymarching pipeline, after the marching loop itself. Every pixel calls the normal function once upon hitting a surface, and the normal function internally calls `map()` 4~7 times.
+
+| Method | Samples | Accuracy | Recommendation |
+|--------|---------|----------|----------------|
+| Forward difference | 4 | O(ε) | Simple scenes |
+| Reverse offset difference | 4 | O(ε) | Same as forward, more compact code |
+| Tetrahedron technique | 4 | Between forward and central | **Preferred** |
+| Central difference | 6 | O(ε²) | When symmetry is needed |
+| Central difference + edge | 7 | O(ε²) + extra info | When edge detection is needed |
+
+**Recommendation**: Default to the tetrahedron technique; only switch to central difference when visual artifacts (e.g., jagged normals) appear.
+
+### Bottleneck 2: Compile Time Explosion
+
+Complex SDFs (e.g., `map()` functions with hundreds of lines) inlined 4~6 times by the normal function can cause compile times to grow from seconds to minutes.
+
+**Root cause**: GLSL compilers attempt to unroll small loops and inline function calls, duplicating the `map()` code 4~6 times.
+
+**Solution**: Use the anti-inlining loop trick (Variant 4), combined with `#define ZERO (min(iFrame, 0))` to prevent the compiler from unrolling at compile time. This keeps only one copy of the `map()` code, called in a runtime loop.
+
+### Bottleneck 3: Epsilon Selection
+
+| Epsilon Range | Effect |
+|---------------|--------|
+| < 1e-5 | Insufficient floating-point precision, normals show noise spots |
+| 0.0005 ~ 0.001 | **Recommended default** |
+| 0.01 ~ 0.02 | Slight smoothing / rounding effect |
+| > 0.05 | Detail loss, geometric edges overly smoothed |
+
+**Best practice**: Use adaptive epsilon `eps * t`, where `eps ≈ 0.001` and `t` is the ray distance. This preserves detail up close and avoids floating-point noise at distance.
+
+### Bottleneck 4: Avoiding Redundant Sampling
+
+If the same position needs both normals and other information (e.g., edge detection, AO pre-estimation), reuse SDF sampling results whenever possible. Variant 5 is a good example: on top of the 6 samples for normal computation, only 1 additional center sample is needed for edge detection, saving nearly half compared to computing normals and edge detection separately (13 samples total).
+
+---
+
+## Combination Suggestions with Full Code
+
+### 1. Normal + Soft Shadow
+
+After the normal determines surface orientation, a secondary raymarch from the hit point toward the light source computes the soft shadow. The normal is used to offset the starting point to avoid self-intersection:
+
+```glsl
+float shadow = calcSoftShadow(pos + nor * 0.01, sunDir, 16.0);
+```
+
+A complete soft shadow function typically looks like this:
+
+```glsl
+float calcSoftShadow(vec3 ro, vec3 rd, float k) {
+    float res = 1.0;
+    float t = 0.01;
+    for (int i = 0; i < 64; i++) {
+        float h = map(ro + rd * t);
+        res = min(res, k * h / t);
+        if (res < 0.001) break;
+        t += clamp(h, 0.01, 0.2);
+    }
+    return clamp(res, 0.0, 1.0);
+}
+```
+
+### 2. Normal + Ambient Occlusion (AO)
+
+The normal direction defines the sampling hemisphere for AO. Sampling the SDF along the normal with increasing step sizes — if the actual distance is less than the expected distance (i.e., nearby geometry is occluding), the AO value decreases:
+
+```glsl
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0;
+        float d = map(pos + nor * h);
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+```
+
+**Parameter notes**:
+- `0.01 + 0.12 * float(i) / 4.0`: Sample step from 0.01 to 0.13, covering near-distance occlusion
+- `sca *= 0.95`: Decreasing weight for farther samples
+- `3.0 * occ`: Contrast adjustment coefficient
+
+### 3. Normal + Fresnel Effect
+
+The angle between the normal and view direction controls Fresnel reflection intensity. At grazing angles (normal nearly perpendicular to view), reflection is strongest:
+
+```glsl
+float fresnel = pow(clamp(1.0 + dot(nor, rd), 0.0, 1.0), 5.0);
+col = mix(col, envColor, fresnel);
+```
+
+**Principle**: `dot(nor, rd)` is close to -1 when the surface directly faces the viewer (`rd` points in the view direction, normal points outward) and close to 0 at grazing angles. Adding 1 shifts the range to [0, 1]; taking the 5th power enhances contrast.
+
+### 4. Normal + Bump Mapping
+
+Procedural perturbation layered on top of SDF normals adds surface detail without modifying the geometry:
+
+```glsl
+vec3 doBumpMap(vec3 pos, vec3 nor) {
+    vec2 e = vec2(0.001, 0.0);
+    float bump = texture(iChannel0, pos.xz * 0.5).x;
+    float bx = texture(iChannel0, (pos.xz + e.xy) * 0.5).x;
+    float bz = texture(iChannel0, (pos.xz + e.yx) * 0.5).x;
+    vec3 grad = vec3(bx - bump, 0.0, bz - bump) / e.x;
+    return normalize(nor + grad * 0.1); // 0.1 controls bump intensity
+}
+```
+
+**Principle**: Computes the height map gradient in texture space and adds it to the geometric normal. `0.1` controls the visual bump strength — larger values make the surface appear rougher.
+
+### 5. Normal + Triplanar Mapping
+
+The absolute values of the normal components serve as blending weights for triplanar texturing, achieving UV-free texturing:
+
+```glsl
+vec3 triplanar(sampler2D tex, vec3 pos, vec3 nor) {
+    vec3 w = pow(abs(nor), vec3(4.0));
+    w /= (w.x + w.y + w.z);
+    return texture(tex, pos.yz).rgb * w.x
+         + texture(tex, pos.zx).rgb * w.y
+         + texture(tex, pos.xy).rgb * w.z;
+}
+```
+
+**Principle**:
+- Faces with normals pointing along the X axis use YZ plane projection
+- Faces with normals pointing along the Y axis use ZX plane projection
+- Faces with normals pointing along the Z axis use XY plane projection
+- `pow(abs(nor), vec3(4.0))` makes blending sharper, reducing blurring in transition regions
+- Normalized weights `w /= (w.x + w.y + w.z)` ensure total weight sums to 1
--- a/skills/shader-dev/reference/particle-system.md
+++ b/skills/shader-dev/reference/particle-system.md
@@ -0,0 +1,589 @@
+# Particle Systems — Detailed Reference
+
+This document is a detailed supplement to SKILL.md, containing prerequisites, in-depth explanations of each step, variant details, performance optimization analysis, and complete code for combination suggestions.
+
+> **NOTE:** Code examples in this document primarily target the ShaderToy environment. **For standalone HTML deployment, refer to the WebGL2 single-file template in SKILL.md**, which includes complete HTML + JS + GLSL code.
+
+## Prerequisites
+
+- GLSL basic syntax (uniforms, varyings, built-in functions)
+- 2D/3D vector math (dot product, cross product, normalization, matrix rotation)
+- ShaderToy architecture (`mainImage`, `iTime`, `iResolution`, `iChannel`, multi-Buffer passes)
+- Basic physics concepts: velocity = derivative of position, acceleration = force/mass
+- Usage of `texelFetch` (precise pixel data reading from Buffers)
+
+## Implementation Steps in Detail
+
+### Step 1: Hash Random Functions
+
+**What**: Define a function that generates pseudo-random numbers from a float (particle ID). This is the foundational infrastructure for all particle systems.
+
+**Why**: Each particle needs unique but deterministic attributes (color, size, initial direction, etc.); hash functions provide repeatable "randomness".
+
+Three hash function dimensions are provided:
+- `hash11`: 1D → 1D, for scalar randomness (lifetime, brightness, etc.)
+- `hash12`: 1D → 2D, for 2D randomness (initial position, etc.)
+- `hash33`: 3D → 3D, for 3D velocity perturbation
+
+```glsl
+// Standard 1D -> 1D hash, returns [0, 1)
+float hash11(float p) {
+    return fract(sin(p * 127.1) * 43758.5453);
+}
+
+// 1D -> 2D hash, for 2D randomness
+vec2 hash12(float p) {
+    vec3 p3 = fract(vec3(p) * vec3(0.1031, 0.1030, 0.0973));
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.xx + p3.yz) * p3.zy);
+}
+
+// 3D -> 3D hash, for 3D velocity perturbation
+vec3 hash33(vec3 p) {
+    p = fract(p * vec3(443.897, 397.297, 491.187));
+    p += dot(p.zxy, p.yxz + 19.19);
+    return fract(vec3(p.x * p.y, p.z * p.x, p.y * p.z)) - 0.5;
+}
+```
+
+### Step 2: Particle Lifecycle Management
+
+**What**: Compute birth time, lifespan, current age for each particle, and auto-respawn after death.
+
+**Why**: Lifecycle is the core mechanism of particle systems — the cycle of birth, motion, fade-out, death, and respawn. `fract` or `mod` implements infinite cycling without additional state.
+
+Key design:
+- `spawnTime`: Each particle's birth time differs, generated by `hash11` from the ID, spread across the `[0, START_TIME]` interval
+- `lifetime`: Each particle's lifespan differs, random within the `[LIFETIME_MIN, LIFETIME_MAX]` interval
+- `mod(time - spawnTime, lifetime)`: Automatic cycling; the particle respawns immediately after death
+- `floor(...)` computes the current life cycle number, used to generate different random attributes each cycle
+
+```glsl
+#define NUM_PARTICLES 100   // adjustable: particle count
+#define LIFETIME_MIN 1.0    // adjustable: minimum lifespan (seconds)
+#define LIFETIME_MAX 3.0    // adjustable: maximum lifespan (seconds)
+#define START_TIME 2.0      // adjustable: time for all particles to be born
+
+// Returns: x = current normalized age [0,1], y = current life cycle number
+vec2 particleAge(int id, float time) {
+    float spawnTime = START_TIME * hash11(float(id) * 2.0);
+    float lifetime = mix(LIFETIME_MIN, LIFETIME_MAX, hash11(float(id) * 3.0 - 35.0));
+    float age = mod(time - spawnTime, lifetime);
+    float run = floor((time - spawnTime) / lifetime);
+    return vec2(age / lifetime, run);
+}
+```
+
+### Step 3: Stateless Particle Position Computation
+
+**What**: Compute 2D/3D position solely from particle ID and time, without relying on any Buffer.
+
+**Why**: For decorative effects (starfields, fireworks, orbiting light points), the stateless approach is simplest and most efficient. Define the main trajectory via parametric curves (e.g., Lissajous curves), then add random offset and gravity.
+
+Position is composed of three components:
+1. **Main trajectory** (harmonic oscillator): Multiple cosine waves superimposed to form smooth Lissajous curves, controlling the overall motion path of the particle group
+2. **Random drift**: Each particle linearly diffuses from the main trajectory position over time; `DRIFT_MAX` controls the diffusion range
+3. **Gravity**: Parabolic descent via `0.5 * g * t²`; `age²` is the normalized form of time
+
+```glsl
+#define GRAVITY vec2(0.0, -4.5)     // adjustable: gravity direction and strength
+#define DRIFT_MAX vec2(0.28, 0.28)  // adjustable: maximum random drift amplitude
+
+// Harmonic superposition for smooth main trajectory
+float harmonics(vec3 freq, vec3 amp, vec3 phase, float t) {
+    float val = 0.0;
+    for (int h = 0; h < 3; h++)
+        val += amp[h] * cos(t * freq[h] * 6.2832 + phase[h] / 360.0 * 6.2832);
+    return (1.0 + val) / 2.0;
+}
+
+vec2 particlePosition(int id, float time) {
+    vec2 ageInfo = particleAge(id, time);
+    float age = ageInfo.x;
+    float run = ageInfo.y;
+
+    // Main trajectory (harmonic oscillator)
+    float slowTime = time * 0.1; // time along main trajectory
+    vec2 mainPos = vec2(
+        harmonics(vec3(0.4, 0.66, 0.78), vec3(0.8, 0.24, 0.18), vec3(0.0, 45.0, 55.0), slowTime),
+        harmonics(vec3(0.415, 0.61, 0.82), vec3(0.72, 0.28, 0.15), vec3(90.0, 120.0, 10.0), slowTime)
+    );
+
+    // Random drift (grows linearly with time)
+    vec2 drift = DRIFT_MAX * (vec2(hash11(float(id) * 3.0 + run * 4.0),
+                                    hash11(float(id) * 7.0 - run * 2.5)) - 0.5) * age;
+    // Gravity effect
+    vec2 grav = GRAVITY * age * age * 0.5;
+
+    return mainPos + drift + grav;
+}
+```
+
+### Step 4: Buffer-Stored Particle State (Stateful System)
+
+**What**: Use one row of pixels in a Buffer texture to store all particles, with each pixel = one particle's (pos.x, pos.y, vel.x, vel.y).
+
+**Why**: When inter-frame persistent state is needed (physics collisions, force field interactions, N-body simulations), particle data must be written to a Buffer and read back the next frame. In ShaderToy, each pixel is a storage cell.
+
+Design points:
+- `fragCoord.y > 0.5`: Only the first row of pixels stores particles; remaining pixels are discarded
+- `fragCoord.x` corresponds to particle ID; each pixel's RGBA stores (pos.x, pos.y, vel.x, vel.y)
+- `iFrame < 5`: First few frames are initialization, randomly distributing particle positions
+- Force accumulation: boundary repulsion + inter-particle attraction/repulsion + friction
+- Clamp velocity and acceleration after integration to prevent numerical explosion
+
+```glsl
+// === Buffer A: Particle physics update ===
+#define NUM_PARTICLES 40    // adjustable: particle count
+#define MAX_VEL 0.5         // adjustable: maximum velocity
+#define MAX_ACC 3.0         // adjustable: maximum acceleration
+#define RESIST 0.2          // adjustable: drag coefficient
+#define DT 0.03             // adjustable: time step
+
+// Read the i-th particle's data
+vec4 loadParticle(float i) {
+    return texelFetch(iChannel0, ivec2(i, 0), 0);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    if (fragCoord.y > 0.5 || fragCoord.x > float(NUM_PARTICLES)) discard;
+
+    float id = floor(fragCoord.x);
+    vec2 res = iResolution.xy / iResolution.y;
+
+    // Initialization
+    if (iFrame < 5) {
+        vec2 rng = hash12(id);
+        fragColor = vec4(0.1 + 0.8 * rng * res, 0.0, 0.0);
+        return;
+    }
+
+    // Read current state
+    vec4 particle = loadParticle(id); // xy = pos, zw = vel
+    vec2 pos = particle.xy;
+    vec2 vel = particle.zw;
+
+    // === Force accumulation ===
+    vec2 force = vec2(0.0);
+
+    // Boundary repulsion force
+    force += 0.8 * (1.0 / abs(pos) - 1.0 / abs(res - pos));
+
+    // Inter-particle interaction (attraction/repulsion)
+    for (float i = 0.0; i < float(NUM_PARTICLES); i++) {
+        if (i == id) continue;
+        vec4 other = loadParticle(i);
+        vec2 w = pos - other.xy;
+        float d = length(w);
+        if (d > 0.0)
+            force -= w * (6.3 + log(d * d * 0.02)) / exp(d * d * 2.4) / d;
+    }
+
+    // Friction force
+    force -= vel * RESIST / DT;
+
+    // === Integration ===
+    vec2 acc = force;
+    float a = length(acc);
+    acc *= a > MAX_ACC ? MAX_ACC / a : 1.0; // limit acceleration
+
+    vel += acc * DT;
+    float v = length(vel);
+    vel *= v > MAX_VEL ? MAX_VEL / v : 1.0; // limit velocity
+
+    pos += vel * DT;
+
+    fragColor = vec4(pos, vel);
+}
+```
+
+### Step 5: Particle Rendering — Point Light / Metaball Style
+
+**What**: Iterate over all particles in the Image pass and render each as a soft glowing point.
+
+**Why**: `1/dot(p,p)` produces natural inverse-square distance falloff; when multiple particles overlap, the result resembles metaball fusion. This is the most classic particle rendering method.
+
+Rendering principle:
+- `dot(p, p)` is `dist²`; using it as the denominator produces inverse-square distance falloff
+- `BRIGHTNESS` controls point size — larger values produce bigger glow points
+- `totalWeight` accumulates the metaball contribution of all particles
+- Color interpolates between `COLOR_START` and `COLOR_END` based on particle velocity
+- `mix(col, pcol, mb / totalWeight)` implements contribution-weighted color blending, with nearby particles having higher color weight
+- Final normalize + clamp prevents overexposure
+
+```glsl
+#define BRIGHTNESS 0.002        // adjustable: particle brightness
+#define COLOR_START vec3(0.0, 0.64, 0.2)  // adjustable: start color
+#define COLOR_END vec3(0.06, 0.35, 0.85)  // adjustable: end color
+
+vec3 renderParticles(vec2 uv) {
+    vec3 col = vec3(0.0);
+    float totalWeight = 0.0;
+
+    for (int i = 0; i < NUM_PARTICLES; i++) {
+        vec4 particle = loadParticle(float(i));
+        vec2 p = uv - particle.xy;
+
+        // Metaball-style falloff: radius / distance²
+        float mb = BRIGHTNESS / dot(p, p);
+        totalWeight += mb;
+
+        // Interpolate color based on particle attributes
+        float ratio = length(particle.zw) / MAX_VEL;
+        vec3 pcol = mix(COLOR_START, COLOR_END, ratio);
+        col = mix(col, pcol, mb / totalWeight);
+    }
+
+    totalWeight /= float(NUM_PARTICLES);
+    col = normalize(col) * clamp(totalWeight, 0.0, 0.4);
+    return col;
+}
+```
+
+### Step 6: Frame Feedback Motion Blur
+
+**What**: Blend the current frame with the previous frame's Buffer to produce motion trails.
+
+**Why**: Single-frame particles are just discrete dots; through temporal accumulation (feedback blending), continuous trails/afterimage effects are produced. The blend coefficient controls trail length.
+
+Design points:
+- `TRAIL_DECAY` closer to 1 produces longer trails (0.99 = very long trail, 0.9 = short trail)
+- Requires an extra Buffer pass: Buffer B handles trail accumulation, Image pass reads from Buffer B
+- `prev * TRAIL_DECAY + current`: Decay old frame + overlay new frame
+- This method can also simulate high particle density with few particles + long trails
+
+```glsl
+#define TRAIL_DECAY 0.95  // adjustable: trail decay rate, closer to 1 = longer trail
+
+// In the rendering Buffer's mainImage:
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+
+    // Read previous frame
+    vec3 prev = texture(iChannel0, uv).rgb * TRAIL_DECAY;
+
+    // Draw current frame particles
+    vec3 current = renderParticles(fragCoord / iResolution.y);
+
+    // Overlay
+    fragColor = vec4(prev + current, 1.0);
+}
+```
+
+### Step 7: HSV Coloring and Star Glare Effects
+
+**What**: Color particles using HSV color space and add cross/star diffraction spike lines.
+
+**Why**: HSV makes it easy to rotate hue for rainbow effects; star glare (diffraction spikes) simulates real lens optical effects, giving light points more visual quality.
+
+HSV coloring principle:
+- `hsv.x` = Hue, 0-1 maps to one revolution of the color wheel
+- `hsv.y` = Saturation, 0 = gray, 1 = pure color
+- `hsv.z` = Value, 0 = black, 1 = brightest
+- Cosine waves approximate the RGB channel hue response curves
+
+Star glare principle:
+- Star glare is caused by diffraction from lens aperture blades in real photography
+- Implemented by stretching the distance field in specific directions: one horizontal, one vertical, one at each 45-degree diagonal
+- `stretch` parameter controls the stretch ratio; larger values produce thinner, longer lines
+- `0.707` is the approximation of `cos(45°)` = `sin(45°)`, used to rotate to diagonal directions
+
+```glsl
+// HSV -> RGB conversion
+vec3 hsv2rgb(vec3 c) {
+    vec4 K = vec4(1.0, 2.0 / 3.0, 1.0 / 3.0, 3.0);
+    vec3 p = abs(fract(c.xxx + K.xyz) * 6.0 - K.www);
+    return c.z * mix(K.xxx, clamp(p - K.xxx, 0.0, 1.0), c.y);
+}
+
+// Star glare effect: produces elongated light rays in horizontal/vertical/diagonal directions
+float starGlare(vec2 relPos, float intensity) {
+    // Horizontal/vertical branches
+    vec2 stretch = vec2(9.0, 0.32); // adjustable: stretch ratio
+    float dh = length(relPos * stretch);
+    float dv = length(relPos * stretch.yx);
+
+    // Diagonal branches
+    vec2 diagPos = 0.707 * vec2(dot(relPos, vec2(1, 1)), dot(relPos, vec2(1, -1)));
+    float dd1 = length(diagPos * vec2(13.0, 0.61));
+    float dd2 = length(diagPos * vec2(0.61, 13.0));
+
+    float glare = 0.25 / (dh * 3.0 + 0.01)
+                + 0.25 / (dv * 3.0 + 0.01)
+                + 0.19 / (dd1 * 3.0 + 0.01)
+                + 0.19 / (dd2 * 3.0 + 0.01);
+    return glare * intensity;
+}
+```
+
+## Common Variant Details
+
+### Variant 1: Metaball Polar Coordinate Particles
+
+**Difference from base version**: Particles are uniformly distributed in polar coordinates and expand outward, using `1/dot(p,p)` metaball fusion instead of point lights, producing organic "blob-like" effects.
+
+Design points:
+- Particle positions change from Cartesian to polar coordinates: angles uniformly distributed around the circle, distance cycles with `fract` over time
+- `fract(time * speed + hash)` produces particles expanding from center outward then respawning
+- Metaball rendering: `0.84 / dot(p, p)` values naturally accumulate where particles overlap, forming fused organic shapes
+- Color interpolates between start and end colors based on distance `d`
+- `mb / totalSum` ensures color blending is weighted by contribution
+
+```glsl
+// Particle position changed to polar coordinate expansion
+float d = fract(time * 0.51 + 48934.4238 * sin(float(i) * 692.7398));
+float angle = TAU * float(i) / float(NUM_PARTICLES);
+vec2 particlePos = d * vec2(cos(angle), sin(angle)) * 4.0;
+
+// Metaball rendering replaces point light
+vec2 p = uv - particlePos;
+float mb = 0.84 / dot(p, p);  // adjustable: 0.84 = metaball radius
+col = mix(col, mix(startColor, endColor, d), mb / totalSum);
+```
+
+### Variant 2: Buffer Storage + Boids Flocking Behavior
+
+**Difference from base version**: Changes from stateless to stateful, with each particle stored in a Buffer pixel, enabling N-body attraction/repulsion interaction and Boids emergent behavior.
+
+Design points:
+- Each particle iterates over all other particles, computing the net attraction/repulsion force
+- The force formula `(6.3 + log(d² * 0.02)) / exp(d² * 2.4)` produces:
+  - Short-range repulsion (exponential decay dominates)
+  - Medium-range attraction (logarithmic term dominates)
+  - Long-range no effect (exponential decay approaches zero)
+- Friction `vel * 0.2 / dt` prevents infinite acceleration
+- Overall effect: particles self-organize into group motion patterns, exhibiting fish-school/bird-flock emergent behavior
+
+```glsl
+// Buffer A: force accumulation
+vec2 sumForce = vec2(0.0);
+for (float j = 0.0; j < NUM_PARTICLES; j++) {
+    if (j == id) continue;
+    vec4 other = texelFetch(iChannel0, ivec2(j, 0), 0);
+    vec2 w = pos - other.xy;
+    float d = length(w);
+    // Combined attraction+repulsion: short-range repulsion, long-range attraction
+    sumForce -= w * (6.3 + log(d * d * 0.02)) / exp(d * d * 2.4) / d;
+}
+sumForce -= vel * 0.2 / dt; // friction
+```
+
+### Variant 3: Verlet Integration Cloth Simulation
+
+**Difference from base version**: Particles are connected through spring constraints (grid topology), using Verlet integration instead of Euler method — no need to explicitly store velocity.
+
+Design points:
+- Verlet integration: `newPos = 2 * current - previous + acc * dt²`
+  - Velocity is implicit in `current - previous`
+  - No separate velocity storage needed; RGBA can store (current.xy, previous.xy)
+  - More stable than Euler in constraint solving (won't blow up from high-frequency oscillation)
+- Spring constraints: each pair of adjacent particles has a "rest length"
+  - Compute the difference between current distance and rest length
+  - Move particles toward the rest length by a small step (0.1 is the relaxation coefficient)
+  - Multiple constraint-solving iterations converge to a stable state
+- Grid topology: particle IDs arranged in rows and columns, each connected to its up/down/left/right neighbors
+
+```glsl
+// Verlet integration: velocity is implicit in (current position - previous position)
+// particle.xy = current position, particle.zw = previous position
+vec2 newPos = 2.0 * particle.xy - particle.zw + vec2(0.0, -0.6) * dt * dt;
+particle.zw = particle.xy;
+particle.xy = newPos;
+
+// Spring constraint solving
+vec4 neighbor = texelFetch(iChannel0, neighborId, 0);
+vec2 delta = neighbor.xy - particle.xy;
+float dist = length(delta);
+float restLength = 0.1; // adjustable: rest length
+particle.xy += 0.1 * (dist - restLength) * (delta / dist);
+```
+
+### Variant 4: 3D Particles + Ray Rendering
+
+**Difference from base version**: Particles are stored in 3D space; rendering uses rays cast from the camera, computing the closest distance from each ray to each particle for coloring.
+
+Design points:
+- Camera at `(0, 0, 2.5)`, ray direction determined by screen UV
+- Point-to-line distance formula: `|cross(P-O, D)|`, where O is ray origin, D is ray direction, P is particle position
+- `dot(cross(...), cross(...))` computes the squared distance (avoiding sqrt)
+- `× 1000.0` is the distance scaling factor controlling visual particle size
+- Difference from 2D rendering: 2D uses length of `uv - pos`, 3D uses closest distance from ray to point
+
+```glsl
+// 3D rendering: closest distance from ray to particle
+vec3 ro = vec3(0.0, 0.0, 2.5);
+vec3 rd = normalize(vec3(uv, -0.5));
+for (int i = 0; i < numParticles; i++) {
+    vec3 pos = texture(iChannel0, vec2(i, 100.0) * w).rgb;
+    // Squared distance from point to line
+    float d = dot(cross(pos - ro, rd), cross(pos - ro, rd));
+    d *= 1000.0;
+    float glow = 0.14 / (pow(d, 1.1) + 0.03);
+    col += glow * particleColor;
+}
+```
+
+### Variant 5: Raindrop Particles (3D Scene Integration)
+
+**Difference from base version**: Particles move in 3D world space (gravity + wind + jitter), rendered as screen-space water drops with normal mapping to simulate refraction. Respawned randomly after lifecycle ends.
+
+Design points:
+- `speedScale` includes `sin(π/2 * pow(age/lifetime, 2))` to accelerate the fall
+- Wind force is projected onto the camera's right/up directions via dot product
+- Jitter `randVec2 * jitterSpeed` simulates air turbulence
+- Death and respawn: `particle.z` accumulates age; when it exceeds `particle.a` (lifespan), position and lifespan are reset
+- Rendering can overlay raindrop SDF + refraction normal mapping to simulate realistic raindrop optics
+
+```glsl
+// 3D force accumulation
+float speedScale = 0.0015 * (0.1 + 1.9 * sin(PI * 0.5 * pow(age / lifetime, 2.0)));
+particle.x += (windShieldOffset.x + windIntensity * dot(rayRight, windDir)) * fallSpeed * speedScale * dt;
+particle.y += (windShieldOffset.y + windIntensity * dot(rayUp, windDir)) * fallSpeed * speedScale * dt;
+// Jitter
+particle.xy += 0.001 * (randVec2(particle.xy + iTime) - 0.5) * jitterSpeed * dt;
+// Death and respawn
+if (particle.z > particle.a) {
+    particle.xy = vec2(rand(seedX), rand(seedY)) * iResolution.xy;
+    particle.a = lifetimeMin + rand(pid) * (lifetimeMax - lifetimeMin);
+    particle.z = 0.0;
+}
+```
+
+### Variant 6: Vortex/Storm Particle System
+
+**Difference from base version**: Particles move along spiral trajectories, forming storm/sandstorm/blizzard effects. Stateless single pass.
+
+Design points:
+- Differential rotation: inner circles rotate faster than outer circles (`angularSpeed = k / (offset + radius)`), producing natural vortices
+- Particle color must be significantly brighter than the background (2-3x), otherwise invisible against similar-colored backgrounds
+- Brightness budget: with 150 particles, `numerator=0.002, epsilon=0.008` (peak=0.25) is safe
+- Vortex center dark zone implemented with `smoothstep(innerR, outerR, dist)` mask
+
+### Variant 7: Meteor/Trailing Line Rendering
+
+**Difference from base version**: Particles are rendered as elongated glow lines rather than circular light points.
+
+Design points:
+- **Must have a clearly visible static starfield background**: Call `starField()` function; stars rendered as sharp Gaussian points using `exp(-dist²*k)`, with peak brightness >= 0.3
+- Meteor trail must be bright enough: `core` multiplier >= 0.15; after dividing by sample count, each step still needs >= 0.005 visible contribution
+- **Do not use `1/(distPerp² + tiny_epsilon)`** for lines — use `exp(-distPerp / width)` for safe glow
+- Meteor head `headGlow = 0.005 / (dist² + 0.0008)` ensures bright visibility
+- trailLen range 0.15-0.35 ensures sufficient trail length
+
+### Variant 8: Fountain/Upward Jet Particle System
+
+**Difference from base version**: Particles jet upward from a single point, follow parabolic arcs, then fall back. Stateless single pass.
+
+Design points:
+- **Must include three layers**: (1) Main water column particles (upward jet + parabola) (2) Splash particles (flying sideways upon hitting water) (3) Water surface/pool visuals
+- **Particles must be sharp, visible individual points**: Use small epsilon (<= 0.002) with small numerator; must not only produce diffuse glow
+- Parabolic motion: `pos = base + vel0 * t + 0.5 * gravity * t²`
+- Ground clipping: `if (pos.y < waterLevel) continue;`
+- Brightness budget: 60 main particles + 40 splash particles, each with epsilon in the 0.001-0.002 range
+
+### Variant 10: Spiral Array/Magic Particle System
+
+**Difference from base version**: Particles arranged in spiral or geometric arrays, producing magic circles, magic dust, and similar effects. Particles feature iridescent color variation. Stateless single pass.
+
+Design points:
+- **Must have discrete visible particles**: Each particle must be an individually visible small light point, not just blurry glow blobs. Use small epsilon (0.0004-0.0006) for sufficient sharpness
+- Spiral trajectory: `angle = baseAngle + norm * spiralTurns + time * rotSpeed`, `radius` increases with norm
+- Magic circles use independent ring particle layers with uniformly distributed angles + time rotation, using elliptical projection to simulate 3D perspective
+- Iridescent effect: `hue = fract(particleId / total + time * speed + norm * shift)`, hue varies continuously with ID and time, covering the full color wheel
+- Starlight shimmer: `shimmer = 0.7 + 0.3 * sin(time * freq + particleId * phase)` controls each particle's brightness pulsation
+- Two-layer structure: (1) Spiral ascending particle stream (2) Horizontally rotating magic circle light point rings
+
+## Brightness Budget Quick Reference
+
+Single pass system: **N × (numerator / epsilon) < 5.0**
+
+| Particle Count | Recommended numerator | Recommended epsilon | Single Particle Peak | Total Peak (no fade) |
+|--------|---------------|-------------|-----------|------------------|
+| 40     | 0.015         | 0.03        | 0.5       | 20 → Reinhard OK |
+| 80     | 0.008         | 0.015       | 0.53      | 42 → Reinhard OK |
+| 150    | 0.002         | 0.008       | 0.25      | 37 → Reinhard OK |
+| 200    | 0.001         | 0.005       | 0.2       | 40 → Reinhard OK |
+
+Multi-pass ping-pong system: **N × (numerator / epsilon) × 1/(1-decay) < 10.0**
+
+| decay | Amplification Factor | 20 Particle Peak Limit | 50 Particle Peak Limit | 100 Particle Peak Limit |
+|-------|---------|---------------|---------------|----------------|
+| 0.88  | 8.3x   | 0.06          | 0.024         | 0.012          |
+| 0.92  | 12.5x  | 0.04          | 0.016         | 0.008          |
+| 0.95  | 20x    | 0.025         | 0.01          | 0.005          |
+
+## Performance Optimization In-Depth Analysis
+
+### 1. Particle Count and Loop Overhead
+- **Bottleneck**: Every pixel iterates over all particles (O(W×H×N)); particle count is the biggest performance lever.
+- **Optimization**: Reducing particle count from 200 to 80 may have little visual difference but doubles performance. Early exit optimization can also help:
+```glsl
+float dist = length(uv - ppos);
+if (dist > 0.1) continue; // adjustable: skip particles beyond influence range
+```
+- Note: The early exit threshold must be tuned based on particle brightness / influence radius; too small causes abrupt particle edge cutoff
+
+### 2. Frame Feedback as Substitute for High Particle Count
+- **Technique**: Few particles + frame feedback trails (`prev * 0.95 + current`) visually equals many more particles. Drawing 50 particles per frame + accumulation = visual density far exceeding 50.
+- This approach has the additional benefit of producing natural motion blur
+- Requires an additional Buffer pass for accumulated frames
+
+### 3. N-body Interaction Complexity
+- **Bottleneck**: Each particle interacts with all others = O(N²). Becomes very slow when N > 100.
+- **Optimization A**: Only interact with K nearest neighbors (using Voronoi tracking acceleration structure, see "Combining with Voronoi Spatial Acceleration Structure" below).
+- **Optimization B**: Divide space into grid cells, only check particles in adjacent cells. Implementing the grid on GPU requires additional Buffer passes to maintain grid data.
+
+### 4. Sub-frame Stepping
+- **Problem**: High-speed particles move multiple pixels per frame, leaving discontinuous trajectories.
+- **Optimization**: Perform multiple small steps per frame for each particle, accumulating rendering along the way:
+```glsl
+const int stepsPerFrame = 7; // adjustable
+for (int j = 0; j < stepsPerFrame; j++) {
+    // Render particle contribution at this position
+    pos += vel * 0.002 * 0.2;
+}
+col /= float(stepsPerFrame);
+```
+- More sub-frames produce more continuous trajectories but linearly increase computational cost
+- Suitable for firework explosions, high-speed bullet curtains, etc.
+
+### 5. Precision and Numerical Stability
+- Velocity and acceleration need clamping to prevent numerical explosion:
+```glsl
+float v = length(vel);
+vel *= v > MAX_VEL ? MAX_VEL / v : 1.0;
+```
+- Verlet integration is more stable than Euler in constraint solving, especially for cloth and spring networks
+- For long-running simulations, be aware of floating-point precision errors accumulating over time
+
+## Combination Suggestions with Complete Code
+
+### Combining with Raymarching Scenes
+Particle systems are often embedded in Raymarching scenes (e.g., rain, sparks, dust). Method: During the Raymarching step loop, simultaneously sample particle density/positions and overlay onto scene color. Or render particles to a separate Buffer and blend during final compositing.
+
+### Combining with Noise / Flow Fields
+Use Simplex/Perlin noise to generate a velocity field; particles move along the noise gradient:
+```glsl
+// Use noise to drive particle velocity
+vel += hash33(vel + time) * 7.0; // random perturbation
+vel = mix(vel, -pos * pow(length(pos), 0.75), 0.5 + 0.5 * sin(time)); // center attraction
+```
+This combination is suitable for "neural synapse", "smoke flow", and other organic effects.
+
+### Combining with Post-Processing
+- **Bloom**: Apply Gaussian blur to particle rendering output and overlay, enhancing the glow.
+- **Chromatic Aberration**: Offset-sample R/G/B channels separately, adding a lens effect.
+- **Tone Mapping**: Apply Reinhard mapping `col = col / (1.0 + col)` to HDR particle brightness.
+
+### Combining with SDF Shape Rendering
+Render particles as specific SDF shapes (fish, water drops, sparks) instead of abstract light points. Method: Rotate local coordinates based on particle velocity direction, then compute SDF distance in that coordinate system:
+```glsl
+float sdFish(vec2 p, float angle) {
+    float c = cos(angle), s = sin(angle);
+    p *= 20.0 * mat2(c, s, -s, c);
+    return max(min(length(p), length(p - vec2(0.56, 0.0))) - 0.3, -min(length(p - vec2(0.8, 0.0)) - 0.45, length(p + vec2(0.14, 0.0)) - 0.12)) * 0.05;
+}
+```
+
+### Combining with Voronoi Spatial Acceleration Structure
+For large-scale particles (thousands), use Voronoi tracking acceleration structure instead of brute-force traversal. Each pixel maintains the IDs of the 4 nearest particles, updated through neighborhood propagation. This reduces rendering and physics queries from O(N) to O(1) (fixed neighborhood query per pixel). Suitable for fluid simulation and large-scale swarm behavior.
--- a/skills/shader-dev/reference/path-tracing-gi.md
+++ b/skills/shader-dev/reference/path-tracing-gi.md
@@ -0,0 +1,602 @@
+# Path Tracing & Global Illumination - Detailed Reference
+
+This document is a complete reference for [SKILL.md](SKILL.md), covering prerequisite knowledge, step-by-step detailed explanations, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL basic syntax**: ShaderToy multi-pass (Buffer A/B/Image) architecture
+- **Vector math**: Dot product, cross product, reflection/refraction vector computation
+- **Probability fundamentals**: PDF (probability density function), Monte Carlo integration, importance sampling
+- **Rendering equation** basic form: $L_o = L_e + \int f_r \cdot L_i \cdot \cos\theta \, d\omega$
+- **Ray-geometry intersection** methods (spheres, planes, SDF)
+
+## Core Principles in Detail
+
+Path tracing solves the rendering equation via Monte Carlo methods. For each pixel, a ray is emitted from the camera and bounces through the scene. At each bounce:
+
+1. **Intersection**: Find the nearest intersection of the ray with the scene
+2. **Shading**: Compute the lighting contribution at the current node based on material type (diffuse/specular/refractive)
+3. **Sample next direction**: Generate the next bounce ray according to the BRDF/BSDF
+4. **Accumulate**: Add the weighted lighting contributions from all nodes along the path
+
+### Core Mathematics
+
+- **Rendering equation**: $L_o(x, \omega_o) = L_e(x, \omega_o) + \int_\Omega f_r(x, \omega_i, \omega_o) L_i(x, \omega_i) (\omega_i \cdot n) d\omega_i$
+- **Monte Carlo estimate**: $L \approx \frac{1}{N} \sum \frac{f_r \cdot L_i \cdot \cos\theta}{p(\omega)}$
+- **Schlick Fresnel**: $F = F_0 + (1 - F_0)(1 - \cos\theta)^5$
+- **Cosine-weighted sampling PDF**: $p(\omega) = \frac{\cos\theta}{\pi}$
+
+### Key Design
+
+An **iterative loop** replaces recursion, using two variables — `acc` (accumulated radiance) and `mask/throughput` (path attenuation) — to track path contributions. At each bounce, the material color is multiplied into the throughput, and self-emission and direct lighting are added to acc.
+
+## Implementation Steps in Detail
+
+### Step 1: Pseudorandom Number Generator
+
+**What**: Provide a different random number sequence per pixel per frame, driving all Monte Carlo sampling.
+
+**Why**: All random decisions in path tracing (direction sampling, Russian roulette, Fresnel selection) depend on random numbers. The seed must be sufficiently decorrelated between pixels and frames; otherwise structured noise will appear.
+
+**Method 1: sin-hash (simple, good for getting started)**
+```glsl
+float seed;
+float rand() { return fract(sin(seed++) * 43758.5453123); }
+// Initialization: seed = iTime + iResolution.y * fragCoord.x / iResolution.x + fragCoord.y / iResolution.y;
+```
+
+**Method 2: Integer hash (better quality, recommended)**
+```glsl
+int iSeed;
+int irand() { iSeed = iSeed * 0x343fd + 0x269ec3; return (iSeed >> 16) & 32767; }
+float frand() { return float(irand()) / 32767.0; }
+void srand(ivec2 p, int frame) {
+    int n = frame;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    n += p.y;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    n += p.x;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    iSeed = n;
+}
+```
+
+The sin-hash may produce periodic artifacts on some GPUs (due to inconsistent sin precision across hardware). The integer hash is more reliable and uniform. The Visual Studio LCG (`0x343fd`) is a commonly used linear congruential generator.
+
+### Step 2: Ray-Scene Intersection
+
+**What**: Given a ray origin and direction, find the nearest intersection along with normal and material information at the intersection point.
+
+**Why**: This is the fundamental operation of path tracing. Either analytic geometry (spheres, planes) or SDF ray marching can be used.
+
+**Analytic sphere intersection (classic smallpt approach)**
+```glsl
+struct Ray { vec3 o, d; };
+struct Sphere { float r; vec3 p, e, c; int refl; };
+
+float intersectSphere(Sphere s, Ray r) {
+    vec3 op = s.p - r.o;
+    float b = dot(op, r.d);
+    float det = b * b - dot(op, op) + s.r * s.r;
+    if (det < 0.) return 0.;
+    det = sqrt(det);
+    float t = b - det;
+    if (t > 1e-3) return t;
+    t = b + det;
+    return t > 1e-3 ? t : 0.;
+}
+```
+
+Derivation: Ray $r(t) = o + td$, sphere $|p - c|^2 = R^2$, substitution yields quadratic $t^2 - 2b \cdot t + c = 0$, where $b = (c - o) \cdot d$, discriminant $\Delta = b^2 - |c - o|^2 + R^2$. The epsilon of `1e-3` prevents self-intersection.
+
+**SDF ray marching (for complex geometry)**
+```glsl
+float map(vec3 p) { /* returns distance to nearest surface */ }
+
+float raymarch(vec3 ro, vec3 rd, float tmax) {
+    float t = 0.01;
+    for (int i = 0; i < 256; i++) {
+        float h = map(ro + rd * t);
+        if (abs(h) < 0.0001 || t > tmax) break;
+        t += h;
+    }
+    return t;
+}
+
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.0001, 0.);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)));
+}
+```
+
+The principle of SDF marching: each step safely advances by the "distance to the nearest surface," ensuring no surface is crossed. The step count (128-256) and threshold (0.0001) represent a tradeoff between accuracy and performance.
+
+### Step 3: Cosine-Weighted Hemisphere Sampling
+
+**What**: Generate a random direction distributed according to cosine weighting on the hemisphere above the surface normal, used for diffuse bounces.
+
+**Why**: Cosine-weighted sampling (Malley's method) matches the Lambertian BRDF distribution with PDF $\cos\theta / \pi$, simplifying BRDF/PDF to just the albedo and greatly reducing variance.
+
+With uniform hemisphere sampling (PDF = $1/2\pi$), each bounce would need an extra multiplication by $\cos\theta \cdot 2$, and variance would be higher since many sample directions contribute very little to the integral.
+
+**Method 1: fizzer method (most concise)**
+```glsl
+vec3 cosineDirection(vec3 nor) {
+    float u = frand();
+    float v = frand();
+    float a = 6.2831853 * v;
+    float b = 2.0 * u - 1.0;
+    vec3 dir = vec3(sqrt(1.0 - b * b) * vec2(cos(a), sin(a)), b);
+    return normalize(nor + dir); // fizzer method
+}
+```
+
+Principle: Uniformly sampling a point on the unit sphere and adding the normal direction, then normalizing, naturally produces a cosine distribution. This works because uniform points on the unit sphere, projected onto the hemisphere above the normal, naturally form a cosine distribution.
+
+**Method 2: Classic ONB construction (more intuitive)**
+```glsl
+vec3 cosineDirectionONB(vec3 n) {
+    vec2 r = vec2(frand(), frand());
+    vec3 u = normalize(cross(n, vec3(0., 1., 1.)));
+    vec3 v = cross(u, n);
+    float ra = sqrt(r.y);
+    float rx = ra * cos(6.2831853 * r.x);
+    float ry = ra * sin(6.2831853 * r.x);
+    float rz = sqrt(1.0 - r.y);
+    return normalize(rx * u + ry * v + rz * n);
+}
+```
+
+Principle: First build an orthonormal basis (ONB) with n as the z-axis, then sample in local coordinates using Malley's method: map uniform random numbers onto the unit disk ($r = \sqrt{\xi_2}$, $\phi = 2\pi\xi_1$), with z-component $\sqrt{1 - r^2}$.
+
+### Step 4: Material System and BRDF Evaluation
+
+**What**: Based on the material type at the intersection (diffuse, specular, refractive), determine the ray's next direction and energy attenuation.
+
+**Why**: Different materials respond to light completely differently. Diffuse scatters randomly, specular reflects perfectly, and refractive materials follow Snell's law. The Fresnel effect determines the reflection/refraction ratio.
+
+```glsl
+#define MAT_DIFFUSE  0
+#define MAT_SPECULAR 1
+#define MAT_DIELECTRIC 2
+```
+
+**Diffuse**:
+- New direction = `cosineDirection(normal)`
+- `throughput *= albedo`
+- Because cosine-weighted sampling is used, BRDF($1/\pi$) * $\cos\theta$ / PDF($\cos\theta/\pi$) = 1, so throughput only needs to be multiplied by albedo
+
+**Specular**:
+- New direction = `reflect(rd, normal)`
+- `throughput *= albedo`
+- A perfect mirror's BRDF is a delta function; only one direction contributes
+
+**Refractive (glass)**:
+```glsl
+void handleDielectric(inout Ray r, vec3 n, vec3 x, float ior,
+                      vec3 albedo, inout vec3 mask) {
+    float cosi = dot(n, r.d);
+    float eta = cosi > 0. ? ior : 1.0 / ior;       // Entering/leaving medium
+    vec3 nl = cosi > 0. ? -n : n;                    // Outward-facing normal
+    cosi = abs(cosi);
+
+    float cos2t = 1.0 - eta * eta * (1.0 - cosi * cosi);
+    r = Ray(x, reflect(r.d, n));                      // Default to reflection
+
+    if (cos2t > 0.) {
+        vec3 tdir = normalize(r.d / eta + nl * (cosi / eta - sqrt(cos2t)));
+        // Schlick Fresnel
+        float R0 = ((ior - 1.) * (ior - 1.)) / ((ior + 1.) * (ior + 1.));
+        float c = 1.0 - (cosi > 0. ? dot(tdir, n) : cosi);
+        float Re = R0 + (1.0 - R0) * c * c * c * c * c;
+        float P = 0.25 + 0.5 * Re;
+        if (frand() < P) {
+            mask *= Re / P;                            // Reflection
+        } else {
+            mask *= albedo * (1.0 - Re) / (1.0 - P);  // Refraction
+            r = Ray(x, tdir);
+        }
+    }
+}
+```
+
+Key points:
+- **Snell's law**: $n_1 \sin\theta_1 = n_2 \sin\theta_2$; total internal reflection occurs when $\sin\theta_2 > 1$
+- **Schlick approximation**: $R(\theta) = R_0 + (1-R_0)(1-\cos\theta)^5$, where $R_0 = ((n_1-n_2)/(n_1+n_2))^2$
+- **Russian Roulette selection**: Instead of selecting directly by `Re`, an adjusted probability `P = 0.25 + 0.5 * Re` is used, then compensated through the mask. This avoids the problem of almost always choosing refraction when Re is low
+
+### Step 5: Direct Light Sampling (Next Event Estimation)
+
+**What**: At each diffuse intersection, directly cast a shadow ray toward the light source to compute direct lighting contribution.
+
+**Why**: Purely random paths are unlikely to hit small-area light sources. Directly sampling light sources greatly reduces variance and accelerates convergence.
+
+```glsl
+// Solid angle sampling of spherical light source
+vec3 directLighting(vec3 x, vec3 n, vec3 albedo,
+                    vec3 lightPos, float lightRadius, vec3 lightEmission,
+                    int selfId) {
+    vec3 l0 = lightPos - x;
+    float cos_a_max = sqrt(1.0 - clamp(lightRadius * lightRadius / dot(l0, l0), 0., 1.));
+    float cosa = mix(cos_a_max, 1.0, frand());
+    float sina = sqrt(1.0 - cosa * cosa);
+    float phi = 6.2831853 * frand();
+
+    // Sample within the cone toward the light source
+    vec3 w = normalize(l0);
+    vec3 u = normalize(cross(w.yzx, w));
+    vec3 v = cross(w, u);
+    vec3 l = (u * cos(phi) + v * sin(phi)) * sina + w * cosa;
+
+    // Shadow test
+    if (shadowTest(Ray(x, l), selfId, lightId)) {
+        float omega = 6.2831853 * (1.0 - cos_a_max); // Solid angle
+        return albedo * lightEmission * clamp(dot(l, n), 0., 1.) * omega / 3.14159265;
+    }
+    return vec3(0.);
+}
+```
+
+Mathematical derivation:
+- Solid angle subtended by spherical light at the shading point: $\omega = 2\pi(1 - \cos\alpha_{max})$, where $\cos\alpha_{max} = \sqrt{1 - R^2/d^2}$
+- PDF for uniform sampling within the cone: $p = 1/\omega$
+- Direct lighting contribution: $L_{direct} = \frac{f_r \cdot L_e \cdot \cos\theta_{light}}{p} = albedo \cdot L_e \cdot \cos\theta \cdot \omega / \pi$
+
+Note: With NEE enabled, indirect bounces that hit the light source should **not** accumulate its emission again (to avoid double-counting). However, in smallpt-style implementations where the light source is large, this double-counting has negligible impact. The strict approach is to skip the indirect hit light emission when NEE is active.
+
+### Step 6: Path Tracing Main Loop
+
+**What**: Combine all the above modules into a complete path tracer.
+
+**Why**: The iterative structure avoids GLSL's lack of recursion support, while the throughput/acc pattern is the standard path tracing implementation paradigm.
+
+```glsl
+#define MAX_BOUNCES 8       // Adjustable: max bounce count; more = more accurate indirect lighting
+#define ENABLE_NEE true     // Adjustable: whether to enable direct light sampling
+
+vec3 pathtrace(Ray r) {
+    vec3 acc = vec3(0.);        // Accumulated radiance
+    vec3 throughput = vec3(1.); // Path attenuation (throughput)
+
+    for (int depth = 0; depth < MAX_BOUNCES; depth++) {
+        // 1. Intersection
+        float t;
+        vec3 n, albedo, emission;
+        int matType;
+        if (!intersectScene(r, t, n, albedo, emission, matType))
+            break; // Shot into the sky
+
+        vec3 x = r.o + r.d * t;
+        vec3 nl = dot(n, r.d) < 0. ? n : -n; // Outward-facing normal
+
+        // 2. Accumulate emission
+        acc += throughput * emission;
+
+        // 3. Russian roulette (starting from bounce 3)
+        if (depth > 2) {
+            float p = max(throughput.r, max(throughput.g, throughput.b));
+            if (frand() > p) break;
+            throughput /= p;
+        }
+
+        // 4. Sample based on material
+        if (matType == MAT_DIFFUSE) {
+            // Direct light sampling (NEE)
+            if (ENABLE_NEE)
+                acc += throughput * directLighting(x, nl, albedo, ...);
+            // Indirect bounce
+            throughput *= albedo;
+            r = Ray(x + nl * 1e-3, cosineDirection(nl));
+
+        } else if (matType == MAT_SPECULAR) {
+            throughput *= albedo;
+            r = Ray(x + nl * 1e-3, reflect(r.d, n));
+
+        } else if (matType == MAT_DIELECTRIC) {
+            handleDielectric(r, n, x, 1.5, albedo, throughput);
+        }
+    }
+    return acc;
+}
+```
+
+Key design points:
+- `acc` accumulates the final color, `throughput` records the attenuation from all materials along the path
+- Russian roulette maintains **unbiasedness**: termination probability is $1-p$, surviving paths divide throughput by $p$, so the expected value is unchanged
+- Normal offset (`x + nl * 1e-3`) prevents self-intersection due to floating-point precision
+
+### Step 7: Progressive Accumulation and Display
+
+**What**: Perform weighted averaging of multi-frame results, progressively converging to a noise-free image. Apply tone mapping and gamma correction for display.
+
+**Why**: A single frame of path tracing is extremely noisy. Through multi-frame accumulation, sample count grows linearly and noise decreases as $1/\sqrt{N}$.
+
+**Buffer A (path tracing + accumulation)**
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    srand(ivec2(fragCoord), iFrame);
+    // ... camera setup, ray generation ...
+    vec3 color = pathtrace(ray);
+
+    // Progressive accumulation
+    vec4 prev = texelFetch(iChannel0, ivec2(fragCoord), 0);
+    if (iFrame == 0) prev = vec4(0.);
+    fragColor = prev + vec4(color, 1.0);
+}
+```
+
+Accumulation strategy: Store each frame's color and sample count in RGBA (RGB = color accumulation, A = sample count accumulation). Divide by A when displaying to get the average. Clear to zero when `iFrame == 0` to handle ShaderToy's edit reset.
+
+**Image Pass (tone mapping + gamma)**
+```glsl
+vec3 ACES(vec3 x) {
+    float a = 2.51, b = 0.03, c = 2.43, d = 0.59, e = 0.14;
+    return (x * (a * x + b)) / (x * (c * x + d) + e);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec4 data = texelFetch(iChannel0, ivec2(fragCoord), 0);
+    vec3 col = data.rgb / max(data.a, 1.0);
+
+    col = ACES(col);                         // Tone mapping
+    col = pow(col, vec3(1.0 / 2.2));         // Gamma correction
+
+    // Optional: vignette
+    vec2 uv = fragCoord / iResolution.xy;
+    col *= 0.5 + 0.5 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.1);
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+ACES tone mapping compresses HDR radiance values into the [0,1] LDR range while preserving detail in highlights and shadows. Gamma correction (2.2) converts linear color space to sRGB display space.
+
+## Common Variants in Detail
+
+### 1. SDF Scene Path Tracing
+
+**Difference from base version**: Replaces analytic sphere intersection with SDF ray marching, supporting arbitrarily complex geometry (fractals, boolean operations, etc.).
+
+Challenges of SDF path tracing:
+- SDF marching is much slower than analytic intersection (each step requires 128+ iterations)
+- Numerical normals (central difference) are needed at each bounce, adding 6 extra `map()` calls
+- Self-intersection issues are more severe, requiring larger epsilon offsets
+
+```glsl
+float map(vec3 p) {
+    float d = p.y + 0.5;                        // Ground
+    d = min(d, length(p - vec3(0., 0.4, 0.)) - 0.4); // Sphere
+    return d;
+}
+
+float intersectScene(vec3 ro, vec3 rd, float tmax) {
+    float t = 0.01;
+    for (int i = 0; i < 128; i++) {
+        float h = map(ro + rd * t);
+        if (h < 0.0001 || t > tmax) break;
+        t += h;
+    }
+    return t < tmax ? t : -1.0;
+}
+// Normal via central difference: calcNormal()
+// Materials distinguished by ID returned from map()
+```
+
+### 2. Disney BRDF Path Tracing
+
+**Difference from base version**: Replaces simple Lambert + perfect mirror with the Disney principled BRDF, supporting metallic/roughness parameterized PBR materials.
+
+Core components of the Disney BRDF:
+- **GGX normal distribution (D)**: Describes the statistical distribution of microsurface normals; higher roughness = wider distribution
+- **Smith occlusion function (G)**: Accounts for self-shadowing between microsurfaces
+- **Fresnel term (F)**: Schlick approximation; metallic controls F0 (metals: F0 = albedo, dielectrics: F0 = 0.04)
+- **VNDF sampling**: Visible Normal Distribution Function sampling, more efficient than traditional GGX sampling
+
+```glsl
+struct Material {
+    vec3 albedo;
+    float metallic;   // 0=dielectric, 1=metal
+    float roughness;  // 0=smooth, 1=rough
+};
+
+// GGX normal distribution
+float D_GGX(float a2, float NoH) {
+    float d = NoH * NoH * (a2 - 1.0) + 1.0;
+    return a2 / (PI * d * d);
+}
+
+// Smith occlusion function
+float G_Smith(float NoV, float NoL, float a2) {
+    float g1 = (2.0 * NoV) / (NoV + sqrt(a2 + (1.0 - a2) * NoV * NoV));
+    float g2 = (2.0 * NoL) / (NoL + sqrt(a2 + (1.0 - a2) * NoL * NoL));
+    return g1 * g2;
+}
+
+// VNDF sampling for importance sampling GGX
+vec3 SampleGGXVNDF(vec3 V, float ax, float ay, float r1, float r2) {
+    vec3 Vh = normalize(vec3(ax * V.x, ay * V.y, V.z));
+    float lensq = Vh.x * Vh.x + Vh.y * Vh.y;
+    vec3 T1 = lensq > 0. ? vec3(-Vh.y, Vh.x, 0) * inversesqrt(lensq) : vec3(1, 0, 0);
+    vec3 T2 = cross(Vh, T1);
+    float r = sqrt(r1);
+    float phi = 2.0 * PI * r2;
+    float t1 = r * cos(phi), t2 = r * sin(phi);
+    float s = 0.5 * (1.0 + Vh.z);
+    t2 = (1.0 - s) * sqrt(1.0 - t1 * t1) + s * t2;
+    vec3 Nh = t1 * T1 + t2 * T2 + sqrt(max(0., 1. - t1*t1 - t2*t2)) * Vh;
+    return normalize(vec3(ax * Nh.x, ay * Nh.y, max(0., Nh.z)));
+}
+```
+
+When using the Disney BRDF in path tracing, the sampling strategy typically is:
+- Use metallic as the probability to choose between diffuse and specular
+- Diffuse uses cosine-weighted sampling
+- Specular uses VNDF sampling for GGX
+
+### 3. Depth of Field
+
+**Difference from base version**: Uses a thin lens model to simulate the bokeh effect of real cameras.
+
+Principle of the thin lens model: All rays passing through the focal point converge to the same point. By randomly offsetting the ray origin within the aperture while keeping the target point on the focal plane unchanged, the depth of field effect can be simulated.
+
+```glsl
+#define APERTURE 0.12    // Adjustable: aperture size; larger = stronger bokeh
+#define FOCUS_DIST 8.0   // Adjustable: focus distance
+
+// In mainImage, after generating the ray:
+vec3 focalPoint = ro + rd * FOCUS_DIST;
+vec3 offset = ca * vec3(uniformDisk() * APERTURE, 0.);
+ro += offset;
+rd = normalize(focalPoint - ro);
+
+vec2 uniformDisk() {
+    vec2 r = vec2(frand(), frand());
+    float a = 6.2831853 * r.x;
+    return sqrt(r.y) * vec2(cos(a), sin(a));
+}
+```
+
+Parameter tuning suggestions:
+- `APERTURE`: 0.01 (almost no bokeh) to 0.5 (strong bokeh)
+- `FOCUS_DIST`: Set to the distance from the camera to the object you want in sharp focus
+- Bokeh effects require more samples to converge (since an extra random dimension is added)
+
+### 4. Multiple Importance Sampling (MIS)
+
+**Difference from base version**: Uses both BRDF sampling and light source sampling simultaneously, combining them with the power heuristic, achieving low variance across all scene configurations.
+
+Core idea of MIS: A single sampling strategy may have high variance in certain scene configurations (e.g., NEE performs poorly on glossy surfaces, BRDF sampling performs poorly with small light sources). MIS combines multiple strategies to compensate for each other's weaknesses.
+
+```glsl
+// Power heuristic (beta=2)
+float misWeight(float pdfA, float pdfB) {
+    float a2 = pdfA * pdfA;
+    float b2 = pdfB * pdfB;
+    return a2 / (a2 + b2);
+}
+
+// During shading, compute both:
+// 1. BRDF sampled direction -> if it hits a light, weight with misWeight(brdfPdf, lightPdf)
+// 2. Light sampled direction -> weight with misWeight(lightPdf, brdfPdf)
+// Sum of both replaces the single sampling strategy
+```
+
+The power heuristic ($\beta=2$) formula: $w_A = p_A^2 / (p_A^2 + p_B^2)$. Veach proved in his thesis that this is nearly optimal.
+
+### 5. Volumetric Path Tracing (Participating Media)
+
+**Difference from base version**: Performs random walks inside the medium, simulating translucent/subsurface scattering effects via Beer-Lambert attenuation and scattering events.
+
+Core concepts of volumetric rendering:
+- **Extinction coefficient** = absorption + scattering
+- **Beer-Lambert law**: Transmittance $T = e^{-\sigma_t \cdot d}$
+- **Scattering event**: Scattering occurs with probability $\sigma_s / \sigma_t$ (vs. absorption)
+- **Phase function**: Determines the distribution of scattering directions. Uniform sphere sampling = isotropic scattering, Henyey-Greenstein = controllable forward/backward scattering
+
+```glsl
+// Beer-Lambert transmittance attenuation
+vec3 transmittance = exp(-extinction * distance);
+
+// Random walk scattering
+float scatterDist = -log(frand()) / extinctionMajorant;
+if (scatterDist < hitDist) {
+    // Scattering event occurs
+    pos += ray.d * scatterDist;
+    // Sample new direction with phase function (e.g., uniform or Henyey-Greenstein)
+    ray.d = uniformSphereSample();
+    throughput *= albedo; // scattering / extinction
+}
+```
+
+Henyey-Greenstein phase function:
+- Parameter g in [-1, 1]: g > 0 forward scattering, g < 0 backward scattering, g = 0 isotropic
+- $p(\cos\theta) = \frac{1-g^2}{4\pi(1+g^2-2g\cos\theta)^{3/2}}$
+
+## Performance Optimization Details
+
+### 1. Sampling Strategy
+1-4 samples per pixel per frame, relying on inter-frame accumulation for convergence. This maintains real-time frame rates while eventually reaching high quality. For ShaderToy, `SAMPLES_PER_FRAME = 1` or `2` is usually the best choice, since more samples per frame lower the frame rate without accelerating visual convergence.
+
+### 2. Russian Roulette
+Starting from bounce 3-4, use the maximum throughput component as the survival probability. This terminates low-energy paths early while maintaining unbiasedness.
+```glsl
+float p = max(throughput.r, max(throughput.g, throughput.b));
+if (frand() > p) break;
+throughput /= p;
+```
+Mathematical guarantee: Termination probability $q = 1 - p$, surviving path throughput multiplied by $1/p$, so the expected value $E[L] = p \cdot L/p + (1-p) \cdot 0 = L$, unbiased.
+
+### 3. Direct Light Sampling (NEE)
+Always explicitly sample the light source on diffuse surfaces, avoiding dependence on random paths hitting the light. Particularly significant for small-area light sources. When the light source subtends a very small fraction of the hemisphere's solid angle, pure BRDF sampling can almost never hit the light; NEE is essential.
+
+### 4. Avoiding Self-Intersection
+Offset the intersection point along the normal direction (epsilon = 1e-3 ~ 1e-4), or record the last-hit object ID and skip self-intersection. Both approaches have tradeoffs:
+- Normal offset: Simple and universal, but may penetrate thin objects
+- ID skipping: Precise, but not suitable for concave objects (which may need self-intersection)
+
+### 5. Firefly Suppression
+Clamp extreme brightness with `min(color, 10.)` to prevent firefly noise spots. ACES tone mapping also helps compress high dynamic range. The root cause of fireflies is that certain paths find high-energy but low-probability light transport paths, resulting in extremely large Monte Carlo estimate values.
+
+### 6. SDF Scene Optimization
+- Limit the maximum marching steps (128-256); treat exceeding the limit as a miss
+- Set a reasonable maximum trace distance (tmax) to cull distant objects
+- Use larger epsilon during bounces (SDF numerical precision is typically worse than analytic geometry)
+- "Relaxed sphere tracing" can be used to increase step size when safe
+
+### 7. High-Quality PRNG
+Use integer hashes (such as Visual Studio LCG or Wang hash) instead of sin-hash to avoid periodic artifacts on some GPUs. The problem with sin-hash is that sin precision differs across GPUs (some use only mediump), which can produce visible structured noise.
+
+## Combination Suggestions in Detail
+
+### 1. Path Tracing + SDF Modeling
+Use SDF to define complex scene geometry (fractals, smooth boolean operations) while path tracing handles lighting computation. This is the most common combination on ShaderToy. SDF's advantage is the ability to easily create shapes difficult to express with traditional meshes (Mandelbulb, Menger sponge, etc.), while path tracing provides physically accurate lighting for these complex geometries.
+
+### 2. Path Tracing + Environment Maps
+Use an HDR cubemap as an infinitely distant environment light source. When a path shoots into the sky, sample the environment map for incident radiance. Can be combined with atmospheric scattering models for a more physically accurate sky.
+```glsl
+// When path misses the scene:
+if (!hit) {
+    acc += throughput * texture(iChannel1, rd).rgb; // HDR environment map
+    break;
+}
+```
+
+### 3. Path Tracing + PBR Materials
+The Disney BRDF/BSDF provides metallic/roughness parameterized material models, combined with GGX microsurface distribution and VNDF importance sampling for production-quality results. In ShaderToy, material parameters can be generated procedurally (based on position, noise, etc.).
+
+### 4. Path Tracing + Volumetric Rendering
+Add participating media to the path tracing framework, using Beer-Lambert law for transmittance and random walks for scattering, to achieve clouds, smoke, subsurface scattering, and other effects.
+```glsl
+// Add volume check in the path tracing loop:
+if (insideVolume) {
+    float scatterDist = -log(frand()) / sigma_t;
+    if (scatterDist < surfaceDist) {
+        // Volume scattering event
+        x = r.o + r.d * scatterDist;
+        r.d = samplePhaseFunction(r.d, g);
+        throughput *= sigma_s / sigma_t; // albedo
+        continue;
+    }
+}
+```
+
+### 5. Path Tracing + Spectral Rendering
+Each path samples a single wavelength instead of RGB, using Sellmeier/Cauchy equations to compute wavelength-dependent index of refraction, and finally converts to sRGB through CIE XYZ color matching functions. This correctly simulates dispersion and rainbow caustics.
+
+Basic spectral rendering workflow:
+1. Each path randomly selects a wavelength λ in [380, 780] nm
+2. Compute the index of refraction for that wavelength using the Sellmeier equation: $n^2 = 1 + \sum B_i \lambda^2 / (\lambda^2 - C_i)$
+3. All color computations in path tracing become single-channel (spectral power at that wavelength)
+4. Finally convert spectral radiance to XYZ via CIE XYZ color matching functions, then to sRGB
+
+### 6. Path Tracing + Temporal Accumulation / TAA
+Leverage ShaderToy's inter-frame buffer feedback mechanism for progressive rendering. Can be further extended to temporal reprojection, reusing historical frame data during camera movement to accelerate convergence.
+
+Basic temporal reprojection:
+1. Store the previous frame's camera matrix
+2. Reproject the current pixel into the previous frame's screen space
+3. If the position is valid and geometrically consistent, blend the historical frame with the current frame
+4. Otherwise discard historical data and restart accumulation
--- a/skills/shader-dev/reference/polar-uv-manipulation.md
+++ b/skills/shader-dev/reference/polar-uv-manipulation.md
@@ -0,0 +1,521 @@
+# Polar Coordinates & UV Manipulation — Detailed Reference
+
+> This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, variant details, in-depth performance analysis, and complete combination code examples.
+
+## Prerequisites
+
+### GLSL Fundamentals
+- **uniform / varying**: Global variable passing mechanisms
+- **Built-in functions**: `sin`, `cos`, `atan`, `length`, `fract`, `mod`, `smoothstep`, `mix`, `clamp`, `pow`, `exp`, `log`, `abs`, `max`, `min`, `floor`, `ceil`, `dot`
+- **Vector types**: `vec2`, `vec3`, `vec4`, with swizzle support (e.g., `.xy`, `.rgb`)
+- **Matrix types**: `mat2` for 2D rotation
+
+### Vector Math
+- 2D vector operations: addition, subtraction, multiplication, division, length (`length`), normalization (`normalize`)
+- Dot product (`dot`): projection and angle relationships
+- 2D rotation matrix:
+```glsl
+mat2 rotate(float a) {
+    float c = cos(a), s = sin(a);
+    return mat2(c, s, -s, c);
+}
+```
+
+### Coordinate Systems
+- Cartesian coordinates (x, y): standard rectangular coordinate system
+- Screen coordinates: bottom-left (0,0), top-right (iResolution.x, iResolution.y)
+- Normalized coordinates: typically mapped to [-1, 1] or [0, 1] range
+
+### ShaderToy Framework
+- `mainImage(out vec4 fragColor, in vec2 fragCoord)`: entry function
+- `fragCoord`: current pixel's screen coordinates
+- `iResolution`: viewport resolution (pixels)
+- `iTime`: time since launch (seconds)
+- `iMouse`: mouse position
+
+## Implementation Steps
+
+### Step 1: UV Normalization and Centering
+
+**What**: Convert screen pixel coordinates to normalized coordinates centered at the screen center with uniform scaling.
+
+**Why**: All subsequent polar coordinate operations depend on a correct center point and uniform scale. Without this step, effects would be offset or stretched.
+
+**Three approaches compared**:
+
+| Approach | Range | Use Case |
+|----------|-------|----------|
+| `/ min(iResolution.x, iResolution.y)` | [-1, 1] square region | Most universal, ensures circles stay circular |
+| `/ iResolution.y` | [-aspect, aspect] × [-1, 1] | When full screen width is needed |
+| Pixel quantization | Depends on PIXEL_FILTER | Pixelated/retro style |
+
+```glsl
+// Approach 1: range [-1, 1], most common
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / min(iResolution.x, iResolution.y);
+
+// Approach 2: range [-aspect, aspect] x [-1, 1]
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Approach 3: precise pixel size control (precise pixel size control)
+float pixel_size = length(iResolution.xy) / PIXEL_FILTER; // PIXEL_FILTER adjustable: pixelation level
+vec2 uv = (floor(fragCoord * (1.0/pixel_size)) * pixel_size - 0.5*iResolution.xy) / length(iResolution.xy);
+```
+
+### Step 2: Cartesian to Polar Coordinate Transform
+
+**What**: Convert (x, y) coordinates to (r, θ) polar coordinates.
+
+**Why**: This is the fundamental transform of the entire paradigm, mapping the linear xy space to a radial space centered at the origin. In polar coordinates:
+- A circle is simply r = constant
+- A ray is simply θ = constant
+- This makes creating ring/spiral/radial effects very straightforward
+
+**About the `atan` function**:
+- `atan(y, x)` (two-argument version) is equivalent to atan2 in math, returning [-π, π]
+- `atan(y/x)` (single-argument version) only returns [-π/2, π/2], losing quadrant information
+- Always use the two-argument version
+
+```glsl
+// Basic transform
+float r = length(uv);           // Radius
+float theta = atan(uv.y, uv.x); // Angle, range [-PI, PI]
+
+// Wrapped as reusable functionvec2 toPolar(vec2 p) {
+    return vec2(length(p), atan(p.y, p.x));
+}
+
+// Normalize angle to [0, 1] rangevec2 polar = vec2(atan(uv.y, uv.x) / 6.283 + 0.5, length(uv));
+// polar.x in [0,1], polar.y is radius
+```
+
+### Step 3: Operations in Polar Coordinate Space
+
+**What**: Perform various transforms in (r, θ) space to create effects.
+
+**Why**: The unique property of polar coordinate space is that rotation, spirals, radial repetition, and other effects that are extremely difficult in Cartesian coordinates become simple addition, subtraction, and multiplication operations here.
+
+#### 3a. Radial Distortion (Swirl) — Angle Offset by Radius
+
+**Principle**: `θ_new = θ - k × r` causes points farther from the center to rotate more, naturally forming a vortex. `k` controls how "tight" the vortex is.
+
+```glsl
+// Greater radius = more rotation → vortex effect
+float spin_amount = 0.25; // Adjustable: vortex strength, 0=no rotation, 1=maximum rotation
+float new_theta = theta - spin_amount * 20.0 * r;
+```
+
+#### 3b. Angular Twist — Angle Plus Time Offset
+
+**Principle**: Adding functions of time and the angle itself to the angle produces distorted rings that change over time. The `sin(theta)` term makes the distortion non-uniform, creating an organic feel.
+
+```glsl
+// Angle varies with time and position → twisted ringsfloat twist_angle = theta + 2.0 * iTime + sin(theta) * sin(iTime) * 3.14159;
+```
+
+#### 3c. Archimedean Spiral — Radius Minus Angle
+
+**Principle**: The Archimedean spiral r = a + bθ has the property of equal spacing. In UV space, `y -= x` (i.e., r -= θ) "unfolds" concentric rings into equally-spaced spiral bands.
+
+```glsl
+// Unfold into spiral bandsvec2 spiral_uv = vec2(theta_normalized, r);
+spiral_uv.y -= spiral_uv.x; // Key: "unfold" radial space into spirals
+```
+
+#### 3d. Logarithmic Spiral — Angle Plus log(r) Shear
+
+**Principle**: The logarithmic spiral (equiangular spiral) r = ae^(bθ) has the property of self-similarity — it looks exactly the same when magnified. The `log(r)` shear makes rotation amount grow logarithmically at different radii, commonly seen in nature (nautilus shells, galaxy arms).
+
+```glsl
+// Logarithmic spiral stretch
+float shear = 2.0 * log(r); // Adjustable: coefficient controls spiral tightness
+float c = cos(shear), s = sin(shear);
+mat2 spiral_mat = mat2(c, -s, s, c); // Rotation matrix implements shear
+```
+
+#### 3e. Kaleidoscope — Angle Modulo and Mirroring
+
+**Principle**: Divides the 2π angular range into N equal sectors, then maps all pixels to a single sector. Mirroring makes adjacent sectors symmetric, avoiding seams.
+
+**Mathematical Derivation**:
+1. `sector = 2π / N`: Angular width of each sector
+2. `c_idx = floor((θ + sector/2) / sector)`: Current sector index
+3. `θ' = mod(θ + sector/2, sector) - sector/2`: Fold to [-sector/2, sector/2]
+4. `θ' *= (2 × (c_idx mod 2) - 1)`: Flip odd sectors
+
+```glsl
+// Angular subdivision + mirroring for kaleidoscopefloat rep = 12.0;          // Adjustable: number of symmetry axes
+float sector = TAU / rep;  // Angle per sector
+float a = polar.y;         // Angle component
+
+// Modulo to single sector
+float c_idx = floor((a + sector * 0.5) / sector);
+a = mod(a + sector * 0.5, sector) - sector * 0.5;
+
+// Mirror: flip adjacent sectors
+a *= mod(c_idx, 2.0) * 2.0 - 1.0;
+```
+
+#### 3f. Spiral Arm Compression — Periodic Modulation in Angular Domain
+
+**Principle**: Galaxy spiral arms are not simple lines but regions of higher matter density. `cos(N × (θ - shear))` creates periodic compression in the angular domain, causing matter (color/brightness) to accumulate along N arms. The `COMPR` parameter controls arm "sharpness".
+
+**Density Compensation**: Compression changes local density (like an accordion effect); `arm_density` compensates for this non-uniformity, preventing the arms from being too bright or too dark.
+
+```glsl
+// Galaxy spiral arm effect
+float NB_ARMS = 5.0;   // Adjustable: number of spiral arms
+float COMPR = 0.1;      // Adjustable: intra-arm compression strength
+float phase = NB_ARMS * (theta - shear);
+theta = theta - COMPR * cos(phase); // Compress angular domain to form arm structures
+float arm_density = 1.0 + NB_ARMS * COMPR * sin(phase); // Density compensation
+```
+
+### Step 4: Polar to Cartesian Reconstruction (Round Trip)
+
+**What**: Convert modified polar coordinates back to Cartesian coordinates.
+
+**Why**: Some effects need to transform in polar space and then return to xy space for further processing (e.g., overlaying texture noise, Truchet patterns, etc.). This forms the complete Cartesian→Polar→Cartesian "round trip".
+
+**Notes**:
+- After inverse transform, the coordinate origin may need adjustment (e.g., a `mid` offset to screen center)
+- If you only need to color in polar space (e.g., ring gradients), no inverse transform is needed
+
+```glsl
+// Basic inverse transform
+vec2 new_uv = vec2(r * cos(new_theta), r * sin(new_theta));
+
+// Wrapped as reusable functionvec2 toRect(vec2 p) {
+    return vec2(p.x * cos(p.y), p.x * sin(p.y));
+}
+
+// Complete round trip: offset to screen center after transform
+vec2 mid = (iResolution.xy / length(iResolution.xy)) / 2.0;
+vec2 warped_uv = vec2(
+    r * cos(new_theta) + mid.x,
+    r * sin(new_theta) + mid.y
+) - mid;
+```
+
+### Step 5: Polar Coordinate Shape Definition (SDF)
+
+**What**: Define signed distance fields of shapes via r(θ) functions in polar coordinates.
+
+**Why**: Many classic curves (cardioid, rose curves, star shapes) have elegant analytical expressions in polar coordinates that would be extremely complex in Cartesian coordinates.
+
+**Advantages of SDF**:
+- Negative value = inside, positive value = outside, zero = boundary
+- Convenient boolean operations (`max` = intersection, `min` = union)
+- `smoothstep` directly produces anti-aliased edges
+- `abs(d)` produces outlines, `1/abs(d)` produces glow
+
+```glsl
+// Cardioid
+float a = atan(p.x, p.y) / 3.141593; // Note: atan(x,y) not atan(y,x), so heart points up
+float h = abs(a);
+float heart_r = (13.0*h - 22.0*h*h + 10.0*h*h*h) / (6.0 - 5.0*h);
+float dist = r - heart_r; // Negative = inside, positive = outside
+
+// Rose curve / petals
+float PETAL_FREQ = 3.0; // Adjustable: petal frequency (K.x/K.y controls integer/fractional petals)
+float A_coeff = 0.2;    // Adjustable: petal amplitude
+float rose_dist = abs(r - A_coeff * sin(PETAL_FREQ * theta) - 0.5); // Distance to curve
+
+// Render SDF as visible shape
+float shape = smoothstep(0.01, -0.01, dist); // Anti-aliased edge
+```
+
+### Step 6: Coloring and Anti-Aliasing
+
+**What**: Color based on polar coordinate information and handle edge anti-aliasing.
+
+**Why**: Polar coordinate coloring naturally produces radial gradients and ring patterns. Anti-aliasing is especially important in polar coordinates because pixel density varies significantly away from the center due to angular subdivision.
+
+**Anti-aliasing method comparison**:
+
+| Method | Pros | Cons |
+|--------|------|------|
+| `fwidth` | Adaptive, precise | Requires GPU derivative support |
+| Fixed resolution width | Simple, reliable | Not adaptive to scaling |
+| `smoothstep` + fixed offset | Simplest | Average results |
+
+```glsl
+// Adaptive anti-aliasing based on fwidthfloat aa = smoothstep(-1.0, 1.0, value / fwidth(value));
+
+// Resolution-based anti-aliasingfloat aa_size = 2.0 / iResolution.y;
+float edge = smoothstep(0.5 - aa_size, 0.5 + aa_size, value);
+
+// General SDF anti-aliasing using smoothstep
+float d = some_sdf_value;
+float col = smoothstep(aa_size, -aa_size, d); // aa_size ≈ 1~3 pixels
+
+// Radial gradient coloring
+vec3 color = vec3(1.0, 0.4 * r, 0.3); // Color varies with radius
+color *= 1.0 - 0.4 * r;               // Darken at edges
+
+// Inter-spiral-band anti-aliasingfloat inter_spiral_aa = 1.0 - pow(abs(2.0 * fract(spiral_uv.y) - 1.0), 10.0);
+```
+
+## Variant Details
+
+### Variant 1: Dynamic Vortex/Swirl Background
+
+**Difference from basic version**: Complete Cartesian→Polar→Cartesian round trip + iterative domain warping to generate complex textures.
+
+**Technical Points**:
+1. First apply vortex distortion in polar coordinates
+2. Convert back to Cartesian coordinates
+3. Perform 5 iterations of domain warping in the transformed space, each iteration nonlinearly offsetting coordinates
+4. The iterative sin/cos combination produces complex organic textures
+
+**Parameter Descriptions**:
+- `SPIN_AMOUNT`: Vortex strength, controls polar distortion magnitude
+- `SPIN_EASE`: Vortex easing, makes rotation speed differ between center and edges
+- `speed`: Animation speed, driven by `iTime`
+
+```glsl
+// Polar coordinate vortex transform
+float new_angle = atan(uv.y, uv.x) + speed
+    - SPIN_EASE * 20.0 * (SPIN_AMOUNT * uv_len + (1.0 - SPIN_AMOUNT));
+vec2 mid = (screenSize.xy / length(screenSize.xy)) / 2.0;
+uv = vec2(uv_len * cos(new_angle) + mid.x,
+           uv_len * sin(new_angle) + mid.y) - mid;
+
+// Iterative domain warping for organic textures
+uv *= 30.0;
+for (int i = 0; i < 5; i++) {
+    uv2 += sin(max(uv.x, uv.y)) + uv;
+    uv  += 0.5 * vec2(cos(5.1123 + 0.353*uv2.y + speed*0.131),
+                       sin(uv2.x - 0.113*speed));
+    uv  -= cos(uv.x + uv.y) - sin(uv.x*0.711 - uv.y);
+}
+```
+
+### Variant 2: Polar Torus Twist
+
+**Difference from basic version**: Renders geometry directly in polar coordinate space (without returning to Cartesian), simulating a 3D torus through angular slicing.
+
+**Technical Points**:
+1. Offset the r dimension to the ring's centerline (`r -= OUT_RADIUS`) to center the ring region
+2. "Slice" along the ring in the angular dimension, with each slice being one edge of a regular polygon
+3. The `twist` variable makes the polygon twist along the ring, producing a Möbius strip-like effect
+4. The `sin(uvr.y)*sin(iTime)` term varies the twist speed with angle, creating organic squeezing/stretching
+
+```glsl
+// Geometric slicing in polar coordinates
+vec2 uvr = vec2(length(uv), atan(uv.y, uv.x) + PI);
+uvr.x -= OUT_RADIUS; // Offset to ring centerline
+
+float twist = uvr.y + 2.0*iTime + sin(uvr.y)*sin(iTime)*PI;
+for (int i = 0; i < NUM_FACES; i++) {
+    float x0 = IN_RADIUS * sin(twist + TAU * float(i) / float(NUM_FACES));
+    float x1 = IN_RADIUS * sin(twist + TAU * float(i+1) / float(NUM_FACES));
+    // Define face start/end positions in the polar r direction
+    vec4 face = slice(x0, x1, uvr);
+    col = mix(col, face.rgb, face.a);
+}
+```
+
+### Variant 3: Galaxy / Logarithmic Spiral (Galaxy Style)
+
+**Difference from basic version**: Uses `log(r)` for equiangular spirals, combined with FBM noise and spiral arm compression.
+
+**Technical Points**:
+1. The `log(r)` shear is the core — it maps concentric circles to logarithmic spirals
+2. Rotation matrix R rotates the noise sampling coordinates by the shear angle, aligning noise along the spiral arms
+3. `NB_ARMS` and `COMPR` control the number and sharpness of arms
+4. FBM noise is sampled in the rotated space, producing galactic dust texture
+
+```glsl
+float rho = length(uv);
+float ang = atan(uv.y, uv.x);
+float shear = 2.0 * log(rho);     // Logarithmic spiral core
+mat2 R = mat2(cos(shear), -sin(shear), sin(shear), cos(shear));
+
+// Spiral arms
+float phase = NB_ARMS * (ang - shear);
+ang = ang - COMPR * cos(phase) + SPEED * t; // Inter-arm compression
+uv = rho * vec2(cos(ang), sin(ang));         // Reconstruct Cartesian
+float gaz = fbm_noise(0.09 * R * uv);        // Sample noise in spiral space
+```
+
+### Variant 4: Archimedean Spiral Band + Vortices
+
+**Difference from basic version**: Unfolds polar coordinates into spiral bands, creates independent vortex animations within bands, with arc-length parameterization.
+
+**Technical Points**:
+1. `U.y -= U.x` is the core of Archimedean unfolding — converts concentric rings to equally-spaced spiral bands
+2. Arc-length parameterization `arc_length()` ensures uniform cell area within the spiral band
+3. Each cell uses `dot` + `cos` to create a small vortex, strong at center, weak at edges
+4. `cell_id.x` gives different cells different vortex phases, avoiding monotonous repetition
+
+```glsl
+vec2 U = vec2(atan(U.y, U.x)/TAU + 0.5, length(U));
+U.y -= U.x;                                    // Archimedean unfolding
+U.x = arc_length(ceil(U.y) + U.x) - iTime;     // Arc-length parameterization
+
+// Vortex within each cell of the spiral band
+vec2 cell_uv = fract(U) - 0.5;
+float vortex = dot(cell_uv,
+    cos(vec2(-33.0, 0.0)                       // Rotation matrix angle offset
+        + 0.3 * (iTime + cell_id.x)            // Time + spatial rotation amount
+        * max(0.0, 0.5 - length(cell_uv))));   // Strong at center, weak at edges
+```
+
+### Variant 5: Complex Number / Polar Duality (Jeweled Vortex Style)
+
+**Difference from basic version**: Uses complex number operations (multiplication = rotation + scaling, power = spiral mapping) instead of explicit trigonometric functions to implement conformal mappings.
+
+**Technical Points**:
+1. Complex power `z^(1/e)` is equivalent to `(r^(1/e), θ/e)` in polar coordinates — simultaneously scaling radius and compressing angle
+2. `exp(log(length(u)) / e)` implements `r^(1/e)` without explicitly computing the power
+3. `ceil(r - a/TAU)` produces spiral contour lines — corresponding to different sheets of the Riemann surface in the complex plane
+4. Multi-layered `sin`/`cos` combinations produce jewel-like interference colors
+
+```glsl
+float e = n * 2.0;  // Complex power exponent, controls spiral curvature
+float a = atan(u.y, u.x) - PI/2.0;     // Angle
+float r = exp(log(length(u)) / e);      // r^(1/e) — complex root
+float sc = ceil(r - a/TAU);             // Spiral contour lines
+float s = pow(sc + a/TAU, 2.0);         // Spiral gradient
+// Multi-layer spiral compositing
+col += sin(cr + s/n * TAU / 2.0);       // Spiral color layer 1
+col *= cos(cr + s/n * TAU);             // Spiral color layer 2
+col *= pow(abs(sin((r - a/TAU) * PI)), abs(e) + 5.0); // Smooth edges
+```
+
+## In-Depth Performance Analysis
+
+### 1. Avoiding Numerical Issues at the Pole
+
+`atan(0,0)` and `length(0)` may produce numerical instability near the origin. While GLSL's `atan` won't crash at the origin, the return value is undefined and may cause flickering.
+
+```glsl
+// Safe polar coordinate conversion
+float r = max(length(uv), 1e-6); // Avoid division by zero
+float theta = atan(uv.y, uv.x);  // atan2 is not well-defined at origin but won't crash
+```
+
+**When needed**: Protection is required when subsequent calculations include `1.0/r`, `log(r)`, or `normalize(uv)`. If only `r * something`, r=0 at the origin is naturally safe.
+
+### 2. Trigonometric Function Optimization
+
+Frequent sin/cos calls are the main cost of polar coordinate shaders. Although GPU sin/cos is hardware-accelerated, heavy use in loops can still become a bottleneck.
+
+```glsl
+// If both sin and cos are needed, replace with a single matrix multiplication
+mat2 ROT(float a) { float c=cos(a), s=sin(a); return mat2(c,s,-s,c); }
+vec2 rotated = ROT(angle) * uv; // Cleaner than computing sin, cos separately and manually constructing
+
+// Use vector dot product instead of explicit trig
+// Instead of U.y = cos(rot)*U.x + sin(rot)*U.y
+// Use U.y = dot(U, cos(vec2(-33,0) + angle))
+```
+
+**Principle**: `cos(vec2(a, b))` in GLSL is a single SIMD instruction that computes two cos values simultaneously. Combined with `dot`, rotation can be achieved with only one `cos` call (leveraging the identity `cos(x - π/2) = sin(x)`).
+
+### 3. Leveraging Kaleidoscope Symmetry
+
+A kaleidoscope inherently reduces computation by a factor of N (N = number of symmetry segments), serving as a natural optimization. All expensive pattern calculations are done in just one sector:
+
+```glsl
+// Do kaleidoscope folding first, then expensive pattern computation
+vec2 kp = kaleidoscope(polar, segments); // Cheap
+vec2 rect = toRect(kp);
+// All subsequent computation only applies to one sector
+float expensive_pattern = some_costly_function(rect); // Same cost but N× visual complexity
+```
+
+**Note**: The cost of kaleidoscope folding itself (a few `floor`, `mod`, and multiplication operations) is far less than the visual complexity it "saves". A 12-segment kaleidoscope means you get 12x visual richness for 1/12 the pattern computation cost.
+
+### 4. Loop Optimization in Spiral Bands
+
+For effects like rose curves that require multi-loop computation, keep loop counts reasonable:
+
+```glsl
+// Rose curves only need ceil(K.y) loops
+for (int i = 0; i < 7; i++) { // 7 loops are enough to cover most fractional frequencies
+    v = max(v, ribbon_value);
+    a += 6.28; // Next loop
+}
+// Don't use excessively large loop counts; 4~8 loops suffice for most cases
+```
+
+**Why 4~8 loops**: The rose curve r = cos(p/q × θ) has a period of q loops (when p/q is fractional). For most practical petal frequencies, 7 loops provide full coverage. Excessive loops not only waste computation but may also produce artifacts from floating-point accumulation errors.
+
+### 5. Pixel Filter Downsampling
+
+For stylized effects, downsampling can dramatically reduce computation:
+
+```glsl
+float pixel_size = length(iResolution.xy) / 745.0; // Adjustable: smaller = more pixelated
+vec2 uv = floor(fragCoord / pixel_size) * pixel_size; // Quantize coordinates
+// All subsequent computation uses quantized uv, adjacent pixels share results
+```
+
+**Performance benefit**: If pixel_size makes each "virtual pixel" cover 4×4 actual pixels, the GPU only needs to compute 1/16 of unique values (remaining adjacent pixels produce identical results and may benefit from cache optimization).
+
+## Complete Combination Code Examples
+
+### Polar Coordinates + FBM Noise
+
+Sample FBM noise in polar coordinate space to produce organic spiral textures (galactic dust, flame vortices):
+
+```glsl
+vec2 polar_uv = rho * vec2(cos(modified_ang), sin(modified_ang));
+float organic = fbm(polar_uv * frequency); // Sample in transformed space
+```
+
+### Polar Coordinates + Truchet Patterns
+
+Lay Truchet tiles in kaleidoscope-folded space to produce kaleidoscopic geometric tunnel effects. The kaleidoscope provides symmetry; Truchet provides detail patterns.
+
+```glsl
+// Kaleidoscope folding
+vec2 kp = kaleidoscope(polar, segments);
+vec2 rect = toRect(kp);
+
+// Truchet grid
+rect *= 4.0;
+vec2 cell_id = floor(rect + 0.5);
+vec2 cell_uv = fract(rect + 0.5) - 0.5;
+float cell_hash = fract(sin(dot(cell_id, vec2(127.1, 311.7))) * 43758.5453);
+
+// Arc Truchet
+float d = length(cell_uv);
+float truchet = abs(d - 0.35);
+if (cell_hash > 0.5) {
+    truchet = min(truchet, abs(length(cell_uv - 0.5) - 0.5));
+} else {
+    truchet = min(truchet, abs(length(cell_uv + 0.5) - 0.5));
+}
+```
+
+### Polar Coordinates + SDF Shapes
+
+Define shape contours with polar equations r(θ), combined with SDF techniques for boolean operations, rounded corners, and glow:
+
+```glsl
+float heart_sdf = r - heart_r_theta;
+float glow = 0.02 / abs(heart_sdf); // Glow effect
+float solid = smoothstep(0.01, -0.01, heart_sdf); // Solid fill
+```
+
+### Polar Coordinates + Checkerboard/Grid
+
+Lay a checkerboard pattern in polar coordinate space, naturally forming ring/spiral checkerboards:
+
+```glsl
+// Create checkerboard in polar UV
+float checker = sign(sin(u * PI * 4.0) * cos(uvr.y * 16.0));
+col *= checker * (1.0/16.0) + 0.7; // Low contrast checkerboard texture
+```
+
+### Polar Coordinates + Post-Processing
+
+Polar coordinate effects combined with gamma correction, vignette, and color mapping can greatly enhance visual quality:
+
+```glsl
+col = pow(col, vec3(1.0/2.2));                                    // Gamma
+col = col*0.6 + 0.4*col*col*(3.0-2.0*col);                      // Contrast enhancement
+col *= 0.5 + 0.5*pow(19.0*q.x*q.y*(1.0-q.x)*(1.0-q.y), 0.7);  // Vignette
+```
--- a/skills/shader-dev/reference/post-processing.md
+++ b/skills/shader-dev/reference/post-processing.md
@@ -0,0 +1,375 @@
+# Post-Processing Effects Detailed Reference
+
+This file is a complete supplement to [SKILL.md](SKILL.md), covering prerequisites, detailed explanations of each step (what and why), variant details, in-depth performance optimization analysis, and complete combination suggestions.
+
+## Prerequisites
+
+- GLSL fundamentals and the ShaderToy environment (iResolution, iTime, iChannel, textureLod, etc.)
+- Basic vector and matrix operations
+- Difference between linear color space and gamma correction
+- Texture sampling and UV coordinate systems
+- Basic concepts of convolution (kernel, weights, normalization)
+- Multi-pass rendering concepts (Buffer A/B/C/D and Image pass in ShaderToy)
+
+## Applicable Scenarios
+
+Use this technique when you have completed the primary rendering of a scene and need screen-space image enhancement on the result. Typical applications include:
+
+- **HDR to LDR Conversion**: After using linear HDR lighting in a scene, tone mapping is needed to compress values into the displayable range
+- **Atmosphere Enhancement**: Effects like vignette, color grading, and film grain to enhance a cinematic look
+- **Glow and Bloom**: Simulating lens bloom to produce soft light diffusion around bright areas
+- **Motion and Defocus Blur**: Simulating physical camera characteristics through motion blur and depth of field
+- **Anti-Aliasing**: Post-processing AA solutions such as FXAA and TAA
+- **Chromatic Aberration and Lens Effects**: Optical simulations like chromatic aberration and lens flare
+
+## Core Principles
+
+### Tone Mapping
+
+Maps HDR linear color values from [0, ∞) to the LDR display range [0, 1]. Core mathematical models:
+
+- **Reinhard**: `color = color / (1.0 + color)`, a simple S-curve compression
+- **Filmic Reinhard**: `q = (T²+1)·x² / (q + x + T²)`, with white point (W) and shoulder (T) parameters
+- **ACES**: Industry standard, converts colors to the ACES color space via a 3×3 matrix, then applies a rational polynomial `(ax+b)/(cx+d)+e` for nonlinear mapping
+- **General Rational Polynomial**: `(a·x²+b·x) / (c·x²+d·x+e)`, can fit various tone curves
+
+### Gaussian Blur
+
+2D Gaussian kernel `G(x,y) = exp(-(x²+y²)/(2σ²))`. Due to separability, it can be split into two 1D passes (horizontal + vertical), reducing O(n²) to O(2n).
+
+### Bloom
+
+Extracts bright pixels (bright-pass threshold), then applies multi-level Gaussian blur and adds the result back to the original image. Multi-octave approach: progressively downsample + blur, then progressively composite, producing bloom layers from narrow to wide.
+
+### Vignette
+
+Attenuates brightness based on the pixel's distance to the screen center. Common formulas:
+- **Multiplicative**: Power of `16·u·v·(1-u)·(1-v)`
+- **Radial**: `1 - pow(dist * scale, exponent)` mixed with strength
+
+### Chromatic Aberration
+
+Simulates the difference in lens refraction for different wavelengths. Samples the same texture with different scale factors for R/G/B channels, with offset increasing from center to edges.
+
+## Implementation Steps
+
+### Step 1: Tone Mapping — Map HDR to Displayable Range
+
+**What**: Compress HDR linear color values from the render output into the [0,1] range.
+
+**Why**: Physically correct lighting calculations produce brightness values far exceeding the display range. Direct clamping would lose highlight detail. Tone mapping uses a nonlinear curve to preserve shadow detail and highlight transitions.
+
+Comparison of four approaches:
+- **Reinhard**: Simplest, good for beginners. A single line `color / (1.0 + color)` achieves S-curve compression, but the highlight region is compressed too aggressively, lacking a smooth "shoulder" transition.
+- **Filmic Reinhard**: The white point (W) parameter controls the mapping position of the brightest value, and the shoulder parameter (T2) controls how gently highlights are compressed. Higher T2 values produce softer highlight transitions.
+- **ACES**: Industry standard approach. First converts linear sRGB to the ACES AP1 color space via an input matrix, applies a rational polynomial nonlinear mapping, then converts back to sRGB via an output matrix. Most accurate color representation, but slightly more computationally expensive.
+- **General Rational Polynomial**: A general curve with 5 adjustable parameters that can manually fit any tone curve. Maximum flexibility, but requires manual parameter tuning.
+
+### Step 2: Gamma Correction — Linear Space to Display Space
+
+**What**: Convert linear color values to sRGB gamma space for correct display on monitors.
+
+**Why**: Monitor brightness response is nonlinear (approximately γ=2.2). Directly outputting linear values would appear too dark. Gamma correction compensates with `pow(1/2.2)`.
+
+Notes:
+- The ACES approach already includes gamma correction, so no additional step is needed
+- Some pipelines use 0.4545 (≈1/2.2) as the gamma value
+- Gamma correction must be performed after tone mapping
+
+### Step 3: Contrast Enhancement — Hermite S-Curve
+
+**What**: Apply an S-curve to the tone-mapped colors to enhance midtone contrast.
+
+**Why**: After tone mapping, the image may appear flat. An S-curve makes darks darker and brights brighter, increasing visual impact. The cubic Hermite basis function `3x² - 2x³` of `smoothstep` is a natural S-curve.
+
+Implementation details:
+- Must be performed after gamma correction, when the value range is [0,1]
+- Use `clamp` to ensure input is within valid range
+- The `contrast_strength` parameter controls effect intensity via `mix`, 0 for no effect, 1 for full effect
+- The `smoothstep(-0.025, 1.0, color)` version provides a slight toe lift in the darks, avoiding pure black
+
+### Step 4: Color Grading
+
+**What**: Apply channel-level adjustments to shift the overall color tone.
+
+**Why**: Different color temperatures and tones convey different moods. Warm tones (yellow/orange bias) give a cozy feeling, while cool tones (blue/cyan bias) give a sense of detachment.
+
+Four approaches in detail:
+- **Per-Channel Multiplication**: Simplest and most direct. `vec3(1.11, 0.89, 0.79)` boosts the red channel while reducing blue/green, producing warm tones. Swap the coefficients for cool tones.
+- **Power Color Grading**: Adjusts color by changing each channel's gamma curve. Values <1 brighten that channel, >1 darken it. Gentler than multiplication, with greater impact on midtones.
+- **HSV Hue Shift**: After converting to HSV, you can directly rotate the hue and adjust saturation. Suitable for scenarios requiring precise hue control.
+- **Desaturation Blend**: Mixes the original color with its luminance value (grayscale). Higher blend ratios produce a more washed-out look, creating a "cinematic" or "faded" effect.
+
+### Step 5: Vignette
+
+**What**: Darken the edges of the image to guide the viewer's focus toward the center.
+
+**Why**: Simulates the optical vignetting of real lenses and is a classic film composition technique.
+
+Comparison of three approaches:
+- **Approach A (Multiplicative, classic)**: `16·u·v·(1-u)·(1-v)` constructs a parabolic surface in UV space that equals 1 at the center and 0 at the corners. The power parameter controls falloff speed, 0.25 is commonly used. Advantage: minimal computation. Disadvantage: fixed rectangular gradient shape.
+- **Approach B (Radial distance)**: Based on the Euclidean distance from pixel to screen center. Accounts for aspect ratio correction, producing an elliptical vignette. Three parameters control intensity, starting radius, and falloff steepness.
+- **Approach C (Inverse quadratic falloff)**: `1/(1 + dot(p,p))` produces very natural optical vignetting. Squaring twice makes the falloff more pronounced. Smoothstep blending controls effect intensity.
+
+### Step 6: Gaussian Blur — Basic Blur
+
+**What**: Apply Gaussian convolution blur to the image. This is the fundamental building block for Bloom.
+
+**Why**: The Gaussian kernel is the only smoothing kernel that is both isotropic and separable, producing a naturally soft blur.
+
+Implementation details:
+- `normpdf` computes the Gaussian probability density, where 0.39894 ≈ 1/√(2π)
+- KERNEL_SIZE must be odd to ensure center symmetry
+- First build a 1D kernel and exploit symmetry (`kernel[HALF+j] = kernel[HALF-j]`)
+- Z is the normalization factor, ensuring all weights sum to 1
+- 2D convolution is implemented via two nested loops, with the outer product `kernel[j] * kernel[i]` constructing 2D weights
+- In production, use a separable approach (two 1D passes) instead for better performance
+
+### Step 7: Bloom — HDR Glow
+
+**What**: Extract bright areas from the image, apply multi-level blur, and add the result back to create a glow diffusion effect.
+
+**Why**: Both the human eye and camera lenses see glow around strong light sources. Bloom is the most impactful post-processing effect for enhancing the "HDR feel" of an image.
+
+Implementation details:
+- Uses `textureLod` to sample from high LOD levels of the mipmap; the GPU hardware automatically handles downsampled blur
+- Sampling from LOD 5/6/7 corresponds to approximately 32x/64x/128x downsampling, producing different blur radii from narrow to wide
+- 2x2 neighborhood supersampling (loop from -1 to 1) reduces blockiness
+- `maxBloom` cap prevents extremely bright pixels from producing excessive bloom
+- `pow(bloom, vec3(1.5))` applies gamma adjustment to concentrate bloom in bright areas
+- Note: ShaderToy Buffers do not generate mipmaps by default; this must be enabled in the channel settings
+
+### Step 8: Chromatic Aberration
+
+**What**: Sample R/G/B channels with different UV scales to simulate lens dispersion.
+
+**Why**: Real lenses cannot focus all wavelengths of light onto the same focal plane. This "imperfection" actually adds realism and visual interest to the image.
+
+Implementation details:
+- Offset direction is calculated from the screen center
+- In each iteration, R/G/B channels are sampled with different scale factors
+- The red channel contracts (rf decreasing), blue channel expands (bf increasing), green channel remains nearly unchanged
+- The difference in contraction/expansion rates produces the dispersion effect, increasing from center to edges
+- The iterative implementation accumulates samples at different scale factors to simulate a continuous spectrum
+- CA_SAMPLES: more samples produce smoother results; 4-8 is usually sufficient
+
+### Step 9: Film Grain
+
+**What**: Overlay pseudo-random noise to simulate film grain texture.
+
+**Why**: Subtle random noise breaks the "perfect" feel of digital images, adds organic texture, and helps reduce color banding.
+
+Two implementation approaches:
+- **Hash Noise**: A simple `fract(sin(...) * 43758.5453)` pseudo-random function. Multiplied by iTime to ensure different noise each frame. An intensity of around 0.012 looks natural.
+- **Bayer Matrix Ordered Dithering**: A 4x4 Bayer matrix provides 17 levels of ordered dithering. More uniform than random noise, particularly suitable for eliminating 8-bit color banding. `(dither - 0.5) * 4.0 / 255.0` limits the dither amount to approximately ±2 color levels.
+
+### Step 10: Motion Blur
+
+**What**: Apply directional blur along each pixel's motion direction.
+
+**Why**: Static frames lack a sense of motion. Motion blur simulates the effect of object movement during shutter exposure, making animation smoother and more natural.
+
+Implementation details:
+- Motion direction is determined from a velocity buffer
+- Samples uniformly along the motion direction with linearly decreasing weights (lower weight at greater distances)
+- MB_STRENGTH controls the blur radius (in UV space); 0.25 means sampling up to 25% screen distance from the pixel
+- 32 samples are usually sufficient; random jittering can achieve similar results with fewer samples
+
+Camera reprojection approach:
+- Requires a depth buffer and previous frame's camera matrix
+- Projects the current pixel's world coordinate to the previous frame's UV to obtain the motion vector
+- The shutterAngle parameter (0~1) controls the blur amount
+- Randomized sample positions avoid regular stripe artifacts
+
+### Step 11: Depth of Field
+
+**What**: Calculate the Circle of Confusion (CoC) based on pixel depth and focal plane distance, and use disk sampling with defocus to simulate out-of-focus blur.
+
+**Why**: Simulates a real thin lens model, producing soft bokeh for objects outside the focal plane, enhancing depth perception.
+
+Implementation details:
+- **CoC Model**: Based on the thin lens formula, CoC size is proportional to how much the pixel depth deviates from the focal plane. The aperture parameter controls the aperture size, affecting the depth of field range.
+- **Fibonacci Spiral Sampling**: The golden angle (≈ 2.3998 radians) ensures sampling points are uniformly distributed on the disk. `sqrt(i)` radius increment produces uniform area density.
+- **Weight Strategy**: Uses each sample point's own CoC as weight, ensuring in-focus sharp regions are not "contaminated" by out-of-focus blur.
+- 64 samples produce high-quality bokeh; 32 are sufficient for most needs.
+
+### Step 12: FXAA — Fast Approximate Anti-Aliasing
+
+**What**: Detect aliased edges in the image and apply directional blur along edges to eliminate aliasing.
+
+**Why**: Post-processing AA does not require modifying the rendering pipeline and has extremely low cost. FXAA detects edge direction through luminance gradients and uses a small number of texture samples for directional blurring.
+
+Implementation details:
+- Sample luminance from 4 diagonal neighbors (NW, NE, SW, SE) and the center
+- Calculate the luminance range (lumaMin/lumaMax) for final quality assessment
+- Edge direction is computed from horizontal/vertical luminance differences
+- `dirReduce` and `rcpDirMin` control the scaling of the direction vector to prevent excessive blurring
+- Two-level sampling strategy: rgbA samples at 1/3 and 2/3 positions, rgbB adds samples at -0.5 and 0.5 positions on top of that
+- Final decision: if rgbB's luminance exceeds the neighborhood range (indicating an edge crossing), fall back to rgbA
+
+## Variant Details
+
+### Variant 1: Multi-Pass Separable Bloom
+
+Differs from the basic single-pass mipmap bloom: uses independent Buffers for separable Gaussian blur (horizontal pass + vertical pass), providing higher bloom quality and greater control.
+
+**Buffer A Details (Horizontal Blur + Downsampling)**:
+- `BLOOM_THRESHOLD`: Brightness threshold; only pixels exceeding this value enter bloom. Lower values mean more pixels participate.
+- `BLOOM_DOWNSAMPLE`: Downsampling factor; 3 means computing at 1/3 resolution. Reduces computation while expanding the effective blur radius.
+- `BLUR_RADIUS`: Blur radius (in pixels); 16 means sampling 16 pixels in each direction.
+- The `-8.0` in the Gaussian weight `exp(-8.0 * d * d)` controls the falloff speed; adjust to change the "softness" of the blur.
+- Boundary check `xy.x >= int(iResolution.x) / BLOOM_DOWNSAMPLE` ensures computation only within the downsampled region.
+
+**Buffer B Details (Vertical Blur)**:
+- Identical structure to Buffer A, except the sampling direction changes from horizontal `ivec2(k, 0)` to vertical `ivec2(0, k)`
+- Input is Buffer A's output (iChannel0 bound to Buffer A)
+- The combination of two separable blur passes is equivalent to a full 2D Gaussian blur
+
+### Variant 2: ACES + Complete Color Pipeline
+
+Differs from the basic version: uses the complete ACES RRT+ODT pipeline, including color space matrix conversion and built-in sRGB gamma, suitable for projects pursuing cinema-grade color.
+
+Key differences:
+- Input matrix m1 converts linear sRGB to the ACES AP1 color space
+- Rational polynomial `(v*(v+a)-b) / (v*(c*v+d)+e)` simulates the ACES RRT (Reference Rendering Transform)
+- Output matrix m2 converts ACES AP1 back to linear sRGB
+- The final `pow(..., 1/2.2)` performs sRGB gamma encoding, so a separate gamma correction step is not needed when using this approach
+
+### Variant 3: Physical DoF + Motion Blur Combination
+
+Differs from the basic version: uses depth buffer and previous frame camera matrix for physically correct depth of field + motion blur, sharing the same sampling loop.
+
+Key design:
+- DoF and motion blur are processed in the same for loop, avoiding two independent sampling passes
+- `randomT` hash randomizes each sample point's time position, reducing regular stripe artifacts
+- Motion blur: interpolates between current and previous frame UV by `shutterAngle`
+- DoF: Fibonacci spiral offset, with offset amount controlled by CoC
+- Both effects share the same `textureLod` sample after stacking, saving half the bandwidth
+
+### Variant 4: TAA Temporal Anti-Aliasing
+
+Differs from basic FXAA: leverages multi-frame history for temporal domain supersampling. Each frame uses sub-pixel jittering, blends with the previous frame, and uses neighborhood color clamping to prevent ghosting.
+
+Key steps explained:
+1. **De-jittered Sampling**: The current frame is rendered with sub-pixel jitter; during sampling, the jitter offset must be subtracted to restore the correct UV
+2. **Neighborhood Clamping**: The min/max of colors in the 3x3 neighborhood defines the "reasonable color range". History frame colors outside this range indicate scene changes (occlusion/reveal)
+3. **Reprojection**: Uses the current pixel's world coordinates and the previous frame's view-projection matrix to calculate the corresponding UV position in the previous frame
+4. **Blend Strategy**: When the history frame is within the reasonable range, use a high weight (0.9) for temporal stability; when outside the range, use 0 weight to fully use the current frame and avoid ghosting
+5. `blend = 0.9` is adjustable: higher values are smoother but more prone to trailing artifacts
+
+### Variant 5: Lens Flare + Starburst
+
+Differs from the basic version: overlays lens flare simulation on top of bloom, including starburst and chromatic ghosts.
+
+Key techniques explained:
+- **Starburst Pattern**: `cos(angle * NUM_APERTURE_BLADES)` creates a periodic pattern in the angular domain, simulating diffraction from aperture blades. `NUM_APERTURE_BLADES` controls the number of starburst points. `pow` controls the sharpness of the starburst, becoming less pronounced farther from the light source.
+- **Octagonal Ghosts**: Multiple ghosts placed at reflected positions along the optical axis (the line from the sun to the screen center). `smoothstep` produces soft-edged disk shapes.
+- **Spectral Color**: `wavelengthToRGB` converts wavelength (nm) to RGB; `fract(ghostDist * 5.0)` produces rainbow bands within the ghost, simulating the dispersion effect of real lens ghosts.
+
+## In-Depth Performance Optimization
+
+### 1. Separable Blur Instead of 2D Convolution
+
+An 11×11 2D Gaussian convolution requires 121 samples; splitting into two 1D passes requires only 22. This is the primary optimization for all blur operations.
+
+Mathematical basis: The separability of the Gaussian kernel `G(x,y) = G(x) · G(y)`, meaning a 2D Gaussian kernel equals the outer product of two 1D Gaussian kernels. This means performing horizontal blur followed by vertical blur (or vice versa) produces identical results to direct 2D convolution.
+
+### 2. Hardware Mipmap Instead of Manual Downsampling
+
+`textureLod(tex, uv, lod)` leverages the GPU hardware's mipmap chain for free downsampled blur, suitable for fast bloom. Note that ShaderToy Buffers do not generate mipmaps by default (you need to enable `mipmap` in the channel settings).
+
+Each mipmap level halves the resolution, equivalent to a 2x2 box filter. LOD 5 corresponds to 32x downsampling, LOD 6 to 64x, LOD 7 to 128x. Although a box filter is not a Gaussian kernel, the results approach Gaussian when multiple levels are combined.
+
+### 3. Downsample Before Blurring
+
+Bloom does not need to be computed at full resolution. Downsample the image by 2-4x first, blur at low resolution, then bilinearly upsample back.
+
+Advantages:
+- Computation reduced by 4-16x (area ratio)
+- The same blur kernel size covers a larger screen area at lower resolution
+- Bilinear interpolation during upsampling automatically smooths the result
+
+### 4. Reduce Sample Count
+
+Recommended sample counts:
+- Motion blur: 16-32 samples are usually sufficient; use random jittering (temporal jitter) instead of regular intervals to hide stripe artifacts from insufficient sampling
+- DoF: 32-64 Fibonacci spiral samples produce high-quality bokeh. Fibonacci spirals are more uniform than random distribution, avoiding clustering
+- Chromatic Aberration: 4-8 samples produce good results, since chromatic aberration is inherently a low-frequency variation
+
+### 5. Leverage Bilinear Interpolation for Free Blur
+
+Sampling between two texels causes the GPU hardware to automatically perform bilinear blending, equivalent to a 2-tap average. A single sample effectively obtains weighted information from 4 texels.
+
+Application: Optimize a 5-tap Gaussian blur to 3 texture samples (1 at center + 1 on each side, with the side sample points placed between two texels).
+
+### 6. Conditional Compilation
+
+Use `#define` switches to control each post-processing module. Disabling unneeded effects has zero cost — the preprocessor completely removes the code, generating no instructions.
+
+```glsl
+#define ENABLE_BLOOM 0  // Disable bloom; the branch code is completely absent after compilation
+```
+
+### 7. Avoid Branching
+
+if/else statements in post-processing should be converted to mathematical forms like `mix`/`step`/`smoothstep` whenever possible, avoiding GPU warp divergence.
+
+Example:
+```glsl
+// Bad: if/else branching
+if (brightness > threshold) color = bright_path; else color = dark_path;
+
+// Good: mathematical form
+float t = step(threshold, brightness);
+color = mix(dark_path, bright_path, t);
+```
+
+## Combination Suggestions
+
+### 1. Bloom + Tone Mapping (Most Basic Combination)
+
+Bloom is computed in linear HDR space, added to the scene, then tone mapping is applied. **The order must not be reversed** — doing bloom in LDR space means highlights have already been clamped, and bloom cannot correctly extract super-bright pixels.
+
+```glsl
+// Correct order
+color += bloom;          // Add bloom in HDR space
+color = tonemap(color);  // Then tone map
+color = pow(color, vec3(1.0/2.2)); // Finally gamma
+```
+
+### 2. TAA + Motion Blur + DoF (Physical Camera Simulation)
+
+TAA removes aliasing first, then DoF and motion blur can share a sampling loop. TAA's sub-pixel jitter can also complement motion blur's temporal jitter.
+
+Suggested pipeline order:
+1. TAA (Buffer D): Blend current frame + history frame
+2. DoF + Motion Blur (Image pass): Shared sampling loop
+3. Other subsequent effects
+
+### 3. Chromatic Aberration + Vignette + Film Grain (Lens Simulation Trio)
+
+These three effects all simulate physical lens imperfections; when combined, the image has a strong "real footage" feel.
+
+Execution order:
+1. Chromatic Aberration (CA) is done during the sampling stage — directly replaces the normal `texture()` call
+2. Vignette is applied after all color processing, multiplicatively
+3. Grain is applied last, additively
+
+### 4. Color Grading + Tone Mapping + Contrast (Color Pipeline)
+
+Color grading (multiplication/power adjustments) is done in linear space, tone mapping handles HDR compression, and the S-curve contrast is applied in gamma space. The order of these three steps determines the final color style.
+
+Key point: Color grading in linear space produces the most natural results, because the perceived brightness relationships are correct in linear space.
+
+### 5. Bloom + Lens Flare (Cinematic Light Effects)
+
+Bloom provides soft highlight diffusion; lens flare provides starburst and ghosts. Both share the same bright-pass extraction result, but flare computes directional patterns while bloom is isotropic blur.
+
+### 6. Multi-Pass Complete Pipeline (Production-Grade)
+
+Recommended production-grade pipeline:
+- **Buffer A**: Scene rendering + velocity/depth encoding (pack motion vectors and depth into alpha channel or additional textures)
+- **Buffer B**: Bloom downsampling + horizontal blur (horizontal Gaussian on Buffer A's bright-pass output)
+- **Buffer C**: Bloom vertical blur (vertical Gaussian on Buffer B, completing separable bloom)
+- **Buffer D**: TAA (current frame + history frame blending, needs to read Buffer D's own historical output)
+- **Image**: Final compositing — DoF + Motion Blur + Bloom compositing + Tone Mapping + Color Grading + Vignette + Grain + Dithering
--- a/skills/shader-dev/reference/procedural-2d-pattern.md
+++ b/skills/shader-dev/reference/procedural-2d-pattern.md
@@ -0,0 +1,439 @@
+# 2D Procedural Patterns — Detailed Reference
+
+This document is a complete supplement to [SKILL.md](SKILL.md), containing prerequisites, detailed explanations for each step, variant descriptions, in-depth performance analysis, and combination example code.
+
+---
+
+## Prerequisites
+
+- **GLSL Basic Syntax**: uniform, varying, built-in functions
+- **Vector Math**: `dot`, `length`, `normalize`, `atan`
+- **Coordinate Space Concepts**: UV normalization, aspect ratio correction
+- **Basic Math Functions**: `sin`/`cos`, `fract`/`floor`/`mod`, `smoothstep`, `pow`
+- **Polar Coordinates**: `atan(y,x)` returns angle, `length` returns radial distance
+
+---
+
+## Core Principles in Detail
+
+The essence of 2D procedural patterns is the combination of **domain transforms + distance fields + color mapping**:
+
+1. **Domain Repetition**: use `fract()`/`mod()` to fold an infinite plane into finite cells, each cell independently rendering the same (or variant) pattern
+2. **Cell Identification**: use `floor()` to extract the integer coordinates of the current cell as a hash seed to generate pseudo-random numbers, driving independent variations per cell
+3. **Distance Fields (SDF)**: use mathematical functions to compute the distance from a pixel to geometric shapes (circles, hexagons, line segments, arcs), converting to crisp or soft edges via `smoothstep`
+4. **Color Mapping**: Cosine palette `a + b*cos(2pi(c*t+d))` or HSV mapping, converting scalar values to rich colors
+5. **Layered Compositing**: results from multiple loops or multi-layer passes are combined through addition, multiplication, or `mix` to build visual complexity
+
+---
+
+## Implementation Steps in Detail
+
+### Step 1: UV Coordinate Normalization and Aspect Ratio Correction
+
+**What**: Convert pixel coordinates to normalized coordinates centered on the screen with Y-axis range [-1, 1]
+
+**Why**: A unified coordinate system ensures patterns don't distort with resolution changes; using Y-axis as reference maintains square pixels
+
+```glsl
+vec2 uv = (fragCoord * 2.0 - iResolution.xy) / iResolution.y;
+```
+
+### Step 2: Domain Repetition — Dividing Space into Repeating Cells
+
+**What**: Scale UV coordinates and take the fractional part to generate repeating local coordinates; simultaneously extract cell IDs using `floor`
+
+**Why**: `fract()` folds an infinite plane into a repeating [0,1) space, `floor()` provides a unique cell identifier for subsequent randomization. Subtracting 0.5 centers the origin
+
+```glsl
+#define SCALE 4.0 // Tunable: repetition density, higher = more cells
+vec2 cell_uv = fract(uv * SCALE) - 0.5;
+vec2 cell_id = floor(uv * SCALE);
+```
+
+For hexagonal grids, domain repetition requires special handling (two offset rectangular grids, taking the nearest):
+
+```glsl
+const vec2 s = vec2(1, 1.7320508); // 1 and sqrt(3)
+vec4 hC = floor(vec4(p, p - vec2(0.5, 1.0)) / s.xyxy) + 0.5;
+vec4 h = vec4(p - hC.xy * s, p - (hC.zw + 0.5) * s);
+// Take the nearest hexagonal center
+vec4 hex_data = dot(h.xy, h.xy) < dot(h.zw, h.zw)
+    ? vec4(h.xy, hC.xy)
+    : vec4(h.zw, hC.zw + vec2(0.5, 1.0));
+```
+
+### Step 3: Cell Randomization
+
+**What**: Use cell IDs to generate pseudo-random numbers, giving each cell different attributes (size, position, color offset)
+
+**Why**: Pure repetition looks mechanical; randomization gives patterns a "procedural yet lively" quality
+
+```glsl
+float hash21(vec2 p) {
+    return fract(sin(dot(p, vec2(141.173, 289.927))) * 43758.5453);
+}
+
+float rnd = hash21(cell_id);
+float radius = 0.15 + 0.1 * rnd; // Tunable: base radius and random range
+```
+
+### Step 4: Distance Field Shape Rendering
+
+**What**: Compute the distance from the pixel to the target shape, then convert to visualization using `smoothstep`
+
+**Why**: SDF is the cornerstone of procedural graphics — a single scalar value simultaneously encodes shape, edges, and glow effects
+
+```glsl
+// Circle SDF
+float d = length(cell_uv) - radius;
+
+// Hexagon SDF
+float hex_sdf(vec2 p) {
+    p = abs(p);
+    return max(dot(p, vec2(0.5, 0.866025)), p.x);
+}
+
+// Line segment SDF (for networks/grid lines)
+float line_sdf(vec2 a, vec2 b, vec2 p) {
+    vec2 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h);
+}
+
+// Anti-aliased rendering with smoothstep
+float shape = 1.0 - smoothstep(radius - 0.008, radius + 0.008, length(cell_uv));
+```
+
+### Step 5: Polar Coordinate Conversion and Ring/Arc Patterns
+
+**What**: Convert Cartesian coordinates to polar coordinates, using radial distance to draw concentric rings and angle to draw sectors/arc segments
+
+**Why**: Polar coordinates are naturally suited for radar sweeps, concentric circles, spirals, and other radially symmetric patterns
+
+```glsl
+vec2 polar = vec2(length(uv), atan(uv.y, uv.x));
+float ring_id = floor(polar.x * NUM_RINGS + 0.5) / NUM_RINGS; // Tunable: NUM_RINGS ring count
+
+// Concentric rings
+float ring = 1.0 - pow(abs(sin(polar.x * 3.14159 * NUM_RINGS)) * 1.25, 2.5);
+
+// Arc segment clipping
+float arc_end = polar.y + sin(iTime + ring_id * 5.5) * 1.52 - 1.5;
+ring *= smoothstep(0.0, 0.05, arc_end);
+```
+
+### Step 6: Cosine Palette
+
+**What**: Generate a continuous rainbow color mapping function using four vec3 parameters
+
+**Why**: A single line of code generates infinite smooth color schemes, more flexible and GPU-friendly than lookup tables
+
+```glsl
+vec3 palette(float t) {
+    // Tunable: modify a/b/c/d to change color scheme
+    vec3 a = vec3(0.5, 0.5, 0.5);       // Brightness offset
+    vec3 b = vec3(0.5, 0.5, 0.5);       // Amplitude
+    vec3 c = vec3(1.0, 1.0, 1.0);       // Frequency
+    vec3 d = vec3(0.263, 0.416, 0.557);  // Phase offset
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Step 7: Iterative Stacking and Glow Effects
+
+**What**: Repeatedly perform domain repetition + distance field calculation in a loop, accumulating color; use `pow(1/d)` to produce glow
+
+**Why**: A single layer pattern is too simple; multi-layer iterative stacking produces fractal-like visual complexity with minimal code. Exponentially decaying glow gives patterns a neon light feel
+
+```glsl
+#define NUM_LAYERS 4.0 // Tunable: number of iteration layers, more = more complex
+vec3 finalColor = vec3(0.0);
+vec2 uv0 = uv; // Preserve original UV for global coloring
+
+for (float i = 0.0; i < NUM_LAYERS; i++) {
+    uv = fract(uv * 1.5) - 0.5;    // Tunable: 1.5 is the scale factor
+    float d = length(uv) * exp(-length(uv0));
+    vec3 col = palette(length(uv0) + i * 0.4 + iTime * 0.4);
+    d = sin(d * 8.0 + iTime) / 8.0; // Tunable: 8.0 is the ripple frequency
+    d = abs(d);
+    d = pow(0.01 / d, 1.2);         // Tunable: 0.01 is glow width, 1.2 is decay exponent
+    finalColor += col * d;
+}
+```
+
+### Step 8: Trigonometric Interference Patterns
+
+**What**: Use `sin`/`cos` to mutually perturb coordinates in iterations, generating water caustic-like interference patterns
+
+**Why**: Superposition of trigonometric functions produces complex Moire-like interference patterns; a few iterations yield highly organic visual effects
+
+```glsl
+#define MAX_ITER 5 // Tunable: iteration count, more = richer detail
+vec2 p = mod(uv * TAU, TAU) - 250.0; // TAU period ensures tileability
+vec2 i = p;
+float c = 1.0;
+float inten = 0.005; // Tunable: intensity coefficient
+
+for (int n = 0; n < MAX_ITER; n++) {
+    float t = iTime * (1.0 - 3.5 / float(n + 1));
+    i = p + vec2(cos(t - i.x) + sin(t + i.y),
+                 sin(t - i.y) + cos(t + i.x));
+    c += 1.0 / length(vec2(p.x / (sin(i.x + t) / inten),
+                            p.y / (cos(i.y + t) / inten)));
+}
+c /= float(MAX_ITER);
+c = 1.17 - pow(c, 1.4); // Tunable: 1.4 is the contrast exponent
+vec3 colour = vec3(pow(abs(c), 8.0));
+```
+
+### Step 9: Multi-Layer Depth Compositing
+
+**What**: Render the same pattern at different zoom levels, using depth fade-in/out to simulate parallax
+
+**Why**: Multi-scale stacking breaks the mechanical feel of a single scale, producing a pseudo-3D depth effect
+
+```glsl
+#define NUM_DEPTH_LAYERS 4.0 // Tunable: number of depth layers
+float m = 0.0;
+for (float i = 0.0; i < 1.0; i += 1.0 / NUM_DEPTH_LAYERS) {
+    float z = fract(iTime * 0.1 + i);
+    float size = mix(15.0, 1.0, z);    // Dense far away, sparse up close
+    float fade = smoothstep(0.0, 0.6, z) * smoothstep(1.0, 0.8, z); // Fade at both ends
+    m += fade * patternLayer(uv * size, i, iTime);
+}
+```
+
+### Step 10: Post-Processing Pipeline
+
+**What**: Apply gamma correction, contrast enhancement, saturation adjustment, and vignette in sequence
+
+**Why**: Post-processing transforms "technically correct" output into "visually pleasing" final results
+
+```glsl
+// Gamma correction
+col = pow(clamp(col, 0.0, 1.0), vec3(1.0 / 2.2));
+// Contrast enhancement (S-curve)
+col = col * 0.6 + 0.4 * col * col * (3.0 - 2.0 * col);
+// Saturation adjustment
+col = mix(col, vec3(dot(col, vec3(0.33))), -0.4); // Tunable: -0.4 increases saturation, positive reduces it
+// Vignette
+vec2 q = fragCoord / iResolution.xy;
+col *= 0.5 + 0.5 * pow(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y), 0.7);
+```
+
+---
+
+## Common Variants in Detail
+
+### Variant 1: Hexagonal Grid + Truchet Arcs
+
+**Difference from base version**: Replaces the square grid with a hexagonal grid coordinate system, drawing three randomly oriented arcs within each hexagonal cell; arcs form maze-like continuous paths between cells
+
+**Key modified code**:
+```glsl
+// Hexagon distance field
+float hex(vec2 p) {
+    p = abs(p);
+    return max(dot(p, vec2(0.5, 0.866025)), p.x);
+}
+
+// Hexagonal grid coordinates (returns xy=cell-local coords, zw=cell ID)
+const vec2 s = vec2(1.0, 1.7320508);
+vec4 getHex(vec2 p) {
+    vec4 hC = floor(vec4(p, p - vec2(0.5, 1.0)) / s.xyxy) + 0.5;
+    vec4 h = vec4(p - hC.xy * s, p - (hC.zw + 0.5) * s);
+    return dot(h.xy, h.xy) < dot(h.zw, h.zw)
+        ? vec4(h.xy, hC.xy)
+        : vec4(h.zw, hC.zw + vec2(0.5, 1.0));
+}
+
+// Truchet three-arc: one arc for each of three directions
+float r = 1.0;
+vec2 q1 = p - vec2(0.0, r) / s;
+vec2 q2 = rot2(6.28318 / 3.0) * p - vec2(0.0, r) / s;
+vec2 q3 = rot2(6.28318 * 2.0 / 3.0) * p - vec2(0.0, r) / s;
+// Take nearest arc
+float d = min(min(length(q1), length(q2)), length(q3));
+d = abs(d - 0.288675) - 0.1; // 0.288675 = sqrt(3)/6, arc radius
+```
+
+### Variant 2: Water Caustic Interference Pattern
+
+**Difference from base version**: Does not use domain repetition grids; instead generates full-screen interference textures through trigonometric iteration, seamlessly tileable
+
+**Key modified code**:
+```glsl
+#define TAU 6.28318530718
+#define MAX_ITER 5 // Tunable: iteration count
+
+vec2 p = mod(uv * TAU, TAU) - 250.0;
+vec2 i = p;
+float c = 1.0;
+float inten = 0.005;
+for (int n = 0; n < MAX_ITER; n++) {
+    float t = iTime * (1.0 - 3.5 / float(n + 1));
+    i = p + vec2(cos(t - i.x) + sin(t + i.y),
+                 sin(t - i.y) + cos(t + i.x));
+    c += 1.0 / length(vec2(p.x / (sin(i.x + t) / inten),
+                            p.y / (cos(i.y + t) / inten)));
+}
+c /= float(MAX_ITER);
+c = 1.17 - pow(c, 1.4);
+vec3 colour = vec3(pow(abs(c), 8.0));
+colour = clamp(colour + vec3(0.0, 0.35, 0.5), 0.0, 1.0); // Aquatic color shift
+```
+
+### Variant 3: Polar Concentric Rings + Animated Arc Segments
+
+**Difference from base version**: Uses polar coordinates instead of Cartesian grids, drawing concentric ring arc segments with independent animation, suitable for radar/HUD style
+
+**Key modified code**:
+```glsl
+#define NUM_RINGS 20.0 // Tunable: ring count
+#define PALETTE vec3(0.0, 1.4, 2.0) + 1.5
+
+vec2 plr = vec2(length(p), atan(p.y, p.x));
+float id = floor(plr.x * NUM_RINGS + 0.5) / NUM_RINGS;
+
+// Each ring rotates independently
+p *= rot2(id * 11.0);
+p.y = abs(p.y); // Mirror symmetry
+
+// Concentric ring SDF
+float rz = 1.0 - pow(abs(sin(plr.x * 3.14159 * NUM_RINGS)) * 1.25, 2.5);
+
+// Arc segment animation
+float arc = plr.y + sin(iTime + id * 5.5) * 1.52 - 1.5;
+rz *= smoothstep(0.0, 0.05, arc);
+
+// Per-ring coloring
+vec3 col = (sin(PALETTE + id * 5.0 + iTime) * 0.5 + 0.5) * rz;
+```
+
+### Variant 4: Multi-Layer Depth Parallax Network
+
+**Difference from base version**: Renders grid nodes and connections at multiple zoom levels, using depth fade-in/out to produce a pseudo-3D effect
+
+**Key modified code**:
+```glsl
+#define NUM_DEPTH_LAYERS 4.0 // Tunable: number of depth layers
+
+// Random vertex position within each cell
+vec2 GetPos(vec2 id, vec2 offs, float t) {
+    float n = hash21(id + offs);
+    return offs + vec2(sin(t + n * 6.28), cos(t + fract(n * 100.0) * 6.28)) * 0.4;
+}
+
+// Line segment SDF
+float df_line(vec2 a, vec2 b, vec2 p) {
+    vec2 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h);
+}
+
+// Multi-layer compositing
+float m = 0.0;
+for (float i = 0.0; i < 1.0; i += 1.0 / NUM_DEPTH_LAYERS) {
+    float z = fract(iTime * 0.1 + i);
+    float size = mix(15.0, 1.0, z);
+    float fade = smoothstep(0.0, 0.6, z) * smoothstep(1.0, 0.8, z);
+    m += fade * NetLayer(uv * size, i, iTime);
+}
+```
+
+### Variant 5: Fractal Apollian Pattern
+
+**Difference from base version**: Uses iterative fold-and-invert transforms to generate infinitely detailed aperiodic fractal patterns, combined with HSV coloring
+
+**Key modified code**:
+```glsl
+float apollian(vec4 p, float s) {
+    float scale = 1.0;
+    for (int i = 0; i < 7; ++i) {     // Tunable: iteration count (5~12)
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5); // Space folding
+        float r2 = dot(p, p);
+        float k = s / r2;              // Tunable: s is scaling factor (1.0~1.5)
+        p *= k;                        // Inversion mapping
+        scale *= k;
+    }
+    return abs(p.y) / scale;
+}
+
+// 4D slice animation for smooth morphing
+vec4 pp = vec4(p.x, p.y, 0.0, 0.0) + offset;
+pp.w = 0.125 * (1.0 - tanh(length(pp.xyz)));
+float d = apollian(pp / 4.0, 1.2) * 4.0;
+
+// HSV coloring
+float hue = fract(0.75 * length(p) - 0.3 * iTime) + 0.3;
+float sat = 0.75 * tanh(2.0 * length(p));
+vec3 col = hsv2rgb(vec3(hue, sat, 1.0));
+```
+
+---
+
+## In-Depth Performance Optimization
+
+### 1. Control Iteration Count
+The iteration loop is the biggest performance bottleneck. Increasing `NUM_LAYERS` from 4 to 8 halves performance. On mobile, keep it at 3 or fewer layers.
+
+### 2. Avoid Branching
+Replace `if/else` with branchless `step()`/`smoothstep()`/`mix()` alternatives:
+```glsl
+// Bad: if(rnd > 0.5) p.y = -p.y;
+// Good: p.y *= sign(rnd - 0.5);  // or use mix
+```
+
+### 3. Merge Distance Field Calculations
+Combine multiple shape SDFs using `min()`/`max()` and apply a single `smoothstep`, rather than rendering each shape separately.
+
+### 4. Precompute Constants
+Compute `sin`/`cos` pairs (e.g., rotation matrices) once outside the loop; write irrational numbers like `1.7320508` (sqrt(3)) as direct constants.
+
+### 5. Minimize `atan` Calls
+`atan` is an expensive function. If you only need periodic angular variation, consider approximating with `dot`.
+
+### 6. LOD Strategy
+Reduce iteration count at distance/when zoomed out:
+```glsl
+int iters = int(mix(3.0, float(MAX_ITER), smoothstep(0.0, 1.0, 1.0 / scale)));
+```
+
+### 7. Use `smoothstep` Instead of `pow`
+`pow(x, n)` is slower than `smoothstep` on some GPUs, and `smoothstep` naturally clamps to [0,1].
+
+---
+
+## Complete Combination Suggestion Examples
+
+### 1. + Noise Texture
+Overlay Perlin/Simplex noise perturbation on distance fields to give geometric patterns an organic/eroded feel. Triangle noise (as used in "Overly Satisfying") is an efficient low-cost alternative:
+```glsl
+d += triangleNoise(uv * 10.0) * 0.05; // Noise perturbation amount is tunable
+```
+
+### 2. + Post-Processing Cross-Hatch
+Overlay cross-hatching effects on patterns to simulate hand-drawn/printmaking style (as used in "Hexagonal Maze Flow"):
+```glsl
+float gr = dot(col, vec3(0.299, 0.587, 0.114)); // Grayscale
+float hatch = (gr < 0.45) ? clamp(sin((uv.x - uv.y) * 125.6) * 2.0 + 1.5, 0.0, 1.0) : 1.0;
+col *= hatch * 0.5 + 0.5;
+```
+
+### 3. + SDF Boolean Operations
+Combine multiple base patterns through `min` (union), `max` (intersection), and subtraction into complex geometry:
+```glsl
+float d = max(hexSDF, -circleSDF); // Hexagon minus circle = hexagonal ring
+```
+
+### 4. + Domain Warping
+Apply sin/cos distortion to UVs before domain repetition, producing flowing/swirling effects:
+```glsl
+uv += 0.05 * vec2(sin(uv.y * 5.0 + iTime), sin(uv.x * 3.0 + iTime));
+```
+
+### 5. + Radial Blur / Motion Blur
+Average multiple samples in the polar coordinate direction on the final color, producing rotational motion blur to enhance dynamism.
+
+### 6. + Pseudo-3D Lighting
+Use SDF gradients as normals and add simple diffuse/specular lighting to give 2D patterns a relief/embossed appearance (as in "Apollian with a twist" shadow casting method).
--- a/skills/shader-dev/reference/procedural-noise.md
+++ b/skills/shader-dev/reference/procedural-noise.md
@@ -0,0 +1,551 @@
+# Procedural Noise — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Basics**: uniform, varying, built-in functions (`fract`, `floor`, `mix`, `smoothstep`, `dot`, `sin`/`cos`)
+- **Vector Math**: dot product, cross product, matrix multiplication (`mat2` rotation matrix)
+- **Coordinate Spaces**: UV coordinate normalization, screen aspect ratio correction
+- **Interpolation Theory**: linear interpolation, Hermite interpolation `3t^2-2t^3` (smoothstep)
+- **ShaderToy Environment**: `iTime`, `iResolution`, `fragCoord`, `mainImage` signature
+
+## Use Cases in Detail
+
+Procedural noise is the most fundamental and versatile technique in real-time GPU graphics, applicable to:
+
+- **Natural phenomena simulation**: fire, clouds, water surfaces, lava, lightning, smoke, etc.
+- **Terrain generation**: mountains, canyons, erosion landscapes, snowline distribution
+- **Texture synthesis**: marble textures, wood grain, organic patterns, abstract art
+- **Volume rendering**: volumetric clouds, volumetric fog, light scattering
+- **Motion effects**: fluid simulation approximation, particle trajectory perturbation, domain warping animation
+
+Core idea: instead of using pre-made textures, generate pseudo-random, spatially continuous signals in real-time on the GPU through mathematical functions, then produce rich multi-scale detail through fractal summation (FBM) and Domain Warping.
+
+## Core Principles in Detail
+
+### 1. Noise Functions — Building Continuous Pseudo-Random Signals
+
+The essence of a noise function is: **generate random values at integer lattice points, then smoothly interpolate between them**.
+
+Two mainstream implementations:
+
+**Value Noise**: each lattice point stores a random scalar, bilinear interpolation yields a continuous field.
+- Formula: `N(p) = mix(mix(h00, h10, u), mix(h01, h11, u), v)`, where `u,v` are the fractional parts after Hermite smoothing
+
+**Simplex Noise**: uses gradient dot products + radial falloff kernels on a triangular lattice (2D) or tetrahedral lattice (3D).
+- Advantages: fewer lattice lookups (2D: 3 vs 4), no axis-aligned artifacts, lower computational cost
+- Core: skew transform maps square grid to triangular grid, using `K1=(sqrt(3)-1)/2` for skewing, `K2=(3-sqrt(3))/6` for unskewing
+
+### 2. Hash Functions — Source of Lattice Random Values
+
+Hash functions map integer coordinates to pseudo-random values in [0,1] or [-1,1]:
+
+- **sin-based hash** (classic but has precision risks): `fract(sin(dot(p, vec2(127.1, 311.7))) * 43758.5453)`
+- **sin-free hash** (cross-platform stable): pure arithmetic `fract(p * 0.1031)` + `dot` mixing + `fract` output
+
+### 3. FBM (Fractional Brownian Motion) — Multi-Scale Detail Summation
+
+Sum multiple noise "octaves" at different frequencies and amplitudes:
+
+```
+FBM(p) = sum( amplitude_i * noise(frequency_i * p) )
+```
+
+Standard parameters:
+- **Lacunarity (frequency multiplier)**: each octave's frequency multiplied by ~2.0
+- **Persistence/Gain (amplitude decay)**: each octave's amplitude multiplied by ~0.5
+- **Inter-octave rotation**: use a rotation matrix to eliminate axis-aligned artifacts
+
+### 4. Domain Warping — Organic Distortion
+
+Feed the output of noise back as coordinate offsets, producing distorted organic patterns:
+- **Single-layer warping**: `fbm(p + fbm(p))`
+- **Multi-layer cascade**: `fbm(p + fbm(p + fbm(p)))` — classic three-layer domain warping
+
+### 5. FBM Variants — Different Visual Characteristics
+
+| Variant | Formula | Visual Effect |
+|---------|---------|---------------|
+| Standard FBM | `sum( a*noise(p) )` | Smooth, soft (cloud interiors) |
+| Ridged FBM | `sum( a*abs(noise(p)) )` | Sharp creases (ridges, lightning) |
+| Sinusoidal ridged | `sum( a*sin(noise(p)*k) )` | Periodic ridges (lava) |
+| Erosion FBM | `sum( a*noise(p)/(1+dot(d,d)) )` | Smooth ridges, fine valleys (terrain) |
+| Sea wave FBM | `sum( a*octave_fn(p) )` | Sharp wave crests (ocean surface) |
+
+## Step-by-Step Implementation Details
+
+### Step 1: Hash Function
+
+**What**: Implement a hash function that maps 2D integer coordinates to pseudo-random values.
+
+**Why**: Hashing is the fundamental building block of all noise. The sin-free version is stable across GPUs; the sin version is more concise.
+
+**Code (sin-free version)**:
+```glsl
+// 2D -> 1D hash, sin-free, cross-platform stable
+float hash12(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * .1031);
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+// 2D -> 2D hash (for gradient noise)
+vec2 hash22(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * vec3(.1031, .1030, .0973));
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.xx + p3.yz) * p3.zy);
+}
+```
+
+**Code (classic sin version)**:
+```glsl
+float hash(vec2 p) {
+    float h = dot(p, vec2(127.1, 311.7));
+    return fract(sin(h) * 43758.5453123);
+}
+
+// Gradient version, output [-1, 1]
+vec2 hash2(vec2 p) {
+    p = vec2(dot(p, vec2(127.1, 311.7)),
+             dot(p, vec2(269.5, 183.3)));
+    return -1.0 + 2.0 * fract(sin(p) * 43758.5453123);
+}
+```
+
+### Step 2: Value Noise
+
+**What**: Perform Hermite-smoothed interpolation between hashed values at integer lattice points to obtain a continuous 2D noise field.
+
+**Why**: Value noise is the simplest noise implementation with minimal code, suitable as a foundation for FBM and domain warping. Using the `smoothstep` polynomial `3t^2-2t^3` directly guarantees C1 continuity (no seam discontinuities).
+
+**Code**:
+```glsl
+float noise(in vec2 x) {
+    vec2 p = floor(x);    // Integer lattice point
+    vec2 f = fract(x);    // Fractional part within cell
+    f = f * f * (3.0 - 2.0 * f);  // Hermite smoothing (can substitute quintic: 6t^5-15t^4+10t^3)
+    float a = hash(p + vec2(0.0, 0.0));
+    float b = hash(p + vec2(1.0, 0.0));
+    float c = hash(p + vec2(0.0, 1.0));
+    float d = hash(p + vec2(1.0, 1.0));
+    return mix(mix(a, b, f.x), mix(c, d, f.x), f.y);  // Bilinear interpolation
+}
+```
+
+### Step 3: Simplex Noise
+
+**What**: Use gradient dot products and radial falloff kernels on a triangular grid to generate isotropic 2D noise.
+
+**Why**: Compared to value noise, Simplex Noise has no axis-aligned artifacts, lower computational cost (2D requires only 3 lattice points instead of 4), and higher visual quality. Suitable for scenarios requiring high-quality noise (fire, clouds).
+
+**Code**:
+```glsl
+float noise(in vec2 p) {
+    const float K1 = 0.366025404;  // (sqrt(3)-1)/2 — skew factor
+    const float K2 = 0.211324865;  // (3-sqrt(3))/6 — unskew factor
+
+    vec2 i = floor(p + (p.x + p.y) * K1);  // Skew to triangular grid
+
+    vec2 a = p - i + (i.x + i.y) * K2;     // Vertex 0 offset
+    vec2 o = (a.x > a.y) ? vec2(1.0, 0.0) : vec2(0.0, 1.0);  // Determine which triangle
+    vec2 b = a - o + K2;                    // Vertex 1 offset
+    vec2 c = a - 1.0 + 2.0 * K2;           // Vertex 2 offset
+
+    vec3 h = max(0.5 - vec3(dot(a, a), dot(b, b), dot(c, c)), 0.0);  // Radial falloff
+    vec3 n = h * h * h * h * vec3(  // h^4 kernel * gradient dot product
+        dot(a, hash2(i + 0.0)),
+        dot(b, hash2(i + o)),
+        dot(c, hash2(i + 1.0))
+    );
+    return dot(n, vec3(70.0));  // Normalize to ~[-1, 1]
+}
+```
+
+### Step 4: Standard FBM (Fractional Brownian Motion)
+
+**What**: Sum multiple octaves of noise with decreasing amplitudes to obtain a multi-scale fractal signal.
+
+**Why**: A single noise octave has a single frequency and cannot produce the multi-scale detail found in nature. FBM simulates fractal self-similarity by summing noise at different frequencies. **The inter-octave rotation matrix is a key technique** that breaks axis-aligned artifacts.
+
+**Code (4-octave loop version)**:
+```glsl
+#define OCTAVES 4           // Tunable: number of octaves (1-8), more = richer detail but more expensive
+#define GAIN 0.5            // Tunable: amplitude decay (0.3-0.7), higher = more prominent high frequencies
+#define LACUNARITY 2.0      // Tunable: frequency multiplier (1.5-3.0), higher = larger gap between octaves
+
+float fbm(vec2 p) {
+    // Encodes both rotation and scaling, eliminates axis-aligned artifacts
+    // |m| = sqrt(1.6^2+1.2^2) = 2.0, rotation angle ~ 36.87 degrees
+    mat2 m = mat2(1.6, 1.2, -1.2, 1.6);
+
+    float f = 0.0;
+    float a = 0.5;   // Initial amplitude
+    for (int i = 0; i < OCTAVES; i++) {
+        f += a * noise(p);
+        p = m * p;    // Rotation + frequency scaling
+        a *= GAIN;    // Amplitude decay
+    }
+    return f;
+}
+```
+
+**Manually unrolled version (with slightly varying lacunarity)**:
+```glsl
+// Slightly varying lacunarity (2.01, 2.02, 2.03...) breaks exact self-similarity
+const mat2 mtx = mat2(0.80, 0.60, -0.60, 0.80);  // Pure rotation ~36.87 degrees
+
+float fbm4(vec2 p) {
+    float f = 0.0;
+    f += 0.5000 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.02;
+    f += 0.2500 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.03;
+    f += 0.1250 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.01;
+    f += 0.0625 * (-1.0 + 2.0 * noise(p));
+    return f / 0.9375;  // Normalization
+}
+```
+
+### Step 5: Ridged FBM
+
+**What**: Take the absolute value of noise before summation, producing sharp "ridges" at zero crossings.
+
+**Why**: Standard FBM produces overly smooth patterns and cannot represent sharp structures like lightning, mountain ridges, or cracks. The `abs()` operation folds the noise's zero crossings into sharp V-shaped ridge lines.
+
+**Code**:
+```glsl
+float fbm_ridged(in vec2 p) {
+    float z = 2.0;
+    float rz = 0.0;
+    for (float i = 1.0; i < 6.0; i++) {
+        // abs((noise-0.5)*2) maps [0,1] to a V-shape in [0,1]
+        rz += abs((noise(p) - 0.5) * 2.0) / z;
+        z *= 2.0;   // Amplitude decay (1/z)
+        p *= 2.0;   // Frequency scaling
+    }
+    return rz;
+}
+```
+
+**Sinusoidal ridged variant**:
+```glsl
+// sin(noise*7) produces smoother periodic ridges, suitable for lava textures
+rz += (sin(noise(p) * 7.0) * 0.5 + 0.5) / z;
+```
+
+### Step 6: Domain Warping
+
+**What**: Use the output of noise/FBM to distort the input coordinates of subsequent noise, producing organic distortion patterns.
+
+**Why**: Domain warping is the core technique for producing "painterly", "ink wash", "geological" and other organic patterns. The number of nested warping layers controls complexity.
+
+**Basic domain warping**:
+```glsl
+// Low-frequency FBM as offset to distort subsequent sampling
+float q = fbm(uv * 0.5);   // Low-frequency domain warping field
+uv -= q - time;             // Use q to offset sampling coordinates
+float f = fbm(uv);          // Sample at warped coordinates
+```
+
+**Classic three-layer cascaded domain warping**:
+```glsl
+// Two independent FBMs produce decorrelated vec2 offsets
+vec2 fbm4_2(vec2 p) {
+    return vec2(fbm4(p + vec2(1.0)), fbm4(p + vec2(6.2)));  // Different offsets for decorrelation
+}
+
+float func(vec2 q, out vec2 o, out vec2 n) {
+    // Layer 1: q -> 4-octave FBM -> 2D offset field o
+    o = 0.5 + 0.5 * fbm4_2(q);
+
+    // Layer 2: o -> 6-octave FBM -> 2D offset field n (higher frequency)
+    n = fbm6_2(4.0 * o);
+
+    // Layer 3: original coordinates + offsets -> final FBM sampling
+    vec2 p = q + 2.0 * n + 1.0;
+    float f = 0.5 + 0.5 * fbm4(2.0 * p);
+
+    // Contrast enhancement: boost contrast in heavily warped areas
+    f = mix(f, f * f * f * 3.5, f * abs(n.x));
+    return f;
+}
+```
+
+**Dual-axis FBM domain warping**:
+```glsl
+float dualfbm(in vec2 p) {
+    vec2 p2 = p * 0.7;
+    // Two independent FBMs offset X/Y axes separately, different time offsets avoid symmetry
+    vec2 basis = vec2(fbm(p2 - time * 1.6), fbm(p2 + time * 1.7));
+    basis = (basis - 0.5) * 0.2;  // Center + scale
+    p += basis;
+    return fbm(p * makem2(time * 0.2));  // Final sampling after rotation
+}
+```
+
+### Step 7: Flow Noise
+
+**What**: Apply independent gradient field displacement within each FBM octave, simulating fluid transport effects.
+
+**Why**: Ordinary domain warping is "global" (distorting before or after FBM), while flow noise is "per-octave" — each frequency layer has its own flow direction and speed, producing extremely realistic lava and fluid effects.
+
+**Code**:
+```glsl
+#define FLOW_SPEED 0.6       // Tunable: main flow speed
+#define BASE_SPEED 1.9       // Tunable: base point flow speed
+#define ADVECTION 0.77       // Tunable: advection factor (0.5=stable, 0.95=turbulent)
+#define GRAD_SCALE 0.5       // Tunable: gradient displacement strength
+
+// Noise gradient (central differences)
+vec2 gradn(vec2 p) {
+    float ep = 0.09;
+    float gradx = noise(vec2(p.x + ep, p.y)) - noise(vec2(p.x - ep, p.y));
+    float grady = noise(vec2(p.x, p.y + ep)) - noise(vec2(p.x, p.y - ep));
+    return vec2(gradx, grady);
+}
+
+float flow(in vec2 p) {
+    float z = 2.0;
+    float rz = 0.0;
+    vec2 bp = p;  // Base point (prevents advection divergence)
+    for (float i = 1.0; i < 7.0; i++) {
+        p += time * FLOW_SPEED;                        // Main flow displacement
+        bp += time * BASE_SPEED;                       // Base flow displacement
+        vec2 gr = gradn(i * p * 0.34 + time * 1.0);   // Noise gradient field
+        gr *= makem2(time * 6.0 - (0.05 * p.x + 0.03 * p.y) * 40.0);  // Spatially varying rotation
+        p += gr * GRAD_SCALE;                          // Gradient displacement
+        rz += (sin(noise(p) * 7.0) * 0.5 + 0.5) / z; // Sinusoidal ridged accumulation
+        p = mix(bp, p, ADVECTION);                     // Mix back to base (prevent divergence)
+        z *= 1.4;   // Amplitude decay
+        p *= 2.0;   // Frequency scaling
+        bp *= 1.9;  // Base frequency scaling (slightly different)
+    }
+    return rz;
+}
+```
+
+### Step 8: Derivative FBM
+
+**What**: Track the analytical gradient of noise during FBM accumulation, using the accumulated gradient magnitude to suppress high-frequency detail in steep areas.
+
+**Why**: This is a signature technique for terrain rendering. Standard FBM adds detail uniformly across all areas, but natural terrain has smooth ridges due to hydraulic erosion while valleys retain fine detail. Derivative FBM automatically simulates this erosion effect through the `1/(1+|gradient|^2)` factor.
+
+**Code**:
+```glsl
+// Value noise with analytical derivative: returns vec3(value, d/dx, d/dy)
+vec3 noised(in vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+    vec2 u = f * f * (3.0 - 2.0 * f);           // Hermite interpolation
+    vec2 du = 6.0 * f * (1.0 - f);               // Hermite derivative (analytical)
+
+    float a = hash(p + vec2(0, 0));
+    float b = hash(p + vec2(1, 0));
+    float c = hash(p + vec2(0, 1));
+    float d = hash(p + vec2(1, 1));
+
+    return vec3(
+        a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y,  // Value
+        du * (vec2(b - a, c - a) + (a - b - c + d) * u.yx)                  // Gradient
+    );
+}
+
+#define TERRAIN_OCTAVES 16   // Tunable: terrain octave count (5-16), more = finer detail
+#define TERRAIN_GAIN 0.5     // Tunable: amplitude decay
+
+float terrainFBM(in vec2 x) {
+    const mat2 m2 = mat2(0.8, -0.6, 0.6, 0.8);  // Pure rotation ~36.87 degrees
+    float a = 0.0;       // Accumulated value
+    float b = 1.0;       // Current amplitude
+    vec2 d = vec2(0.0);  // Accumulated gradient
+
+    for (int i = 0; i < TERRAIN_OCTAVES; i++) {
+        vec3 n = noised(x);    // (value, dx, dy)
+        d += n.yz;             // Accumulate gradient
+        a += b * n.x / (1.0 + dot(d, d));  // Key: larger gradient = smaller contribution (erosion effect)
+        b *= TERRAIN_GAIN;
+        x = m2 * x * 2.0;     // Rotation + frequency scaling
+    }
+    return a;
+}
+```
+
+## Common Variants in Detail
+
+### Variant 1: Ridged FBM (Ridged/Turbulent FBM)
+
+- **Difference from base version**: applies `abs()` to noise values, producing sharp ridge lines at zero crossings
+- **Use cases**: lightning, mountain ridges, cracks, veins, electric arcs
+- **Key modified code**:
+```glsl
+// Standard FBM line:
+f += a * noise(p);
+// Changed to ridged:
+f += a * abs(noise(p));
+// Or sinusoidal ridged (smoother periodic ridges, suitable for lava):
+f += a * (sin(noise(p) * 7.0) * 0.5 + 0.5);
+```
+
+### Variant 2: Domain Warped FBM
+
+- **Difference from base version**: FBM output is fed back as coordinate offsets, producing organic distortion
+- **Use cases**: cloud deformation, geological textures, ink wash style, abstract art
+- **Key modified code**:
+```glsl
+// Classic three-layer domain warping
+vec2 o = 0.5 + 0.5 * vec2(fbm(q + vec2(1.0)), fbm(q + vec2(6.2)));
+vec2 n = vec2(fbm(4.0 * o + vec2(9.2)), fbm(4.0 * o + vec2(5.7)));
+float f = 0.5 + 0.5 * fbm(q + 2.0 * n + 1.0);
+```
+
+### Variant 3: Derivative Erosion FBM
+
+- **Difference from base version**: tracks analytical gradient, suppresses high frequencies in steep areas (simulates hydraulic erosion)
+- **Use cases**: realistic terrain, mountains, canyons
+- **Key modified code**:
+```glsl
+vec2 d = vec2(0.0);  // Accumulated gradient
+for (int i = 0; i < N; i++) {
+    vec3 n = noised(p);       // (value, dx, dy)
+    d += n.yz;                // Accumulate gradient
+    a += b * n.x / (1.0 + dot(d, d));  // Key: divide by gradient magnitude
+    b *= 0.5;
+    p = m2 * p * 2.0;
+}
+```
+
+### Variant 4: Flow Noise
+
+- **Difference from base version**: applies independent gradient field displacement within each octave, simulating fluid transport
+- **Use cases**: lava, liquid metal, flowing magma
+- **Key modified code**:
+```glsl
+for (float i = 1.0; i < 7.0; i++) {
+    vec2 gr = gradn(i * p * 0.34 + time);                              // Gradient field
+    gr *= makem2(time * 6.0 - (0.05 * p.x + 0.03 * p.y) * 40.0);     // Spatially varying rotation
+    p += gr * 0.5;                                                      // Displacement
+    rz += (sin(noise(p) * 7.0) * 0.5 + 0.5) / z;                      // Accumulation
+    p = mix(bp, p, 0.77);                                               // Mix back to base
+}
+```
+
+### Variant 5: Custom Sea Octave FBM
+
+- **Difference from base version**: uses `1-abs(sin(uv))` to construct peaked waveforms, combined with bidirectional propagation and choppy decay
+- **Use cases**: ocean water surface, waves
+- **Key modified code**:
+```glsl
+float sea_octave(vec2 uv, float choppy) {
+    uv += noise(uv);                      // Noise domain perturbation
+    vec2 wv = 1.0 - abs(sin(uv));         // Peaked waveform
+    vec2 swv = abs(cos(uv));              // Smooth waveform
+    wv = mix(wv, swv, wv);               // Adaptive blending
+    return pow(1.0 - pow(wv.x * wv.y, 0.65), choppy);
+}
+// Bidirectional propagation in FBM loop:
+d  = sea_octave((uv + SEA_TIME) * freq, choppy);
+d += sea_octave((uv - SEA_TIME) * freq, choppy);
+choppy = mix(choppy, 1.0, 0.2);  // Higher octaves are smoother
+```
+
+## Performance Optimization Details
+
+### 1. Reduce Octave Count (Most Direct)
+
+Each additional octave doubles the noise sampling cost. Distant objects can use fewer octaves:
+```glsl
+// LOD-aware octave count
+int oct = 5 - int(log2(1.0 + t * 0.5));  // Fewer octaves at greater distances
+```
+
+### 2. Multi-Level LOD Strategy
+
+Provide functions at different precision levels for different purposes:
+```glsl
+float terrainL(vec2 x) { /* 3 octaves — for camera height */ }
+float terrainM(vec2 x) { /* 9 octaves — for ray marching */ }
+float terrainH(vec2 x) { /* 16 octaves — for normal calculation */ }
+```
+
+### 3. Use Texture Sampling Instead of Math
+
+Store precomputed noise in textures, using hardware texture filtering instead of arithmetic hashing:
+```glsl
+float noise(in vec2 x) { return texture(iChannel0, x * 0.01).x; }
+// Or use texelFetch for exact lookup:
+float a = texelFetch(iChannel0, (p + 0) & 255, 0).x;
+```
+
+### 4. Manually Unroll Loops
+
+GLSL compilers typically optimize manually unrolled small loops (4-6 iterations) better than `for` loops, and allow slightly varying lacunarity per octave.
+
+### 5. Adaptive Step Size (Volume Rendering)
+
+```glsl
+// Step size grows linearly with distance
+float dt = max(0.05, 0.02 * t);
+```
+
+### 6. Directional Derivative Instead of Full Gradient (Volumetric Lighting)
+
+```glsl
+// 1 extra sample vs 3
+float dif = clamp((den - map(pos + 0.3 * sundir)) / 0.25, 0.0, 1.0);
+```
+
+### 7. Early Termination
+
+```glsl
+if (sum.a > 0.99) break;  // Volume is already opaque, stop marching
+```
+
+## Combination Suggestions in Detail
+
+### 1. FBM + Ray Marching
+
+Noise drives a height field or density field, ray marching finds intersections. This is the standard combination for terrain and ocean surface rendering:
+- Height field: `height = terrainFBM(pos.xz)`, ray march to find the intersection where `pos.y == height`
+- Volume field: `density = fbm(pos)`, forward-accumulate transmittance and color
+
+### 2. FBM + Finite Difference Normals + Lighting
+
+Use finite differences on a 2D noise field to estimate normals, adding pseudo-3D lighting effects:
+```glsl
+vec3 nor = normalize(vec3(f(p+ex)-f(p), epsilon, f(p+ey)-f(p)));
+float dif = dot(nor, lightDir);
+```
+
+### 3. FBM + Color Mapping
+
+Map the same scalar at different power exponents to RGB channels, producing natural color gradients:
+```glsl
+vec3 col = vec3(1.5*c, 1.5*c*c*c, c*c*c*c*c*c);  // Fire: red -> orange -> yellow -> white
+```
+Or inverse color mapping:
+```glsl
+vec3 col = vec3(0.2, 0.07, 0.01) / rz;  // Areas with small ridge values are brightest
+```
+
+### 4. FBM + Fresnel Water Surface Coloring
+
+Noise drives water surface waveforms, Fresnel equations blend reflected sky and refracted water color:
+```glsl
+float fresnel = pow(1.0 - dot(n, -eye), 3.0);
+vec3 color = mix(refracted, reflected, fresnel);
+```
+
+### 5. Multi-Layer FBM Compositing
+
+Different FBM layers with different parameters control different properties:
+- **Shape layer**: low-frequency standard FBM controls cloud shape
+- **Ridged layer**: mid-frequency ridged FBM adds edge detail
+- **Color layer**: high-frequency FBM controls cloud interior color variation
+- **Combination**: `f *= r + f;` shape * ridged produces sharp edges
+
+### 6. FBM + Volumetric Lighting (Directional Derivative)
+
+In volume rendering, the density difference along the light direction approximates lighting:
+```glsl
+float shadow = clamp((density_here - density_toward_sun) / scale, 0.0, 1.0);
+vec3 lit_color = mix(shadow_color, light_color, shadow);
+```
--- a/skills/shader-dev/reference/ray-marching.md
+++ b/skills/shader-dev/reference/ray-marching.md
@@ -0,0 +1,396 @@
+# Ray Marching Detailed Reference
+
+This document serves as a detailed reference for the Ray Marching Skill, covering prerequisites, step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Basics**: uniforms, varyings, built-in functions (`mix`, `clamp`, `smoothstep`, `normalize`, `dot`, `cross`, `reflect`, `refract`)
+- **Vector Math**: dot product, cross product, vector normalization, matrix multiplication
+- **Coordinate Systems**: transformations from screen space to NDC to view space to world space
+- **Basic Lighting Models**: diffuse (Lambertian), specular (Phong/Blinn-Phong)
+
+## Implementation Steps in Detail
+
+### Step 1: UV Coordinate Normalization and Ray Direction Computation
+
+**What**: Convert pixel coordinates to normalized coordinates in the [-1,1] range, and compute the ray direction from the camera.
+
+**Why**: This establishes the mapping from screen pixels to the 3D world. Dividing by `iResolution.y` preserves the aspect ratio; the z component controls the field of view.
+
+```glsl
+// Method A: Concise version (common for quick prototyping)
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+vec3 ro = vec3(0.0, 0.0, -3.0);             // Ray origin (camera position)
+vec3 rd = normalize(vec3(uv, 1.0));          // Ray direction, z=1.0 gives ~90° FOV
+
+// Method B: Precise FOV control
+vec2 xy = fragCoord - iResolution.xy / 2.0;
+float z = iResolution.y / tan(radians(FOV) / 2.0); // FOV is adjustable: field of view in degrees
+vec3 rd = normalize(vec3(xy, -z));
+```
+
+### Step 2: Building the Camera Matrix (Look-At)
+
+**What**: Construct a view matrix from the camera position, target point, and up direction, then transform the view-space ray direction into world space.
+
+**Why**: Without a camera matrix, the ray direction is fixed along -Z. With a Look-At matrix, the camera can be freely positioned and rotated.
+
+```glsl
+mat3 setCamera(vec3 ro, vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);                     // Forward direction
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);            // Up reference (cr controls roll)
+    vec3 cu = normalize(cross(cw, cp));                // Right direction
+    vec3 cv = cross(cu, cw);                           // Up direction
+    return mat3(cu, cv, cw);
+}
+
+// Usage:
+mat3 ca = setCamera(ro, ta, 0.0);
+vec3 rd = ca * normalize(vec3(uv, FOCAL_LENGTH)); // FOCAL_LENGTH adjustable: 1.0~3.0, larger = narrower FOV
+```
+
+### Step 3: Defining the Scene SDF
+
+**What**: Write a function that returns the signed distance from any point in space to the nearest surface.
+
+**Why**: The SDF is the core of Ray Marching — it simultaneously defines geometry and step distance.
+
+```glsl
+// --- Basic SDF Primitives ---
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+
+float sdTorus(vec3 p, vec2 t) {
+    return length(vec2(length(p.xz) - t.x, p.y)) - t.y;
+}
+
+// --- CSG Boolean Operations ---
+float opUnion(float a, float b)        { return min(a, b); }
+float opSubtraction(float a, float b)  { return max(a, -b); }
+float opIntersection(float a, float b) { return max(a, b); }
+
+// --- Smooth Boolean Operations (organic blending) ---
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;  // k adjustable: blend radius, 0.1~0.5
+}
+
+// --- Spatial Transforms ---
+// Translation: apply inverse translation to the sample point
+// Rotation: multiply the sample point by a rotation matrix
+// Scaling: p /= s, result *= s
+
+// --- Scene Composition Example ---
+float map(vec3 p) {
+    float d = sdSphere(p - vec3(0.0, 0.5, 0.0), 0.5);   // Sphere
+    d = opUnion(d, p.y);                                    // Add ground plane
+    d = smin(d, sdBox(p - vec3(1.0, 0.3, 0.0), vec3(0.3)), 0.2); // Smooth blend with box
+    return d;
+}
+```
+
+### Step 4: Core Ray Marching Loop
+
+**What**: Iteratively step along the ray direction, using the SDF value at each step to determine the advance distance, and check whether the ray has hit a surface or exceeded the maximum range.
+
+**Why**: Sphere Tracing guarantees that each step advances the maximum safe distance (without penetrating surfaces), taking large steps in open areas and automatically slowing down near surfaces.
+
+```glsl
+#define MAX_STEPS 128   // Adjustable: max step count, 64~256, more = more precise but slower
+#define MAX_DIST 100.0  // Adjustable: max travel distance
+#define SURF_DIST 0.001 // Adjustable: surface hit threshold, 0.0001~0.01
+
+float rayMarch(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + t * rd;
+        float d = map(p);
+        if (d < SURF_DIST) return t;   // Surface hit
+        t += d;
+        if (t > MAX_DIST) break;        // Out of range
+    }
+    return -1.0; // No hit
+}
+```
+
+### Step 5: Normal Estimation
+
+**What**: Compute the surface normal at the hit point using the numerical gradient of the SDF.
+
+**Why**: Normals are the foundation of lighting calculations. The gradient direction of the SDF is the surface normal direction.
+
+```glsl
+// Method A: Central differences (6 SDF calls, straightforward)
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);  // e.x adjustable: differentiation step size
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+// Method B: Tetrahedron trick (4 SDF calls, prevents compiler inline bloat, recommended)
+vec3 calcNormal(vec3 pos) {
+    vec3 n = vec3(0.0);
+    for (int i = 0; i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3((((i+3)>>1)&1), ((i>>1)&1), (i&1)) - 1.0);
+        n += e * map(pos + 0.001 * e);
+    }
+    return normalize(n);
+}
+```
+
+### Step 6: Lighting and Shading
+
+**What**: Compute Phong lighting (ambient + diffuse + specular) at the hit point.
+
+**Why**: Give SDF surfaces realistic shading with highlights and shadow gradients.
+
+```glsl
+vec3 shade(vec3 p, vec3 rd) {
+    vec3 nor = calcNormal(p);
+    vec3 lightDir = normalize(vec3(0.6, 0.35, 0.5));   // Light direction (adjustable)
+    vec3 viewDir = -rd;
+    vec3 halfDir = normalize(lightDir + viewDir);
+
+    // Diffuse
+    float diff = clamp(dot(nor, lightDir), 0.0, 1.0);
+    // Specular
+    float spec = pow(clamp(dot(nor, halfDir), 0.0, 1.0), SHININESS); // SHININESS adjustable: 8~64
+    // Ambient + sky light
+    float sky = sqrt(clamp(0.5 + 0.5 * nor.y, 0.0, 1.0));
+
+    vec3 col = vec3(0.2, 0.2, 0.25);             // Material base color (adjustable)
+    vec3 lin = vec3(0.0);
+    lin += diff * vec3(1.3, 1.0, 0.7) * 2.2;     // Main light
+    lin += sky  * vec3(0.4, 0.6, 1.15) * 0.6;    // Sky light
+    lin += vec3(0.25) * 0.55;                      // Fill light
+    col *= lin;
+    col += spec * vec3(1.3, 1.0, 0.7) * 5.0;     // Specular highlight
+
+    return col;
+}
+```
+
+### Step 7: Post-Processing (Gamma Correction and Tone Mapping)
+
+**What**: Convert linear lighting results to sRGB space and apply tone mapping to prevent overexposure.
+
+**Why**: GPU computations are done in linear space, but displays require gamma-corrected values. Tone mapping compresses HDR values into the [0,1] range.
+
+```glsl
+// Gamma correction
+col = pow(col, vec3(0.4545));  // i.e., 1/2.2
+
+// Optional: Reinhard tone mapping (before gamma)
+col = col / (1.0 + col);
+
+// Optional: Vignette
+vec2 q = fragCoord / iResolution.xy;
+col *= 0.5 + 0.5 * pow(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y), 0.25);
+```
+
+## Common Variants in Detail
+
+### 1. Volumetric Ray Marching
+
+**Difference from the basic version**: Instead of finding a surface intersection, the ray advances in **fixed steps**, accumulating density/color at each step. Used for flames, smoke, and clouds.
+
+**Key modified code**:
+```glsl
+#define VOL_STEPS 150       // Adjustable: volume sample count
+#define VOL_STEP_SIZE 0.05  // Adjustable: step size
+
+// Density field (built with FBM noise)
+float fbmDensity(vec3 p) {
+    float den = 0.2 - p.y;                                    // Base height falloff
+    vec3 q = p - vec3(0.0, 1.0, 0.0) * iTime;
+    float f  = 0.5000 * noise(q); q = q * 2.02 - vec3(0.0, 1.0, 0.0) * iTime;
+          f += 0.2500 * noise(q); q = q * 2.03 - vec3(0.0, 1.0, 0.0) * iTime;
+          f += 0.1250 * noise(q); q = q * 2.01 - vec3(0.0, 1.0, 0.0) * iTime;
+          f += 0.0625 * noise(q);
+    return den + 4.0 * f;
+}
+
+// Volumetric marching main function
+vec3 volumetricMarch(vec3 ro, vec3 rd) {
+    vec4 sum = vec4(0.0);
+    float t = 0.05;
+    for (int i = 0; i < VOL_STEPS; i++) {
+        vec3 pos = ro + t * rd;
+        float den = fbmDensity(pos);
+        if (den > 0.0) {
+            den = min(den, 1.0);
+            vec3 col = mix(vec3(1.0, 0.5, 0.05), vec3(0.48, 0.53, 0.5),
+                           clamp(pos.y * 0.5, 0.0, 1.0));  // Fire-to-smoke color gradient
+            col *= den;
+            col.a = den * 0.6;
+            col.rgb *= col.a;
+            sum += col * (1.0 - sum.a);                     // Front-to-back compositing
+            if (sum.a > 0.99) break;                         // Early exit
+        }
+        t += VOL_STEP_SIZE;
+    }
+    return clamp(sum.rgb, 0.0, 1.0);
+}
+```
+
+### 2. CSG Scene Construction (Constructive Solid Geometry)
+
+**Difference from the basic version**: Combines multiple SDF primitives using `min` (union), `max` (intersection), and `max(a,-b)` (subtraction), along with rotation/translation transforms to create complex mechanical parts.
+
+**Key modified code**:
+```glsl
+float sceneSDF(vec3 p) {
+    p = rotateY(iTime * 0.5) * p;                                // Rotate entire scene
+
+    float sphere = sdSphere(p, 1.2);
+    float cube = sdBox(p, vec3(0.9));
+    float cyl = sdCylinder(p, vec2(0.4, 2.0));                   // Vertical cylinder
+    float cylX = sdCylinder(p.yzx, vec2(0.4, 2.0));              // X-axis cylinder (swizzled)
+    float cylZ = sdCylinder(p.xzy, vec2(0.4, 2.0));              // Z-axis cylinder
+
+    // Sphere ∩ Cube - three-axis cylinders = nut shape
+    return opSubtraction(
+        opIntersection(sphere, cube),
+        opUnion(cyl, opUnion(cylX, cylZ))
+    );
+}
+```
+
+### 3. Physically-Based Volumetric Scattering
+
+**Difference from the basic version**: Uses physically correct extinction coefficients, scattering coefficients, and transmittance formulas, with volumetric shadows (marching toward the light source to compute transmittance). Based on Frostbite engine's energy-conserving integration formula.
+
+**Key modified code**:
+```glsl
+void getParticipatingMedia(out float sigmaS, out float sigmaE, vec3 pos) {
+    float heightFog = 0.3 * clamp((7.0 - pos.y), 0.0, 1.0);  // Height fog
+    sigmaS = 0.02 + heightFog;                                  // Scattering coefficient
+    sigmaE = max(0.000001, sigmaS);                              // Extinction coefficient (includes absorption)
+}
+
+// Energy-conserving scattering integral (Frostbite improved version)
+vec3 S = lightColor * sigmaS * phaseFunction() * volShadow;     // Incoming light
+vec3 Sint = (S - S * exp(-sigmaE * stepLen)) / sigmaE;          // Integrate current step
+scatteredLight += transmittance * Sint;                          // Accumulate
+transmittance *= exp(-sigmaE * stepLen);                         // Update transmittance
+```
+
+### 4. Glow Accumulation
+
+**Difference from the basic version**: During the Ray March loop, additionally tracks the closest distance from the ray to the surface `dM`. Even without a hit, this produces a glow effect. Commonly used for glowing spheres and plasma.
+
+**Key modified code**:
+```glsl
+vec2 rayMarchWithGlow(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    float dMin = MAX_DIST;                    // Track minimum distance
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + t * rd;
+        float d = map(p);
+        if (d < dMin) dMin = d;               // Update closest distance
+        if (d < SURF_DIST) break;
+        t += d;
+        if (t > MAX_DIST) break;
+    }
+    return vec2(t, dMin);
+}
+
+// Add glow based on dMin during shading
+float glow = 0.02 / max(dMin, 0.001);        // Closer = brighter
+col += glow * vec3(1.0, 0.8, 0.9);
+```
+
+### 5. Refraction and Bidirectional Marching (Interior Marching)
+
+**Difference from the basic version**: After hitting a surface, computes the refraction direction and marches **inside the object in reverse** (negating the SDF) to find the exit point. Can achieve glass, water, and liquid metal effects.
+
+**Key modified code**:
+```glsl
+// Bidirectional marching: determine SDF sign based on whether the origin is inside or outside
+float castRay(vec3 ro, vec3 rd) {
+    float sign = (map(ro) < 0.0) ? -1.0 : 1.0;   // Negate distance if inside
+    float t = 0.0;
+    for (int i = 0; i < 120; i++) {
+        float h = sign * map(ro + rd * t);
+        if (abs(h) < 0.0001 || t > 12.0) break;
+        t += h;
+    }
+    return t;
+}
+
+// Refraction: after hitting the outer surface, march inside along the refracted direction
+vec3 refDir = refract(rd, nor, IOR);                // IOR adjustable: index of refraction, e.g., 0.9
+float t2 = 2.0;
+for (int i = 0; i < 50; i++) {
+    float h = map(hitPos + refDir * t2);
+    t2 -= h;                                         // Reverse marching (from inside outward)
+    if (abs(h) > 3.0) break;
+}
+vec3 nor2 = calcNormal(hitPos + refDir * t2);        // Exit point normal
+```
+
+## Performance Optimization in Detail
+
+### 1. Reducing SDF Call Count
+
+- Use the tetrahedron trick for normal computation (4 calls instead of 6 with central differences)
+- Use `min(iFrame,0)` as the loop start value to prevent the compiler from unrolling and inlining map() multiple times
+
+### 2. Bounding Box Acceleration
+
+Perform AABB ray intersection before marching to skip empty regions:
+```glsl
+vec2 tb = iBox(ro - center, rd, halfSize);
+if (tb.x < tb.y && tb.y > 0.0) { /* Only march inside the box */ }
+```
+
+### 3. Adaptive Precision
+
+- Scale the hit threshold with distance: `SURF_DIST * (1.0 + t * 0.1)` — distant surfaces don't need high precision
+- Clamp step size: `t += clamp(h, 0.01, 0.2)` — prevent individual steps from being too large or too small
+
+### 4. Early Exit
+
+- In volume rendering: `if (sum.a > 0.99) break;` — stop immediately when opaque
+- In shadow computation: `if (res < 0.004) break;` — stop when fully occluded
+
+### 5. Reducing map() Complexity
+
+- Use simplified SDFs for distant objects
+- First test with a cheap bounding SDF; only compute the expensive precise SDF when `sdBox(p, bound) < currentMin`
+
+### 6. Anti-Aliasing
+
+- Supersampling (AA=2 means 2x2 sampling, 4 rays per pixel), but at 4x performance cost
+- In volume rendering, use dithering instead of supersampling to reduce banding artifacts
+
+## Combination Suggestions in Detail
+
+### 1. Ray Marching + FBM Noise
+
+Use fractal noise to perturb SDF surfaces for terrain and rock textures, or build volumetric density fields to render clouds/smoke.
+
+### 2. Ray Marching + Domain Warping
+
+Apply spatial distortions (twist, bend, repeat) to sample points to create infinitely repeating corridors or twisted surreal geometry.
+
+### 3. Ray Marching + PBR Materials
+
+SDF provides geometry; combine with Cook-Torrance BRDF, environment map reflections, and Fresnel terms for realistic metal/dielectric materials.
+
+### 4. Ray Marching + Post-Processing
+
+Multi-pass architecture: the first Buffer performs Ray Marching and outputs color + depth (stored in the alpha channel); the second pass applies depth of field (DOF), motion blur, and tone mapping.
+
+### 5. Ray Marching + Procedural Animation
+
+Drive SDF primitive positions/sizes/blend coefficients with time parameters, combined with easing functions (smoothstep, parabolic) to create character animations without a skeletal system.
--- a/skills/shader-dev/reference/sdf-2d.md
+++ b/skills/shader-dev/reference/sdf-2d.md
@@ -0,0 +1,724 @@
+# 2D SDF Detailed Reference
+
+This file contains the complete step-by-step tutorial, mathematical derivations, detailed explanations, and advanced usage for [SKILL.md](SKILL.md).
+
+## Prerequisites
+
+- **GLSL Basics**: uniforms, varyings, built-in functions (length, dot, clamp, mix, smoothstep, step, sign, abs, max, min)
+- **Vector Math**: 2D vector operations, geometric meaning of dot and cross products
+- **Coordinate Systems**: conversion from screen coordinates to normalized device coordinates (NDC), aspect ratio correction
+- **Signed Distance Field Concept**: the function returns the signed distance to the shape boundary — negative inside, zero on the boundary, positive outside
+
+## Core Principles in Detail
+
+The core idea of 2D SDF: **for each pixel on screen, compute its shortest signed distance `d` to the target shape boundary**.
+
+- `d < 0`: pixel is inside the shape
+- `d = 0`: pixel is exactly on the boundary
+- `d > 0`: pixel is outside the shape
+
+Once you have the distance value `d`, use functions like `smoothstep` and `clamp` to map it to color/opacity, enabling:
+- **Fill**: color when `d < 0`
+- **Anti-aliased edges**: `smoothstep(-aa, aa, d)` for sub-pixel smoothing at the boundary
+- **Stroke**: apply smoothstep again on `abs(d) - strokeWidth`
+- **Boolean operations**: `min(d1, d2)` = union, `max(d1, d2)` = intersection, `max(-d1, d2)` = subtraction
+
+Key mathematical formulas:
+```
+Circle:       d = length(p - center) - radius
+Rectangle:    d = length(max(abs(p) - halfSize, 0.0)) + min(max(abs(p).x - halfSize.x, abs(p).y - halfSize.y), 0.0)
+Line segment: d = length(p - a - clamp(dot(p-a, b-a)/dot(b-a, b-a), 0, 1) * (b-a)) - width/2
+Union:        d = min(d1, d2)
+Intersection: d = max(d1, d2)
+Subtraction:  d = max(-d1, d2)
+Smooth union: d = mix(d2, d1, h) - k*h*(1-h),  h = clamp(0.5 + 0.5*(d2-d1)/k, 0, 1)
+```
+
+## Implementation Steps in Detail
+
+### Step 1: Coordinate Normalization and Aspect Ratio Correction
+
+**What**: Convert screen pixel coordinates to normalized coordinates centered at the screen center, with the y range of [-1, 1].
+
+**Why**: Pixel coordinates depend on resolution. After normalization, SDF parameters (such as radius) have resolution-independent physical meaning. Dividing by `iResolution.y` (not `.x`) ensures correct aspect ratio so circles don't become ellipses.
+
+**Code**:
+```glsl
+// Method 1: Origin at center, y range [-1, 1] (most common, standard practice)
+vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Method 2: If you need to work in pixel space (suitable for fixed pixel-size UI)
+vec2 p = fragCoord.xy;
+vec2 center = iResolution.xy * 0.5;
+
+// Method 3: [0, 1] range normalization (requires manual aspect ratio handling)
+vec2 uv = fragCoord.xy / iResolution.xy;
+```
+
+### Step 2: Defining SDF Primitive Functions
+
+**What**: Write basic primitive functions that return signed distances. Each function takes the current point `p` and shape parameters, and returns a `float` distance value.
+
+**Why**: These are the atomic building blocks for all 2D SDF graphics. Encapsulating them as independent functions allows free combination, transformation, and reuse.
+
+**Code**:
+```glsl
+// ---- Circle ----
+// The most basic SDF: distance from point to center minus radius
+float sdCircle(vec2 p, float radius) {
+    return length(p) - radius;
+}
+
+// ---- Rectangle (optional rounded corners) ----
+// halfSize is half-width and half-height, radius is the corner radius
+float sdBox(vec2 p, vec2 halfSize, float radius) {
+    halfSize -= vec2(radius);
+    vec2 d = abs(p) - halfSize;
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0)) - radius;
+}
+
+// ---- Line Segment ----
+// Line segment from start to end, with width
+float sdLine(vec2 p, vec2 start, vec2 end, float width) {
+    vec2 dir = end - start;
+    float h = clamp(dot(p - start, dir) / dot(dir, dir), 0.0, 1.0);
+    return length(p - start - dir * h) - width * 0.5;
+}
+
+// ---- Triangle (exact signed distance) ----
+// Three vertices p0, p1, p2, only one sqrt needed
+float sdTriangle(vec2 p, vec2 p0, vec2 p1, vec2 p2) {
+    vec2 e0 = p1 - p0, v0 = p - p0;
+    vec2 e1 = p2 - p1, v1 = p - p1;
+    vec2 e2 = p0 - p2, v2 = p - p2;
+
+    // Squared distance to each edge (projection + clamp)
+    float d0 = dot(v0 - e0 * clamp(dot(v0, e0) / dot(e0, e0), 0.0, 1.0),
+                   v0 - e0 * clamp(dot(v0, e0) / dot(e0, e0), 0.0, 1.0));
+    float d1 = dot(v1 - e1 * clamp(dot(v1, e1) / dot(e1, e1), 0.0, 1.0),
+                   v1 - e1 * clamp(dot(v1, e1) / dot(e1, e1), 0.0, 1.0));
+    float d2 = dot(v2 - e2 * clamp(dot(v2, e2) / dot(e2, e2), 0.0, 1.0),
+                   v2 - e2 * clamp(dot(v2, e2) / dot(e2, e2), 0.0, 1.0));
+
+    // Determine inside/outside using cross product sign
+    float o = e0.x * e2.y - e0.y * e2.x;
+    vec2 d = min(min(vec2(d0, o * (v0.x * e0.y - v0.y * e0.x)),
+                     vec2(d1, o * (v1.x * e1.y - v1.y * e1.x))),
+                     vec2(d2, o * (v2.x * e2.y - v2.y * e2.x)));
+    return -sqrt(d.x) * sign(d.y);
+}
+
+// ---- Ellipse (approximate) ----
+// Simplified ellipse SDF based on scaled space
+float sdEllipse(vec2 p, vec2 center, float a, float b) {
+    float a2 = a * a, b2 = b * b;
+    vec2 d = p - center;
+    return (b2 * d.x * d.x + a2 * d.y * d.y - a2 * b2) / (a2 * b2);
+}
+```
+
+### Step 3: CSG Boolean Operations
+
+**What**: Combine two SDF distance values using min/max operations to achieve union, subtraction, and intersection of shapes.
+
+**Why**: This is the most powerful capability of SDFs — building arbitrarily complex shapes from simple primitives. `min` takes the smaller of the two field values to produce a union (since smaller distance means "closer" to the shape interior); `max` takes the larger value for intersection; `max(a, -b)` inverts b's inside/outside and intersects for subtraction.
+
+**Code**:
+```glsl
+// Union: take the nearest shape
+float opUnion(float d1, float d2) {
+    return min(d1, d2);
+}
+
+// Intersection: overlapping region of both shapes
+float opIntersect(float d1, float d2) {
+    return max(d1, d2);
+}
+
+// Subtraction: carve d1 out of d2
+float opSubtract(float d1, float d2) {
+    return max(-d1, d2);
+}
+
+// Smooth union: produces a rounded transition at the junction, k controls transition width
+float opSmoothUnion(float d1, float d2, float k) {
+    float h = clamp(0.5 + 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) - k * h * (1.0 - h);
+}
+
+// XOR: non-overlapping region of both shapes
+float opXor(float d1, float d2) {
+    return min(max(-d1, d2), max(-d2, d1));
+}
+```
+
+### Step 4: Coordinate Transforms
+
+**What**: Transform coordinates before computing the SDF so that shapes appear at desired positions and angles.
+
+**Why**: SDF functions define shapes centered at the origin by default. By transforming the input coordinates (rather than the shape itself), you can freely place and rotate multiple primitives in the scene without affecting the mathematical properties of the distance field.
+
+**Code**:
+```glsl
+// Translation: move the coordinate origin to position t
+vec2 translate(vec2 p, vec2 t) {
+    return p - t;
+}
+
+// Counter-clockwise rotation
+vec2 rotateCCW(vec2 p, float angle) {
+    mat2 m = mat2(cos(angle), sin(angle), -sin(angle), cos(angle));
+    return p * m;
+}
+
+// Usage example: translate then rotate
+float d = sdBox(rotateCCW(translate(p, vec2(0.5, 0.3)), iTime), vec2(0.2), 0.05);
+```
+
+### Step 5: Distance Field Visualization and Rendering
+
+**What**: Convert the SDF distance value to final color output. Includes fill, anti-aliasing, stroke, contour lines, and other visualization methods.
+
+**Why**: The distance value itself is just a scalar that needs a mapping strategy to become a visual effect. `smoothstep` creates sub-pixel smooth transitions at the boundary, avoiding aliasing from hard edges. The `fwidth` function uses screen-space derivatives to automatically calculate pixel width, achieving resolution-independent anti-aliasing.
+
+**Code**:
+```glsl
+// ---- Method 1: clamp for simple alpha (most basic) ----
+float t = clamp(d, 0.0, 1.0);
+vec4 shapeColor = vec4(color, 1.0 - t);
+
+// ---- Method 2: smoothstep anti-aliasing (recommended general approach) ----
+// aa controls edge softness, typical value is pixel size px = 2.0/iResolution.y
+float px = 2.0 / iResolution.y;                      // Adjustable: anti-aliasing width
+float mask = smoothstep(px, -px, d);                  // 1.0 inside, 0.0 outside
+vec3 col = mix(backgroundColor, shapeColor, mask);
+
+// ---- Method 3: fwidth adaptive anti-aliasing (suitable for zooming scenes) ----
+float anti = fwidth(d) * 1.0;                         // Adjustable: multiplier, larger = softer edges
+float mask = 1.0 - smoothstep(-anti, anti, d);
+
+// ---- Method 4: Classic distance field debug visualization ----
+vec3 col = (d > 0.0) ? vec3(0.9, 0.6, 0.3)           // Outside: orange
+                      : vec3(0.65, 0.85, 1.0);        // Inside: blue
+col *= 1.0 - exp(-12.0 * abs(d));                     // Distance falloff
+col *= 0.8 + 0.2 * cos(120.0 * d);                    // Contour lines, 120.0 adjustable: line density
+col = mix(col, vec3(1.0), smoothstep(1.5*px, 0.0, abs(d) - 0.002)); // Zero contour highlight
+```
+
+### Step 6: Stroke and Border Rendering
+
+**What**: Use the absolute value of the distance field to extract the shape's outline, or render inner/outer borders separately.
+
+**Why**: Strokes are a natural byproduct of SDFs — `abs(d)` gives unsigned distance, and subtracting the stroke width yields the "stroke shape" SDF. Unlike rasterized strokes that require geometry expansion, SDF strokes need only one line of math.
+
+**Code**:
+```glsl
+// ---- Fill mask ----
+float fillMask(float d) {
+    return clamp(-d, 0.0, 1.0);
+}
+
+// ---- Stroke rendering (fwidth adaptive) ----
+// stroke is the stroke width (in distance field units)
+vec4 renderShape(float d, vec3 color, float stroke) {
+    float anti = fwidth(d) * 1.0;
+    vec4 strokeLayer = vec4(vec3(0.05), 1.0 - smoothstep(-anti, anti, d - stroke));
+    vec4 colorLayer  = vec4(color,      1.0 - smoothstep(-anti, anti, d));
+    if (stroke < 0.0001) return colorLayer;
+    return vec4(mix(strokeLayer.rgb, colorLayer.rgb, colorLayer.a), strokeLayer.a);
+}
+
+// ---- Inner border mask ----
+float innerBorderMask(float d, float width) {
+    return clamp(d + width, 0.0, 1.0) - clamp(d, 0.0, 1.0);
+}
+
+// ---- Outer border mask ----
+float outerBorderMask(float d, float width) {
+    return clamp(d, 0.0, 1.0) - clamp(d - width, 0.0, 1.0);
+}
+```
+
+### Step 7: Multi-Layer Compositing
+
+**What**: Render multiple SDF shapes as layers with alpha channels, then blend them back-to-front using `mix`.
+
+**Why**: Complex 2D scenes typically contain backgrounds, multiple shapes, strokes, and other visual layers. Rendering each SDF as an independent RGBA layer and compositing them layer by layer with standard alpha blending (`mix(bottom, top, top.a)`) is both intuitive and gives precise control over stacking order.
+
+**Code**:
+```glsl
+// Background layer
+vec3 bgColor = vec3(1.0, 0.8, 0.7 - 0.07 * p.y) * (1.0 - 0.25 * length(p));
+
+// Shape layer 1
+float d1 = sdCircle(translate(p, pos1), 0.3);
+vec4 layer1 = renderShape(d1, vec3(0.9, 0.3, 0.2), 0.02);
+
+// Shape layer 2
+float d2 = sdBox(translate(p, pos2), vec2(0.2), 0.05);
+vec4 layer2 = renderShape(d2, vec3(0.2, 0.5, 0.8), 0.0);
+
+// Composite back-to-front
+vec3 col = bgColor;
+col = mix(col, layer1.rgb, layer1.a);   // Overlay shape 1
+col = mix(col, layer2.rgb, layer2.a);   // Overlay shape 2
+
+fragColor = vec4(col, 1.0);
+```
+
+## Variant Detailed Descriptions
+
+### Variant 1: Solid Fill + Stroke Mode
+
+**Difference from the basic version**: Instead of showing distance field debug colors, renders solid shapes with clean strokes, suitable for UI and icons.
+
+**Key modified code**:
+```glsl
+// Replace the distance field visualization section
+vec3 shapeColor = vec3(0.32, 0.56, 0.53);
+float strokeW = 0.015;   // Adjustable: stroke width
+vec4 shape = render(d, shapeColor, strokeW);
+
+vec3 col = bgCol;
+col = mix(col, shape.rgb, shape.a);
+```
+
+### Variant 2: Multi-Layer CSG Illustration
+
+**Difference from the basic version**: Combines multiple SDF primitives through boolean operations into complex patterns (e.g., an umbrella, a logo), with each layer independently colored and composited layer by layer. Suitable for 2D illustrations and icon construction.
+
+**Key modified code**:
+```glsl
+// Build the body (ellipse intersection)
+float a = sdEllipse(p, vec2(0.0, 0.16), 0.25, 0.25);
+float b = sdEllipse(p, vec2(0.0, -0.03), 0.8, 0.35);
+float body = opIntersect(a, b);
+vec4 layer1 = render(body, vec3(0.32, 0.56, 0.53), fwidth(body) * 2.0);
+
+// Build the handle (line segment + arc subtraction)
+float handle = sdLine(p, vec2(0.0, 0.05), vec2(0.0, -0.42), 0.01);
+float arc = sdCircle(translate(p, vec2(-0.04, -0.42)), 0.04);
+float arcInner = sdCircle(translate(p, vec2(-0.04, -0.42)), 0.03);
+handle = opUnion(handle, opSubtract(arcInner, arc));
+vec4 layer0 = render(handle, vec3(0.4, 0.3, 0.28), STROKE_WIDTH);
+
+// Composite
+vec3 col = bgCol;
+col = mix(col, layer0.rgb, layer0.a);
+col = mix(col, layer1.rgb, layer1.a);
+```
+
+### Variant 3: Hexagonal Grid Tiling
+
+**Difference from the basic version**: Uses non-orthogonal coordinate system domain repetition to tile SDFs across the screen, with each cell having an independent ID for differentiated coloring. Suitable for background textures and geometric patterns.
+
+**Key modified code**:
+```glsl
+// Hexagonal grid function: returns (cellID.xy, edge distance, center distance)
+vec4 hexagon(vec2 p) {
+    vec2 q = vec2(p.x * 2.0 * 0.5773503, p.y + p.x * 0.5773503);
+    vec2 pi = floor(q);
+    vec2 pf = fract(q);
+    float v = mod(pi.x + pi.y, 3.0);
+    float ca = step(1.0, v);
+    float cb = step(2.0, v);
+    vec2 ma = step(pf.xy, pf.yx);
+    float e = dot(ma, 1.0 - pf.yx + ca*(pf.x+pf.y-1.0) + cb*(pf.yx-2.0*pf.xy));
+    p = vec2(q.x + floor(0.5 + p.y / 1.5), 4.0 * p.y / 3.0) * 0.5 + 0.5;
+    float f = length((fract(p) - 0.5) * vec2(1.0, 0.85));
+    return vec4(pi + ca - cb * ma, e, f);
+}
+
+// Usage
+#define HEX_SCALE 8.0          // Adjustable: grid density
+vec4 h = hexagon(HEX_SCALE * p + 0.5 * iTime);
+vec3 col = 0.15 + 0.15 * hash1(h.xy + 1.2);          // Different gray per cell
+col *= smoothstep(0.10, 0.11, h.z);                   // Edge lines
+col *= smoothstep(0.10, 0.11, h.w);                   // Center falloff
+```
+
+### Variant 4: Organic Shapes (Polar Coordinate SDF)
+
+**Difference from the basic version**: Uses polar coordinates `(atan, length)` to define shape boundary functions, enabling creation of hearts, petals, stars, and other non-polygonal organic shapes. Supports pulsing animations.
+
+**Key modified code**:
+```glsl
+// Heart SDF (polar coordinate algebraic curve)
+p.y -= 0.25;
+float a = atan(p.x, p.y) / 3.141593;
+float r = length(p);
+float h = abs(a);
+float d = (13.0*h - 22.0*h*h + 10.0*h*h*h) / (6.0 - 5.0*h);
+
+// Pulse animation
+float tt = mod(iTime, 1.5) / 1.5;
+float ss = pow(tt, 0.2) * 0.5 + 0.5;
+ss = 1.0 + ss * 0.5 * sin(tt * 6.2831 * 3.0) * exp(-tt * 4.0);  // Adjustable: sin frequency controls pulse count
+
+// Rendering
+vec3 col = mix(bgCol, heartCol, smoothstep(-0.01, 0.01, d - r));
+```
+
+### Variant 5: Bezier Curve SDF
+
+**Difference from the basic version**: Computes the exact signed distance from a point to a quadratic Bezier curve by solving a cubic equation (Cardano's formula). Suitable for curved text, path rendering, and similar scenarios.
+
+**Key modified code**:
+```glsl
+// Cubic equation solver (Cardano's formula)
+vec3 solveCubic(float a, float b, float c) {
+    float p = b - a*a/3.0, p3 = p*p*p;
+    float q = a*(2.0*a*a - 9.0*b)/27.0 + c;
+    float d = q*q + 4.0*p3/27.0;
+    float offset = -a/3.0;
+    if (d >= 0.0) {
+        float z = sqrt(d);
+        vec2 x = (vec2(z,-z) - q) / 2.0;
+        vec2 uv = sign(x) * pow(abs(x), vec2(1.0/3.0));
+        return vec3(offset + uv.x + uv.y);
+    }
+    float v = acos(-sqrt(-27.0/p3)*q/2.0) / 3.0;
+    float m = cos(v), n = sin(v) * 1.732050808;
+    return vec3(m+m, -n-m, n-m) * sqrt(-p/3.0) + offset;
+}
+
+// Bezier SDF (three control points A, B, C)
+float sdBezier(vec2 A, vec2 B, vec2 C, vec2 p) {
+    B = mix(B + vec2(1e-4), B, step(1e-6, abs(B*2.0-A-C)));
+    vec2 a = B-A, b = A-B*2.0+C, c = a*2.0, d = A-p;
+    vec3 k = vec3(3.*dot(a,b), 2.*dot(a,a)+dot(d,b), dot(d,a)) / dot(b,b);
+    vec3 t = clamp(solveCubic(k.x, k.y, k.z), 0.0, 1.0);
+    vec2 pos = A+(c+b*t.x)*t.x; float dis = length(pos-p);
+    pos = A+(c+b*t.y)*t.y; dis = min(dis, length(pos-p));
+    pos = A+(c+b*t.z)*t.z; dis = min(dis, length(pos-p));
+    return dis * signBezier(A, B, C, p);   // signBezier uses barycentric coordinates to determine sign
+}
+```
+
+## Performance Optimization in Detail
+
+### 1. Reducing sqrt Calls
+
+In polygon SDFs (such as triangles), by comparing squared distance values first and only taking `sqrt` on the minimum distance at the end, multiple `sqrt` calls are reduced to one. This is the core optimization idea behind the triangle SDF implementation.
+
+```glsl
+// Bad: sqrt on every edge
+float d0 = length(v0 - e0 * h0);
+float d1 = length(v1 - e1 * h1);
+// Good: compare dot(v,v) squares, one sqrt at the end
+float d0 = dot(proj0, proj0);
+float d1 = dot(proj1, proj1);
+return -sqrt(min(d0, d1)) * sign(...);
+```
+
+### 2. fwidth vs Fixed Pixel Width
+
+`fwidth(d)` invokes screen-space partial derivatives. In simple scenes, a fixed `px = 2.0/iResolution.y` can replace it to reduce GPU derivative computation overhead. However, in scenes with coordinate scaling/distortion (such as the hexagonal grid's `pos *= 1.2 + 0.15*length(pos)`), `fwidth` must be used to ensure correct anti-aliasing width.
+
+### 3. Avoiding Excessive Boolean Operation Nesting
+
+Large amounts of `min`/`max` nesting are correct but computing distances for all primitives per pixel per frame can be expensive. You can skip distant primitives by checking rough bounding boxes:
+
+```glsl
+// Only compute precisely when near the shape
+if (length(p - shapeCenter) < shapeRadius + margin) {
+    d = opUnion(d, sdComplexShape(p));
+}
+```
+
+### 4. Supersampling AA Trade-off
+
+Multiple samples (e.g., 2x2 supersampling) yield higher quality anti-aliasing but multiply the fragment shader computation by 4:
+
+```glsl
+#define AA 2  // Adjustable: 1 = no supersampling, 2 = 4x, 3 = 9x
+for (int m = 0; m < AA; m++)
+for (int n = 0; n < AA; n++) {
+    vec2 off = vec2(m, n) / float(AA);
+    // ... computation ...
+    tot += col;
+}
+tot /= float(AA * AA);
+```
+
+For most real-time scenes, single-pixel AA with `smoothstep` or `fwidth` is sufficient. Supersampling is mainly for offline rendering or showcase scenes.
+
+### 5. Step Size Optimization for 2D Soft Shadows
+
+In cone marching 2D soft shadows, use `max(1.0, abs(sd))` instead of a fixed step size to take large leaps in open areas and small precise steps near shapes. Typically 64 steps can cover a large scene:
+
+```glsl
+dt += max(1.0, abs(sd));  // Adaptive step size
+if (dt > dl) break;       // Early exit after reaching the light source
+```
+
+## Combination Suggestions in Detail
+
+### 1. SDF + Noise Textures
+
+Adding noise values to the distance field creates dissolve, erosion, and organic edge effects:
+
+```glsl
+float d = sdCircle(p, 0.4);
+d += noise(p * 10.0 + iTime) * 0.05;  // Organic jittery edges
+```
+
+### 2. SDF + 2D Lighting and Shadows
+
+Cone marching based on the distance field implements real-time soft shadows and multi-light lighting for 2D scenes. The distance field provides "scene query" capability, using `sceneDist()` during ray marching to check occlusion:
+
+```glsl
+// 2D soft shadow (see 4dfXDn for full implementation)
+float shadow(vec2 p, vec2 lightPos, float radius) {
+    vec2 dir = normalize(lightPos - p);
+    float dl = length(p - lightPos);
+    float lf = radius * dl;
+    float dt = 0.01;
+    for (int i = 0; i < 64; i++) {
+        float sd = sceneDist(p + dir * dt);
+        if (sd < -radius) return 0.0;
+        lf = min(lf, sd / dt);
+        dt += max(1.0, abs(sd));
+        if (dt > dl) break;
+    }
+    lf = clamp((lf*dl + radius) / (2.0*radius), 0.0, 1.0);
+    return smoothstep(0.0, 1.0, lf);
+}
+```
+
+### 3. SDF + Normal Mapping / Bump Mapping
+
+By computing normals via finite differences on the distance field, then applying standard lighting models, you can simulate 3D bump/highlight effects on 2D SDFs (as done in the DVD Bounce shader):
+
+```glsl
+vec2 e = vec2(0.8, 0.0) / iResolution.y;
+float fx = sceneDist(p) - sceneDist(p + e);
+float fy = sceneDist(p) - sceneDist(p + e.yx);
+vec3 nor = normalize(vec3(fx, fy, e.x / 0.1));  // 0.1 = bump factor, adjustable
+// Standard Blinn-Phong lighting
+vec3 lig = normalize(vec3(1.0, 2.0, 2.0));
+float dif = clamp(dot(lig, nor), 0.0, 1.0);
+```
+
+### 4. SDF + Domain Repetition (Spatial Tiling)
+
+Use `fract` or `mod` on coordinates for infinite repetition; use `floor` to get cell IDs for differentiated coloring. Suitable for background patterns, particle arrays, etc.:
+
+```glsl
+vec2 cellSize = vec2(0.5);
+vec2 cellID = floor(p / cellSize);
+vec2 cellP = fract(p / cellSize) - 0.5;        // Local coordinate within cell
+float d = sdCircle(cellP, 0.15 + 0.05 * sin(iTime + cellID.x * 3.0));
+```
+
+### 5. SDF + Animation
+
+Distance field parameters (position, radius, rotation angle) naturally support continuous animation. Combine with `sin/cos` periodic motion, `exp` decay, `mod` looping, and other time functions:
+
+```glsl
+// Bouncing
+float y = abs(sin(iTime * 3.0)) * 0.5;
+float d = sdCircle(translate(p, vec2(0.0, y)), 0.2);
+
+// Pulse scaling
+float pulse = 1.0 + 0.1 * sin(iTime * 6.28 * 2.0) * exp(-mod(iTime, 1.0) * 4.0);
+float d = sdCircle(p / pulse, 0.3) * pulse;
+
+// Rotation
+float d = sdBox(rotateCCW(p, iTime), vec2(0.2), 0.03);
+```
+
+## Extended 2D SDF Primitives Reference
+
+### sdRoundedBox — Rounded Box with Independent Corner Radii
+
+**Signature**: `float sdRoundedBox(vec2 p, vec2 b, vec4 r)`
+
+- `p`: query point
+- `b`: half-size of the box
+- `r`: corner radii as `vec4(top-right, bottom-right, top-left, bottom-left)`
+
+Selects the appropriate corner radius based on the quadrant of `p`, then computes a standard rounded box distance. Useful for UI elements where each corner needs a different rounding.
+
+### sdOrientedBox — Oriented Box
+
+**Signature**: `float sdOrientedBox(vec2 p, vec2 a, vec2 b, float th)`
+
+- `p`: query point
+- `a`, `b`: endpoints defining the box's center axis
+- `th`: thickness (full width perpendicular to the axis)
+
+Constructs a local coordinate frame aligned with segment `a`-to-`b`, then evaluates a standard box SDF. Useful for drawing thick line-like rectangles at arbitrary angles without manual rotation.
+
+### sdArc — Arc
+
+**Signature**: `float sdArc(vec2 p, vec2 sc, float ra, float rb)`
+
+- `p`: query point
+- `sc`: `vec2(sin, cos)` of the half-aperture angle
+- `ra`: arc radius
+- `rb`: arc thickness
+
+Computes distance to an arc segment. The aperture is symmetric about the y-axis. Combines angular clamping with radial distance.
+
+### sdPie — Pie / Sector
+
+**Signature**: `float sdPie(vec2 p, vec2 c, float r)`
+
+- `p`: query point
+- `c`: `vec2(sin, cos)` of the half-aperture angle
+- `r`: radius
+
+Returns the signed distance to a filled pie-slice (sector) shape. The sector is symmetric about the y-axis.
+
+### sdRing — Ring
+
+**Signature**: `float sdRing(vec2 p, vec2 n, float r, float th)`
+
+- `p`: query point
+- `n`: `vec2(sin, cos)` of the half-aperture angle
+- `r`: ring radius
+- `th`: ring thickness
+
+Similar to `sdArc` but with capped endpoints and full ring behavior within the aperture.
+
+### sdMoon — Moon Shape
+
+**Signature**: `float sdMoon(vec2 p, float d, float ra, float rb)`
+
+- `p`: query point
+- `d`: distance between circle centers
+- `ra`: radius of outer circle
+- `rb`: radius of inner (subtracted) circle
+
+Creates a crescent/moon shape by subtracting one circle from another. The two circles are offset by distance `d` along the x-axis.
+
+### sdHeart — Heart (Approximate)
+
+**Signature**: `float sdHeart(vec2 p)`
+
+- `p`: query point (centered at origin, roughly unit scale)
+
+An approximate heart SDF composed of two geometric regions stitched together. The shape extends roughly from (0,0) to (0,1) vertically.
+
+### sdVesica — Vesica / Lens Shape
+
+**Signature**: `float sdVesica(vec2 p, float w, float h)`
+
+- `p`: query point
+- `w`: width of the vesica
+- `h`: height of the vesica
+
+A lens-shaped figure (vesica piscis) formed by the intersection of two circles. Symmetric about both axes.
+
+### sdEgg — Egg Shape
+
+**Signature**: `float sdEgg(vec2 p, float he, float ra, float rb)`
+
+- `p`: query point
+- `he`: half-height of the straight section
+- `ra`: radius at bottom
+- `rb`: radius at top
+
+Produces an egg-like shape with different radii at top and bottom, connected by a straight vertical section.
+
+### sdEquilateralTriangle — Equilateral Triangle
+
+**Signature**: `float sdEquilateralTriangle(vec2 p, float r)`
+
+- `p`: query point
+- `r`: side length / scale
+
+An exact SDF for an equilateral triangle centered at the origin using symmetry folding.
+
+### sdPentagon — Pentagon
+
+**Signature**: `float sdPentagon(vec2 p, float r)`
+
+- `p`: query point
+- `r`: circumscribed radius
+
+Regular pentagon SDF using mirror-fold operations along pentagon edge normals. The constants encode cos/sin of 72-degree angles.
+
+### sdHexagon — Hexagon
+
+**Signature**: `float sdHexagon(vec2 p, float r)`
+
+- `p`: query point
+- `r`: circumscribed radius
+
+Regular hexagon SDF. Constants encode cos(30), sin(30), and tan(30). Uses a single mirror fold.
+
+### sdOctagon — Octagon
+
+**Signature**: `float sdOctagon(vec2 p, float r)`
+
+- `p`: query point
+- `r`: circumscribed radius
+
+Regular octagon SDF. Uses two mirror folds at 22.5-degree and 67.5-degree angles.
+
+### sdStar — N-Pointed Star
+
+**Signature**: `float sdStar(vec2 p, float r, int n, float m)`
+
+- `p`: query point
+- `r`: outer radius
+- `n`: number of points
+- `m`: inner radius ratio (controls pointiness; typical range 2.0-6.0)
+
+A general n-pointed star using angular repetition (`mod(atan(...))`) and edge projection. Higher `m` values produce sharper, thinner points.
+
+### sdBezier (Extended) — Quadratic Bezier Curve SDF
+
+**Signature**: `float sdBezier(vec2 pos, vec2 A, vec2 B, vec2 C)`
+
+- `pos`: query point
+- `A`, `B`, `C`: control points of the quadratic Bezier
+
+An alternative Bezier SDF formulation that solves for the closest point on the curve using the cubic formula. Returns unsigned distance (no sign). Note the different parameter order from the Variant 5 version.
+
+### sdParabola — Parabola
+
+**Signature**: `float sdParabola(vec2 pos, float k)`
+
+- `pos`: query point
+- `k`: curvature coefficient (y = k * x^2)
+
+Signed distance to a parabola. Uses a cubic root solution to find the closest point on the curve.
+
+### sdCross — Cross Shape
+
+**Signature**: `float sdCross(vec2 p, vec2 b, float r)`
+
+- `p`: query point
+- `b`: half-extents of each arm (b.x = length, b.y = width)
+- `r`: corner rounding offset
+
+A plus/cross shape formed by the union of two perpendicular rectangles, with an optional rounding parameter.
+
+## 2D SDF Modifiers Reference
+
+### opRound2D — Rounding Modifier
+
+**Signature**: `float opRound2D(float d, float r)`
+
+Subtracts `r` from any SDF, effectively expanding the shape boundary outward by `r` and rounding all corners/edges. Apply to any existing SDF to add uniform rounding.
+
+### opAnnular2D — Annular (Hollowing) Modifier
+
+**Signature**: `float opAnnular2D(float d, float r)`
+
+Takes the absolute value of the distance and subtracts thickness `r`, converting any filled shape into a ring/outline version with wall thickness `2*r`. Stackable: applying twice creates concentric rings.
+
+### opRepeat2D — Grid Repetition
+
+**Signature**: `vec2 opRepeat2D(vec2 p, float s)`
+
+Applies `mod` to fold coordinates into a repeating grid cell of size `s`. Apply to `p` before passing to any SDF to create infinite tiling. Use `floor(p / s)` to obtain cell IDs for per-cell variation.
+
+### opMirror2D — Arbitrary Mirror
+
+**Signature**: `vec2 opMirror2D(vec2 p, vec2 dir)`
+
+Mirrors coordinates across a line through the origin with direction `dir` (should be normalized). Any point on the negative side of the line is reflected to the positive side, effectively creating bilateral symmetry along any arbitrary axis.
--- a/skills/shader-dev/reference/sdf-3d.md
+++ b/skills/shader-dev/reference/sdf-3d.md
@@ -0,0 +1,805 @@
+# 3D Signed Distance Fields (3D SDF) — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Basics**: uniform variables (`iTime`, `iResolution`, `iMouse`), `fragCoord` coordinate system
+- **Vector Math**: built-in functions like `dot`, `cross`, `normalize`, `length`, `reflect`
+- **Rays and Cameras**: understanding how to generate rays from screen pixels (ray origin + ray direction)
+- **Implicit Surface Concept**: f(p) = 0 defines the surface, f(p) > 0 is outside, f(p) < 0 is inside
+
+## Step-by-Step Detailed Explanation
+
+### Step 1: SDF Primitive Library
+
+**What**: Define basic geometric distance functions.
+
+**Why**: All SDF scenes are composed of basic primitives. Each primitive is a pure function that takes a point in space and returns the shortest distance to that primitive's surface. The accuracy of these primitives directly determines the efficiency of sphere tracing — accurate SDFs allow larger step sizes.
+
+**Code**:
+
+```glsl
+// Sphere: p=sample point, r=radius
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+// Box: p=sample point, b=half-size (xyz dimensions)
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+
+// Ellipsoid (approximate): p=sample point, r=three-axis radii
+float sdEllipsoid(vec3 p, vec3 r) {
+    float k0 = length(p / r);
+    float k1 = length(p / (r * r));
+    return k0 * (k0 - 1.0) / k1;
+}
+
+// Torus: p=sample point, t.x=major radius, t.y=tube radius
+float sdTorus(vec3 p, vec2 t) {
+    return length(vec2(length(p.xz) - t.x, p.y)) - t.y;
+}
+
+// Capsule (two endpoints + radius): useful for skeleton/limb modeling
+float sdCapsule(vec3 p, vec3 a, vec3 b, float r) {
+    vec3 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h) - r;
+}
+
+// Cylinder (vertical): h.x=radius, h.y=half-height
+float sdCylinder(vec3 p, vec2 h) {
+    vec2 d = abs(vec2(length(p.xz), p.y)) - h;
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0));
+}
+
+// Plane (y=0)
+float sdPlane(vec3 p) {
+    return p.y;
+}
+```
+
+### Step 2: Boolean Operations and Smooth Blending
+
+**What**: Define combination operations between primitives — union, subtraction, intersection, and their smooth variants.
+
+**Why**: Union merges multiple primitives into one scene; subtraction carves one object out of another; intersection keeps the overlapping region. Smooth variants (`smin`/`smax`) use a control parameter `k` to produce smooth blend transitions — one of SDF's most powerful capabilities over traditional modeling, achieving organic forms without additional geometry.
+
+**Code**:
+
+```glsl
+// === Hard Boolean Operations ===
+
+// Union: take the nearer surface
+float opUnion(float d1, float d2) { return min(d1, d2); }
+
+// Subtraction: subtract d2 from d1
+float opSubtraction(float d1, float d2) { return max(d1, -d2); }
+
+// Intersection: keep the overlapping region
+float opIntersection(float d1, float d2) { return max(d1, d2); }
+
+// Union with material ID (vec2.x stores distance, vec2.y stores material ID)
+vec2 opU(vec2 d1, vec2 d2) { return (d1.x < d2.x) ? d1 : d2; }
+
+// === Smooth Boolean Operations ===
+
+// Smooth union: k=blend radius (larger = smoother, typical values 0.1~0.5)
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+// vec2 version of smin: for smooth blending of vec2(distance, materialID)
+vec2 smin(vec2 a, vec2 b, float k) {
+    float h = max(k - abs(a.x - b.x), 0.0);
+    float d = min(a.x, b.x) - h * h * 0.25 / k;
+    float m = (a.x < b.x) ? a.y : b.y;
+    return vec2(d, m);
+}
+
+// Smooth subtraction / smooth max
+float smax(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+```
+
+### Step 3: Scene Definition (map Function)
+
+**What**: Write the `map()` function that combines the above primitives and operations into a complete 3D scene.
+
+**Why**: `map(p)` is the core of the SDF rendering pipeline — it returns the distance from any point p in space to the nearest scene surface (plus optional material information). Ray marching, normal computation, shadows, and AO all depend on this function. All geometric complexity of the scene is encapsulated here.
+
+**Code**:
+
+```glsl
+// Returns vec2(distance, materialID)
+vec2 map(vec3 p) {
+    // Ground
+    vec2 res = vec2(p.y, 0.0); // Material 0: ground
+
+    // Sphere (displaced to y=0.5)
+    float d1 = sdSphere(p - vec3(0.0, 0.5, 0.0), 0.4);
+    res = opU(res, vec2(d1, 1.0)); // Material 1: sphere
+
+    // Box
+    float d2 = sdBox(p - vec3(1.5, 0.4, 0.0), vec3(0.3, 0.4, 0.3));
+    res = opU(res, vec2(d2, 2.0)); // Material 2: box
+
+    // Blend two spheres with smin for organic blob effect
+    float d3 = sdSphere(p - vec3(-1.2, 0.5, 0.0), 0.3);
+    float d4 = sdSphere(p - vec3(-1.5, 0.8, 0.2), 0.25);
+    float dBlob = smin(d3, d4, 0.3);
+    res = opU(res, vec2(dBlob, 3.0)); // Material 3: blob
+
+    return res;
+}
+```
+
+### Step 4: Raymarching
+
+**What**: Implement the sphere tracing loop — cast a ray from the camera and step along the ray direction until hitting a surface or exceeding the maximum distance.
+
+**Why**: Sphere tracing exploits the "safe distance" property of SDFs — the current SDF value tells us there is absolutely no surface within that radius, so we can safely advance that far. This is much more efficient than fixed-step volumetric ray marching, typically achieving precise results in 64-128 steps.
+
+**Code**:
+
+```glsl
+#define MAX_STEPS 128      // Adjustable: step count, 64=fast/coarse, 256=precise/slow
+#define MAX_DIST 40.0       // Adjustable: max trace distance
+#define SURF_DIST 0.0001    // Adjustable: surface detection threshold
+
+vec2 raycast(vec3 ro, vec3 rd) {
+    vec2 res = vec2(-1.0, -1.0);
+    float t = 0.01;
+
+    for (int i = 0; i < MAX_STEPS && t < MAX_DIST; i++) {
+        vec2 h = map(ro + rd * t);
+        if (abs(h.x) < SURF_DIST * t) {
+            res = vec2(t, h.y);
+            break;
+        }
+        t += h.x; // Key: step distance = SDF value
+    }
+    return res; // .x=hit distance, .y=materialID; -1 means no hit
+}
+```
+
+### Step 5: Normal Computation
+
+**What**: Compute the surface normal at the hit point by taking the finite-difference gradient of the SDF.
+
+**Why**: The gradient direction of the SDF is the surface normal direction. We use the tetrahedron trick (4 `map` calls) instead of central differences (6 calls), saving performance and avoiding compiler inline bloat from inlining `map()` multiple times.
+
+**Code**:
+
+```glsl
+// Tetrahedron normal computation (recommended, only 4 map calls)
+vec3 calcNormal(vec3 pos) {
+    vec2 e = vec2(1.0, -1.0) * 0.5773 * 0.0005; // Adjustable: epsilon
+    return normalize(
+        e.xyy * map(pos + e.xyy).x +
+        e.yyx * map(pos + e.yyx).x +
+        e.yxy * map(pos + e.yxy).x +
+        e.xxx * map(pos + e.xxx).x
+    );
+}
+
+// Anti-compiler-inline version (suitable for complex map functions)
+// Uses a loop to prevent compiler unrolling, uses a loop to prevent compiler unrolling
+#define ZERO (min(iFrame, 0))
+vec3 calcNormalLoop(vec3 pos) {
+    vec3 n = vec3(0.0);
+    for (int i = ZERO; i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3((((i+3)>>1)&1), ((i>>1)&1), (i&1)) - 1.0);
+        n += e * map(pos + 0.0005 * e).x;
+    }
+    return normalize(n);
+}
+```
+
+### Step 6: Soft Shadows
+
+**What**: Cast a secondary ray from the surface point toward the light source, and estimate shadow softness based on the minimum distance encountered along the way.
+
+**Why**: Hard shadows only determine "occluded or not" (0/1), while SDF soft shadows use intermediate distance information to estimate "how close to being occluded." In the formula `k*h/t`, `k` controls shadow softness — larger `k` produces sharper shadows, smaller `k` produces softer shadows. This is one of SDF rendering's killer features.
+
+**Code**:
+
+```glsl
+// k=shadow sharpness (2=very soft, 32=near hard), mint=start offset, tmax=max distance
+float calcSoftshadow(vec3 ro, vec3 rd, float mint, float tmax, float k) {
+    float res = 1.0;
+    float t = mint;
+    for (int i = 0; i < 24; i++) { // Adjustable: shadow step count
+        float h = map(ro + rd * t).x;
+        float s = clamp(k * h / t, 0.0, 1.0);
+        res = min(res, s);
+        t += clamp(h, 0.01, 0.2);
+        if (res < 0.004 || t > tmax) break;
+    }
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res); // Smooth Hermite interpolation
+}
+```
+
+### Step 7: Ambient Occlusion (AO)
+
+**What**: Sample several points along the normal direction and compare actual SDF values with expected distances to estimate occlusion.
+
+**Why**: SDFs naturally provide distance information for cheap AO approximation: if the SDF value at a sample point along the normal is much smaller than its distance to the surface, nearby occluding geometry exists. This method is more physically accurate than traditional SSAO and requires only 5 `map` calls.
+
+**Code**:
+
+```glsl
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) { // Adjustable: number of sample layers
+        float h = 0.01 + 0.12 * float(i) / 4.0; // Adjustable: sample spacing
+        float d = map(pos + h * nor).x;
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+```
+
+### Step 8: Camera and Rendering Pipeline
+
+**What**: Build a look-at camera matrix, generate screen rays, and chain together the entire rendering pipeline.
+
+**Why**: Mapping screen pixels to 3D rays is the starting point of raymarching. The look-at matrix builds an orthonormal basis from the camera position, target point, and up direction, making camera control intuitive. The final pipeline chains all steps: ray generation, ray marching, normals, lighting/shadows/AO, and post-processing.
+
+**Code**:
+
+```glsl
+// Camera look-at matrix
+mat3 setCamera(vec3 ro, vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);
+    vec3 cu = normalize(cross(cw, cp));
+    vec3 cv = cross(cu, cw);
+    return mat3(cu, cv, cw);
+}
+
+// Render: input ray, output color
+vec3 render(vec3 ro, vec3 rd) {
+    // Background color (sky gradient)
+    vec3 col = vec3(0.7, 0.7, 0.9) - max(rd.y, 0.0) * 0.3;
+
+    // Raycast intersection
+    vec2 res = raycast(ro, rd);
+    float t = res.x;
+    float m = res.y; // Material ID
+
+    if (m > -0.5) {
+        vec3 pos = ro + t * rd;
+        vec3 nor = calcNormal(pos);
+
+        // Material color (varies by ID)
+        vec3 mate = 0.2 + 0.2 * sin(m * 2.0 + vec3(0.0, 1.0, 2.0));
+
+        // Lighting
+        vec3 lig = normalize(vec3(-0.5, 0.4, -0.6));
+        float dif = clamp(dot(nor, lig), 0.0, 1.0);
+        dif *= calcSoftshadow(pos, lig, 0.02, 2.5, 8.0);
+        float amb = 0.5 + 0.5 * nor.y;
+        float occ = calcAO(pos, nor);
+
+        col = mate * (dif * vec3(1.3, 1.0, 0.7) + amb * occ * vec3(0.4, 0.6, 1.0) * 0.6);
+
+        // Fog (exponential decay)
+        col = mix(col, vec3(0.7, 0.7, 0.9), 1.0 - exp(-0.0001 * t * t * t));
+    }
+
+    return clamp(col, 0.0, 1.0);
+}
+```
+
+## Variant Detailed Descriptions
+
+### Variant 1: Dynamic Organic Body (Smooth Blob Animation)
+
+**Difference from the basic version**: Replaces static primitives with multiple animated spheres blended via `smin`, producing lava/fluid-like organic effects. A common technique for organic fluid-like effects.
+
+**Key modified code**:
+
+```glsl
+// Replace scene definition in map()
+vec2 map(vec3 p) {
+    float d = 2.0;
+    for (int i = 0; i < 16; i++) { // Adjustable: number of spheres
+        float fi = float(i);
+        float t = iTime * (fract(fi * 412.531 + 0.513) - 0.5) * 2.0;
+        d = smin(
+            sdSphere(p + sin(t + fi * vec3(52.5126, 64.627, 632.25)) * vec3(2.0, 2.0, 0.8),
+                     mix(0.5, 1.0, fract(fi * 412.531 + 0.5124))),
+            d,
+            0.4 // Adjustable: blend radius
+        );
+    }
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 2: Infinite Repeating Corridor (Domain Repetition)
+
+**Difference from the basic version**: Uses `mod()` to repeat spatial coordinates infinitely. A common domain repetition technique. Can layer `hash()` to introduce random variation per repeating cell.
+
+**Key modified code**:
+
+```glsl
+// Linear domain repetition
+float repeat(float v, float c) {
+    return mod(v, c) - c * 0.5;
+}
+
+// Angular domain repetition (repeat count times in polar coordinate direction)
+float amod(inout vec2 p, float count) {
+    float an = 6.283185 / count;
+    float a = atan(p.y, p.x) + an * 0.5;
+    float c = floor(a / an);
+    a = mod(a, an) - an * 0.5;
+    p = vec2(cos(a), sin(a)) * length(p);
+    return c; // Returns sector index
+}
+
+vec2 map(vec3 p) {
+    // Repeat every 4 units along the z axis
+    p.z = repeat(p.z, 4.0);
+    // Add bending offset along x axis
+    p.x += 2.0 * sin(p.z * 0.1);
+
+    float d = -sdBox(p, vec3(2.0, 2.0, 20.0)); // Invert = corridor interior
+    d = max(d, -sdBox(p, vec3(1.8, 1.8, 1.9))); // Subtract interior space
+    d = min(d, sdCylinder(p - vec3(1.5, -2.0, 0.0), vec2(0.1, 2.0))); // Add pillars
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 3: Character/Creature Modeling (Organic Character Modeling)
+
+**Difference from the basic version**: Uses `sdEllipsoid` + `sdCapsule` (sdStick) to compose body parts, `smin` to connect with smooth transitions, and `smax` to carve indentations (mouth). Combined with procedural animation to drive joints. A standard approach for character SDF modeling.
+
+**Key modified code**:
+
+```glsl
+// Stick primitive (different radii at each end, suitable for limbs)
+vec2 sdStick(vec3 p, vec3 a, vec3 b, float r1, float r2) {
+    vec3 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return vec2(length(pa - ba * h) - mix(r1, r2, h * h * (3.0 - 2.0 * h)), h);
+}
+
+vec2 map(vec3 pos) {
+    // Body (ellipsoid)
+    float d = sdEllipsoid(pos, vec3(0.25, 0.3, 0.25));
+
+    // Head (sphere, connected with smin)
+    float dHead = sdEllipsoid(pos - vec3(0.0, 0.35, 0.02), vec3(0.12, 0.15, 0.13));
+    d = smin(d, dHead, 0.1);
+
+    // Arms (sdStick)
+    vec2 arm = sdStick(abs(pos.x) > 0.0 ? vec3(abs(pos.x), pos.yz) : pos,
+                       vec3(0.18, 0.2, -0.05),
+                       vec3(0.35, -0.1, -0.15), 0.03, 0.05);
+    d = smin(d, arm.x, 0.04);
+
+    // Mouth (carved with smax)
+    float dMouth = sdEllipsoid(pos - vec3(0.0, 0.3, 0.15), vec3(0.08, 0.03, 0.1));
+    d = smax(d, -dMouth, 0.03);
+
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 4: Symmetry Exploitation
+
+**Difference from the basic version**: Leverages geometric symmetry (mirror/rotational invariance) to reduce N repeated elements' SDF evaluations to N/k. For example, octahedral symmetry can reduce 18 elements to 4 evaluations. The key is mapping the input point to the symmetry's fundamental domain.
+
+**Key modified code**:
+
+```glsl
+// Fold a point into the octahedral fundamental domain
+vec2 rot45(vec2 v) {
+    return vec2(v.x - v.y, v.y + v.x) * 0.707107;
+}
+
+vec2 map(vec3 p) {
+    float d = sdSphere(p, 0.12); // Center sphere
+
+    // Exploit symmetry: original 18 gears reduced to 4 evaluations
+    vec3 qx = vec3(rot45(p.zy), p.x);
+    if (abs(qx.x) > abs(qx.y)) qx = qx.zxy;
+
+    vec3 qy = vec3(rot45(p.xz), p.y);
+    if (abs(qy.x) > abs(qy.y)) qy = qy.zxy;
+
+    vec3 qz = vec3(rot45(p.yx), p.z);
+    if (abs(qz.x) > abs(qz.y)) qz = qz.zxy;
+
+    vec3 qa = abs(p);
+    qa = (qa.x > qa.y && qa.x > qa.z) ? p.zxy :
+         (qa.z > qa.y) ? p.yzx : p.xyz;
+
+    // Only 4 gear() evaluations needed instead of 18
+    d = min(d, gear(qa, 0.0));
+    d = min(d, gear(qx, 1.0));
+    d = min(d, gear(qy, 1.0));
+    d = min(d, gear(qz, 1.0));
+
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 5: PBR Material Rendering Pipeline
+
+**Difference from the basic version**: Replaces simplified Blinn-Phong with GGX microfacet BRDF, combined with a material ID system to assign different roughness/metalness to each primitive. A standard approach for PBR raymarching.
+
+**Key modified code**:
+
+```glsl
+// GGX/Trowbridge-Reitz NDF
+float D_GGX(float NoH, float roughness) {
+    float a = roughness * roughness;
+    float a2 = a * a;
+    float d = NoH * NoH * (a2 - 1.0) + 1.0;
+    return a2 / (3.14159 * d * d);
+}
+
+// Schlick Fresnel approximation
+vec3 F_Schlick(float VoH, vec3 f0) {
+    return f0 + (1.0 - f0) * pow(1.0 - VoH, 5.0);
+}
+
+// Replace lighting section in render()
+vec3 pbrLighting(vec3 pos, vec3 nor, vec3 rd, vec3 albedo, float roughness, float metallic) {
+    vec3 lig = normalize(vec3(-0.5, 0.4, -0.6));
+    vec3 hal = normalize(lig - rd);
+    vec3 f0 = mix(vec3(0.04), albedo, metallic);
+
+    float NoL = max(dot(nor, lig), 0.0);
+    float NoH = max(dot(nor, hal), 0.0);
+    float VoH = max(dot(-rd, hal), 0.0);
+
+    float D = D_GGX(NoH, roughness);
+    vec3 F = F_Schlick(VoH, f0);
+
+    vec3 spec = D * F * 0.25; // Simplified specular term
+    vec3 diff = albedo * (1.0 - metallic) / 3.14159;
+
+    float shadow = calcSoftshadow(pos, lig, 0.02, 2.5);
+    return (diff + spec) * NoL * shadow * vec3(1.3, 1.0, 0.7) * 3.0;
+}
+```
+
+## Performance Optimization in Detail
+
+### 1. Bounding Volume Acceleration
+
+Use an overall AABB or bounding sphere to constrain the search range. Perform analytical ray intersection first to narrow the `tmin`/`tmax` range, avoiding wasted steps in empty regions. A common optimization in advanced raymarching shaders.
+
+```glsl
+// Ray-AABB intersection (call before raycast)
+vec2 iBox(vec3 ro, vec3 rd, vec3 rad) {
+    vec3 m = 1.0 / rd;
+    vec3 n = m * ro;
+    vec3 k = abs(m) * rad;
+    vec3 t1 = -n - k;
+    vec3 t2 = -n + k;
+    return vec2(max(max(t1.x, t1.y), t1.z),
+                min(min(t2.x, t2.y), t2.z));
+}
+```
+
+### 2. Per-Object Bounding
+
+In `map()`, first check with a cheap sdBox whether the current point is near a primitive. Only compute the precise SDF when close. A standard per-object culling technique.
+
+```glsl
+// Inside map():
+if (sdBox(pos - objectCenter, boundingSize) < res.x) {
+    // Only compute precise SDF when bounding box distance is closer than current nearest
+    res = opU(res, vec2(sdComplexShape(pos), matID));
+}
+```
+
+### 3. Adaptive Step Size
+
+Allow larger precision tolerance at distance, stricter up close. Based on the `abs(h.x) < (0.0001 * t)` check found in nearly all advanced shaders.
+
+### 4. Preventing Compiler Inlining
+
+Complex `map()` functions get inlined 4 times inside `calcNormal`, causing compilation time to explode. Use a loop + `ZERO` macro to prevent inlining. A well-known technique to prevent excessive compiler inlining.
+
+```glsl
+#define ZERO (min(iFrame, 0)) // Compiler cannot prove this is 0 at compile time, so it won't unroll the loop
+```
+
+### 5. Symmetry Exploitation
+
+If the scene has rotational/mirror symmetry, fold the point into the fundamental domain and evaluate only once. Achieves significant speedup (e.g., 18-to-4 reduction) or infinite repetition.
+
+## Combination Suggestions in Detail
+
+### 1. SDF + Noise Displacement
+
+Add noise on top of the `map()` return value to add organic details to smooth surfaces (terrain, skin textures).
+
+```glsl
+float d = sdSphere(p, 1.0);
+d += 0.05 * (sin(p.x * 10.0) * sin(p.y * 10.0) * sin(p.z * 10.0)); // Simple displacement
+// Or use fbm noise: d += 0.1 * fbm(p * 4.0);
+```
+
+**Note**: Noise displacement breaks the SDF's Lipschitz condition (|grad f| <= 1). You need to multiply the step size by a safety factor (e.g., 0.5~0.7) to avoid penetration.
+
+### 2. SDF + Bump Mapping
+
+Instead of modifying the SDF itself, add detail perturbation only in the normal computation. Better performance than noise displacement since it doesn't affect ray marching. A common technique in SDF rendering.
+
+```glsl
+vec3 calcNormalBumped(vec3 pos) {
+    vec3 n = calcNormal(pos);
+    // Add high-frequency detail to the normal
+    n += 0.1 * vec3(fbm(pos.yz * 20.0) - 0.5, 0.0, fbm(pos.xy * 20.0) - 0.5);
+    return normalize(n);
+}
+```
+
+### 3. SDF + Domain Warping
+
+Warp spatial coordinates before entering `map()` to achieve bending, twisting, polar coordinate transforms, and other effects. A common spatial warping technique.
+
+```glsl
+// Cartesian to polar ring space: straight corridor becomes a ring structure
+vec2 displaceLoop(vec2 p, float r) {
+    return vec2(length(p) - r, atan(p.y, p.x));
+}
+```
+
+### 4. SDF + Procedural Animation
+
+Bone/joint angles vary with time, driving SDF primitive positions. `smin` ensures smooth transitions at joints. Common techniques for procedural character animation (squash & stretch, bone chain IK).
+
+```glsl
+// Squash and stretch deformation
+float p = 4.0 * t1 * (1.0 - t1); // Parabolic bounce
+float sy = 0.5 + 0.5 * p;        // Stretch in y direction
+float sz = 1.0 / sy;              // Compress in z direction (preserve volume)
+vec3 q = pos - center;
+float d = sdEllipsoid(q, vec3(0.25, 0.25 * sy, 0.25 * sz));
+```
+
+### 5. SDF + Motion Blur
+
+Average multiple frames sampled across the time dimension. A standard temporal supersampling technique.
+
+```glsl
+// Randomly offset time in mainImage
+float time = iTime;
+#if AA > 1
+    time += 0.5 * float(m * AA + n) / float(AA * AA) / 24.0; // Intra-frame time jitter
+#endif
+```
+
+## Extended SDF Primitives Reference
+
+### Rounded Box — `sdRoundBox(vec3 p, vec3 b, float r)`
+
+- `p`: sample point
+- `b`: half-size dimensions (before rounding)
+- `r`: rounding radius — edges and corners are rounded by this amount
+
+### Box Frame — `sdBoxFrame(vec3 p, vec3 b, float e)`
+
+- `p`: sample point
+- `b`: outer half-size dimensions
+- `e`: edge thickness — the wireframe thickness of the box edges
+
+### Cone — `sdCone(vec3 p, vec2 c, float h)`
+
+- `p`: sample point
+- `c`: vec2(sin, cos) of the cone's opening angle
+- `h`: height of the cone
+
+### Capped Cone — `sdCappedCone(vec3 p, float h, float r1, float r2)`
+
+- `p`: sample point
+- `h`: half-height
+- `r1`: bottom radius
+- `r2`: top radius
+
+### Round Cone — `sdRoundCone(vec3 p, float r1, float r2, float h)`
+
+- `p`: sample point
+- `r1`: bottom sphere radius
+- `r2`: top sphere radius
+- `h`: height between sphere centers
+
+### Solid Angle — `sdSolidAngle(vec3 p, vec2 c, float ra)`
+
+- `p`: sample point
+- `c`: vec2(sin, cos) of the solid angle
+- `ra`: radius
+
+### Octahedron — `sdOctahedron(vec3 p, float s)`
+
+- `p`: sample point
+- `s`: size (distance from center to vertex)
+
+### Pyramid — `sdPyramid(vec3 p, float h)`
+
+- `p`: sample point
+- `h`: height of the pyramid (base is a unit square centered at origin)
+
+### Hex Prism — `sdHexPrism(vec3 p, vec2 h)`
+
+- `p`: sample point
+- `h.x`: hexagonal radius (circumradius)
+- `h.y`: half-height along z axis
+
+### Cut Sphere — `sdCutSphere(vec3 p, float r, float h)`
+
+- `p`: sample point
+- `r`: sphere radius
+- `h`: cut plane height (cuts sphere at y=h)
+
+### Capped Torus — `sdCappedTorus(vec3 p, vec2 sc, float ra, float rb)`
+
+- `p`: sample point
+- `sc`: vec2(sin, cos) of the cap angle
+- `ra`: major radius
+- `rb`: tube radius
+
+### Link — `sdLink(vec3 p, float le, float r1, float r2)`
+
+- `p`: sample point
+- `le`: half-length of the elongation
+- `r1`: major radius of the torus cross-section
+- `r2`: tube radius
+
+### Plane (arbitrary) — `sdPlane(vec3 p, vec3 n, float h)`
+
+- `p`: sample point
+- `n`: plane normal (must be normalized)
+- `h`: offset from origin along the normal
+
+### Rhombus — `sdRhombus(vec3 p, float la, float lb, float h, float ra)`
+
+- `p`: sample point
+- `la`, `lb`: half-diagonals of the rhombus in XZ plane
+- `h`: half-height (extrusion in Y)
+- `ra`: rounding radius
+
+### Triangle (unsigned) — `udTriangle(vec3 p, vec3 a, vec3 b, vec3 c)`
+
+- `p`: sample point
+- `a`, `b`, `c`: triangle vertex positions
+- Returns unsigned (non-negative) distance
+
+## Deformation Operators Reference
+
+### Round — `opRound(float d, float r)`
+
+Softens edges of any SDF by subtracting a radius. Apply to the result of any SDF.
+
+```glsl
+// Round a box with radius 0.1
+float d = opRound(sdBox(p, vec3(1.0)), 0.1);
+```
+
+### Onion — `opOnion(float d, float t)`
+
+Hollows out any SDF into a shell of thickness `t`. Can be stacked for concentric shells.
+
+```glsl
+// Hollow sphere shell, 0.1 thick
+float d = opOnion(sdSphere(p, 1.0), 0.1);
+// Double shell
+float d = opOnion(opOnion(sdSphere(p, 1.0), 0.1), 0.05);
+```
+
+### Elongate — `opElongate(vec3 p, vec3 h, vec3 center, vec3 size)`
+
+Stretches a shape along one or more axes by `h`. The shape is stretched without distortion — it inserts a linear segment.
+
+```glsl
+// Elongate along Y to stretch a box
+vec3 q = abs(p) - vec3(0.0, 0.5, 0.0);
+float d = sdBox(max(q, 0.0), vec3(0.3)) + min(max(q.x, max(q.y, q.z)), 0.0);
+```
+
+### Twist — `opTwist(vec3 p, float k)`
+
+Rotates the XZ cross-section around the Y axis proportionally to height. Returns transformed coordinates to pass into any SDF.
+
+```glsl
+// Twisted box: k controls twist rate (radians per unit height)
+vec3 q = opTwist(p, 3.0);
+float d = sdBox(q, vec3(0.5));
+```
+
+### Cheap Bend — `opCheapBend(vec3 p, float k)`
+
+Bends geometry along the X axis. Returns transformed coordinates.
+
+```glsl
+// Bent box
+vec3 q = opCheapBend(p, 2.0);
+float d = sdBox(q, vec3(0.5, 0.3, 0.5));
+```
+
+### Displacement — `opDisplace(float d, vec3 p)`
+
+Adds procedural sinusoidal surface detail. Breaks Lipschitz bound, so reduce ray march step size by 0.5-0.7.
+
+```glsl
+float d = sdSphere(p, 1.0);
+d = opDisplace(d, p); // Adds bumpy surface detail
+```
+
+## 2D-to-3D Constructors Reference
+
+### Revolution — `opRevolution(vec3 p, float sdf2d_result, float o)`
+
+Creates a 3D solid of revolution by rotating a 2D SDF around the Y axis. Compute the 2D SDF at `vec2(length(p.xz) - o, p.y)` and pass the result.
+
+```glsl
+// Create a torus-like shape by revolving a 2D circle
+vec2 q = vec2(length(p.xz) - 1.0, p.y); // offset=1.0
+float d2d = length(q) - 0.3;             // 2D circle radius=0.3
+float d3d = opRevolution(p, d2d, 1.0);   // revolve around Y
+```
+
+### Extrusion — `opExtrusion(vec3 p, float d2d, float h)`
+
+Extends any 2D SDF along the Z axis with finite height `h`. The 2D SDF is evaluated in the XY plane and capped at `+/- h` along Z.
+
+```glsl
+// Extrude a 2D shape 0.2 units in both directions
+float d2d = sdCircle2D(p.xy, 0.5);      // any 2D SDF
+float d3d = opExtrusion(p, d2d, 0.2);    // finite extrusion
+```
+
+## Symmetry Operators Reference
+
+### Mirror X — `opSymX(vec3 p)`
+
+Mirrors across the X axis using `abs(p.x)`. Model only one half and get bilateral symmetry for free. Place at the start of `map()`.
+
+```glsl
+vec2 map(vec3 p) {
+    p = opSymX(p); // Mirror: only model x >= 0 side
+    float d = sdSphere(p - vec3(1.0, 0.5, 0.0), 0.3);
+    // Automatically appears at both x=+1 and x=-1
+    return vec2(d, 1.0);
+}
+```
+
+### Mirror XZ — `opSymXZ(vec3 p)`
+
+Four-fold symmetry across both X and Z axes. Model one quadrant, get four copies.
+
+```glsl
+vec2 map(vec3 p) {
+    p = opSymXZ(p); // Four-fold symmetry
+    float d = sdBox(p - vec3(2.0, 0.5, 2.0), vec3(0.3));
+    // Appears in all four quadrants
+    return vec2(d, 1.0);
+}
+```
+
+### Arbitrary Mirror — `opMirror(vec3 p, vec3 dir)`
+
+Mirrors across an arbitrary plane defined by its normal `dir` (must be normalized). Reflects any point on the negative side to the positive side.
+
+```glsl
+// Mirror across a 45-degree plane
+vec3 q = opMirror(p, normalize(vec3(1.0, 0.0, 1.0)));
+float d = sdSphere(q - vec3(1.0, 0.5, 0.0), 0.3);
+```
--- a/skills/shader-dev/reference/sdf-tricks.md
+++ b/skills/shader-dev/reference/sdf-tricks.md
@@ -0,0 +1,63 @@
+# SDF Tricks Detailed Reference
+
+## Prerequisites
+- Understanding of signed distance fields and ray marching
+- Basic SDF primitives and boolean operations
+- FBM / procedural noise fundamentals
+
+## Lipschitz Condition and FBM Detail
+
+An SDF must satisfy the **Lipschitz condition**: `|f(a) - f(b)| ≤ |a - b|` (gradient magnitude ≤ 1). This guarantees that stepping by the SDF value is always safe — no surface exists within that radius.
+
+When adding FBM noise to an SDF, the noise derivatives can violate Lipschitz:
+- Raw noise amplitude of 0.1 with frequency 20 has gradient ~2.0, breaking the condition
+- This causes ray marching to overshoot, creating holes and artifacts
+
+**Solutions**:
+1. **Amplitude limiting**: Keep `amplitude × frequency < 1.0` across all octaves
+2. **Distance fade**: `d += amp * fbm(p * freq) * smoothstep(fadeStart, 0.0, d)` — detail only appears near the surface where overshoot distance is small
+3. **Step size reduction**: Multiply ray step by 0.5-0.7, trading speed for stability
+
+## Bounding Volume Strategies
+
+### Hierarchical Bounding
+For scenes with N objects, test bounding volumes in order of increasing cost:
+```
+Level 1: Scene bounding sphere (1 evaluation)
+Level 2: Object group bounds (few evaluations)
+Level 3: Individual object SDF (full cost)
+```
+
+### Spatial Partitioning
+For repeating structures, combine domain repetition with bounds:
+```glsl
+float map(vec3 p) {
+    vec3 q = mod(p + 2.0, 4.0) - 2.0;  // repeat every 4 units
+    // Only evaluate detail if within local bounding sphere
+    float bound = length(q) - 1.5;
+    if (bound > 0.2) return bound;
+    return detailedSDF(q);
+}
+```
+
+## Binary Search Convergence
+
+After N iterations of binary search, the position error is `initialStep / 2^N`:
+- 4 iterations: 1/16 of initial step size
+- 6 iterations: 1/64 of initial step size (sub-pixel at typical resolutions)
+- 8 iterations: 1/256 (overkill for most uses)
+
+6 iterations is the practical sweet spot — gives sub-pixel precision without wasting GPU cycles.
+
+## XOR Operation Mathematics
+
+`opXor(a, b) = max(min(a, b), -max(a, b))`
+
+This is equivalent to: `union(a, b) AND NOT intersection(a, b)` — the symmetric difference. Geometry exists where exactly one shape is present but not both. Useful for creating lattice structures and interlocking patterns.
+
+## Interior SDF Pattern Techniques
+
+When the camera is inside an SDF (d < 0), the negative distance still gives useful information:
+- `abs(d)` gives distance to nearest surface from inside
+- Combine with repeating patterns using `fract()` to create infinite interior structures
+- Use `max(outerSDF, innerSDF)` to confine interior patterns within the outer shell
--- a/skills/shader-dev/reference/shadow-techniques.md
+++ b/skills/shader-dev/reference/shadow-techniques.md
@@ -0,0 +1,476 @@
+# SDF Soft Shadow Techniques - Detailed Reference
+
+This document is a complete supplement to [SKILL.md](SKILL.md), covering prerequisite knowledge, step-by-step detailed explanations, mathematical derivations, variant descriptions, and full code examples for combinations.
+
+## Use Cases
+
+- **Shadow computation in SDF raymarching scenes**: When using signed distance fields (SDF) for ray marching rendering and you need to add soft shadow effects to the scene
+- **Real-time soft shadow / penumbra effects**: Simulating the penumbra gradient produced by real light source area, rather than simple hard shadow binary results
+- **Terrain / heightfield shadows**: Shadow computation for procedural terrain and height maps
+- **Multi-layer shadow compositing**: Combining ground shadows, vegetation shadows, cloud shadows, and other shadow sources into a final result
+- **Volumetric light / God Ray effects**: Reusing the shadow function to sample along the view ray to generate volumetric light scattering effects
+- **Analytical shadows**: Using O(1) analytical shadows for simple geometry like spheres instead of ray marching
+
+## Prerequisites
+
+- **GLSL fundamentals**: uniforms, varyings, built-in functions (`clamp`, `mix`, `smoothstep`, `normalize`, `dot`, `reflect`)
+- **Raymarching**: Understanding SDF scene representation and the basic sphere tracing workflow
+- **SDF basics**: Understanding signed distance fields — `map(p)` returns the distance from point p to the nearest surface
+- **Basic lighting models**: Diffuse (N·L), specular (Blinn-Phong), ambient light
+- **Vector math**: Dot product, cross product, vector normalization, ray parametric equation `ro + rd * t`
+
+## Core Principles in Detail
+
+The core idea of SDF soft shadows is: **march from a surface point toward the light source, using the ratio of "nearest distance to march distance" to estimate penumbra width**.
+
+### Classic Formula (2013)
+
+```
+shadow = min(shadow, k * h / t)
+```
+
+Where:
+- `h` = SDF value at the current march position (distance to nearest surface)
+- `t` = distance already traveled along the shadow ray
+- `k` = constant controlling penumbra softness (larger = harder, smaller = softer)
+
+**Geometric intuition**: The ratio `h/t` approximates "the angular width of the nearest occluder as seen from the current point on the shadow ray." When the ray grazes an object's surface, `h` is small while `t` is large, making `h/t` small and producing a penumbra region; when the ray is far from all objects, `h/t` is large and the area is fully lit.
+
+Taking the minimum `min(res, k*h/t)` across all sample points along the ray yields "the darkest point," which is the final shadow factor.
+
+### Improved Formula (2018)
+
+The classic formula produces overly dark artifacts near sharp edges. The improved version uses SDF values from adjacent steps to perform geometric triangulation, estimating a more accurate nearest point:
+
+```
+y = h² / (2 * ph)           // ph = SDF value from previous step
+d = sqrt(h² - y²)           // true nearest distance perpendicular to ray direction
+shadow = min(shadow, d / (w * max(0, t - y)))
+```
+
+**Mathematical derivation**: Assume the previous step at ray position `t-h_step` had SDF value `ph`, and the current step at position `t` has SDF value `h`. The intersection region of these two SDF spheres (with radii `ph` and `h` respectively) provides a more accurate estimate of the nearest surface point. Through simple triangle geometry:
+- `y` is the distance to step back along the ray from the current sample point to the nearest point projection
+- `d` is the perpendicular distance from the nearest surface point to the ray
+- The corrected effective distance is `t - y` rather than `t`
+
+### Negative Extension (2020)
+
+Allows `res` to drop to negative values (minimum -1), then remaps to [0,1] with a custom smooth mapping:
+
+```
+res = max(res, -1.0)
+shadow = 0.25 * (1 + res)² * (2 - res)
+```
+
+This eliminates the hard crease produced by the classic `clamp(0,1)`, achieving a smoother penumbra transition.
+
+**Why it works**: The classic method produces a C0 continuous (non-smooth) crease at `res=0` due to clamping. By allowing `res` to enter the negative domain [-1, 0], then remapping with the C1 continuous function `0.25*(1+res)²*(2-res)`, a completely smooth penumbra gradient is obtained. This function evaluates to 0 at `res=-1` and 1 at `res=1`, with smooth derivative transitions at both ends.
+
+## Implementation Steps in Detail
+
+### Step 1: Scene SDF Definition
+
+**What**: Define the scene's signed distance function, returning the distance from any point in space to the nearest surface.
+
+**Why**: Shadow ray marching needs `map(p)` queries to determine step size and penumbra estimation.
+
+```glsl
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+float sdPlane(vec3 p) {
+    return p.y;
+}
+
+float sdRoundBox(vec3 p, vec3 b, float r) {
+    vec3 q = abs(p) - b;
+    return length(max(q, 0.0)) + min(max(q.x, max(q.y, q.z)), 0.0) - r;
+}
+
+float map(vec3 p) {
+    float d = sdPlane(p);
+    d = min(d, sdSphere(p - vec3(0.0, 0.5, 0.0), 0.5));
+    d = min(d, sdRoundBox(p - vec3(-1.2, 0.3, 0.5), vec3(0.3), 0.05));
+    return d;
+}
+```
+
+### Step 2: Classic Soft Shadow Function
+
+**What**: March from a surface point toward the light source, progressively accumulating the minimum `k*h/t` ratio as the shadow factor.
+
+**Why**: This is the foundational framework for all SDF soft shadows. At each step, `h/t` approximates the angular width of occlusion at that point; the minimum across the entire ray serves as the final penumbra estimate. The k value controls penumbra softness.
+
+```glsl
+// Classic SDF soft shadow
+// ro: shadow ray origin (surface position)
+// rd: light direction (normalized)
+// mint: starting offset (to avoid self-shadowing)
+// tmax: maximum march distance
+float calcSoftShadow(vec3 ro, vec3 rd, float mint, float tmax) {
+    float res = 1.0;
+    float t = mint;
+
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        float s = clamp(SHADOW_K * h / t, 0.0, 1.0);
+        res = min(res, s);
+        t += clamp(h, MIN_STEP, MAX_STEP);    // Step size clamping
+        if (res < 0.004 || t > tmax) break;    // Early exit
+    }
+
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);      // Smoothstep smoothing
+}
+```
+
+### Step 3: Improved Soft Shadow (Geometric Triangulation)
+
+**What**: Use SDF values from the current and previous steps to estimate a more accurate nearest point position via geometric triangulation, eliminating penumbra artifacts near sharp edges.
+
+**Why**: The classic `h/t` formula assumes the nearest surface point is directly below the current sample position, but the actual nearest point may lie between two steps. Using the intersection relationship of SDF spheres from two adjacent steps provides a more accurate estimate of perpendicular distance `d` and corrected depth `t-y` along the ray.
+
+```glsl
+// Improved SDF soft shadow
+float calcSoftShadowImproved(vec3 ro, vec3 rd, float mint, float tmax, float w) {
+    float res = 1.0;
+    float t = mint;
+    float ph = 1e10;  // Previous step SDF value, initialized large so first step y≈0
+
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+
+        // Geometric triangulation: estimate corrected nearest distance
+        float y = h * h / (2.0 * ph);         // Step-back distance along ray
+        float d = sqrt(h * h - y * y);         // True nearest distance perpendicular to ray
+        res = min(res, d / (w * max(0.0, t - y)));
+
+        ph = h;                                // Save current h for next step
+        t += h;
+
+        if (res < 0.0001 || t > tmax) break;
+    }
+
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);
+}
+```
+
+### Step 4: Negative Extension Version (Smoothest Penumbra)
+
+**What**: Allow the shadow factor to drop into the negative range [-1, 0], then remap to [0, 1] with a custom quadratic smooth function, eliminating hard creases.
+
+**Why**: The classic method produces a C0 continuous (non-smooth) crease at `clamp(0,1)`. By allowing `res` to enter the negative domain and remapping with the C1 continuous function `0.25*(1+res)²*(2-res)`, a completely smooth penumbra gradient is achieved.
+
+```glsl
+// Negative extension soft shadow
+float calcSoftShadowSmooth(vec3 ro, vec3 rd, float mint, float tmax, float w) {
+    float res = 1.0;
+    float t = mint;
+
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        res = min(res, h / (w * t));
+        t += clamp(h, MIN_STEP, MAX_STEP);
+        if (res < -1.0 || t > tmax) break;    // Allow res to drop to -1
+    }
+
+    res = max(res, -1.0);                      // Clamp to [-1, 1]
+    return 0.25 * (1.0 + res) * (1.0 + res) * (2.0 - res);  // Smooth remapping
+}
+```
+
+### Step 5: Bounding Volume Optimization
+
+**What**: Before starting the march, use simple geometric tests (plane clipping or AABB ray intersection) to narrow the shadow ray's effective range.
+
+**Why**: If the shadow ray cannot possibly hit any object outside a bounded region (e.g., above the scene is empty), `tmax` can be shortened early or 1.0 returned immediately, saving many march iterations.
+
+```glsl
+// Method A: Plane clipping — clip ray to scene upper bound plane
+float tp = (SCENE_Y_MAX - ro.y) / rd.y;
+if (tp > 0.0) tmax = min(tmax, tp);
+
+// Method B: AABB bounding box clipping
+vec2 iBox(vec3 ro, vec3 rd, vec3 rad) {
+    vec3 m = 1.0 / rd;
+    vec3 n = m * ro;
+    vec3 k = abs(m) * rad;
+    vec3 t1 = -n - k;
+    vec3 t2 = -n + k;
+    float tN = max(max(t1.x, t1.y), t1.z);
+    float tF = min(min(t2.x, t2.y), t2.z);
+    if (tN > tF || tF < 0.0) return vec2(-1.0);
+    return vec2(tN, tF);
+}
+
+// Usage in shadow function
+vec2 dis = iBox(ro, rd, BOUND_SIZE);
+if (dis.y < 0.0) return 1.0;       // Ray completely misses bounding box
+tmin = max(tmin, dis.x);
+tmax = min(tmax, dis.y);
+```
+
+### Step 6: Shadow Color Rendering (Color Bleeding)
+
+**What**: Instead of using a uniform scalar shadow value, apply different shadow attenuation curves to the RGB channels.
+
+**Why**: In the real world, penumbra regions exhibit a warm color shift due to subsurface scattering and atmospheric effects — red light penetrates the most while blue light is blocked first. By applying per-channel power operations on the shadow value, this physical phenomenon can be approximated at low cost.
+
+```glsl
+// Method A: Classic color shadow
+// sha is a [0,1] shadow factor
+vec3 shadowColor = vec3(sha, sha * sha * 0.5 + 0.5 * sha, sha * sha);
+// R = sha (linear), G = softer quadratic blend, B = sha² (darkest)
+
+// Method B: Per-channel power operation (Woods style)
+vec3 shadowColor = pow(vec3(sha), vec3(1.0, 1.2, 1.5));
+// R = sha^1.0, G = sha^1.2, B = sha^1.5 → penumbra region shifts warm
+```
+
+### Step 7: Integration into the Lighting Model
+
+**What**: Multiply the shadow value into the diffuse and specular lighting contributions.
+
+**Why**: Shadows are essentially an estimate of "light source visibility" and should act as a multiplicative factor on all lighting terms that depend on that light source. Shadows are typically only computed when N·L > 0 (surface faces the light) to avoid wasting GPU cycles on backlit faces.
+
+```glsl
+// Lighting integration
+vec3 sunDir = normalize(vec3(-0.5, 0.4, -0.6));
+vec3 hal = normalize(sunDir - rd);
+
+// Diffuse × shadow
+float dif = clamp(dot(nor, sunDir), 0.0, 1.0);
+if (dif > 0.0001)
+    dif *= calcSoftShadow(pos + nor * 0.01, sunDir, 0.02, 8.0);
+
+// Specular is also modulated by shadow
+float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), 16.0);
+spe *= dif;  // dif already includes shadow
+
+// Final color compositing
+vec3 col = vec3(0.0);
+col += albedo * 2.0 * dif * vec3(1.0, 0.9, 0.8);       // Sun diffuse
+col += 5.0 * spe * vec3(1.0, 0.9, 0.8);                 // Sun specular
+col += albedo * 0.5 * clamp(0.5 + 0.5 * nor.y, 0.0, 1.0)
+     * vec3(0.4, 0.6, 1.0);                              // Sky ambient (no shadow)
+```
+
+## Variant Details
+
+### Variant 1: Analytical Sphere Shadow
+
+**Difference from base version**: Does not use ray marching; instead performs an O(1) analytical closest-distance computation for spheres. Suitable for scenes containing only spheres or objects that can be approximated by spheres.
+
+**Principle**: For a ray and a sphere, the closest distance from the ray to the sphere surface and the parameter `t` at that closest point along the ray can be computed analytically. These two values directly form the `d/t` ratio without iterative marching.
+
+```glsl
+// Sphere analytical soft shadow
+vec2 sphDistances(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    float d = sqrt(max(0.0, sph.w * sph.w - h)) - sph.w;
+    return vec2(d, -b - sqrt(max(h, 0.0)));
+}
+
+float sphSoftShadow(vec3 ro, vec3 rd, vec4 sph, float k) {
+    vec2 r = sphDistances(ro, rd, sph);
+    if (r.y > 0.0)
+        return clamp(k * max(r.x, 0.0) / r.y, 0.0, 1.0);
+    return 1.0;
+}
+// Multi-sphere aggregation: res = min(res, sphSoftShadow(ro, rd, sphere[i], k))
+```
+
+### Variant 2: Terrain Heightfield Shadow
+
+**Difference from base version**: `h` is not obtained from a generic SDF `map()`, but computed as `p.y - terrain(p.xz)`, the height difference between the ray and the terrain. Step size adapts to camera distance.
+
+**Use cases**: Procedural terrain rendering (using FBM noise-generated height maps). Terrain SDF is difficult to define precisely, but height difference serves as an approximate distance estimate.
+
+```glsl
+float terrainShadow(vec3 ro, vec3 rd, float dis) {
+    float minStep = clamp(dis * 0.01, 0.5, 50.0);  // Distance-adaptive minimum step
+    float res = 1.0;
+    float t = 0.01;
+    for (int i = 0; i < 80; i++) {                  // Terrain needs more iterations
+        vec3 p = ro + t * rd;
+        float h = p.y - terrainMap(p.xz);           // Height difference replaces SDF
+        res = min(res, 16.0 * h / t);               // k=16
+        t += max(minStep, h);
+        if (res < 0.001 || p.y > MAX_TERRAIN_HEIGHT) break;
+    }
+    return clamp(res, 0.0, 1.0);
+}
+```
+
+### Variant 3: Per-Material Hard/Soft Blend
+
+**Difference from base version**: Uses a global variable or extra parameter to control each object's shadow hardness, blending via `mix(1.0, k*h/t, hardness)`. When `hardness=0`, it produces hard shadows; when `hardness=1`, fully soft shadows.
+
+**Use cases**: Characters need sharp hard shadows (to enhance silhouette), while environment objects use softer shadows.
+
+```glsl
+float hsha = 1.0;  // Global variable, set per material in map()
+
+float mapWithShadowHardness(vec3 p) {
+    float d = sdPlane(p);
+    hsha = 1.0;  // Ground: fully soft shadow
+    float dChar = sdCharacter(p);
+    if (dChar < d) { d = dChar; hsha = 0.0; }  // Character: hard shadow
+    return d;
+}
+
+// Inside shadow loop:
+res = min(res, mix(1.0, SHADOW_K * h / t, hsha));
+```
+
+### Variant 4: Multi-Layer Shadow Composition
+
+**Difference from base version**: Different types of occlusion sources are computed separately, then composed multiplicatively. Typical scenario: ground shadow × vegetation shadow × cloud shadow.
+
+**Design rationale**: Different shadow sources have very different characteristics — terrain shadows need high-precision marching, vegetation shadows can use probability/density field approximation, cloud shadows are large-scale planar projections. Layered computation allows using the optimal algorithm for each type.
+
+```glsl
+// Layered computation
+float sha_terrain = terrainShadow(pos, sunDir, 0.02);
+float sha_trees   = treesShadow(pos, sunDir);
+float sha_clouds  = cloudShadow(pos, sunDir);  // Single planar projection + FBM sample
+
+// Multiplicative composition
+float sha = sha_terrain * sha_trees;
+sha *= smoothstep(-0.3, -0.1, sha_clouds);  // Cloud shadow softened with smoothstep
+
+// Apply to lighting
+dif *= sha;
+```
+
+### Variant 5: Volumetric Light / God Ray Reusing Shadow Function
+
+**Difference from base version**: Marches uniformly along the view ray direction, calling the shadow function toward the light at each step, accumulating light energy. Essentially a secondary sampling of the shadow function to produce volumetric scattering effects.
+
+**Principle**: Volumetric light effects come from the scattering of light by airborne particles. At each point along the view ray, if that point is illuminated by the sun (high shadow value), it contributes some scattered light to the final color. Summing the lighting contributions from all sample points along the view ray produces the volumetric light effect.
+
+```glsl
+// Volumetric light (God Rays)
+float godRays(vec3 ro, vec3 rd, float tmax, vec3 sunDir) {
+    float v = 0.0;
+    float dt = 0.15;                                 // View ray step size
+    float t = dt * fract(texelFetch(iChannel0, ivec2(fragCoord) & 255, 0).x); // Jittering
+    for (int i = 0; i < 32; i++) {                   // Number of samples
+        if (t > tmax) break;
+        vec3 p = ro + rd * t;
+        float sha = calcSoftShadow(p, sunDir, 0.02, 8.0); // Reuse shadow function
+        v += sha * exp(-0.2 * t);                    // Exponential distance falloff
+        t += dt;
+    }
+    v /= 32.0;
+    return v * v;                                    // Square to enhance contrast
+}
+// Usage: col += godRayIntensity * godRays(...) * vec3(1.0, 0.75, 0.4);
+```
+
+## Performance Optimization Details
+
+### Bottleneck Analysis
+
+The main cost of SDF soft shadows is the **shadow ray marching per pixel**, which involves multiple `map()` calls. For complex scenes, a single `map()` call may contain dozens of SDF combination operations.
+
+### Optimization Techniques
+
+#### 1. Bounding Volume Culling (Most Significant)
+
+- Plane clipping: `tmax = min(tmax, (yMax - ro.y) / rd.y)` restricts the ray within the scene height range
+- AABB clipping: Use `iBox()` to restrict `tmin`/`tmax` within the bounding box; return 1.0 immediately when the ray completely misses
+- Can reduce 30-70% of wasted iterations
+
+#### 2. Step Size Clamping
+
+- `t += clamp(h, minStep, maxStep)` prevents extremely small steps (getting stuck near surface) and extremely large steps (skipping thin objects)
+- Typical `minStep` values: 0.005~0.05, `maxStep`: 0.2~0.5
+- Distance-adaptive: `minStep = clamp(dis * 0.01, 0.5, 50.0)` uses larger steps for distant shadows
+
+#### 3. Early Exit
+
+- Classic version: `res < 0.004` is already dark enough, no need to continue
+- Negative extension: `res < -1.0` is saturated
+- Height upper bound: `pos.y > yMax` means the ray has left the scene
+
+#### 4. Reduced Shadow SDF Precision
+
+- Use a simplified `map2()` that omits material computation and only returns distance
+- For terrain scenes, use a low-resolution `terrainM()` (fewer FBM octaves) instead of full-precision `terrainH()`
+
+#### 5. Conditional Computation
+
+- `if (dif > 0.0001) dif *= shadow(...)` only computes shadow when facing the light
+- Backlit faces are directly 0, no shadow needed
+
+#### 6. Iteration Count Adjustment
+
+- Simple scenes (a few primitives): 16~32 iterations suffice
+- Complex FBM surfaces: Need 64~128 iterations
+- Terrain scenes: With distance-adaptive step sizes, around 80 iterations
+
+#### 7. Loop Unrolling Control
+
+- `#define ZERO (min(iFrame,0))` prevents the compiler from unrolling loops at compile time, reducing instruction cache pressure
+
+## Combination Suggestions with Full Code
+
+### With Ambient Occlusion (AO)
+
+Shadows handle direct light occlusion; AO handles indirect light occlusion. They complement each other:
+
+```glsl
+float sha = calcSoftShadow(pos, sunDir, 0.02, 8.0);
+float occ = calcAO(pos, nor);
+col += albedo * dif * sha * sunColor;       // Direct light × shadow
+col += albedo * sky * occ * skyColor;       // Ambient light × AO
+```
+
+### With Subsurface Scattering (SSS)
+
+Shadow values can modulate SSS intensity, simulating the translucent light-through effect at shadow edges:
+
+```glsl
+float sss = pow(clamp(dot(rd, sunDir), 0.0, 1.0), 4.0);
+sss *= 0.25 + 0.75 * sha;  // SSS reduced but not eliminated in shadow
+col += albedo * sss * vec3(1.0, 0.4, 0.2);
+```
+
+### With Fog / Atmospheric Scattering
+
+Shadows should be "washed out" by fog at distance. The common approach is to complete shadow lighting before applying fog, which naturally blends:
+
+```glsl
+// First complete lighting with shadows
+vec3 col = albedo * lighting_with_shadow;
+// Then apply fog (distance fog naturally weakens shadow contrast)
+col = mix(col, fogColor, 1.0 - exp(-0.001 * t * t));
+```
+
+### With Normal Maps / Bump Mapping
+
+Shadows use the geometric normal (not the perturbed normal) to compute N·L for determining light-facing, but shadow rays are still cast from the actual surface point. Normal maps only affect lighting calculations, not shadows:
+
+```glsl
+vec3 geoNor = calcNormal(pos);              // Geometric normal
+vec3 nor = perturbNormal(geoNor, ...);      // Perturbed normal
+float dif = clamp(dot(nor, sunDir), 0.0, 1.0);  // Use perturbed normal for diffuse
+if (dot(geoNor, sunDir) > 0.0)                    // Use geometric normal to decide shadow
+    dif *= calcSoftShadow(pos + geoNor * 0.01, sunDir, 0.02, 8.0);
+```
+
+### With Reflections
+
+The shadow function can be reused for the reflection direction, occluding specular highlights that should not be visible:
+
+```glsl
+vec3 ref = reflect(rd, nor);
+float refSha = calcSoftShadow(pos + nor * 0.01, ref, 0.02, 8.0);
+col += specular * envColor * refSha * occ;
+```
--- a/skills/shader-dev/reference/simulation-physics.md
+++ b/skills/shader-dev/reference/simulation-physics.md
@@ -0,0 +1,644 @@
+# GPU Physics Simulation — Detailed Reference
+
+This document is the complete reference material for [SKILL.md](SKILL.md), containing step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Basics**: uniforms, texture sampling (`texture`/`texelFetch`), `fragCoord`/`iResolution` coordinate system
+- **ShaderToy Multi-Pass Mechanism**: Buffer A/B/C/D read/write between each other, `iChannel0~3` binding, Common pass for shared code
+- **Vector Calculus Basics**: gradient, divergence, curl, Laplacian
+- **Numerical Integration**: Forward Euler, semi-implicit methods (Semi-implicit / Verlet)
+- **Textures as Data Storage**: Encoding physical quantities such as position/velocity/density into RGBA channels of texture pixels
+
+## Core Principles in Detail
+
+The core paradigm of GPU physics simulation is **Buffer Feedback**: leveraging ShaderToy's multi-pass architecture to store physical state (position, velocity, density, pressure, etc.) in texture buffers. Each frame reads the previous frame's state, computes new state, and writes it back. Each pixel computes independently in parallel, achieving GPU-level massively parallel physics solving.
+
+### Key Mathematical Tools in Detail
+
+**1. Discrete Laplacian Operator** (used for wave equation, viscous force, diffusion):
+```
+∇²f ≈ f(x+1,y) + f(x-1,y) + f(x,y+1) + f(x,y-1) - 4·f(x,y)
+```
+The Laplacian measures the difference between a point's value and the average of its neighbors. In the wave equation, it drives wave propagation; in fluid simulation, it provides viscous force (velocity diffusion); in the heat equation, it drives temperature equalization.
+
+**2. Semi-Lagrangian Advection** (used for fluid solving):
+```
+f_new(x) = f_old(x - v·dt)    // backward tracing along the velocity field
+```
+Advection is the most critical step in fluid simulation. The semi-Lagrangian method achieves unconditionally stable advection through "backward tracing" — starting from the target position, tracing backward along the velocity field to find the source position, then sampling the value at the source. This avoids the CFL condition limitation of forward Euler advection.
+
+**3. Spring-Damper Force** (used for cloth, soft bodies):
+```
+F_spring = k · (|Δx| - L₀) · normalize(Δx)
+F_damper = c · dot(normalize(Δx), Δv) · normalize(Δx)
+```
+Spring force pulls two mass points back to the rest length L₀; stiffness k determines the restoring force strength. Damper force attenuates relative velocity along the connection direction; coefficient c determines the energy dissipation rate. Combined, they produce stable elastic motion.
+
+**4. Vorticity Confinement** (used for preserving fluid detail):
+```
+curl = ∂v_x/∂y - ∂v_y/∂x
+vorticity_force = ε · (∇|curl| × curl) / |∇|curl||
+```
+Numerical viscosity over-smooths small-scale vortices. Vorticity confinement compensates for this artificial dissipation by applying an additional force in high-vorticity regions, pushing small vortices into more concentrated rotational structures and preserving the visual richness of the fluid.
+
+## Implementation Steps in Detail
+
+### Step 1: Ping-Pong Double Buffer Structure
+
+**What**: Create two Buffers (A and B) that alternate read/write to achieve state persistence.
+
+**Why**: GPU shaders cannot simultaneously read and write the same buffer. The ping-pong strategy reads from one buffer (previous frame's data) and writes to the other each frame, then swaps on the next frame.
+
+**IMPORTANT: Key Difference Between ShaderToy and WebGL2**: In ShaderToy, Buffer A/B are two independent passes with separate write targets, so `iChannel0=self, iChannel1=other` doesn't conflict. However, in WebGL2 there's only one shader program doing ping-pong, and the write target texture cannot be simultaneously read. The solution is **dual-channel encoding** (R=current height, G=previous frame height).
+
+**Code** (WebGL2-safe version, reads only from iChannel0, with RGBA8-compatible encoding):
+```glsl
+// IMPORTANT: Only use iChannel0 (read currentBuf), write to nextBuf (must be different!)
+// IMPORTANT: encode/decode ensure signed values aren't clipped on RGBA8 (no float textures/SwiftShader)
+uniform int useFloatTex;
+float decode(float v) { return useFloatTex == 1 ? v : v * 2.0 - 1.0; }
+float encode(float v) { return useFloatTex == 1 ? v : v * 0.5 + 0.5; }
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 texel = 1.0 / iResolution.xy;
+
+    float current = decode(texture(iChannel0, uv).x);
+    float previous = decode(texture(iChannel0, uv).y);
+
+    float left  = decode(texture(iChannel0, uv - vec2(texel.x, 0.0)).x);
+    float right = decode(texture(iChannel0, uv + vec2(texel.x, 0.0)).x);
+    float down  = decode(texture(iChannel0, uv - vec2(0.0, texel.y)).x);
+    float up    = decode(texture(iChannel0, uv + vec2(0.0, texel.y)).x);
+
+    float laplacian = left + right + down + up - 4.0 * current;
+    float next = 2.0 * current - previous + 0.25 * laplacian;
+
+    next *= 0.995; // damping decay
+    next *= min(1.0, float(iFrame)); // zero on frame 0
+
+    fragColor = vec4(encode(next), encode(current), 0.0, 0.0);
+}
+```
+
+### Step 2: Interaction-Driven (External Force Injection)
+
+**What**: Inject energy into the simulation through mouse clicks or programmatic generation.
+
+**Why**: Physics simulations need external excitation to start and sustain. Mouse interaction is the most intuitive driving method; programmatic methods can simulate raindrops, explosions, etc.
+
+**Code** (insert before wave equation computation):
+```glsl
+float d = 0.0;
+
+if (iMouse.z > 0.0)
+{
+    // Mouse click: create ripple at mouse position
+    d = smoothstep(4.5, 0.5, length(iMouse.xy - fragCoord));
+}
+else
+{
+    // Programmatic raindrop: pseudo-random position + impulse
+    float t = iTime * 2.0;
+    vec2 pos = fract(floor(t) * vec2(0.456665, 0.708618)) * iResolution.xy;
+    float amp = 1.0 - step(0.05, fract(t));
+    d = -amp * smoothstep(2.5, 0.5, length(pos - fragCoord));
+}
+```
+
+### Step 3: Rendering Layer (Height Field Visualization)
+
+**What**: Read simulation results in the Image Pass, compute normals via gradient calculation, and render lighting effects.
+
+**Why**: The simulation result is a height field texture that needs to be transformed into a visible surface effect. Computing gradients via finite differences as normals enables refraction, diffuse reflection, specular highlights, and other water surface effects.
+
+**Code** (Image Pass):
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    vec2 uv = fragCoord / iResolution.xy;
+    vec3 e = vec3(vec2(1.0) / iResolution.xy, 0.0);
+
+    // Read four-neighbor height values from Buffer A
+    float left  = texture(iChannel0, uv - e.xz).x;
+    float right = texture(iChannel0, uv + e.xz).x;
+    float down  = texture(iChannel0, uv - e.zy).x;
+    float up    = texture(iChannel0, uv + e.zy).x;
+
+    // Construct normal from gradient
+    vec3 normal = normalize(vec3(right - left, up - down, 1.0));
+
+    // Lighting computation
+    vec3 light = normalize(vec3(0.2, -0.5, 0.7));
+    float diffuse = max(dot(normal, light), 0.0);
+    float spec = pow(max(-reflect(light, normal).z, 0.0), 32.0);
+
+    // Refraction-offset background texture sampling
+    vec4 bg = texture(iChannel1, uv + normal.xy * 0.35);
+    vec3 waterTint = vec3(0.7, 0.8, 1.0);
+
+    fragColor = mix(bg, vec4(waterTint, 1.0), 0.25) * diffuse + spec;
+}
+```
+
+### Step 4: Chained Multi-Buffer Iteration (Improving Accuracy)
+
+**What**: Chain multiple Buffers together to execute the same solver multiple times per frame.
+
+**Why**: Many physics solvers (fluid pressure projection, constraint solving) require multiple iterations to converge. In ShaderToy, you can chain Buffer A → B → C to execute the same code, equivalent to 3 iterations per frame. This is critical for Eulerian fluid (pressure-divergence elimination) and rigid bodies (impulse constraint solving).
+
+**Full Euler fluid solver code** (Buffer A/B/C share Common pass):
+```glsl
+// === Common Pass ===
+#define dt 0.15                        // adjustable: time step
+#define viscosityThreshold 0.64        // adjustable: viscosity coefficient (larger = thinner)
+#define vorticityThreshold 0.25        // adjustable: vorticity confinement strength
+
+vec4 fluidSolver(sampler2D field, vec2 uv, vec2 step,
+                 vec4 mouse, vec4 prevMouse)
+{
+    float k = 0.2, s = k / dt;
+
+    // Sample center and four neighbors
+    vec4 c  = textureLod(field, uv, 0.0);
+    vec4 fr = textureLod(field, uv + vec2(step.x, 0.0), 0.0);
+    vec4 fl = textureLod(field, uv - vec2(step.x, 0.0), 0.0);
+    vec4 ft = textureLod(field, uv + vec2(0.0, step.y), 0.0);
+    vec4 fd = textureLod(field, uv - vec2(0.0, step.y), 0.0);
+
+    // Divergence and density gradient
+    vec3 ddx = (fr - fl).xyz * 0.5;
+    vec3 ddy = (ft - fd).xyz * 0.5;
+    float divergence = ddx.x + ddy.y;
+    vec2 densityDiff = vec2(ddx.z, ddy.z);
+
+    // Density solve
+    c.z -= dt * dot(vec3(densityDiff, divergence), c.xyz);
+
+    // Viscous force (Laplacian)
+    vec2 laplacian = fr.xy + fl.xy + ft.xy + fd.xy - 4.0 * c.xy;
+    vec2 viscosity = viscosityThreshold * laplacian;
+
+    // Semi-Lagrangian advection
+    vec2 densityInv = s * densityDiff;
+    vec2 uvHistory = uv - dt * c.xy * step;
+    c.xyw = textureLod(field, uvHistory, 0.0).xyw;
+
+    // Mouse external force
+    vec2 extForce = vec2(0.0);
+    if (mouse.z > 1.0 && prevMouse.z > 1.0)
+    {
+        vec2 drag = clamp((mouse.xy - prevMouse.xy) * step * 600.0,
+                          -10.0, 10.0);
+        vec2 p = uv - mouse.xy * step;
+        extForce += 0.001 / dot(p, p) * drag;
+    }
+
+    c.xy += dt * (viscosity - densityInv + extForce);
+
+    // Velocity decay
+    c.xy = max(vec2(0.0), abs(c.xy) - 5e-6) * sign(c.xy);
+
+    // Vorticity confinement
+    c.w = (fd.x - ft.x + fr.y - fl.y); // curl
+    vec2 vorticity = vec2(abs(ft.w) - abs(fd.w),
+                          abs(fl.w) - abs(fr.w));
+    vorticity *= vorticityThreshold / (length(vorticity) + 1e-5) * c.w;
+    c.xy += vorticity;
+
+    // Boundary conditions
+    c.y *= smoothstep(0.5, 0.48, abs(uv.y - 0.5));
+    c.x *= smoothstep(0.5, 0.49, abs(uv.x - 0.5));
+
+    // Stability clamping
+    c = clamp(c, vec4(-24.0, -24.0, 0.5, -0.25),
+                 vec4( 24.0,  24.0, 3.0,  0.25));
+
+    return c;
+}
+
+// === Buffer A / B / C (identical code) ===
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 stepSize = 1.0 / iResolution.xy;
+    vec4 prevMouse = textureLod(iChannel0, vec2(0.0), 0.0);
+    fragColor = fluidSolver(iChannel0, uv, stepSize, iMouse, prevMouse);
+
+    // Bottom row stores mouse state
+    if (fragCoord.y < 1.0) fragColor = iMouse;
+}
+```
+
+### Step 5: Texture Data Layout for Particle/Mass-Point Systems
+
+**What**: Encode particle positions, velocities, and other attributes at specific pixel locations in a texture.
+
+**Why**: In GPU physics simulation, each particle/mass point needs to store multiple attributes (position, velocity, force, etc.). By partitioning the texture into regions (e.g., left half for positions, right half for velocities), or encoding different attributes into different RGBA channels, a compact data layout is achieved.
+
+**Code** (cloth simulation data layout example):
+```glsl
+#define SIZX 128.0  // adjustable: cloth width (particle count)
+#define SIZY 64.0   // adjustable: cloth height (particle count)
+
+// Left half [0, SIZX) stores positions, right half [SIZX, 2*SIZX) stores velocities
+// IMPORTANT: In WebGL2, getpos/getvel both read from iChannel0 (currentBuf, read-only),
+//    write target is nextBuf (separate buffer), avoiding read-write conflict
+vec3 getpos(vec2 id)
+{
+    return texture(iChannel0, (id + 0.5) / iResolution.xy).xyz;
+}
+
+vec3 getvel(vec2 id)
+{
+    return texture(iChannel0, (id + 0.5 + vec2(SIZX, 0.0)) / iResolution.xy).xyz;
+}
+
+// In mainImage, decide whether to output position or velocity based on fragCoord
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    vec2 fc = floor(fragCoord);
+    vec2 c = fc;
+    c.x = fract(c.x / SIZX) * SIZX; // mass point ID
+
+    vec3 pos = getpos(c);
+    vec3 vel = getvel(c);
+
+    // ... physics computation ...
+
+    // Output: left half stores position, right half stores velocity
+    fragColor = vec4(fc.x >= SIZX ? vel : pos, 0.0);
+}
+```
+
+### Step 6: Spring-Damper Constraint System
+
+**What**: Implement spring forces and damping forces between mass points.
+
+**Why**: Spring-dampers are the core of cloth and soft body simulation. Each mass point is connected to neighbors via springs — spring force maintains structural shape, damping force dissipates oscillation energy. Using near-neighbors (structural springs) + diagonals (shear springs) + skip-connections (bending springs) provides complete constraints.
+
+**Full code**:
+```glsl
+const float SPRING_K = 0.15;  // adjustable: spring stiffness
+const float DAMPER_C = 0.10;  // adjustable: damping coefficient
+const float GRAVITY  = 0.0022; // adjustable: gravitational acceleration
+
+vec3 pos, vel, ovel;
+vec2 c; // current mass point ID
+
+void edge(vec2 dif)
+{
+    // Boundary check
+    if ((dif + c).x < 0.0 || (dif + c).x >= SIZX ||
+        (dif + c).y < 0.0 || (dif + c).y >= SIZY) return;
+
+    float restLen = length(dif); // rest length = initial distance
+    vec3 posdif = getpos(dif + c) - pos;
+    vec3 veldif = getvel(dif + c) - ovel;
+
+    // IMPORTANT: Must check for zero length, otherwise normalize(vec3(0)) produces NaN
+    float plen = length(posdif);
+    if (plen < 0.0001) return;
+    vec3 dir = posdif / plen;
+
+    // Spring force: restore to rest length
+    vel += dir
+         * clamp(plen - restLen, -1.0, 1.0)
+         * SPRING_K;
+
+    // Damping force: attenuate relative velocity along connection direction
+    vel += dir
+         * dot(dir, veldif)
+         * DAMPER_C;
+}
+
+// In mainImage, call 12 edges (near-neighbors + diagonals + skip-connections)
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    // ... initialize pos, vel, c ...
+    ovel = vel;
+
+    // Structural springs (4 near-neighbors)
+    edge(vec2( 0.0, 1.0));
+    edge(vec2( 0.0,-1.0));
+    edge(vec2( 1.0, 0.0));
+    edge(vec2(-1.0, 0.0));
+
+    // Shear/bending springs (diagonals + skip-connections)
+    edge(vec2( 1.0, 1.0));
+    edge(vec2(-1.0,-1.0));
+    edge(vec2( 0.0, 2.0));
+    edge(vec2( 0.0,-2.0));
+    edge(vec2( 2.0, 0.0));
+    edge(vec2(-2.0, 0.0));
+    edge(vec2( 2.0,-2.0));
+    edge(vec2(-2.0, 2.0));
+
+    // Collision detection (sphere)
+    // ... ballcollis() ...
+
+    // Integration
+    pos += vel;
+    vel.y += GRAVITY;
+
+    // Air resistance (normal wind force)
+    vec3 norm = findnormal(c);
+    vec3 windvel = vec3(0.01, 0.0, -0.005); // adjustable: wind direction and speed
+    vel -= norm * (dot(norm, vel - windvel) * 0.05);
+
+    // Fixed boundary (top row pinned as curtain rod)
+    if (c.y == 0.0)
+    {
+        pos = vec3(fc.x * 0.85, fc.y, fc.y * 0.01);
+        vel = vec3(0.0);
+    }
+
+    fragColor = vec4(fc.x >= SIZX ? vel : pos, 0.0);
+}
+```
+
+### Step 7: N-Body Particle Interaction (Biot-Savart Vortex Method)
+
+**What**: Implement all-pairs interaction forces between all particles.
+
+**Why**: Certain physical systems (such as vortex dynamics, gravitational N-body problems) require each particle to interact with all other particles. The Biot-Savart law gives the velocity field generated by vorticity, which is the core of 2D vortex simulation. Uses semi-Newton (Verlet-type) two-step integration for improved accuracy.
+
+**Full code**:
+```glsl
+#define N 20           // adjustable: N×N total particles
+#define Nf float(N)
+#define MARKERS 0.90   // adjustable: passive marker particle ratio
+
+// STRENGTH automatically scales with particle count and marker ratio
+float STRENGTH = 1e3 * 0.25 / (1.0 - MARKERS) * sqrt(30.0 / Nf);
+
+#define tex(i,j) texture(iChannel1, (vec2(i,j) + 0.5) / iResolution.xy)
+#define W(i,j)   tex(i, j + N).z  // vorticity stored in tile(0,1) z channel
+
+void mainImage(out vec4 O, vec2 U)
+{
+    vec2 T = floor(U / Nf);   // tile index
+    U = mod(U, Nf);            // particle ID
+
+    // Pass 1 (Buffer A): half-step integration dt*0.5
+    // Pass 2 (Buffer B): full-step integration using Pass 1 velocity
+
+    vec2 F = vec2(0.0);
+
+    // N×N all-pairs Biot-Savart summation
+    for (int j = 0; j < N; j++)
+        for (int i = 0; i < N; i++)
+        {
+            float w = W(i, j);
+            vec2 d = tex(i, j).xy - O.xy;
+            // Periodic boundary: take nearest image
+            d = (fract(0.5 + d / iResolution.xy) - 0.5) * iResolution.xy;
+            float l = dot(d, d);
+            if (l > 1e-5)
+                F += vec2(-d.y, d.x) * w / l; // Biot-Savart kernel
+        }
+
+    O.zw = STRENGTH * F;  // velocity
+    O.xy += O.zw * dt;    // integrate position
+    O.xy = mod(O.xy, iResolution.xy); // periodic boundary
+}
+```
+
+### Step 8: State Storage in Specific Pixels (Global Variable Trick)
+
+**What**: Store global state (current position, time, mouse history) at fixed pixel locations in the texture.
+
+**Why**: GPU shaders have no global variables. By storing state at agreed-upon pixel coordinates (usually `(0,0)` or the bottom row), the next frame can read these "global variables". This is indispensable for ODE integration (e.g., Lorenz attractor) and interactions that need to track mouse history.
+
+**Full code**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord)
+{
+    // Pixel (0,0) stores global state (e.g., Lorenz attractor's current 3D position)
+    if (floor(fragCoord) == vec2(0, 0))
+    {
+        if (iFrame == 0)
+        {
+            fragColor = vec4(0.1, 0.001, 0.0, 0.0); // initial conditions
+        }
+        else
+        {
+            vec3 state = texture(iChannel0, vec2(0.0)).xyz;
+            // Execute multi-step ODE integration
+            for (float i = 0.0; i < 96.0; i++)
+            {
+                // Lorenz system: dx/dt = σ(y-x), dy/dt = x(ρ-z)-y, dz/dt = xy-βz
+                vec3 deriv;
+                deriv.x = 10.0 * (state.y - state.x);        // σ = 10
+                deriv.y = state.x * (28.0 - state.z) - state.y; // ρ = 28
+                deriv.z = state.x * state.y - 8.0/3.0 * state.z; // β = 8/3
+                state += deriv * 0.016 * 0.2;
+            }
+            fragColor = vec4(state, 0.0);
+        }
+        return;
+    }
+
+    // Other pixels: accumulate trajectory distance field
+    vec3 last = texture(iChannel0, vec2(0.0)).xyz;
+    float d = 1e6;
+    for (float i = 0.0; i < 96.0; i++)
+    {
+        vec3 next = Integrate(last, 0.016 * 0.2);
+        d = min(d, dfLine(last.xz * 0.015, next.xz * 0.015, uv));
+        last = next;
+    }
+
+    float c = 0.5 * smoothstep(1.0 / iResolution.y, 0.0, d);
+    vec3 prev = texture(iChannel0, fragCoord / iResolution.xy).rgb;
+    fragColor = vec4(vec3(c) + prev * 0.99, 0.0); // decaying accumulation
+}
+```
+
+## Common Variant Details
+
+### Variant 1: Eulerian Fluid Simulation (Smoke / Ink)
+
+**Difference from base version**: Extends from scalar wave equation to full 2D velocity field solving — including advection, viscosity, vorticity confinement, and density tracking. Requires 3+ chained buffer iterations for enhanced convergence.
+
+**Key code**:
+```glsl
+// Buffer storage: xy = velocity, z = density, w = curl
+// Key difference: semi-Lagrangian advection replaces simple neighborhood update
+vec2 uvHistory = uv - dt * velocity.xy * stepSize;
+vec4 advected = textureLod(field, uvHistory, 0.0);
+
+// Vorticity confinement (preserve fluid detail)
+float curl = (fd.x - ft.x + fr.y - fl.y);
+vec2 vortGrad = vec2(abs(ft.w) - abs(fd.w), abs(fl.w) - abs(fr.w));
+vec2 vortForce = vorticityThreshold / (length(vortGrad) + 1e-5) * curl * vortGrad;
+velocity.xy += vortForce;
+```
+
+### Variant 2: Cloth Simulation (Mass-Spring-Damper)
+
+**Difference from base version**: Changes from grid-based field equations to a discrete particle system. Each pixel represents a mass point storing 3D position and velocity. Connected to neighbors via spring-dampers, plus gravity, wind force, and collision. Multi-buffer chained iteration (4 passes) implements multiple sub-steps.
+
+**Key code**:
+```glsl
+// Data layout: left half of texture = position, right half = velocity
+// Spring force core
+vec3 posdif = getpos(neighbor) - pos;
+vec3 veldif = getvel(neighbor) - vel;
+float restLen = length(neighborOffset);
+force += normalize(posdif) * clamp(length(posdif) - restLen, -1.0, 1.0) * 0.15;
+force += normalize(posdif) * dot(normalize(posdif), veldif) * 0.10;
+
+// Sphere collision response
+if (length(pos - ballPos) < ballRadius) {
+    vel -= normalize(pos - ballPos) * dot(normalize(pos - ballPos), vel);
+    pos = ballPos + normalize(pos - ballPos) * ballRadius;
+}
+```
+
+> **IMPORTANT: Common Pitfalls**:
+> - **Cloth Image Pass must project world coordinates to screen**: You cannot use `uv * vec2(SIZX, SIZY)` to map screen UV to grid ID, because particles have moved from their initial positions, producing scattered fragments. You must iterate over mesh faces, projecting vertex world coordinates to screen space for triangle rasterization
+> - GLSL is strictly typed; you cannot write `float / vec2`. Wrong example: `length(dif) / vec2(SIZX, SIZY).x` will first execute float/vec2 causing a compile error; use `length(dif) / SIZX` instead
+> - `normalize(vec3(0))` produces NaN; all `normalize()` calls must include a length check beforehand
+> - In the Image Pass, `getpos`/`getvel` must use the simulation resolution (`iSimResolution`) for UV calculation, not the screen resolution `iResolution`
+> - Texel center sampling should use `+0.5` offset (not `+0.01`)
+
+### Variant 3: Rigid Body Physics Engine (Box2D-lite on GPU)
+
+**Difference from base version**: The most complex variant. Uses structured pixel addressing (ECS data layout) to serialize rigid body attributes, joints, contact points, etc., into textures. Buffer A handles integration + collision detection, Buffer B/C/D handle impulse constraint iteration. Requires Common pass to encapsulate a complete physics library.
+
+**Key code**:
+```glsl
+// Structured memory addressing: map structs to consecutive pixels
+int bodyAddress(int b_id) {
+    return pixel_count_of_Globals + pixel_count_of_Body * b_id;
+}
+Body loadBody(sampler2D buff, int b_id) {
+    int addr = bodyAddress(b_id);
+    vec4 d0 = texelFetch(buff, address2D(res, addr), 0);
+    vec4 d1 = texelFetch(buff, address2D(res, addr+1), 0);
+    b.pos = d0.xy; b.vel = d0.zw;
+    b.ang = d1.x; b.ang_vel = d1.y; // ...
+}
+
+// Contact impulse solving
+float v_n = dot(dv, contact.normal);
+float dp_n = contact.mass_n * (-v_n + contact.bias);
+dp_n = max(0.0, dp_n);
+body.vel += body.inv_mass * dp_n * contact.normal;
+```
+
+### Variant 4: N-Body Vortex Particle Simulation
+
+**Difference from base version**: Changes from field (Eulerian) method to particle (Lagrangian) method. Each particle carries vorticity, and the Biot-Savart law computes the full-field velocity. Uses semi-Newton two-step integration (Buffer A half-step → Buffer B full-step). O(N²) all-pairs interaction.
+
+**Key code**:
+```glsl
+// Biot-Savart kernel: velocity induced by vorticity w at distance d
+// v = w * (-dy, dx) / |d|²
+for (int j = 0; j < N; j++)
+    for (int i = 0; i < N; i++) {
+        float w = W(i, j);
+        vec2 d = tex(i, j).xy - pos;
+        d = (fract(0.5 + d / res) - 0.5) * res; // periodic boundary
+        float l = dot(d, d);
+        if (l > 1e-5) F += vec2(-d.y, d.x) * w / l;
+    }
+```
+
+### Variant 5: 3D SPH Particle Fluid
+
+**Difference from base version**: Extends to 3D. Uses Particle Cluster Grid (PCG) for spatial neighborhood management, custom bit packing (5-bit exponent + 9-bit component) to compress particle data into 4 floats. Buffer A handles advection + clustering, Buffer B computes density, Buffer C computes forces + integration, Buffer D computes shadows.
+
+**Key code**:
+```glsl
+// Map 3D grid to 2D texture
+vec2 dim2from3(vec3 p3d) {
+    float ny = floor(p3d.z / SCALE.x);
+    float nx = floor(p3d.z) - ny * SCALE.x;
+    return vec2(nx, ny) * size3d.xy + p3d.xy;
+}
+
+// SPH pressure force
+float pressure = max(rho / rest_density - 1.0, 0.0);
+float SPH_F = force_coef_a * GD(d, 1.5) * pressure;
+// Friction + surface tension
+float Friction = 0.45 * dot(dir, dvel) * GD(d, 1.5);
+float F = surface_tension * GD(d, surface_tension_rad);
+p.force += force_k * dir * (F + SPH_F + Friction) * irho / rest_density;
+```
+
+## Performance Optimization Details
+
+### 1. Neighborhood Sampling Optimization
+- **Bottleneck**: Each pixel samples 4~12 neighbors; texture bandwidth is the main bottleneck
+- **Optimization**: Use `texelFetch` instead of `texture` (skips filtering), pre-compute `1.0/iResolution.xy` to avoid repeated division
+
+### 2. N-Body O(N²) Loop Optimization
+- **Bottleneck**: All-pairs interaction has O(N²) complexity; N=20 means 400 iterations per frame, N=50 means 2500
+- **Optimization**:
+  - Limit N value (20~30 is enough for good visual results)
+  - Use "cheap" periodic boundary mode (`fract` instead of 3×3 loop traversal)
+  - Passive marker particles (90%) don't participate in force computation, only flow passively
+
+### 3. Iteration Count vs. Accuracy Balance
+- **Bottleneck**: Fluid/rigid body solvers need multiple iterations, but each buffer can only execute once
+- **Optimization**:
+  - Use 3 chained buffers (A→B→C) for 3 iterations/frame
+  - 4 chained buffers for cloth (4 sub-steps/frame, time step = 1/4/60)
+  - More buffers consume more GPU memory; balance accuracy against resources
+
+### 4. Adaptive Precision
+- **Optimization**: Use larger step sizes for screen edges or distant regions
+```glsl
+// Kelvin wave example: distant pixels use 8× step size
+if (abs(U.y * R.y) > 100.0) dx *= 8.0 * abs(U.y);
+```
+
+### 5. Data Packing Compression
+- **Optimization**: When each particle has more than 4 float attributes, use bit operations for packing
+```glsl
+// 3D SPH example: 3 floats compressed into 1 uint (5-bit exponent + 3×9-bit components)
+uint packvec3(vec3 v) {
+    int exp = clamp(int(ceil(log2(max(...)))), -15, 15);
+    float scale = exp2(-float(exp));
+    uvec3 sv = uvec3(round(clamp(v*scale, -1.0, 1.0) * 255.0) + 255.0);
+    return uint(exp + 15) | (sv.x << 5) | (sv.y << 14) | (sv.z << 23);
+}
+```
+
+### 6. Stability Safeguards
+- Apply `clamp` to velocity/density to prevent numerical explosion
+- Use `smoothstep` for soft boundary decay instead of hard cutoff
+- Keep damping coefficients in the 0.95~0.999 range
+
+## Combination Suggestions in Detail
+
+### 1. Physics Simulation + Post-Processing Rendering
+The most common combination. Buffer passes handle physics computation, Image pass handles visualization:
+- **Waves + Refraction/Caustics**: Height field gradient drives refraction-offset sampling
+- **Fluid + Ink Coloring**: Velocity field advects colored ink particles (Buffer D), with HSV random coloring
+- **Cloth + Ray Tracing**: Voxelized spatial tree accelerates cloth surface ray intersection
+
+### 2. Physics Simulation + SDF Rendering
+Rigid body/particle position data is passed to the Image pass, rendered as geometry using SDF functions:
+- `sdBox(p - bodyPos, bodySize)` renders rigid bodies
+- `length(p - particlePos) - radius` renders particles
+- Suitable for Box2D-lite rigid body engine visualization
+
+### 3. Physics Simulation + Volume Rendering
+3D simulations (e.g., SPH) require a volume rendering pipeline:
+- Density field trilinear interpolation → ray marching → normal computation → lighting
+- Shadows via a separate buffer accumulating optical density along light rays
+- Environment map reflections + Fresnel blending
+
+### 4. Multiple Physics System Coupling
+- **Fluid + Rigid Bodies**: Fluid velocity field drives rigid body motion; rigid body occupancy modifies fluid boundaries
+- **Cloth + Colliders**: Sphere/box shapes for collision detection, cloth elastic response
+- **Particles + Fields**: Particles generate fields (density/vorticity), fields in turn drive particles (SPH / Biot-Savart)
+
+### 5. Physics Simulation + Audio Visualization
+- Bind audio texture via `iChannel`, mapping spectrum energy to external forces or parameters
+- Low frequencies drive large-scale motion, high frequencies drive small-scale vortices/ripples
--- a/skills/shader-dev/reference/sound-synthesis.md
+++ b/skills/shader-dev/reference/sound-synthesis.md
@@ -0,0 +1,578 @@
+# Sound Synthesis — Detailed Reference
+
+This document is a complete reference supplement to [SKILL.md](SKILL.md), covering prerequisites, detailed explanations of each step, in-depth variant descriptions, performance optimization analysis, and complete combination code examples.
+
+## Prerequisites
+
+- **GLSL Fundamentals**: Functions, vector operations, `float`/`vec2` types, math functions like `sin()`/`exp()`/`fract()`
+- **Audio Fundamentals**: Sample rate (typically 44100Hz), frequency-to-pitch relationship, waveform concepts (sine, sawtooth, square)
+- **Music Theory Basics**: MIDI note numbers, equal temperament, octave relationship (frequency doubles), chord construction
+- **ShaderToy Sound Mode**: `vec2 mainSound(int samp, float time)` returns a `vec2` stereo sample value in the range `[-1, 1]`
+
+## Implementation Steps
+
+### Step 1: mainSound Entry Point and Basic Framework
+
+**What**: Establish the standard entry function for a sound shader, outputting a stereo signal.
+
+**Why**: ShaderToy requires the fixed signature `vec2 mainSound(int samp, float time)`, where the return value's `.x` and `.y` are the left and right channels respectively, with a range of `[-1, 1]`. `samp` is the sample index, and `time` is the corresponding time (in seconds).
+
+```glsl
+// ShaderToy sound shader basic framework
+#define TAU 6.28318530718
+#define BPM 120.0                    // Adjustable: tempo
+#define SPB (60.0 / BPM)             // Seconds per beat
+
+vec2 mainSound(int samp, float time) {
+    vec2 audio = vec2(0.0);
+
+    // Layer instruments/tracks here
+    // audio += instrument(time);
+
+    // Master volume control + anti-click fade-in
+    audio *= 0.5 * smoothstep(0.0, 0.5, time);
+
+    return clamp(audio, -1.0, 1.0);
+}
+```
+
+### Step 2: MIDI Note to Frequency Conversion
+
+**What**: Convert a MIDI note number to its corresponding frequency value.
+
+**Why**: In equal temperament, each semitone up multiplies the frequency by `2^(1/12)`. MIDI 69 = A4 = 440Hz is the standard reference point. This is the foundation of all melodic synthesis.
+
+```glsl
+// MIDI note number to frequency
+// 69 = A4 = 440Hz, every +12 is one octave (frequency doubles)
+float noteFreq(float note) {
+    return 440.0 * pow(2.0, (note - 69.0) / 12.0);
+}
+```
+
+### Step 3: Basic Oscillators
+
+**What**: Implement four standard waveform generators — sine, sawtooth, square, and triangle waves.
+
+**Why**: Different waveforms have different harmonic characteristics. Sine waves are pure (fundamental only), sawtooth waves are rich in all harmonics (bright), square waves contain only odd harmonics (hollow), and triangle waves have faster harmonic decay (soft). These four are the building blocks of all timbre synthesis.
+
+```glsl
+// Sine wave - pure tone, fundamental only
+float osc_sin(float t) {
+    return sin(TAU * t);
+}
+
+// Sawtooth wave - contains all harmonics, bright and sharp
+float osc_saw(float t) {
+    return fract(t) * 2.0 - 1.0;
+}
+
+// Square wave - odd harmonics only, hollow texture
+float osc_sqr(float t) {
+    return step(fract(t), 0.5) * 2.0 - 1.0;
+}
+
+// Triangle wave - fast harmonic decay, soft and warm
+float osc_tri(float t) {
+    return abs(fract(t) - 0.5) * 4.0 - 1.0;
+}
+```
+
+### Step 4: Additive Synthesis Instrument
+
+**What**: Build a timbre by layering multiple harmonics (integer multiples of the fundamental), each with independent amplitude and decay rate.
+
+**Why**: The timbre of real instruments is determined by their harmonic content (spectrum). Layering 3-8 harmonics with faster decay for higher harmonics can simulate piano, bell, and other timbres. This is the core technique for additive timbre synthesis.
+
+```glsl
+// Additive synthesis instrument
+// freq: fundamental frequency, t: time within note
+// Additive synthesis with harmonic layering
+float instrument_additive(float freq, float t) {
+    float y = 0.0;
+
+    // Layer harmonics: fundamental × 1, 2, 4
+    // Decreasing amplitude + frequency-dependent decay (higher harmonics decay faster)
+    y += 0.50 * sin(TAU * 1.00 * freq * t) * exp(-0.0015 * 1.0 * freq * t);
+    y += 0.30 * sin(TAU * 2.01 * freq * t) * exp(-0.0015 * 2.0 * freq * t);
+    y += 0.20 * sin(TAU * 4.01 * freq * t) * exp(-0.0015 * 4.0 * freq * t);
+
+    // Nonlinear waveshaping to enrich harmonics
+    y += 0.1 * y * y * y;                          // Adjustable: 0.0-0.35, higher = more distortion
+
+    // Tremolo
+    y *= 0.9 + 0.1 * cos(40.0 * t);                // Adjustable: 40.0 = tremolo frequency
+
+    // Smooth attack to avoid clicks
+    y *= smoothstep(0.0, 0.01, t);                  // Adjustable: 0.01 = attack time
+
+    return y;
+}
+```
+
+### Step 5: FM Synthesis Instrument
+
+**What**: Use one oscillator's (modulator) output as the phase offset of another oscillator (carrier) to produce rich harmonics.
+
+**Why**: FM synthesis can generate extremely rich timbres with very few oscillators. Varying modulation depth over time can simulate the "bright→dark" decay characteristic of instruments. Electric pianos and sitar-like timbres are both based on this principle.
+
+```glsl
+// FM synthesis electric piano
+// FM electric piano synthesis
+vec2 fm_epiano(float freq, float t) {
+    // Stereo micro-detuning for chorus effect
+    vec2 f0 = vec2(freq * 0.998, freq * 1.002);    // Adjustable: detune amount
+
+    // "Glass" layer - high-frequency FM, fast decay → metallic attack quality
+    vec2 glass = sin(TAU * (f0 + 3.0) * t
+        + sin(TAU * 14.0 * f0 * t) * exp(-30.0 * t)  // Adjustable: 14.0=mod ratio, -30.0=mod decay
+    ) * exp(-4.0 * t);                                 // Adjustable: -4.0 = glass layer decay
+    glass = sin(glass);                                 // Second-order nonlinearity
+
+    // "Body" layer - low-frequency FM, slow decay → sustained warm tone
+    vec2 body = sin(TAU * f0 * t
+        + sin(TAU * f0 * t) * exp(-0.5 * t) * pow(440.0 / f0.x, 0.5)  // Low-frequency compensation
+    ) * exp(-t);                                        // Adjustable: -1.0 = body decay
+
+    return (glass + body) * smoothstep(0.0, 0.001, t) * 0.1;
+}
+
+// FM synthesis generic instrument (struct-parameterized)
+// FM synthesis generic instrument (struct-parameterized)
+struct Instr {
+    float att;      // Attack speed (higher = faster)
+    float fo;       // Decay rate
+    float vibe;     // Vibrato speed
+    float vphas;    // Vibrato phase
+    float phas;     // FM modulation depth
+    float dtun;     // Detune amount
+};
+
+float fm_instrument(float freq, float t, float beatTime, Instr ins) {
+    float f = freq - beatTime * ins.dtun;
+    float phase = f * t * TAU;
+    float vibrato = cos(beatTime * ins.vibe * 3.14159 / 8.0 + ins.vphas * 1.5708);
+    float fm = sin(phase + vibrato * sin(phase * ins.phas));
+    float env = exp(-beatTime * ins.fo) * (1.0 - exp(-beatTime * ins.att));
+    return fm * env * (1.0 - beatTime * 0.125);
+}
+```
+
+### Step 6: Percussion Synthesis
+
+**What**: Synthesize kick drum, snare/clap, and hi-hat percussion instruments.
+
+**Why**: Percussion is typically composed of pitch sweeps (kick) or noise pulses (hi-hat/clap) with fast envelopes. The kick's core is a sine sweep from high to low frequency; hi-hats are noise with exponential decay. Nearly all complete music shaders require these.
+
+```glsl
+// Pseudo-random hash (replaces noise texture)
+float hash(float p) {
+    p = fract(p * 0.1031);
+    p *= p + 33.33;
+    p *= p + p;
+    return fract(p);
+}
+
+// 909-style kick drum
+// 909-style kick drum synthesis
+float kick(float t) {
+    float df = 512.0;                               // Adjustable: frequency sweep depth
+    float dftime = 0.01;                             // Adjustable: sweep time constant
+    float freq = 60.0;                               // Adjustable: base frequency
+
+    // Exponential frequency sweep: rapidly slides from high to base frequency
+    float phase = TAU * (freq * t - df * dftime * exp(-t / dftime));
+    float body = sin(phase) * smoothstep(0.3, 0.0, t) * 1.5;
+
+    // Transient noise click
+    float click = sin(TAU * 8000.0 * fract(t)) * hash(t * 2000.0)
+                * smoothstep(0.007, 0.0, t);
+
+    return body + click;
+}
+
+// Hi-hat (open / closed)
+// Hi-hat synthesis (open / closed)
+float hihat(float t, float decay) {
+    // decay: 5.0 = open hat (long decay), 15.0 = closed hat (short decay)
+    float noise = hash(floor(t * 44100.0)) * 2.0 - 1.0;
+    return noise * exp(-decay * t) * smoothstep(0.0, 0.02, t);
+}
+
+// Clap / snare
+float clap(float t) {
+    float noise = hash(floor(t * 44100.0)) * 2.0 - 1.0;
+    return noise * smoothstep(0.1, 0.0, t);
+}
+```
+
+### Step 7: Note Sequence Arrangement
+
+**What**: Implement melody/chord temporal arrangement, determining which note should play at each moment.
+
+**Why**: Music = timbre × timing. ShaderToy has three mainstream arrangement approaches: (A) D() macro accumulation for handwritten melodies, (B) array lookup for complex arrangements, (C) hash pseudo-random for algorithmic composition.
+
+```glsl
+// === Approach A: D() Macro Accumulation ===
+// Usage: D(duration, MIDI note number) arranged sequentially
+// b = accumulated time, x = current note start time, n = current note
+#define D(duration, note) b += float(duration); if(t > b) { x = b; n = float(note); }
+
+float melody_macro(float time) {
+    float t = time / 0.18;                          // Adjustable: 0.18 = seconds per unit duration
+    float n = 0.0, b = 0.0, x = 0.0;
+
+    D(10,71) D(2,76) D(3,79) D(1,78) D(2,76) D(4,83) D(2,81) D(6,78)
+    // ... continue arranging notes ...
+
+    float freq = noteFreq(n);
+    float noteTime = 0.18 * (t - x);
+    return instrument_additive(freq, noteTime);
+}
+
+// === Approach B: Array Lookup ===
+const float NOTES[16] = float[16](
+    60., 62., 64., 65., 67., 69., 71., 72.,         // Adjustable: note sequence
+    60., 64., 67., 72., 65., 69., 64., 60.
+);
+
+float melody_array(float time, float bpm) {
+    float beat = time * bpm / 60.0;
+    int idx = int(mod(beat, 16.0));
+    float noteTime = fract(beat);
+    float freq = noteFreq(NOTES[idx]);
+    return instrument_additive(freq, noteTime * 60.0 / bpm);
+}
+
+// === Approach C: Hash Pseudo-Random ===
+float nse(float x) {
+    return fract(sin(x * 110.082) * 19871.8972);
+}
+
+// Scale quantization: filter out dissonant notes
+float scale_filter(float note) {
+    float n2 = mod(note, 12.0);
+    // Major scale: filter out semitones 1,3,6,8,10
+    if (n2==1.||n2==3.||n2==6.||n2==8.||n2==10.) return -100.0;
+    return note;
+}
+
+float melody_random(float time, float bpm) {
+    float beat = time * bpm / 60.0;
+    float seqn = nse(floor(beat));
+    float note = 48.0 + floor(seqn * 24.0);         // Adjustable: 48.0=lowest note, 24.0=range
+    note = scale_filter(note);
+    float freq = noteFreq(note);
+    float noteTime = fract(beat) * 60.0 / bpm;
+    return instrument_additive(freq, noteTime);
+}
+```
+
+### Step 8: Chord Construction
+
+**What**: Layer multiple notes according to chord relationships to form harmony.
+
+**Why**: A chord is a combination of multiple pitches sounding simultaneously. The common structure is root + third + fifth (triad), with added seventh and ninth degrees for jazz chords. Jazz chord progressions can be built this way.
+
+```glsl
+// Chord construction
+vec2 chord(float time, float root, float isMinor) {
+    vec2 result = vec2(0.0);
+    float bass = root - 24.0;                        // Root two octaves lower
+
+    // Root (bass)
+    result += fm_epiano(noteFreq(bass), time, 2.0);
+    // Root
+    result += fm_epiano(noteFreq(root), time - SPB * 0.5, 1.25);
+    // Third (major third = 4 semitones, minor third = 3 semitones)
+    result += fm_epiano(noteFreq(root + 4.0 - isMinor), time - SPB, 1.5);
+    // Fifth
+    result += fm_epiano(noteFreq(root + 7.0), time - SPB * 0.5, 1.25);
+    // Seventh
+    result += fm_epiano(noteFreq(root + 11.0 - isMinor), time - SPB, 1.5);
+    // Ninth
+    result += fm_epiano(noteFreq(root + 14.0), time - SPB, 1.5);
+
+    return result;
+}
+```
+
+### Step 9: Delay and Reverb Effects
+
+**What**: Simulate spatial echo and reverb effects by layering time-offset copies of the audio signal.
+
+**Why**: Dry audio sounds "flat". Multi-tap delay creates spatial depth by layering signal copies at different delays and decay amounts. Ping-pong delay bounces alternately between left and right channels, enhancing stereo width.
+
+```glsl
+// Multi-tap echo/reverb
+// Multi-tap echo/reverb
+// NOTE: in GLSL ES 3.00, "sample" is a reserved word — use "samp" instead
+vec2 echo_reverb(float time) {
+    vec2 tot = vec2(0.0);
+    float hh = 1.0;
+    for (int i = 0; i < 6; i++) {                   // Adjustable: 6 = echo count
+        float h = float(i) / 5.0;
+        float delayedTime = time - 0.7 * h;         // Adjustable: 0.7 = echo interval
+
+        // Call your instrument function to get audio at that time point
+        float samp = get_instrument_sample(delayedTime);
+
+        // Stereo spread: each echo has different L/R ratio
+        tot += samp * vec2(0.5 + 0.1 * h, 0.5 - 0.1 * h) * hh;
+        hh *= 0.5;                                   // Adjustable: 0.5 = decay per echo
+    }
+    return tot;
+}
+
+// Ping-pong stereo delay
+// Ping-pong stereo delay
+vec2 pingpong_delay(float time) {
+    vec2 mx = get_stereo_sample(time) * 0.5;
+    float ec = 0.4;                                  // Adjustable: initial echo volume
+    float fb = 0.6;                                  // Adjustable: feedback decay coefficient
+    float delay_time = 0.222;                        // Adjustable: delay time (seconds)
+    float et = delay_time;
+
+    // 4 alternating left/right ping-pong taps
+    mx += get_stereo_sample(time - et) * ec * vec2(1.0, 0.5); ec *= fb; et += delay_time;
+    mx += get_stereo_sample(time - et) * ec * vec2(0.5, 1.0); ec *= fb; et += delay_time;
+    mx += get_stereo_sample(time - et) * ec * vec2(1.0, 0.5); ec *= fb; et += delay_time;
+    mx += get_stereo_sample(time - et) * ec * vec2(0.5, 1.0); ec *= fb; et += delay_time;
+
+    return mx;
+}
+```
+
+### Step 10: Beat and Arrangement Structure
+
+**What**: Define a time grid using BPM, arrange different instruments at different beat positions, and control the overall song structure (intro, verse, interlude, etc.).
+
+**Why**: The rhythmic skeleton of music is built on a uniform beat grid. Using `floor(time * BPM / 60)` gets the current beat number, and `fract()` gets the position within the beat. `smoothstep` gating controls instrument entry and exit at specific sections.
+
+```glsl
+vec2 mainSound(int samp, float time) {
+    vec2 audio = vec2(0.0);
+
+    float beat = time * BPM / 60.0;                  // Current beat count
+    float bar = beat / 4.0;                           // Current bar (4/4 time)
+    float beatInBar = mod(beat, 4.0);                 // Beat position within bar
+
+    // --- Rhythm layer ---
+    // Kick: trigger every beat
+    float kickTime = mod(time, SPB);
+    audio += vec2(kick(kickTime) * 0.5);
+
+    // Hi-hat: trigger every half beat
+    float hatTime = mod(time, SPB * 0.5);
+    audio += vec2(hihat(hatTime, 15.0) * 0.15);
+
+    // --- Melody layer ---
+    audio += vec2(melody_array(time, BPM)) * 0.3;
+
+    // --- Arrangement automation ---
+    // Use smoothstep to control instrument entry/exit
+    float introFade = smoothstep(0.0, 4.0, bar);     // Fade in over first 4 bars
+    float dropGate = smoothstep(16.0, 16.1, bar);    // Drop at bar 16
+
+    audio *= introFade;
+
+    // Master volume + anti-click
+    audio *= 0.35 * smoothstep(0.0, 0.5, time);
+    return clamp(audio, -1.0, 1.0);
+}
+```
+
+## Variant Details
+
+### Variant 1: Subtractive Synthesis / TB-303 Acid Synthesizer
+
+**Difference from basic version**: Instead of building timbre by layering harmonics, generates a harmonic-rich waveform (sawtooth) and then sculpts it with a resonant low-pass filter to remove high frequencies. The filter cutoff frequency is modulated by an envelope, producing the classic "wah" sound.
+
+**Key modified code**:
+
+```glsl
+#define NSPC 128                                    // Adjustable: synthesis harmonic count (higher = better quality)
+
+// Resonant low-pass frequency response
+float lpf_response(float h, float cutoff, float reso) {
+    cutoff -= 20.0;
+    float df = max(h - cutoff, 0.0);
+    float df2 = abs(h - cutoff);
+    return exp(-0.005 * df * df) * 0.5              // Adjustable: -0.005 = rolloff slope
+         + exp(df2 * df2 * -0.1) * reso;            // Adjustable: resonance peak
+}
+
+// TB-303 acid synthesizer
+vec2 acid_synth(float freq, float noteTime) {
+    vec2 v = vec2(0.0);
+    // Envelope-driven filter cutoff frequency
+    float cutoff = exp(noteTime * -1.5) * 50.0      // Adjustable: -1.5=envelope speed, 50.0=sweep range
+                 + 10.0;                             // Adjustable: minimum cutoff
+    float sqr = step(0.5, fract(noteTime * 4.5));   // Sawtooth/square switching
+
+    for (int i = 0; i < NSPC; i++) {
+        float h = float(i + 1);
+        float inten = 1.0 / h;                      // Sawtooth spectrum
+        inten = mix(inten, inten * mod(h, 2.0), sqr); // Square wave variant
+        inten *= lpf_response(h, cutoff, 2.2);
+        v.x += inten * sin((TAU + 0.01) * noteTime * freq * h);
+        v.y += inten * sin(TAU * noteTime * freq * h);
+    }
+    float amp = smoothstep(0.05, 0.0, abs(noteTime - 0.31) - 0.26)
+              * exp(noteTime * -1.0);
+    return clamp(v * amp * 2.0, -1.0, 1.0);
+}
+```
+
+### Variant 2: IIR Biquad Filter
+
+**Difference from basic version**: Uses a time-domain IIR filter based on the Audio EQ Cookbook instead of frequency-domain methods. Supports 7 filter types including low-pass, high-pass, band-pass, notch, peak, and shelf — closer to real hardware. Requires maintaining past sample state.
+
+**Key modified code**:
+
+```glsl
+// Sawtooth oscillator (sample-domain, anti-aliasing friendly)
+float waveSaw(float freq, int samp) {
+    return fract(freq * float(samp) / iSampleRate) * 2.0 - 1.0;
+}
+
+// Stereo widening
+vec2 widerSaw(float freq, int samp) {
+    int offset = int(freq) * 64;                    // Adjustable: 64 = width factor
+    return vec2(waveSaw(freq, samp - offset), waveSaw(freq, samp + offset));
+}
+
+// Biquad low-pass filter coefficient calculation
+void biquadLPF(float freq, float Q, float sr,
+    out float b0, out float b1, out float b2,
+    out float a0, out float a1, out float a2) {
+    float omega = TAU * freq / sr;
+    float sn = sin(omega), cs = cos(omega);
+    float alpha = sn / (2.0 * Q);                   // Adjustable: Q = resonance (0.5-20)
+    b0 = (1.0 - cs) * 0.5;
+    b1 = 1.0 - cs;
+    b2 = (1.0 - cs) * 0.5;
+    a0 = 1.0 + alpha;
+    a1 = -2.0 * cs;
+    a2 = 1.0 - alpha;
+}
+```
+
+### Variant 3: Vocal / Formant Synthesis
+
+**Difference from basic version**: Uses a sinusoidal tract model to simulate the human voice. By setting formants at different frequencies with their bandwidths, vowels can be synthesized. Consonants are implemented through fricative noise.
+
+**Key modified code**:
+
+```glsl
+// Vocal tract formant model
+float tract(float x, float formantFreq, float bandwidth) {
+    return sin(TAU * formantFreq * x)
+         * exp(-bandwidth * 3.14159 * x);
+}
+
+// "Ah" vowel synthesis
+float vowel_aah(float t, float pitch) {
+    float period = 1.0 / pitch;
+    float x = mod(t, period);
+    // Formant frequencies and bandwidths (Hz) — adjustable to simulate different vowels
+    float aud = tract(x, 710.0, 70.0) * 0.5         // F1: 710Hz ('a' vowel)
+              + tract(x, 1000.0, 90.0) * 0.6         // F2: 1000Hz
+              + tract(x, 2450.0, 140.0) * 0.4;       // F3: 2450Hz
+    return aud;
+}
+
+// Fricative consonant noise
+float fricative(float t, float formantFreq) {
+    return (hash11(floor(formantFreq * t) * 20.0) - 0.5) * 3.0;
+}
+```
+
+### Variant 4: Algorithmic Composition (Generative Music)
+
+**Difference from basic version**: Does not use handwritten note sequences; instead uses hash functions to generate pseudo-random melodies, with scale quantization to ensure harmonic consistency. Multi-level rhythmic subdivision (1-beat/2-beat/4-beat) produces fractal-like musical structure.
+
+**Key modified code**:
+
+```glsl
+// 8-note pseudo-random loop
+vec2 noteRing(float n) {
+    float r = 0.5 + 0.5 * fract(sin(mod(floor(n), 32.123) * 32.123) * 41.123);
+    n = mod(n, 8.0);
+    // Adjustable: modify these intervals to change the melodic character
+    float note = n<1.?0. : n<2.?5. : n<3.?-2. : n<4.?4. : n<5.?7. : n<6.?4. : n<7.?2. : 0.;
+    return vec2(note, r);                            // (interval, volume)
+}
+
+// FBM-style layered note generation
+vec2 generativeNote(float beat) {
+    float b0 = floor(beat);
+    float b1 = floor(beat * 0.5);
+    float b2 = floor(beat * 0.25);
+    // Large-scale + medium-scale + small-scale layering
+    vec2 note = noteRing(b2 * 0.0625)
+              + noteRing(b2 * 0.25)
+              + noteRing(b2);
+    return note;
+}
+```
+
+### Variant 5: Chord Progression System (Circle of Fifths)
+
+**Difference from basic version**: Automatically generates harmonic progressions based on the circle of fifths interval. Every 4 beats advances one fifth (+7 semitones), automatically alternating major/minor chords with jazz chord extensions (seventh, ninth).
+
+**Key modified code**:
+
+```glsl
+vec2 mainSound(int samp, float time) {
+    float id = floor(time / SPB / 4.0);             // Current chord number
+    float offset = id * 7.0;                         // Circle of fifths: +7 semitones per step
+    float minor = mod(id, 4.0) >= 3.0 ? 1.0 : 0.0; // Every 4th chord is minor
+    float t = mod(time, SPB * 4.0);
+
+    float root = 57.0 + mod(offset, 12.0);           // Adjustable: 57.0 = starting root (A3)
+    vec2 result = chord(t, root, minor);
+
+    // Two-tap ping-pong delay
+    result += vec2(0.5, 0.2) * chord(t - SPB * 0.5, root, minor);
+    result += vec2(0.05, 0.1) * chord(t - SPB, root, minor);
+
+    return result;
+}
+```
+
+## Performance Optimization Details
+
+1. **Reduce Harmonic Count**: In additive synthesis and frequency-domain filters, the harmonic count (`NUM_HARMONICS` / `NSPC`) is the biggest performance bottleneck. Start with 4-8 harmonics and don't add more once the sound is satisfactory. Using 256 harmonics is an extreme case.
+
+2. **Avoid Sample History in Loops**: IIR filters need to process 128 historical samples, meaning each output sample requires 128 loop iterations. Prefer frequency-domain methods or reduce `PAST_SAMPLES`.
+
+3. **Simplify Echo/Delay**: Each delay tap requires recomputing the complete signal chain. 4 taps means 5x computation. Consider reducing the complexity (fewer harmonics) for delayed signals.
+
+4. **Use `fract()` Instead of `mod()`**: When the divisor is 1.0, `fract(x)` is faster than `mod(x, 1.0)`.
+
+5. **Precompute Constants**: Move loop-invariant expressions like `TAU * freq` outside the loop.
+
+6. **Use the Common Pass**: Place constant definitions and shared functions in ShaderToy's Common tab, accessible by both Sound and Image, avoiding redundant computation of BPM/SPB, etc.
+
+## Combination Suggestions
+
+### 1. Combining with Audio Visualization
+
+Sound shader output can be read in the Image shader via `iChannel0` (set to this shader's Sound output). Use `texture(iChannel0, vec2(freq, 0.0))` to get spectrum data to drive visual effects (waveforms, spectrum bar charts, etc.).
+
+### 2. Combining with Raymarching Scenes
+
+Sound-visual synchronization can be achieved by sharing timeline/cue events. Define shared timeline/cue events in the Common Pass, referenced by both Sound and Image shaders simultaneously, ensuring visual-audio synchronization.
+
+### 3. Combining with Particle Systems
+
+Use beat events (kick trigger moments) to drive particle emission. In the Image shader, use the same BPM/SPB to calculate the current beat position, and increase particle count or velocity at the kick trigger moment.
+
+### 4. Combining with Post-Processing Effects
+
+Share Sound shader envelope values (e.g., sidechain compression coefficient) with the Image shader via the Common Pass, driving bloom intensity, color shifting, screen shake, and other effects.
+
+### 5. Combining with Text/Graphic Overlays
+
+Use `message()` functions in the Image shader to render text hints, parameter displays, or interaction instructions to help users understand what is being played.
--- a/skills/shader-dev/reference/terrain-rendering.md
+++ b/skills/shader-dev/reference/terrain-rendering.md
@@ -0,0 +1,839 @@
+# Heightfield Ray Marching Terrain Rendering — Detailed Reference
+
+> This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, complete explanations for each step (what/why), variant details, in-depth performance optimization analysis, and complete code examples for combination suggestions.
+
+## Prerequisites
+
+- **GLSL Fundamentals**: uniforms, varyings, built-in functions (mix, smoothstep, clamp, fract, floor)
+- **Vector Math**: dot product, cross product, matrix transforms, normal calculation
+- **Basic Ray Marching Concepts**: casting rays from the camera, advancing along rays, detecting intersections
+- **Noise Functions**: basic principles of Value Noise / Gradient Noise (grid sampling + interpolation)
+- **FBM (Fractal Brownian Motion)**: layering multiple noise octaves to build fractal detail
+
+## Implementation Steps
+
+### Step 1: Noise and Hash Functions
+
+**What**: Implement 2D Value Noise, providing the fundamental sampling capability for FBM.
+
+**Why**: Terrain shaders build terrain from noise. Value Noise generates a continuous pseudo-random field through grid point hashing + bilinear interpolation. A rotation-based hash avoids precision issues with `sin()` on some GPUs. Interpolation uses Hermite smoothstep `3t²-2t³` to ensure C¹ continuity.
+
+**Code**:
+```glsl
+// === Hash Function ===
+// High-quality hash without sin
+// Uses fract-dot pattern, avoiding sin() precision issues
+float hash(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * 0.1031);
+    p3 += dot(p3, p3.yzx + 19.19);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+// === 2D Value Noise ===
+// Grid sampling + Hermite interpolation, returns [0,1]
+float noise(in vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+    vec2 u = f * f * (3.0 - 2.0 * f); // Hermite smoothstep
+
+    float a = hash(i + vec2(0.0, 0.0));
+    float b = hash(i + vec2(1.0, 0.0));
+    float c = hash(i + vec2(0.0, 1.0));
+    float d = hash(i + vec2(1.0, 1.0));
+
+    return mix(mix(a, b, u.x), mix(c, d, u.x), u.y);
+}
+```
+
+### Step 2: Noise with Analytical Derivatives (Advanced)
+
+**What**: Return the noise value along with its analytical partial derivatives `∂n/∂x` and `∂n/∂y`.
+
+**Why**: Analytical derivatives are key to implementing "eroded terrain" — accumulating derivatives in FBM can suppress detail layering on steep slopes (used in Step 3). This technique is widely used in terrain shaders. The derivative formula comes from chain rule differentiation of Hermite interpolation: `du = 6f(1-f)`.
+
+**Code**:
+```glsl
+// === 2D Value Noise with Analytical Derivatives ===
+// Returns vec3: .x = noise value, .yz = partial derivatives (dn/dx, dn/dy)
+vec3 noised(in vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+
+    // Hermite interpolation and its derivative
+    vec2 u  = f * f * (3.0 - 2.0 * f);
+    vec2 du = 6.0 * f * (1.0 - f);
+
+    float a = hash(i + vec2(0.0, 0.0));
+    float b = hash(i + vec2(1.0, 0.0));
+    float c = hash(i + vec2(0.0, 1.0));
+    float d = hash(i + vec2(1.0, 1.0));
+
+    float value = a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y;
+    vec2  deriv = du * (vec2(b - a, c - a) + (a - b - c + d) * u.yx);
+
+    return vec3(value, deriv);
+}
+```
+
+### Step 3: FBM Terrain Heightfield (with Derivative Erosion)
+
+**What**: Layer multiple noise octaves to build a terrain heightfield, using derivative accumulation to simulate erosion effects.
+
+**Why**: FBM is the terrain generation core. The key difference is **whether derivative suppression is used**:
+- **Without derivatives**: simple layering, terrain appears more "rough"
+- **With derivative suppression**: the `1/(1+dot(d,d))` term suppresses high-frequency detail on steep slopes, producing realistic ridge/valley structures
+
+The rotation matrix `m2` rotates sampling coordinates between each layer, breaking axis-aligned visual banding. `mat2(0.8,-0.6, 0.6,0.8)` rotates approximately 37° with unit determinant (pure rotation, no scaling) — a standard choice for terrain FBM.
+
+**Code**:
+```glsl
+#define TERRAIN_OCTAVES 9   // Tunable: 3=rough outline, 9=medium detail, 16=highest precision (for normals)
+#define TERRAIN_SCALE 0.003 // Tunable: controls terrain spatial frequency, smaller = "wider" terrain
+#define TERRAIN_HEIGHT 120.0 // Tunable: terrain elevation scale
+
+// Per-layer rotation matrix: ~37° pure rotation, eliminates axis-aligned banding
+const mat2 m2 = mat2(0.8, -0.6, 0.6, 0.8);
+
+// === FBM Terrain Heightfield (Derivative Erosion Version) ===
+// Input: 2D world coordinates (xz plane)
+// Output: scalar height value
+float terrain(in vec2 p) {
+    p *= TERRAIN_SCALE;
+
+    float a = 0.0;   // Accumulated height
+    float b = 1.0;   // Current amplitude
+    vec2  d = vec2(0.0); // Accumulated derivatives
+
+    for (int i = 0; i < TERRAIN_OCTAVES; i++) {
+        vec3 n = noised(p);          // .x=value, .yz=derivatives
+        d += n.yz;                    // Accumulate gradient
+        a += b * n.x / (1.0 + dot(d, d)); // Derivative suppression: contribution reduced on steep slopes
+        b *= 0.5;                     // Amplitude halved per layer
+        p = m2 * p * 2.0;            // Rotate + double frequency
+    }
+
+    return a * TERRAIN_HEIGHT;
+}
+```
+
+### Step 4: LOD Multi-Resolution Terrain Functions
+
+**What**: Create terrain functions at different precision levels for different purposes.
+
+**Why**: This is a classic optimization — ray marching only needs rough height (fewer FBM layers), normal calculation needs detail (more FBM layers), and camera placement only needs the coarsest estimate. A dual-function scheme (coarse for marching, fine for normals) is standard practice in terrain shaders.
+
+**Code**:
+```glsl
+#define OCTAVES_LOW 3     // Tunable: for camera placement, fastest
+#define OCTAVES_MED 9     // Tunable: for ray marching
+#define OCTAVES_HIGH 16   // Tunable: for normal calculation, finest detail
+
+// Low precision (camera height, far distance)
+float terrainL(in vec2 p) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0, b = 1.0;
+    vec2  d = vec2(0.0);
+    for (int i = 0; i < OCTAVES_LOW; i++) {
+        vec3 n = noised(p);
+        d += n.yz;
+        a += b * n.x / (1.0 + dot(d, d));
+        b *= 0.5;
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+
+// Medium precision (ray marching)
+float terrainM(in vec2 p) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0, b = 1.0;
+    vec2  d = vec2(0.0);
+    for (int i = 0; i < OCTAVES_MED; i++) {
+        vec3 n = noised(p);
+        d += n.yz;
+        a += b * n.x / (1.0 + dot(d, d));
+        b *= 0.5;
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+
+// High precision (normal calculation)
+float terrainH(in vec2 p) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0, b = 1.0;
+    vec2  d = vec2(0.0);
+    for (int i = 0; i < OCTAVES_HIGH; i++) {
+        vec3 n = noised(p);
+        d += n.yz;
+        a += b * n.x / (1.0 + dot(d, d));
+        b *= 0.5;
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+```
+
+### Step 5: Adaptive Step Size Ray Marching
+
+**What**: Cast rays from the camera and advance along the ray with adaptive steps, finding the intersection with the terrain heightfield.
+
+**Why**: Terrain is a heightfield (not an arbitrary SDF), so `ray.y - terrain(ray.xz)` can be used as a conservative step size estimate. Common terrain shaders employ three strategies:
+- **Conservative factor approach**: `step = 0.4 × h` (conservative factor 0.4, prevents overshooting sharp ridges, 300 steps)
+- **Relaxation marching**: `step = h × max(t×0.02, 1.0)`, step size automatically increases with distance (90 steps covering greater range)
+- **Adaptive marching + binary refinement**: adaptive marching + 5 binary refinement steps (150 steps + precise intersection)
+
+This template uses the conservative factor approach + distance-adaptive precision threshold, balancing accuracy and efficiency.
+
+**Code**:
+```glsl
+#define MAX_STEPS 300       // Tunable: march steps, 80=fast, 300=high quality
+#define MAX_DIST 5000.0     // Tunable: maximum render distance
+#define STEP_FACTOR 0.4     // Tunable: march conservative factor, 0.3=safe, 0.8=aggressive
+
+// === Ray Marching ===
+// ro: ray origin, rd: ray direction (normalized)
+// Returns: intersection distance t (-1.0 means miss)
+float raymarch(in vec3 ro, in vec3 rd) {
+    float t = 0.0;
+
+    // Upper bound clipping: skip if ray cannot possibly hit terrain
+    // Assumes terrain max height is TERRAIN_HEIGHT
+    if (ro.y > TERRAIN_HEIGHT && rd.y >= 0.0) return -1.0;
+    if (ro.y > TERRAIN_HEIGHT) {
+        t = (ro.y - TERRAIN_HEIGHT) / (-rd.y); // Fast jump to terrain height upper bound
+    }
+
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 pos = ro + t * rd;
+        float h = pos.y - terrainM(pos.xz); // Height difference = ray y - terrain height
+
+        // Adaptive precision: tolerate larger error at distance (screen-space equivalent)
+        if (abs(h) < 0.0015 * t) break;
+        if (t > MAX_DIST) return -1.0;
+
+        t += STEP_FACTOR * h; // Advance proportionally to height difference
+    }
+
+    return t;
+}
+```
+
+### Step 6: Binary Refinement (Optional)
+
+**What**: Perform binary search near the rough intersection found by ray marching to precisely locate the terrain surface.
+
+**Why**: Ray marching only guarantees the intersection is within some interval; binary search converges the error by 2^5=32x. This is especially important for sharp ridge silhouettes. A similar "step-back-and-halve" strategy is common in terrain shaders.
+
+**Code**:
+```glsl
+#define BISECT_STEPS 5 // Tunable: binary search steps, 5 steps = 32x precision improvement
+
+// === Binary Refinement ===
+// ro: ray origin, rd: ray direction
+// tNear: last t above terrain, tFar: first t below terrain
+float bisect(in vec3 ro, in vec3 rd, float tNear, float tFar) {
+    for (int i = 0; i < BISECT_STEPS; i++) {
+        float tMid = 0.5 * (tNear + tFar);
+        vec3 pos = ro + tMid * rd;
+        float h = pos.y - terrainM(pos.xz);
+        if (h > 0.0) {
+            tNear = tMid; // Still above terrain, advance forward
+        } else {
+            tFar = tMid;  // Below terrain, pull back
+        }
+    }
+    return 0.5 * (tNear + tFar);
+}
+```
+
+### Step 7: Normal Calculation
+
+**What**: Compute terrain surface normals at the intersection point using finite differences.
+
+**Why**: Normals are the foundation of all lighting calculations. A key optimization is **epsilon increasing with distance** — using coarser epsilon at distance avoids aliasing from high-frequency noise. The high-precision terrain function `terrainH` is used here for normal detail.
+
+**Code**:
+```glsl
+// === Normal Calculation (Finite Differences) ===
+// pos: surface intersection point, t: distance (for adaptive epsilon)
+vec3 calcNormal(in vec3 pos, float t) {
+    // Adaptive epsilon: fine up close, coarse at distance (avoids aliasing)
+    float eps = 0.02 + 0.00005 * t * t;
+
+    float hC = terrainH(pos.xz);
+    float hR = terrainH(pos.xz + vec2(eps, 0.0));
+    float hU = terrainH(pos.xz + vec2(0.0, eps));
+
+    // Finite difference normal
+    return normalize(vec3(hC - hR, eps, hC - hU));
+}
+```
+
+### Step 8: Material and Color Assignment
+
+**What**: Blend different material colors based on height, slope, noise, and other conditions.
+
+**Why**: Natural terrain color layering is key to visual convincingness. Nearly all terrain shaders follow this layering logic:
+- **Rock**: steep surfaces (small normal y component) → gray rock
+- **Grass**: flat low-altitude surfaces → green
+- **Snow**: high-altitude flat surfaces → white
+- **Sand**: near water level → sand color
+
+Use `smoothstep` for smooth transitions between layers and FBM noise to break up transition line regularity.
+
+**Code**:
+```glsl
+#define SNOW_HEIGHT 80.0     // Tunable: snow line altitude
+#define TREE_HEIGHT 45.0     // Tunable: tree line altitude
+#define BEACH_HEIGHT 1.5     // Tunable: beach height
+
+// === Material Color ===
+// pos: world coordinates, nor: normal
+vec3 getMaterial(in vec3 pos, in vec3 nor) {
+    // Slope factor: nor.y=1 means horizontal, nor.y=0 means vertical
+    float slope = nor.y;
+    float h = pos.y;
+
+    // Noise to break up transition lines
+    float nz = noise(pos.xz * 0.04) * noise(pos.xz * 0.005);
+
+    // Base rock color
+    vec3 rock = vec3(0.10, 0.09, 0.08);
+
+    // Dirt/grass color (flat surfaces)
+    vec3 grass = mix(vec3(0.10, 0.08, 0.04), vec3(0.05, 0.09, 0.02), nz);
+
+    // Snow color
+    vec3 snow = vec3(0.62, 0.65, 0.70);
+
+    // Sand color
+    vec3 sand = vec3(0.50, 0.45, 0.35);
+
+    // --- Layered blending ---
+    vec3 col = rock;
+
+    // Flat areas: rock → grass
+    col = mix(col, grass, smoothstep(0.5, 0.8, slope));
+
+    // High altitude: → snow (slope + height + noise)
+    float snowMask = smoothstep(SNOW_HEIGHT - 20.0 * nz, SNOW_HEIGHT + 10.0, h)
+                   * smoothstep(0.3, 0.7, slope);
+    col = mix(col, snow, snowMask);
+
+    // Low altitude: → sand
+    float beachMask = smoothstep(BEACH_HEIGHT + 1.0, BEACH_HEIGHT - 0.5, h)
+                    * smoothstep(0.5, 0.9, slope);
+    col = mix(col, sand, beachMask);
+
+    return col;
+}
+```
+
+### Step 9: Lighting Model
+
+**What**: Implement multi-component lighting: sun diffuse + hemisphere ambient light + backlight fill + specular.
+
+**Why**: Terrain lighting models share consistent core components:
+- **Lambert Diffuse**: `dot(N, L)` — fundamental component
+- **Hemisphere Ambient**: `0.5 + 0.5 * N.y` — standard terrain ambient lighting
+- **Backlight**: fill light from the horizontal direction opposite the sun
+- **Fresnel Rim Light**: `pow(1+dot(rd,N), 2~5)` — edge glow effect
+- **Specular**: Phong/Blinn-Phong, power ranging from 3 to 500
+
+**Code**:
+```glsl
+#define SUN_DIR normalize(vec3(0.8, 0.4, -0.6)) // Tunable: sun direction
+#define SUN_COL vec3(8.0, 5.0, 3.0)              // Tunable: sun color temperature (warm light)
+#define SKY_COL vec3(0.5, 0.7, 1.0)              // Tunable: sky color
+
+// === Lighting Calculation ===
+vec3 calcLighting(in vec3 pos, in vec3 nor, in vec3 rd, float shadow) {
+    vec3 sunDir = SUN_DIR;
+
+    // Diffuse (Lambert)
+    float dif = clamp(dot(nor, sunDir), 0.0, 1.0);
+
+    // Hemisphere ambient: facing up=full brightness, facing down=half brightness
+    float amb = 0.5 + 0.5 * nor.y;
+
+    // Backlight fill (horizontal direction opposite the sun)
+    vec3 backDir = normalize(vec3(-sunDir.x, 0.0, -sunDir.z));
+    float bac = clamp(0.2 + 0.8 * dot(nor, backDir), 0.0, 1.0);
+
+    // Fresnel rim light
+    float fre = pow(clamp(1.0 + dot(rd, nor), 0.0, 1.0), 2.0);
+
+    // Specular (Blinn-Phong)
+    vec3 hal = normalize(sunDir - rd);
+    float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), 16.0)
+              * (0.04 + 0.96 * pow(1.0 + dot(hal, rd), 5.0)); // Fresnel term
+
+    // Combine
+    vec3 lin = vec3(0.0);
+    lin += dif * shadow * SUN_COL * 0.1;          // Sun diffuse
+    lin += amb * SKY_COL * 0.2;                    // Sky ambient
+    lin += bac * vec3(0.15, 0.05, 0.04);           // Backlight (warm tone)
+    lin += fre * SKY_COL * 0.3;                    // Rim light
+    lin += spe * shadow * SUN_COL * 0.05;          // Specular
+
+    return lin;
+}
+```
+
+### Step 10: Soft Shadows
+
+**What**: Cast a shadow ray from the surface intersection point toward the sun, computing soft shadows with penumbra.
+
+**Why**: Soft shadows greatly enhance terrain spatial depth. The classic technique — during shadow ray marching, track `min(k*h/t)`, where h is the height distance from the terrain and t is the march distance. A smaller ratio = the ray grazes the terrain surface = penumbra region. The k parameter controls penumbra softness (k=16 for soft, k=64 for hard).
+
+**Code**:
+```glsl
+#define SHADOW_STEPS 80     // Tunable: shadow ray steps, 32=fast, 80=high quality
+#define SHADOW_K 16.0       // Tunable: penumbra softness, 8=very soft, 64=very hard
+
+// === Soft Shadows ===
+// pos: surface point, sunDir: sun direction
+float calcShadow(in vec3 pos, in vec3 sunDir) {
+    float res = 1.0;
+    float t = 1.0; // Start slightly above the surface to avoid self-intersection
+
+    for (int i = 0; i < SHADOW_STEPS; i++) {
+        vec3 p = pos + t * sunDir;
+        float h = p.y - terrainM(p.xz);
+
+        if (h < 0.001) return 0.0; // Full shadow
+
+        // Penumbra estimate: smaller h/t = ray closer to occlusion
+        res = min(res, SHADOW_K * h / t);
+        t += clamp(h, 2.0, 100.0); // Adaptive step size
+    }
+
+    return clamp(res, 0.0, 1.0);
+}
+```
+
+### Step 11: Aerial Perspective and Fog
+
+**What**: Blend terrain color toward fog color with increasing distance, achieving an aerial perspective effect.
+
+**Why**: Atmospheric effects are the key visual cue for "pushing" pixels into the distance. Common approaches range from simple to complex:
+- **Exponential fog**: `exp(-0.00005 * t^2)` — simplest
+- **Exponential + height-decay fog**: `exp(-pow(k*t, 1.5))` — denser at low altitude, thinner at high altitude
+- **Wavelength-dependent fog**: `exp(-t * vec3(1,1.5,4) * k)` — blue light attenuates faster, red light travels further, realistic atmospheric dispersion
+- **Full Rayleigh+Mie scattering**: physically accurate but expensive
+
+**Code**:
+```glsl
+#define FOG_DENSITY 0.00025  // Tunable: fog density
+#define FOG_HEIGHT 0.001     // Tunable: height decay coefficient (0=no height dependency)
+
+// === Atmospheric Fog ===
+// col: original color, t: distance, rd: ray direction
+vec3 applyFog(in vec3 col, float t, in vec3 rd) {
+    // Wavelength-dependent attenuation: blue attenuates 4x faster than red
+    vec3 extinction = exp(-t * FOG_DENSITY * vec3(1.0, 1.5, 4.0));
+
+    // Fog color: base blue-gray + sun direction scattering (warm tones)
+    float sundot = clamp(dot(rd, SUN_DIR), 0.0, 1.0);
+    vec3 fogCol = mix(vec3(0.55, 0.55, 0.58),         // Base fog color
+                      vec3(1.0, 0.7, 0.3),              // Sun scatter color
+                      0.3 * pow(sundot, 8.0));
+
+    return col * extinction + fogCol * (1.0 - extinction);
+}
+```
+
+### Step 12: Sky Rendering
+
+**What**: Draw the background sky, including gradients, sun disk, and horizon glow.
+
+**Why**: The sky is an important component of atmospheric mood. All terrain shaders with 3D viewpoints include sky rendering. Key components:
+- Zenith-to-horizon blue→white gradient
+- Horizon glow band (`pow(1-rd.y, n)` family)
+- Sun disk and halo (`pow(sundot, high power)` family)
+
+**Code**:
+```glsl
+// === Sky Color ===
+vec3 getSky(in vec3 rd) {
+    // Base sky gradient: zenith blue → horizon white
+    vec3 col = vec3(0.3, 0.5, 0.85) - rd.y * vec3(0.2, 0.15, 0.0);
+
+    // Horizon glow
+    float horizon = pow(1.0 - max(rd.y, 0.0), 4.0);
+    col = mix(col, vec3(0.8, 0.75, 0.7), 0.5 * horizon);
+
+    // Sun
+    float sundot = clamp(dot(rd, SUN_DIR), 0.0, 1.0);
+    col += vec3(1.0, 0.7, 0.3) * 0.3 * pow(sundot, 8.0);   // Large halo
+    col += vec3(1.0, 0.9, 0.7) * 0.5 * pow(sundot, 64.0);   // Small halo
+    col += vec3(1.0, 1.0, 0.9) * min(pow(sundot, 1150.0), 0.3); // Sun disk
+
+    return col;
+}
+```
+
+### Step 13: Camera Setup
+
+**What**: Build a Look-At camera matrix and define a flight path.
+
+**Why**: Terrain flythrough cameras typically follow Lissajous curves or arc paths, with altitude following the terrain. The Look-At matrix maps screen coordinates to world-space ray directions.
+
+**Code**:
+```glsl
+#define CAM_ALTITUDE 20.0   // Tunable: camera height above ground
+#define CAM_SPEED 0.5       // Tunable: flight speed
+
+// === Camera Path ===
+vec3 cameraPath(float t) {
+    return vec3(
+        100.0 * sin(0.2 * t),  // x: sine curve
+        0.0,                     // y: determined by terrain height
+        -100.0 * t               // z: forward direction
+    );
+}
+
+// === Camera Matrix ===
+mat3 setCamera(in vec3 ro, in vec3 ta) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cu = normalize(cross(cw, vec3(0.0, 1.0, 0.0)));
+    vec3 cv = cross(cu, cw);
+    return mat3(cu, cv, cw);
+}
+```
+
+## Common Variants
+
+### Variant 1: Relaxation Marching
+
+**Difference from the base version**: Step size automatically increases with distance, covering greater range but with slightly reduced precision. The conservative factor is replaced with a distance-adaptive relaxation factor, while the height estimate is scaled down to prevent penetration.
+
+**Key code**:
+```glsl
+#define RELAX_MAX_STEPS 90       // Fewer steps needed to cover greater distance
+#define RELAX_FAR 400.0
+
+float raymarchRelax(in vec3 ro, in vec3 rd) {
+    float t = 0.0;
+    float d = (ro + rd * t).y - terrainM((ro + rd * t).xz);
+
+    for (int i = 0; i < RELAX_MAX_STEPS; i++) {
+        if (abs(d) < t * 0.0001 || t > RELAX_FAR) break;
+
+        float rl = max(t * 0.02, 1.0); // Relaxation factor: larger steps at distance
+        t += d * rl;
+        vec3 pos = ro + rd * t;
+        d = (pos.y - terrainM(pos.xz)) * 0.7; // 0.7 attenuation prevents penetration
+    }
+    return t;
+}
+```
+
+### Variant 2: Sign-Alternating FBM
+
+**Difference from the base version**: Flips the amplitude sign each layer (`w = -w * 0.4`), producing unique alternating ridge/valley patterns. Does not use derivative suppression — the style is distinctly different from the erosion version, producing a more "jagged and twisted" appearance.
+
+**Key code**:
+```glsl
+float terrainSignFlip(in vec2 p) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0;
+    float w = 1.0; // Initial weight
+
+    for (int i = 0; i < TERRAIN_OCTAVES; i++) {
+        a += w * noise(p);
+        w = -w * 0.4;    // Sign flip + decay: alternating addition and subtraction
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+```
+
+### Variant 3: Texture-Driven Heightfield + 3D Displacement
+
+**Difference from the base version**: Uses texture sampling as the base heightfield, with 3D FBM displacement layered on top to produce cliffs, caves, and other non-heightfield formations. Requires additional texture channel inputs but can create far more terrain diversity than pure FBM. Marching becomes true SDF sphere tracing.
+
+**Key code**:
+```glsl
+// 3D Value Noise
+float noise3D(in vec3 x) {
+    vec3 p = floor(x);
+    vec3 f = fract(x);
+    f = f * f * (3.0 - 2.0 * f);
+    // 3D→2D flattening: offset UV by p.z, sample two texture layers and interpolate
+    vec2 uv = (p.xy + vec2(37.0, 17.0) * p.z) + f.xy;
+    vec2 rg = textureLod(iChannel0, (uv + 0.5) / 256.0, 0.0).yx;
+    return mix(rg.x, rg.y, f.z);
+}
+
+// 3D FBM Displacement
+const mat3 m3 = mat3(0.00, 0.80, 0.60,
+                    -0.80, 0.36,-0.48,
+                    -0.60,-0.48, 0.64);
+
+float displacement(vec3 p) {
+    float f = 0.5 * noise3D(p); p = m3 * p * 2.02;
+    f += 0.25 * noise3D(p);    p = m3 * p * 2.03;
+    f += 0.125 * noise3D(p);   p = m3 * p * 2.01;
+    f += 0.0625 * noise3D(p);
+    return f;
+}
+
+// SDF: heightfield + 3D displacement (supports cliffs/caves)
+float mapCanyon(vec3 p) {
+    float h = terrainM(p.xz);
+    float dis = displacement(0.25 * p * vec3(1.0, 4.0, 1.0)) * 3.0;
+    return (dis + p.y - h) * 0.25;
+}
+```
+
+### Variant 4: Directional Erosion Noise
+
+**Difference from the base version**: Uses slope direction as the projection direction for Gabor noise. Each erosion layer adjusts the "water flow direction" based on the previous layer's derivatives, producing realistic dendritic drainage patterns. Requires multi-pass height map precomputation.
+
+**Key code**:
+```glsl
+#define EROSION_OCTAVES 5
+#define EROSION_BRANCH 1.5 // Tunable: branching strength, 0=parallel, 2=strong branching
+
+// Directional Gabor noise
+vec3 erosionNoise(vec2 p, vec2 dir) {
+    vec2 ip = floor(p); vec2 fp = fract(p) - 0.5;
+    float va = 0.0; float wt = 0.0;
+    vec2 dva = vec2(0.0);
+
+    for (int i = -2; i <= 1; i++)
+    for (int j = -2; j <= 1; j++) {
+        vec2 o = vec2(float(i), float(j));
+        vec2 h = hash2(ip - o) * 0.5; // Grid point random offset
+        vec2 pp = fp + o + h;
+        float d = dot(pp, pp);
+        float w = exp(-d * 2.0);       // Gaussian weight
+        float mag = dot(pp, dir);       // Directional projection
+        va += cos(mag * 6.283) * w;     // Directional ripple
+        dva += -sin(mag * 6.283) * dir * w;
+        wt += w;
+    }
+    return vec3(va, dva) / wt;
+}
+
+// Erosion FBM: direction evolves with slope
+float terrainErosion(vec2 p, vec2 baseSlope) {
+    float e = 0.0, a = 0.5;
+    vec2 dir = normalize(baseSlope + vec2(0.001));
+
+    for (int i = 0; i < EROSION_OCTAVES; i++) {
+        vec3 n = erosionNoise(p * 4.0, dir);
+        e += a * n.x;
+        // Branching: curl of previous layer's derivative modifies water flow direction
+        dir = normalize(dir + n.zy * vec2(1.0, -1.0) * EROSION_BRANCH);
+        a *= 0.5;
+        p *= 2.0;
+    }
+    return e;
+}
+```
+
+### Variant 5: Volumetric Clouds + God Rays
+
+**Difference from the base version**: Adds a volumetric cloud layer above the terrain using front-to-back alpha compositing, with god ray factor accumulated during marching. Requires 3D noise and more steps, significantly increasing cost but with excellent visual results.
+
+**Key code**:
+```glsl
+#define CLOUD_STEPS 64        // Tunable: cloud march steps
+#define CLOUD_BASE 200.0      // Tunable: cloud layer base height
+#define CLOUD_TOP 300.0       // Tunable: cloud layer top height
+
+vec4 raymarchClouds(vec3 ro, vec3 rd) {
+    // Calculate intersections with cloud slab
+    float tmin = (CLOUD_BASE - ro.y) / rd.y;
+    float tmax = (CLOUD_TOP  - ro.y) / rd.y;
+    if (tmin > tmax) { float tmp = tmin; tmin = tmp; tmax = tmp; } // swap
+    if (tmin < 0.0) tmin = 0.0;
+
+    float t = tmin;
+    vec4 sum = vec4(0.0); // rgb=color, a=opacity
+    float rays = 0.0;     // God ray accumulation
+
+    for (int i = 0; i < CLOUD_STEPS; i++) {
+        if (sum.a > 0.99 || t > tmax) break;
+        vec3 pos = ro + t * rd;
+
+        // Cloud density: slab shape × FBM carving
+        float hFrac = (pos.y - CLOUD_BASE) / (CLOUD_TOP - CLOUD_BASE);
+        float shape = 1.0 - 2.0 * abs(hFrac - 0.5); // Densest in the middle
+        float den = shape - 1.6 * (1.0 - noise(pos.xz * 0.01)); // Simplified FBM
+
+        if (den > 0.0) {
+            // Cloud lighting: offset sample toward sun direction (self-shadowing)
+            float shadowDen = shape - 1.6 * (1.0 - noise((pos.xz + SUN_DIR.xz * 30.0) * 0.01));
+            float shadow = clamp(1.0 - shadowDen * 2.0, 0.0, 1.0);
+
+            vec3 cloudCol = mix(vec3(0.4, 0.4, 0.45), vec3(1.0, 0.95, 0.8), shadow);
+            float alpha = clamp(den * 0.4, 0.0, 1.0);
+
+            // God rays: brightness of sunlight passing through thin areas
+            rays += 0.02 * shadow * (1.0 - sum.a);
+
+            // Front-to-back compositing
+            cloudCol *= alpha;
+            sum += vec4(cloudCol, alpha) * (1.0 - sum.a);
+        }
+
+        float dt = max(0.5, 0.05 * t);
+        t += dt;
+    }
+
+    // Add god rays to color
+    sum.rgb += pow(rays, 3.0) * 0.4 * vec3(1.0, 0.8, 0.7);
+    return sum;
+}
+```
+
+## In-Depth Performance Optimization
+
+### 1. LOD Layering (Most Important Optimization)
+**Bottleneck**: Each FBM layer requires an independent noise sample; octave count is a direct performance multiplier.
+**Optimization**: Use low octaves for ray marching (3-9 layers), high octaves for normal calculation (16 layers), and lowest for camera placement (3 layers). This is standard practice in terrain shaders.
+
+### 2. Upper Bound Clipping (Bounding Plane)
+**Bottleneck**: Rays waste iterations stepping through open air.
+**Optimization**: Precompute the maximum terrain height and intersect the ray with that plane before starting to march.
+```glsl
+if (ro.y > maxHeight && rd.y >= 0.0) return -1.0; // Skip entirely
+t = (ro.y - maxHeight) / (-rd.y); // Jump to upper bound
+```
+
+### 3. Adaptive Precision Threshold
+**Bottleneck**: Distant pixels still use near-field precision, wasting iterations.
+**Optimization**: Hit threshold grows with distance: `abs(h) < 0.001 * t`. This is common practice, with the coefficient typically ranging from 0.0001 to 0.002.
+
+### 4. Texture Instead of Procedural Noise
+**Bottleneck**: Procedural noise requires multiple hash and interpolation operations.
+**Optimization**: Pre-bake a 256x256 noise texture and sample with `textureLod`. Provides approximately 2-3x speedup over procedural noise.
+
+### 5. Early Exit
+**Bottleneck**: Rays continue iterating after exceeding range.
+**Optimization**:
+- `t > MAX_DIST` break out
+- `alpha > 0.99` break out in volumetric rendering
+- `h < 0` immediately return 0 in shadow rays
+
+### 6. Jittered Start
+**Bottleneck**: Uniform stepping produces visible banding artifacts.
+**Optimization**: Add per-pixel random offset to the starting t: `t += hash(fragCoord) * step_size`. Adds no computational cost but significantly improves visual quality.
+
+## Complete Combination Code Examples
+
+### 1. Terrain + Water Surface
+The most common terrain rendering combination. The water surface serves as a fixed y-plane — march the terrain first, and if the ray intersects terrain below the water surface, render underwater effects; otherwise render water surface reflection/refraction.
+- Key: Water surface normals use multi-frequency noise perturbation to simulate waves; Fresnel controls reflection/refraction mixing
+
+```glsl
+#define WATER_LEVEL 5.0
+
+// Water surface normal (multi-frequency noise perturbation)
+vec3 waterNormal(vec2 p, float t) {
+    float eps = 0.1;
+    float h0 = noise(p * 0.5 + iTime * 0.3) * 0.5
+             + noise(p * 1.5 - iTime * 0.2) * 0.25;
+    float hx = noise((p + vec2(eps, 0.0)) * 0.5 + iTime * 0.3) * 0.5
+             + noise((p + vec2(eps, 0.0)) * 1.5 - iTime * 0.2) * 0.25;
+    float hz = noise((p + vec2(0.0, eps)) * 0.5 + iTime * 0.3) * 0.5
+             + noise((p + vec2(0.0, eps)) * 1.5 - iTime * 0.2) * 0.25;
+    return normalize(vec3(h0 - hx, eps, h0 - hz));
+}
+
+// In the main function:
+// 1. Check water surface intersection first
+float tWater = (ro.y - WATER_LEVEL) / (-rd.y);
+// 2. Compare with terrain intersection
+float tTerrain = raymarch(ro, rd);
+
+vec3 col;
+if (tWater > 0.0 && (tTerrain < 0.0 || tWater < tTerrain)) {
+    // Hit water surface
+    vec3 wpos = ro + tWater * rd;
+    vec3 wnor = waterNormal(wpos.xz, tWater);
+
+    // Fresnel
+    float fresnel = pow(1.0 - max(dot(-rd, wnor), 0.0), 5.0);
+    fresnel = 0.02 + 0.98 * fresnel;
+
+    // Reflection
+    vec3 refl = reflect(rd, wnor);
+    vec3 reflCol = getSky(refl);
+
+    // Underwater color
+    vec3 waterCol = vec3(0.0, 0.04, 0.04);
+
+    col = mix(waterCol, reflCol, fresnel);
+    col = applyFog(col, tWater, rd);
+} else if (tTerrain > 0.0) {
+    // Hit terrain (same as original code)
+    // ...
+}
+```
+
+### 2. Terrain + Volumetric Clouds
+Render the terrain first to get color and depth, then march the cloud slab along the ray, compositing onto the terrain using front-to-back alpha blending.
+- Key: Cloud self-shadowing (offset sampling toward light direction), god ray accumulation
+
+```glsl
+// In the main function:
+vec3 col;
+float t = raymarch(ro, rd);
+
+if (t > 0.0) {
+    // Render terrain...
+    vec3 pos = ro + t * rd;
+    vec3 nor = calcNormal(pos, t);
+    vec3 mate = getMaterial(pos, nor);
+    float sha = calcShadow(pos + nor * 0.5, SUN_DIR);
+    vec3 lin = calcLighting(pos, nor, rd, sha);
+    col = mate * lin;
+    col = applyFog(col, t, rd);
+} else {
+    col = getSky(rd);
+}
+
+// Overlay volumetric clouds
+vec4 clouds = raymarchClouds(ro, rd);
+col = col * (1.0 - clouds.a) + clouds.rgb;
+```
+
+### 3. Terrain + Volumetric Fog/Dust
+Volumetric dust fog can be added after the main marching is complete, additionally sample a 3D FBM density field along the ray with distance-based attenuation. Suitable for desert, volcanic, and similar scenes.
+- Key: Step size adapts to density — smaller steps in dense regions
+
+### 4. Terrain + SDF Object Placement
+SDF ellipsoids can be placed as trees on the terrain. Terrain marching and object marching can be separated or combined. Objects are placed on a 2D grid with hash-based jitter.
+- Key: `floor(p.xz/gridSize)` determines the grid cell, `hash(cell)` determines tree position/size
+
+```glsl
+#define TREE_GRID 30.0
+
+// Place tree SDFs in a grid
+float mapTrees(vec3 p) {
+    vec2 cell = floor(p.xz / TREE_GRID);
+    vec2 cellCenter = (cell + 0.5) * TREE_GRID;
+
+    // Hash to randomize position
+    vec2 jitter = (hash2(cell) - 0.5) * TREE_GRID * 0.6;
+    vec2 treePos = cellCenter + jitter;
+
+    // Tree trunk height
+    float groundH = terrainL(treePos);
+
+    // SDF: ellipsoid tree canopy
+    vec3 treeCenter = vec3(treePos.x, groundH + 8.0, treePos.y);
+    float treeSize = 4.0 + hash(cell) * 3.0;
+    vec3 q = (p - treeCenter) / vec3(treeSize, treeSize * 1.5, treeSize);
+    return (length(q) - 1.0) * treeSize * 0.8;
+}
+```
+
+### 5. Terrain + Temporal Anti-Aliasing (TAA)
+Inter-frame reprojection blending can be used for temporal anti-aliasing. The current frame's camera matrix is stored in buffer pixels, and the next frame uses it to reproject 3D points back to the previous frame's screen coordinates, blending historical colors.
+- Key: blend ratio ~10% new frame + 90% history frame, with increased new frame weight in motion areas
--- a/skills/shader-dev/reference/texture-mapping-advanced.md
+++ b/skills/shader-dev/reference/texture-mapping-advanced.md
@@ -0,0 +1,87 @@
+# Advanced Texture Mapping Detailed Reference
+
+## Prerequisites
+- Screen-space derivatives (`dFdx`, `dFdy`)
+- `textureGrad()` function usage
+- Basic ray marching
+
+## Triplanar vs Biplanar Cost Analysis
+
+| Aspect | Triplanar | Biplanar |
+|--------|-----------|----------|
+| Texture fetches | 3 | 2 |
+| ALU operations | Lower | Higher (axis selection) |
+| Bandwidth | Higher | Lower |
+| Visual quality | Baseline | Equivalent (k≥8) |
+| Best for | Bandwidth-rich GPUs | Mobile, bandwidth-limited |
+
+Modern GPUs are typically bandwidth-limited rather than ALU-limited, making biplanar the better default choice.
+
+### Weight Remapping Mathematics
+
+The biplanar weight formula `clamp((w - 0.5773) / (1.0 - 0.5773), 0, 1)` ensures:
+- At normals aligned with one axis: weight = 1.0 (clean projection)
+- At 45° diagonals where 2 axes are equal: smooth transition
+- At the cube diagonal (1/√3 ≈ 0.5773): weight = 0.0, but this is the point where the third (discarded) projection would be needed — biplanar's approximation error is maximal here but visually acceptable
+
+### Gradient Propagation
+
+Using `textureGrad()` instead of `texture()` is essential because:
+1. Axis selection (`ma`, `me`) creates UV discontinuities at projection boundaries
+2. Hardware `texture()` computes mip from implicit derivatives, which spike at discontinuities → visible seams
+3. `textureGrad()` with manually propagated `dFdx(p)`, `dFdy(p)` bypasses this, keeping gradients smooth across boundaries
+
+## Ray Differential Mathematics
+
+### Problem Statement
+In rasterization, `dFdx`/`dFdy` of texture coordinates work naturally because adjacent pixels map to nearby surface points. In ray marching, adjacent pixels may hit completely different objects → broken mip selection.
+
+### Solution: Tangent Plane Intersection
+
+Given:
+- Primary ray hits surface at `pos` with normal `nor`
+- Neighbor pixel ray `rd_neighbor` originates from `ro_neighbor`
+
+The neighbor ray's intersection with the tangent plane at `pos`:
+```
+t_neighbor = dot(pos - ro_neighbor, nor) / dot(rd_neighbor, nor)
+pos_neighbor = ro_neighbor + rd_neighbor * t_neighbor
+```
+
+The difference `pos_neighbor - pos` gives the world-space footprint of one pixel at the hit point.
+
+### For Perspective Cameras (Common Case)
+```
+ro is the same for all pixels, only rd varies:
+dposdx = t * (rdx * dot(rd, nor) / dot(rdx, nor) - rd)
+dposdy = t * (rdy * dot(rd, nor) / dot(rdy, nor) - rd)
+```
+Where `rdx = rd + dFdx(rd)` and `rdy = rd + dFdy(rd)`.
+
+### Chain Rule for Texture Coordinates
+If texture mapping function is `uv = f(pos)`:
+```
+duvdx = Jacobian(f) × dposdx
+duvdy = Jacobian(f) × dposdy
+```
+For simple planar mapping `uv = pos.xz`:
+```
+duvdx = dposdx.xz
+duvdy = dposdy.xz
+```
+
+## Texture Repetition Theory
+
+### Why Tiling is Visible
+Human vision excels at detecting:
+1. **Periodic patterns**: Regular grid alignment
+2. **Unique features**: Distinctive spots/marks that repeat identically
+3. **Phase alignment**: All tiles start at the same phase
+
+### Breaking Repetition
+Each method targets different cues:
+- **Random offset** (Method A): Breaks phase alignment, 4 fetches
+- **Voronoi blend**: Breaks grid structure entirely, 9 fetches (expensive)
+- **Virtual pattern** (Method B): Breaks unique features cheaply, 2 fetches
+
+Method B is preferred for real-time use — the low-frequency index variation is cache-friendly and the two texture fetches share locality.
--- a/skills/shader-dev/reference/texture-sampling.md
+++ b/skills/shader-dev/reference/texture-sampling.md
@@ -0,0 +1,553 @@
+# Texture Sampling Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, variant details, and complete combination code examples.
+
+## Prerequisites
+
+- **GLSL Basic Syntax**: `vec2`/`vec3`/`vec4`, `uniform sampler2D`, and other types and declarations
+- **UV Coordinate System**: `fragCoord / iResolution.xy` normalizes to `[0,1]`, with origin at the bottom-left corner
+- **Mipmap Concept**: A multi-resolution pyramid of the texture, with each level at half the resolution. The GPU automatically selects the appropriate level based on screen-space derivatives to avoid aliasing
+- **ShaderToy Multi-Pass Architecture**: Image pass is the final output, Buffer A/B/C/D are intermediate computation passes, bound to textures or buffers via `iChannel0~3`
+
+## Implementation Steps
+
+### Step 1: Basic Texture Sampling and UV Normalization
+
+**What**: Convert screen pixel coordinates to UV coordinates and read texture data.
+
+**Why**: `texture()` accepts UV coordinates in the `[0,1]` range. ShaderToy provides pixel coordinates `fragCoord`, which need to be normalized by dividing by the resolution.
+
+```glsl
+// Normalize UV
+vec2 uv = fragCoord / iResolution.xy;
+
+// Basic texture sampling (hardware bilinear filtering)
+vec4 col = texture(iChannel0, uv);
+```
+
+Hardware bilinear filtering automatically performs linear interpolation between the nearest 4 texels. When the UV lands exactly at a texel center, the exact value is returned; when it falls between texels, a weighted average of the surrounding four points is returned.
+
+### Step 2: Using textureLod to Control Mipmap Level
+
+**What**: Explicitly specify the LOD level to control sampling resolution, achieving blur or avoiding automatic mip selection in ray marching.
+
+**Why**: In ray marching, the GPU cannot correctly estimate screen-space derivatives, which leads to incorrect mip level selection and artifacts. Using `textureLod(..., 0.0)` forces sampling at the highest resolution level; using higher LOD values produces blur effects (e.g., depth of field, bloom).
+
+Physical meaning of LOD values:
+- `lod = 0.0`: Original resolution (mip 0)
+- `lod = 1.0`: Half resolution (mip 1), equivalent to a 2x2 area average
+- `lod = N`: Resolution is 1/2^N of the original
+
+```glsl
+// In ray marching: force LOD 0 to avoid artifacts (from Campfire at night)
+vec3 groundCol = textureLod(iChannel2, groundUv * 0.05, 0.0).rgb;
+
+// Depth of field blur: LOD varies with distance (from Heartfelt)
+float focus = mix(maxBlur - coverage, minBlur, smoothstep(.1, .2, coverage));
+vec3 col = textureLod(iChannel0, uv + normal, focus).rgb;
+
+// Bloom: explicitly sample high mip levels (from Campfire at night)
+#define BLOOM_LOD_A 4.0  // Adjustable: bloom first layer mip level
+#define BLOOM_LOD_B 5.0  // Adjustable: bloom second layer mip level
+#define BLOOM_LOD_C 6.0  // Adjustable: bloom third layer mip level
+vec3 bloom = vec3(0.0);
+bloom += textureLod(iChannel0, uv + off * exp2(BLOOM_LOD_A), BLOOM_LOD_A).rgb;
+bloom += textureLod(iChannel0, uv + off * exp2(BLOOM_LOD_B), BLOOM_LOD_B).rgb;
+bloom += textureLod(iChannel0, uv + off * exp2(BLOOM_LOD_C), BLOOM_LOD_C).rgb;
+bloom /= 3.0;
+```
+
+### Step 3: Using texelFetch for Exact Pixel Data Access
+
+**What**: Read the value of a specific texel using integer coordinates, bypassing all filtering.
+
+**Why**: When textures are used as data storage (game state, precomputed LUTs, keyboard input), exact values of specific pixels must be read — hardware filtering would corrupt data integrity. `texelFetch` uses `ivec2` integer coordinates instead of `vec2` float UVs, accessing pixels directly by address, similar to array indexing.
+
+```glsl
+// Define data storage addresses (from Bricks Game)
+const ivec2 txBallPosVel = ivec2(0, 0);
+const ivec2 txPaddlePos  = ivec2(1, 0);
+const ivec2 txPoints     = ivec2(2, 0);
+const ivec2 txState      = ivec2(3, 0);
+
+// Read stored data
+vec4 loadValue(in ivec2 addr) {
+    return texelFetch(iChannel0, addr, 0);
+}
+
+// Write data (in buffer pass)
+void storeValue(in ivec2 addr, in vec4 val, inout vec4 fragColor, in ivec2 fragPos) {
+    fragColor = (fragPos == addr) ? val : fragColor;
+}
+
+// Read keyboard input (ShaderToy keyboard texture)
+float key = texelFetch(iChannel1, ivec2(KEY_SPACE, 0), 0).x;
+```
+
+### Step 4: Manual Bilinear Interpolation + Quintic Hermite Smoothing
+
+**What**: Bypass hardware bilinear filtering by manually sampling 4 texels and interpolating with a quintic Hermite polynomial for C² continuity.
+
+**Why**: Hardware bilinear interpolation is linear (C⁰ continuous), which produces visible grid-like seams when layering noise FBM. Quintic Hermite interpolation has zero first and second derivatives at sample points, eliminating these artifacts.
+
+**Mathematical Derivation**:
+
+Standard bilinear interpolation uses linear weight `u = f` (where `f = fract(x)`), which causes derivative discontinuity at boundaries.
+
+Quintic Hermite polynomial: `u = f³(6f² - 15f + 10)`
+
+Verifying C² continuity:
+- `u(0) = 0`, `u(1) = 1` — Correct interpolation boundaries
+- `u'(f) = 30f²(f-1)²` → `u'(0) = 0`, `u'(1) = 0` — First derivative is zero at boundaries
+- `u''(f) = 60f(f-1)(2f-1)` → `u''(0) = 0`, `u''(1) = 0` — Second derivative is zero at boundaries
+
+```glsl
+// Manual four-point sampling + quintic Hermite interpolation (from up in the cloud sea)
+float noise(vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+
+    // Quintic Hermite smoothing (C2 continuous)
+    vec2 u = f * f * f * (f * (f * 6.0 - 15.0) + 10.0);
+
+    // Manual sampling of four corner points (divided by texture resolution for normalization)
+    #define TEX_RES 1024.0  // Adjustable: noise texture resolution
+    float a = texture(iChannel0, (p + vec2(0.0, 0.0)) / TEX_RES).x;
+    float b = texture(iChannel0, (p + vec2(1.0, 0.0)) / TEX_RES).x;
+    float c = texture(iChannel0, (p + vec2(0.0, 1.0)) / TEX_RES).x;
+    float d = texture(iChannel0, (p + vec2(1.0, 1.0)) / TEX_RES).x;
+
+    // Bilinear blending
+    return a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y;
+}
+```
+
+### Step 5: FBM (Fractional Brownian Motion) Noise from Textures
+
+**What**: Build multi-scale procedural noise by layering multiple texture samples at different frequencies.
+
+**Why**: A single noise sample lacks the multi-scale detail found in nature. FBM simulates the 1/f spectral characteristics of natural textures by layering at doubling frequencies with halving amplitudes. Most natural textures (terrain, clouds, rocks) exhibit 1/f noise characteristics — low frequencies contain most of the energy, high frequencies add detail.
+
+FBM formula: `fbm(x) = Σ (persistence^i × noise(2^i × x))` for i = 0..N-1
+
+Parameter effects:
+- **OCTAVES (number of layers)**: More layers add more detail, but each additional layer adds one complete noise call
+- **PERSISTENCE**: Controls the amplitude decay rate at higher frequencies. 0.5 is the classic value; higher values (0.6-0.7) produce rougher textures; lower values (0.3-0.4) produce smoother textures
+
+```glsl
+#define FBM_OCTAVES 5       // Adjustable: number of layers, more = richer detail
+#define FBM_PERSISTENCE 0.5 // Adjustable: amplitude decay rate, higher = stronger high-frequency detail
+
+float fbm(vec2 x) {
+    float v = 0.0;
+    float a = 0.5;          // Initial amplitude
+    float totalWeight = 0.0;
+    for (int i = 0; i < FBM_OCTAVES; i++) {
+        v += a * noise(x);
+        totalWeight += a;
+        x *= 2.0;           // Double frequency
+        a *= FBM_PERSISTENCE;
+    }
+    return v / totalWeight;
+}
+```
+
+### Step 6: Separable Gaussian Blur (Multi-Pass Convolution)
+
+**What**: Decompose a 2D Gaussian blur into horizontal and vertical passes, each performing a 1D convolution.
+
+**Why**: A direct NxN 2D convolution requires N² samples; after separation, only 2N are needed. This leverages the separability of the Gaussian kernel — a 2D Gaussian function can be decomposed into the product of two 1D Gaussian functions: `G(x,y) = G(x) × G(y)`. `fract()` wraps coordinates to implement torus boundary conditions, avoiding edge artifacts.
+
+Optimization trick: Leveraging the "free" interpolation of hardware bilinear filtering — sampling between two texels gives a single `texture()` call the weighted average of both texels, achieving an N-tap effect with `(N+1)/2` samples.
+
+```glsl
+// Horizontal blur pass (from expansive reaction-diffusion)
+#define BLUR_RADIUS 4  // Adjustable: blur radius (kernel width = 2*BLUR_RADIUS+1)
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 d = vec2(1.0 / iResolution.x, 0.0); // Horizontal step
+
+    // 9-tap Gaussian weights (sigma ≈ 2.0)
+    float w[9] = float[9](0.05, 0.09, 0.12, 0.15, 0.16, 0.15, 0.12, 0.09, 0.05);
+
+    vec4 col = vec4(0.0);
+    for (int i = -4; i <= 4; i++) {
+        col += w[i + 4] * texture(iChannel0, fract(uv + float(i) * d));
+    }
+    col /= 0.98; // Weight normalization correction
+    fragColor = col;
+}
+
+// Vertical blur pass: change d to vec2(0.0, 1.0/iResolution.y)
+```
+
+### Step 7: Dispersion Sampling (Wavelength-Dependent Displacement)
+
+**What**: Sample a texture multiple times along a displacement vector with different offsets, weighted by spectral response curves, to simulate prismatic dispersion.
+
+**Why**: Different wavelengths of real light have different refractive indices, causing spatial color separation. By progressively offsetting UV along the displacement direction and accumulating with different weights per RGB channel, this physical phenomenon can be simulated.
+
+Design principles of spectral response weights:
+- **Red channel** `t²`: Enhanced at the long wavelength end; red light is at the far end of the spectrum
+- **Green channel** `46.6666 × ((1-t) × t)³`: Peak at middle wavelengths, simulating the human eye's greatest sensitivity to green
+- **Blue channel** `(1-t)²`: Enhanced at the short wavelength end; blue light is at the near end of the spectrum
+
+```glsl
+#define DISP_SAMPLES 64  // Adjustable: dispersion sample count, more = smoother
+
+// Spectral response weights (simulating human eye cone response)
+vec3 sampleWeights(float i) {
+    return vec3(
+        i * i,                            // Red: long wavelength enhancement
+        46.6666 * pow((1.0 - i) * i, 3.0), // Green: middle wavelength peak
+        (1.0 - i) * (1.0 - i)             // Blue: short wavelength enhancement
+    );
+}
+
+// Dispersion sampling
+vec3 sampleDisp(sampler2D tex, vec2 uv, vec2 disp) {
+    vec3 col = vec3(0.0);
+    vec3 totalWeight = vec3(0.0);
+    for (int i = 0; i < DISP_SAMPLES; i++) {
+        float t = float(i) / float(DISP_SAMPLES);
+        vec3 w = sampleWeights(t);
+        col += w * texture(tex, fract(uv + disp * t)).rgb;
+        totalWeight += w;
+    }
+    return col / totalWeight;
+}
+```
+
+### Step 8: IBL Environment Sampling (textureLod + Roughness Mapping)
+
+**What**: Select the cubemap mipmap level based on surface roughness for image-based lighting.
+
+**Why**: In PBR, rough surfaces need to gather lighting from a wider range of the environment (equivalent to a blurred environment map). High mipmap levels naturally correspond to blurred versions of the environment map, so roughness can be directly mapped to LOD level. This is the split-sum approximation method popularized by Epic Games in UE4.
+
+Complete split-sum IBL workflow:
+1. Pre-filter environment map: different roughness values correspond to different mip levels
+2. Pre-compute BRDF LUT: `vec2(NdotV, roughness)` -> `vec2(scale, bias)`
+3. Final compositing: `specular = envColor * (F * brdf.x + brdf.y)`
+
+```glsl
+#define MAX_LOD 7.0     // Adjustable: cubemap maximum mip level
+#define DIFFUSE_LOD 6.5 // Adjustable: diffuse sampling LOD (near the blurriest level)
+
+// Specular IBL (from Old watch)
+vec3 getSpecularLightColor(vec3 N, float roughness) {
+    vec3 raw = textureLod(iChannel0, N, roughness * MAX_LOD).rgb;
+    return pow(raw, vec3(4.5)) * 6.5; // HDR approximation boost
+}
+
+// Diffuse irradiance IBL
+vec3 getDiffuseLightColor(vec3 N) {
+    return textureLod(iChannel0, N, DIFFUSE_LOD).rgb;
+}
+
+// BRDF LUT query (precomputed split-sum approximation)
+vec2 brdf = texture(iChannel3, vec2(NdotV, roughness)).rg;
+vec3 specular = envColor * (F * brdf.x + brdf.y);
+```
+
+## Variant Details
+
+### Variant 1: Anisotropic Flow Field Blur
+
+**Difference from basic version**: Instead of uniform Gaussian blur, performs directional blur along a noise-driven direction field, producing a flowing brushstroke effect. The direction field can come from a noise texture, velocity field, or user-defined vector field. The parabolic weight `4h(1-h)` makes the blur strongest at the path center and weakest at both ends, producing a more natural trailing effect.
+
+```glsl
+#define BLUR_ITERATIONS 32  // Adjustable: number of samples along flow field
+#define BLUR_STEP 0.008     // Adjustable: UV offset per step
+
+vec3 flowBlur(vec2 uv) {
+    vec3 col = vec3(0.0);
+    float acc = 0.0;
+    for (int i = 0; i < BLUR_ITERATIONS; i++) {
+        float h = float(i) / float(BLUR_ITERATIONS);
+        float w = 4.0 * h * (1.0 - h); // Parabolic weight
+        col += w * texture(iChannel0, uv).rgb;
+        acc += w;
+        // Direction from noise texture (or other vector field)
+        vec2 dir = texture(iChannel1, uv).xy * 2.0 - 1.0;
+        uv += BLUR_STEP * dir;
+    }
+    return col / acc;
+}
+```
+
+### Variant 2: Texture as Data Storage (Buffer-as-Data)
+
+**Difference from basic version**: Textures store structured data (positions, velocities, state) instead of colors, using `texelFetch` for exact reads to achieve inter-frame persistent state.
+
+The key to this pattern is the "address-value" mapping: each pixel coordinate is an "address", and the `vec4` is the stored "value". In a buffer pass, the shader executes for every pixel, but only writes a new value when `fragPos == addr`; all other pixels retain their old values. This implements selective writing.
+
+Applicable scenarios: Game state (health, score, position), particle system parameters, physics simulation global variables.
+
+```glsl
+// Address definitions
+const ivec2 txPosition = ivec2(0, 0);
+const ivec2 txVelocity = ivec2(1, 0);
+const ivec2 txState    = ivec2(2, 0);
+
+// Data read/write interface
+vec4 load(ivec2 addr) { return texelFetch(iChannel0, addr, 0); }
+
+void store(ivec2 addr, vec4 val, inout vec4 fragColor, ivec2 fragPos) {
+    fragColor = (fragPos == addr) ? val : fragColor;
+}
+
+// Usage in mainImage
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    ivec2 p = ivec2(fragCoord);
+    fragColor = texelFetch(iChannel0, p, 0); // Default: keep old value
+
+    vec4 pos = load(txPosition);
+    vec4 vel = load(txVelocity);
+    // ... update logic ...
+    store(txPosition, pos + vel * 0.016, fragColor, p);
+    store(txVelocity, vel, fragColor, p);
+}
+```
+
+### Variant 3: Chromatic Dispersion
+
+**Difference from basic version**: Samples multiple times along a displacement vector, each at a different offset with wavelength-dependent weighted RGB accumulation, producing a prismatic dispersion effect. `DISP_STRENGTH` controls the spatial range of dispersion — larger values produce more pronounced RGB separation.
+
+```glsl
+#define DISP_SAMPLES 64     // Adjustable: sample count
+#define DISP_STRENGTH 0.05  // Adjustable: dispersion strength
+
+vec3 dispersion(vec2 uv, vec2 displacement) {
+    vec3 col = vec3(0.0);
+    vec3 w_total = vec3(0.0);
+    for (int i = 0; i < DISP_SAMPLES; i++) {
+        float t = float(i) / float(DISP_SAMPLES);
+        vec3 w = vec3(t * t, 46.666 * pow((1.0 - t) * t, 3.0), (1.0 - t) * (1.0 - t));
+        col += w * texture(iChannel0, fract(uv + displacement * t * DISP_STRENGTH)).rgb;
+        w_total += w;
+    }
+    return col / w_total;
+}
+```
+
+### Variant 4: Triplanar Texture Mapping
+
+**Difference from basic version**: For 3D surfaces, samples textures using three projection directions (X/Y/Z axes) and blends by normal weights, avoiding seam issues with traditional UV mapping.
+
+`TRIPLANAR_SHARPNESS` controls the blend transition sharpness: higher values produce sharper transitions between projection faces; a value of 1.0 provides the smoothest but potentially blurry transitions. Typical values are 2.0-4.0.
+
+Applicable scenarios: Procedural terrain (where UV unwrapping cannot be done in advance), geometry generated by SDF ray marching.
+
+```glsl
+#define TRIPLANAR_SHARPNESS 2.0  // Adjustable: blend sharpness
+
+vec3 triplanarSample(sampler2D tex, vec3 pos, vec3 normal, float scale) {
+    vec3 w = pow(abs(normal), vec3(TRIPLANAR_SHARPNESS));
+    w /= (w.x + w.y + w.z); // Normalize weights
+
+    vec3 xSample = texture(tex, pos.yz * scale).rgb;
+    vec3 ySample = texture(tex, pos.xz * scale).rgb;
+    vec3 zSample = texture(tex, pos.xy * scale).rgb;
+
+    return xSample * w.x + ySample * w.y + zSample * w.z;
+}
+```
+
+### Variant 5: Temporal Reprojection (TAA)
+
+**Difference from basic version**: Calculates the current frame pixel's UV position in the previous frame, samples the previous frame data from the buffer, and blends to achieve temporal anti-aliasing or accumulation effects.
+
+`TAA_BLEND` controls the history frame weight: higher values (e.g., 0.95) provide better temporal stability but more motion trailing; lower values (e.g., 0.8) provide faster response but more flickering. The clamp operation prevents ghosting — when the history color exceeds the current frame's neighborhood range, it indicates a large scene change, and history weight should be reduced.
+
+```glsl
+#define TAA_BLEND 0.9  // Adjustable: history frame blend ratio (higher = smoother but more trailing)
+
+vec3 temporalBlend(vec2 currUv, vec2 prevUv, vec3 currColor) {
+    vec3 history = textureLod(iChannel0, prevUv, 0.0).rgb;
+    // Simple clamp to prevent ghosting
+    vec3 minCol = currColor - 0.1;
+    vec3 maxCol = currColor + 0.1;
+    history = clamp(history, minCol, maxCol);
+    return mix(currColor, history, TAA_BLEND);
+}
+```
+
+## Performance Optimization Details
+
+### Bottleneck 1: Texture Sampling Bandwidth
+
+- **Problem**: A large number of `texture()` calls (e.g., 64 dispersion samples) is a GPU bandwidth-intensive operation
+- **Optimization**: Reduce sample count and compensate with smarter weight functions; use mipmap (`textureLod` at high LOD) to reduce cache misses
+- **Details**: GPU texture cache works in cache lines; cache hit rates are high when adjacent pixels access similar texture regions. Higher LOD level textures are smaller and more likely to fit entirely in cache. For dispersion sampling, consider performing dispersion in a low-resolution buffer first, then bilinearly upsampling
+
+### Bottleneck 2: Separable Blur
+
+- **Problem**: A 2D Gaussian blur requires N² samples
+- **Optimization**: Always use a separable two-pass approach (horizontal + vertical), reducing complexity from O(N²) to O(2N)
+- **Advanced trick**: Leverage hardware bilinear filtering's "free" interpolation — sampling between two texels causes the hardware to automatically return the weighted average, achieving an N-tap effect with `(N+1)/2` samples. For example, a 9-tap Gaussian requires only 5 texture samples
+
+### Bottleneck 3: Mip Selection in Ray Marching
+
+- **Problem**: The GPU's screen-space derivatives (`dFdx`/`dFdy`) are incorrect inside ray march loops, because adjacent pixels may be at completely different ray march steps, causing incorrect automatic mip level selection
+- **Optimization**: Use `textureLod(..., 0.0)` in all texture queries within ray march loops to force the base level
+- **Alternative**: If mipmap anti-aliasing is needed, manually compute the LOD: estimate screen-space coverage based on ray length and surface tilt angle, then convert to LOD with `log2()`
+
+### Bottleneck 4: Manual Interpolation for High-Frequency Noise
+
+- **Problem**: Manual four-point sampling + Hermite interpolation is approximately 4x slower than hardware bilinear (4 `texture()` calls + math vs. 1 hardware-filtered `texture()` call)
+- **Optimization**: Only use it when the visual difference is noticeable (first 1-2 octaves of FBM); higher-frequency octaves can fall back to `texture()` since the difference is no longer visible
+- **Tradeoff**: For a 6-octave FBM, using Hermite for the first 2 octaves (8 samples) and hardware bilinear for the last 4 (4 samples) totals 12 samples — half of the 24 samples needed for full Hermite
+
+### Bottleneck 5: Multi-Buffer Feedback Latency
+
+- **Problem**: Each buffer in a multi-pass feedback loop adds one frame of latency (because a buffer's output is only readable in the next frame)
+- **Optimization**: Combine mergeable operations into a single pass whenever possible; use `texelFetch` instead of `texture` to read buffer data to avoid unnecessary filtering overhead
+- **Architecture suggestion**: When designing buffer topology, minimize feedback chain length. If A→B→C→A forms a three-frame delay loop, consider whether B and C can be merged into a single pass
+
+## Complete Combination Code Examples
+
+### Combining with SDF Ray Marching
+
+Texture sampling provides surface detail for SDF scenes: sampling noise textures for displacement mapping, material lookup. Key: `textureLod(..., 0.0)` must be used inside ray march loops.
+
+```glsl
+// Using texture noise for detail displacement in an SDF scene
+float map(vec3 p) {
+    float d = length(p) - 1.0; // Base sphere SDF
+
+    // Texture noise displacement (must use textureLod inside ray march)
+    float n = textureLod(iChannel0, p.xz * 0.5, 0.0).x;
+    d += n * 0.1; // Surface detail
+
+    return d;
+}
+
+// Material query also uses textureLod
+vec3 getMaterial(vec3 p, vec3 n) {
+    // Triplanar mapping for material color
+    vec3 w = pow(abs(n), vec3(2.0));
+    w /= (w.x + w.y + w.z);
+    vec3 col = textureLod(iChannel1, p.yz * 0.5, 0.0).rgb * w.x
+             + textureLod(iChannel1, p.xz * 0.5, 0.0).rgb * w.y
+             + textureLod(iChannel1, p.xy * 0.5, 0.0).rgb * w.z;
+    return col;
+}
+```
+
+### Combining with Procedural Noise (Domain Warping)
+
+Texture-based noise (manual Hermite + FBM) serves as the driver for domain warping, used to generate terrain, clouds, flames, and other natural effects. Texture noise is faster than pure mathematical noise (one texture sample vs. multiple hash calculations).
+
+```glsl
+// Domain warping: use FBM to warp FBM's input coordinates
+float domainWarp(vec2 p) {
+    // First warping layer
+    vec2 q = vec2(fbm(p + vec2(0.0, 0.0)),
+                  fbm(p + vec2(5.2, 1.3)));
+
+    // Second warping layer (more complex effect)
+    vec2 r = vec2(fbm(p + 4.0 * q + vec2(1.7, 9.2)),
+                  fbm(p + 4.0 * q + vec2(8.3, 2.8)));
+
+    return fbm(p + 4.0 * r);
+}
+```
+
+### Combining with Post-Processing Pipeline
+
+Multi-LOD sampling for bloom, separable Gaussian blur for depth of field, dispersion sampling for chromatic aberration. These techniques can be chained into a complete post-processing pipeline.
+
+```glsl
+// Complete post-processing chain (single-pass simplified version)
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+
+    // 1. Read scene color (from Buffer A)
+    vec3 col = texture(iChannel0, uv).rgb;
+
+    // 2. Bloom (multi-LOD sampling)
+    vec3 bloom = vec3(0.0);
+    bloom += textureLod(iChannel0, uv, 4.0).rgb * 0.5;
+    bloom += textureLod(iChannel0, uv, 5.0).rgb * 0.3;
+    bloom += textureLod(iChannel0, uv, 6.0).rgb * 0.2;
+    col += bloom * 0.3;
+
+    // 3. Chromatic aberration (simplified 3-tap)
+    vec2 dir = uv - 0.5;
+    float strength = length(dir) * 0.02;
+    col.r = texture(iChannel0, uv + dir * strength).r;
+    col.b = texture(iChannel0, uv - dir * strength).b;
+
+    // 4. Tone mapping (Filmic)
+    col = (col * (6.2 * col + 0.5)) / (col * (6.2 * col + 1.7) + 0.06);
+
+    // 5. Vignette
+    col *= 0.5 + 0.5 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.2);
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+### Combining with PBR/IBL Lighting
+
+`textureLod` samples the cubemap by roughness for image-based lighting, combined with a precomputed BRDF LUT (queried via `texelFetch` or `texture`), forming a complete split-sum IBL pipeline.
+
+```glsl
+// Complete IBL lighting computation
+vec3 computeIBL(vec3 N, vec3 V, vec3 albedo, float roughness, float metallic) {
+    float NdotV = max(dot(N, V), 0.0);
+    vec3 R = reflect(-V, N);
+
+    // Fresnel (Schlick approximation)
+    vec3 F0 = mix(vec3(0.04), albedo, metallic);
+    vec3 F = F0 + (1.0 - F0) * pow(1.0 - NdotV, 5.0);
+
+    // Specular: sample pre-filtered environment map by roughness
+    vec3 specEnv = textureLod(iChannel0, R, roughness * 7.0).rgb;
+    specEnv = pow(specEnv, vec3(4.5)) * 6.5; // HDR approximation
+
+    // BRDF LUT query
+    vec2 brdf = texture(iChannel3, vec2(NdotV, roughness)).rg;
+    vec3 specular = specEnv * (F * brdf.x + brdf.y);
+
+    // Diffuse irradiance
+    vec3 diffEnv = textureLod(iChannel0, N, 6.5).rgb;
+    vec3 kD = (1.0 - F) * (1.0 - metallic);
+    vec3 diffuse = kD * albedo * diffEnv;
+
+    return diffuse + specular;
+}
+```
+
+### Combining with Simulation/Feedback Systems
+
+Multi-buffer texture sampling for reaction-diffusion, fluid simulation, and other iterative systems. Buffer A stores state, Buffer B/C perform separable blur diffusion, and the Image pass handles final visualization. `fract()` wraps coordinates for torus boundaries.
+
+```glsl
+// Buffer A: Reaction-diffusion state update
+// iChannel0: Buffer A itself (feedback)
+// iChannel1: Buffer B (result after horizontal blur)
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 px = 1.0 / iResolution.xy;
+
+    // Read current state and diffused state
+    vec2 state = texelFetch(iChannel0, ivec2(fragCoord), 0).xy;
+    vec2 diffused = texture(iChannel1, uv).xy; // After separable blur
+
+    // Gray-Scott reaction-diffusion
+    float a = diffused.x;
+    float b = diffused.y;
+    float feed = 0.037;
+    float kill = 0.06;
+
+    float da = 1.0 * (diffused.x - state.x) - a * b * b + feed * (1.0 - a);
+    float db = 0.5 * (diffused.y - state.y) + a * b * b - (kill + feed) * b;
+
+    state += vec2(da, db) * 0.9;
+    state = clamp(state, 0.0, 1.0);
+
+    fragColor = vec4(state, 0.0, 1.0);
+}
+```
--- a/skills/shader-dev/reference/volumetric-rendering.md
+++ b/skills/shader-dev/reference/volumetric-rendering.md
@@ -0,0 +1,608 @@
+# Volumetric Rendering — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step explanations, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+- **GLSL Fundamentals**: uniforms, varyings, built-in functions
+- **Vector Math**: dot product, cross product, normalize
+- **Ray Representation**: `P = ro + t * rd` (ray origin + t × ray direction)
+- **Noise Function Basics**: value noise, Perlin noise, fBM (Fractal Brownian Motion)
+- **Basic Optical Concepts**:
+  - Transmittance: the fraction of light remaining after passing through a medium
+  - Scattering: light changing direction within a medium
+  - Absorption: light energy being converted to heat by the medium
+
+## Core Principles
+
+The core of volumetric rendering is **Ray Marching**: along each view ray, advancing with fixed or adaptive step sizes, querying medium density at each sample point, and accumulating color and opacity.
+
+### Key Mathematical Formulas
+
+#### 1. Beer-Lambert Transmittance Law
+
+Transmittance of light passing through a medium of thickness `d` with extinction coefficient `σe`:
+
+```
+T = exp(-σe × d)
+```
+
+Where `σe = σs + σa` (scattering coefficient + absorption coefficient).
+
+**Physical meaning**: the larger the extinction coefficient or thicker the medium, the less light passes through. This is the fundamental law of all volumetric rendering.
+
+#### 2. Front-to-Back Alpha Compositing
+
+Standard form:
+```
+color_acc += sample_color × sample_alpha × (1.0 - alpha_acc)
+alpha_acc += sample_alpha × (1.0 - alpha_acc)
+```
+
+Equivalent premultiplied alpha form (most commonly used in actual code):
+```glsl
+col.rgb *= col.a;           // Premultiply
+sum += col * (1.0 - sum.a); // Front-to-back compositing
+```
+
+**Why front-to-back?** Because it allows early exit (early ray termination) when accumulated opacity approaches 1.0, saving significant computation.
+
+#### 3. Henyey-Greenstein Phase Function
+
+Describes the directional distribution of light scattering in a medium:
+
+```
+HG(cosθ, g) = (1 - g²) / (1 + g² - 2g·cosθ)^(3/2)
+```
+
+- `g > 0`: forward scattering (e.g., the silver lining effect in clouds) — light primarily continues along its original direction
+- `g < 0`: backward scattering — light primarily reflects back
+- `g = 0`: isotropic scattering — light scatters uniformly in all directions
+
+**Practical application**: Clouds typically use a dual-lobe HG function, mixing a forward scattering lobe (g≈0.8) and a backward scattering lobe (g≈-0.2) to simulate the real light scattering characteristics of cloud layers. Forward scattering produces the silver lining, while backward scattering provides volume definition.
+
+#### 4. Frostbite Improved Integration Formula
+
+In each step, the scattered light is not simply `S × dt`, but a more precise integral:
+
+```
+Sint = (S - S × exp(-σe × dt)) / σe
+```
+
+**Why is improvement needed?** The naive `S × dt` integration overestimates scattered light at larger step sizes or stronger scattering, leading to energy non-conservation (image too bright or too dark). The Frostbite formula ensures energy conservation at any step size through precise integration of the Beer-Lambert law.
+
+## Implementation Steps
+
+### Step 1: Camera and Ray Construction
+
+**What**: Generate a ray from the camera for each pixel.
+
+**Why**: This is the starting point for all ray marching techniques. Camera position determines the viewing angle; ray direction determines the sampling path.
+
+```glsl
+// Normalize screen coordinates to [-1,1], correcting for aspect ratio
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Camera parameters
+vec3 ro = vec3(0.0, 1.0, -5.0);  // Tunable: camera position
+vec3 ta = vec3(0.0, 0.0, 0.0);   // Tunable: look-at target
+
+// Build camera matrix
+vec3 ww = normalize(ta - ro);
+vec3 uu = normalize(cross(ww, vec3(0.0, 1.0, 0.0)));
+vec3 vv = cross(uu, ww);
+
+// Generate ray direction
+float fl = 1.5; // Tunable: focal length, larger = narrower FOV
+vec3 rd = normalize(uv.x * uu + uv.y * vv + fl * ww);
+```
+
+**Key parameter notes**:
+- `ro`: camera position — changing it orbits around the volume
+- `ta`: look-at target — the camera points toward this position
+- `fl`: focal length — 1.0 ≈ 90° FOV, 1.5 ≈ 67° FOV, 2.0 ≈ 53° FOV
+- Normalizing with `iResolution.y` ensures circles don't distort
+
+### Step 2: Volume Boundary Intersection
+
+**What**: Compute distances `tmin`/`tmax` where the ray enters and exits the volume, limiting the marching range.
+
+**Why**: Avoids wasting samples in empty regions. Different volume shapes use different intersection methods.
+
+```glsl
+// --- Method A: Horizontal plane boundaries (cloud layers) ---
+float yBottom = -1.0; // Tunable: volume bottom Y coordinate
+float yTop    =  2.0; // Tunable: volume top Y coordinate
+float tmin = (yBottom - ro.y) / rd.y;
+float tmax = (yTop    - ro.y) / rd.y;
+if (tmin > tmax) { float tmp = tmin; tmin = tmp; tmax = tmin; tmin = tmp; }
+// In practice, handle edge cases like ray direction parallel to plane
+
+// --- Method B: Sphere boundary (explosions, fur balls, atmospheres) ---
+// Returns intersection distances of ray with sphere centered at origin with radius r
+vec2 intersectSphere(vec3 ro, vec3 rd, float r) {
+    float b = dot(ro, rd);
+    float c = dot(ro, ro) - r * r;
+    float d = b * b - c;
+    if (d < 0.0) return vec2(1e5, -1e5); // No hit
+    d = sqrt(d);
+    return vec2(-b - d, -b + d);
+}
+```
+
+**Selection guide**:
+- Use plane boundaries (Method A) for horizontally distributed volumes like cloud layers
+- Use sphere intersection (Method B) for spherical volumes like explosions or planetary atmospheres
+- AABB (axis-aligned bounding box) intersection can also be used for cuboid-shaped volumes
+
+### Step 3: Density Field Definition
+
+**What**: Define the medium density at each point in space. This is the most core and flexible part of volumetric rendering.
+
+**Why**: The density field determines the volume's shape, texture, and dynamic characteristics. Different density functions produce completely different visual effects.
+
+```glsl
+// 3D Value Noise (classic texture-lookup-based implementation)
+float noise(vec3 x) {
+    vec3 p = floor(x);
+    vec3 f = fract(x);
+    f = f * f * (3.0 - 2.0 * f); // smoothstep interpolation
+
+    vec2 uv = (p.xy + vec2(37.0, 239.0) * p.z) + f.xy;
+    vec2 rg = textureLod(iChannel0, (uv + 0.5) / 256.0, 0.0).yx;
+    return mix(rg.x, rg.y, f.z);
+}
+
+// fBM (Fractal Brownian Motion) — layering multiple frequency noises
+float fbm(vec3 p) {
+    float f = 0.0;
+    f += 0.50000 * noise(p); p *= 2.02;
+    f += 0.25000 * noise(p); p *= 2.03;
+    f += 0.12500 * noise(p); p *= 2.01;
+    f += 0.06250 * noise(p); p *= 2.02;
+    f += 0.03125 * noise(p);
+    return f;
+}
+
+// Cloud density function example
+float cloudDensity(vec3 p) {
+    vec3 q = p - vec3(0.0, 0.1, 1.0) * iTime; // Wind direction animation
+    float f = fbm(q);
+    // Use Y coordinate to limit cloud height range
+    return clamp(1.5 - p.y - 2.0 + 1.75 * f, 0.0, 1.0);
+}
+```
+
+**Density field design points**:
+- The `noise` function uses texture lookup (`iChannel0`) to implement 3D value noise, faster than pure arithmetic implementations
+- `fbm` layers 5 octaves of noise to produce natural fractal detail
+- Non-integer frequency multipliers (2.02, 2.03) break repetitiveness
+- In `cloudDensity`, `1.5 - p.y - 2.0` establishes a base density field that decreases with height
+- Time offset `iTime` produces a wind-blown effect
+
+### Step 4: Ray Marching Main Loop
+
+**What**: March along the ray from `tmin` to `tmax`, sampling density at each step and accumulating color and opacity.
+
+**Why**: This is the core loop of volumetric rendering. Step count and step size directly affect quality and performance.
+
+```glsl
+#define NUM_STEPS 64        // Tunable: march steps, more = finer
+#define STEP_SIZE 0.05      // Tunable: fixed step size (or use adaptive)
+
+vec4 raymarch(vec3 ro, vec3 rd, float tmin, float tmax, vec3 bgCol) {
+    vec4 sum = vec4(0.0); // rgb = accumulated color (premultiplied alpha), a = accumulated opacity
+
+    // Jitter starting position to eliminate banding artifacts
+    float t = tmin + STEP_SIZE * fract(sin(dot(fragCoord, vec2(12.9898, 78.233))) * 43758.5453);
+
+    for (int i = 0; i < NUM_STEPS; i++) {
+        if (t > tmax || sum.a > 0.99) break; // Early exit: out of range or fully opaque
+
+        vec3 pos = ro + t * rd;
+        float den = cloudDensity(pos);
+
+        if (den > 0.01) {
+            // --- Color and lighting (see Step 5) ---
+            vec4 col = vec4(1.0, 0.95, 0.8, den); // Placeholder color
+
+            // Opacity scaling
+            col.a *= 0.4; // Tunable: density scale factor
+            // Can also multiply by step size: col.a = min(col.a * 8.0 * dt, 1.0);
+
+            // Premultiply alpha and front-to-back compositing
+            col.rgb *= col.a;
+            sum += col * (1.0 - sum.a);
+        }
+
+        t += STEP_SIZE;
+        // Adaptive step variant: t += max(0.05, 0.02 * t);
+    }
+
+    return clamp(sum, 0.0, 1.0);
+}
+```
+
+**Key design decisions**:
+- **Steps vs step size**: fixed step count suits known volume sizes; fixed step size suits uncertain volume sizes
+- **Jittering**: without jittering, visible banding artifacts appear; adding pixel-dependent random offset converts banding into invisible noise
+- **Early exit condition**: `sum.a > 0.99` is one of the most important performance optimizations
+- **Density threshold**: `den > 0.01` skips empty regions, avoiding unnecessary lighting calculations
+- **Adaptive step size**: `max(0.05, 0.02 * t)` gives small steps up close (good detail) and large steps at distance (fast)
+
+### Step 5: Lighting Calculation
+
+**What**: Compute lighting color for each sample point within the volume.
+
+**Why**: Lighting is the determining factor for visual quality in volumetric rendering. Different lighting models suit different scenarios.
+
+```glsl
+// === Method A: Directional derivative lighting (simplest, single extra sample) ===
+// Classic directional derivative method, requires only 1 extra noise sample
+vec3 sundir = normalize(vec3(1.0, 0.0, -1.0)); // Tunable: sun direction
+float dif = clamp((den - cloudDensity(pos + 0.3 * sundir)) / 0.6, 0.0, 1.0);
+vec3 lin = vec3(1.0, 0.6, 0.3) * dif + vec3(0.91, 0.98, 1.05); // Sunlight color + sky light
+```
+
+**Method A details**: Estimates lighting by comparing density at the current point with an offset position along the light direction. The direction where density decreases indicates the light source. This is an approximate method — extremely fast but not very physically accurate. Suitable for stylized clouds or performance-critical scenarios.
+
+```glsl
+// === Method B: Volumetric shadow (secondary ray march) ===
+// Volumetric shadow (Frostbite-style)
+float volumetricShadow(vec3 from, vec3 lightDir) {
+    float shadow = 1.0;
+    float dt = 0.5;            // Tunable: shadow step size
+    float d = dt * 0.5;
+    for (int s = 0; s < 6; s++) { // Tunable: shadow steps (6-16)
+        vec3 pos = from + lightDir * d;
+        float muE = cloudDensity(pos);
+        shadow *= exp(-muE * dt); // Beer-Lambert
+        dt *= 1.3;               // Tunable: step size increase factor
+        d += dt;
+    }
+    return shadow;
+}
+```
+
+**Method B details**: For each sample point, performs a second ray march toward the light source, accumulating transmittance. This is the more physically accurate method but computationally expensive (each primary step requires an additional 6-16 shadow steps). The increasing step size (`dt *= 1.3`) is because distant regions contribute less to shadowing.
+
+```glsl
+// === Method C: Henyey-Greenstein phase function scattering ===
+float HenyeyGreenstein(float cosTheta, float g) {
+    float gg = g * g;
+    return (1.0 - gg) / pow(1.0 + gg - 2.0 * g * cosTheta, 1.5);
+}
+// Mix forward and backward scattering
+float sundotrd = dot(rd, -sundir);
+float scattering = mix(
+    HenyeyGreenstein(sundotrd, 0.8),   // Tunable: forward scattering g value
+    HenyeyGreenstein(sundotrd, -0.2),  // Tunable: backward scattering g value
+    0.5                                 // Tunable: blend ratio
+);
+```
+
+**Method C details**: The phase function describes the probability distribution of light scattering in different directions. The dual-lobe HG function mixes forward and backward scattering, simulating the cloud silver lining effect (forward scattering lobe) and dark-side volume definition (backward scattering lobe). Forward scattering with `g=0.8` makes the lit side very bright — an important visual characteristic of real clouds.
+
+### Step 6: Color Mapping
+
+**What**: Map density values to colors.
+
+**Why**: Different media (clouds, flames, explosions) require different coloring strategies.
+
+```glsl
+// === Method A: Density interpolation coloring (clouds) ===
+vec3 cloudColor = mix(vec3(1.0, 0.95, 0.8),   // Lit side color (tunable)
+                      vec3(0.25, 0.3, 0.35),   // Dark side color (tunable)
+                      den);
+```
+
+**Method A details**: Low density areas show bright color (near white, simulating thin cloud translucency), high density areas show dark color (gray-blue, simulating thick cloud light blocking). Simple and efficient.
+
+```glsl
+// === Method B: Radial gradient coloring (explosions, flames) ===
+vec3 computeColor(float density, float radius) {
+    vec3 result = mix(vec3(1.0, 0.9, 0.8),
+                      vec3(0.4, 0.15, 0.1), density);
+    vec3 colCenter = 7.0 * vec3(0.8, 1.0, 1.0);  // Tunable: core highlight color
+    vec3 colEdge = 1.5 * vec3(0.48, 0.53, 0.5);   // Tunable: edge color
+    result *= mix(colCenter, colEdge, min(radius / 0.9, 1.15));
+    return result;
+}
+```
+
+**Method B details**: Explosion/flame cores are extremely bright (HDR values > 1.0, multiplied by 7.0), while edges are darker. Both density and distance from center determine the color. The core color multiplied by 7.0 creates an overexposure effect that, combined with post-processing tone mapping, produces a searing heat look.
+
+```glsl
+// === Method C: Height-based ambient gradient (production-grade clouds) ===
+vec3 ambientLight = mix(
+    vec3(39., 67., 87.) * (1.5 / 255.),   // Bottom ambient color (tunable)
+    vec3(149., 167., 200.) * (1.5 / 255.), // Top ambient color (tunable)
+    normalizedHeight
+);
+```
+
+**Method C details**: Real cloud bottoms are darker blue (receiving ground reflection and sky scattering), while tops are brighter gray-blue (receiving more sky light). Using normalized height for interpolation produces a natural vertical gradient.
+
+### Step 7: Final Compositing and Post-Processing
+
+**What**: Blend volumetric rendering results with the background, applying tone mapping and post-processing.
+
+**Why**: Post-processing significantly affects final visual quality.
+
+```glsl
+// Background sky
+vec3 bgCol = vec3(0.6, 0.71, 0.75) - rd.y * 0.2 * vec3(1.0, 0.5, 1.0);
+float sun = clamp(dot(sundir, rd), 0.0, 1.0);
+bgCol += 0.2 * vec3(1.0, 0.6, 0.1) * pow(sun, 8.0); // Sun halo
+
+// Composite volume with background
+vec4 vol = raymarch(ro, rd, tmin, tmax, bgCol);
+vec3 col = bgCol * (1.0 - vol.a) + vol.rgb;
+
+// Sun flare
+col += vec3(0.2, 0.08, 0.04) * pow(sun, 3.0);
+
+// Tone mapping (simple smoothstep version)
+col = smoothstep(0.15, 1.1, col);
+
+// Optional: distance fog (inside the marching loop)
+// col.xyz = mix(col.xyz, bgCol, 1.0 - exp(-0.003 * t * t));
+
+// Optional: vignette
+float vignette = 0.25 + 0.75 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.1);
+col *= vignette;
+```
+
+**Post-processing details**:
+- **Sky gradient**: `rd.y` controls sky color variation from horizon to zenith
+- **Sun halo**: `pow(sun, 8.0)` produces a narrow, bright halo; higher exponent = narrower halo
+- **Sun flare**: `pow(sun, 3.0)` produces a wider warm-colored flare
+- **Distance fog**: `exp(-0.003 * t * t)` gradually blends distant volumes into the background
+- **Tone mapping**: `smoothstep(0.15, 1.1, col)` lifts shadows, compresses highlights, and increases contrast
+- **Vignette**: simulates lens vignette effect, guiding visual focus to the center of the frame
+
+## Variant Details
+
+### Variant 1: Emissive Volume (Flames/Explosions)
+
+**Difference from the base version**: No external light source; color is entirely determined by density and position. Density maps to emissive color.
+
+**Design concept**: Flames and explosions are self-luminous — no external lighting calculation needed. The core region is extremely bright (HDR), while edges are dim. Color is mapped through a combination of density and distance from center. Bloom effects are achieved by adding distance-attenuated light source contributions in the accumulation loop.
+
+**Key code**:
+```glsl
+// Replace lighting calculation with emissive color mapping
+vec3 emissionColor(float density, float radius) {
+    vec3 result = mix(vec3(1.0, 0.9, 0.8), vec3(0.4, 0.15, 0.1), density);
+    vec3 colCenter = 7.0 * vec3(0.8, 1.0, 1.0);
+    vec3 colEdge = 1.5 * vec3(0.48, 0.53, 0.5);
+    result *= mix(colCenter, colEdge, min(radius / 0.9, 1.15));
+    return result;
+}
+// Use bloom effect in the accumulation loop
+vec3 lightColor = vec3(1.0, 0.5, 0.25);
+sum.rgb += lightColor / exp(lDist * lDist * lDist * 0.08) / 30.0;
+```
+
+### Variant 2: Physical Scattering Atmosphere (Rayleigh + Mie)
+
+**Difference from the base version**: Uses nested ray marching to compute optical depth; separates Rayleigh and Mie scattering channels; uses precise Beer-Lambert transmittance.
+
+**Design concept**: Atmospheric scattering requires handling two scattering mechanisms separately:
+- **Rayleigh scattering**: wavelength-dependent (shorter wavelengths scatter more), producing the blue sky effect. Scattering coefficient proportional to λ⁻⁴.
+- **Mie scattering**: wavelength-independent, primarily caused by aerosols/large particles, producing the orange-red of sunsets and white halos around the sun.
+
+Density decreases exponentially with altitude, using different scale height parameters to control the altitude distribution of both scattering types. Nested ray marching (marching toward the sun for each sample point) computes optical depth for precise Beer-Lambert transmittance.
+
+**Key code**:
+```glsl
+// Atmospheric density decreases exponentially with altitude
+float density(vec3 p, float scaleHeight) {
+    return exp(-max(length(p) - R_INNER, 0.0) / scaleHeight);
+}
+// Nested ray march to compute optical depth
+float opticDepth(vec3 from, vec3 to, float scaleHeight) {
+    vec3 s = (to - from) / float(NUM_STEPS_LIGHT);
+    vec3 v = from + s * 0.5;
+    float sum = 0.0;
+    for (int i = 0; i < NUM_STEPS_LIGHT; i++) {
+        sum += density(v, scaleHeight);
+        v += s;
+    }
+    return sum * length(s);
+}
+// Rayleigh phase function
+float phaseRayleigh(float cc) { return (3.0 / 16.0 / PI) * (1.0 + cc); }
+// Combined Rayleigh + Mie
+vec3 scatter = sumRay * kRay * phaseRayleigh(cc) + sumMie * kMie * phaseMie(-0.78, c, cc);
+```
+
+### Variant 3: Frostbite Energy-Conserving Integration
+
+**Difference from the base version**: Uses an improved scattering integration formula that maintains energy conservation in strongly scattering media.
+
+**Design concept**: Naive Euler integration `S × dt` is inaccurate at large step sizes or in dense media. The Frostbite formula performs precise exponential integration for each step's scattering, ensuring that the sum of accumulated scattering and transmittance never exceeds the incident light regardless of step size. This is especially important for dense fog, volumetric lighting, and similar scenarios.
+
+**Key code**:
+```glsl
+// Replace naive integration with Frostbite formula
+vec3 S = evaluateLight(p) * sigmaS * phaseFunction() * volumetricShadow(p, lightPos);
+vec3 Sint = (S - S * exp(-sigmaE * dt)) / sigmaE; // Improved integration
+scatteredLight += transmittance * Sint;
+transmittance *= exp(-sigmaE * dt);
+```
+
+### Variant 4: Production-Grade Clouds (Horizon Zero Dawn Style)
+
+**Difference from the base version**: Uses Perlin-Worley noise textures instead of procedural noise; layered density modeling (base shape + detail erosion); dual-lobe HG phase function; temporal reprojection anti-aliasing.
+
+**Design concept**: Production-grade cloud rendering uses a layered approach:
+1. **Low-frequency shape layer** (`cloudMapBase`): uses Perlin-Worley 3D texture to define the rough cloud shape
+2. **Height gradient** (`cloudGradient`): controls density distribution with altitude based on cloud type (cumulus, stratus, etc.)
+3. **High-frequency detail layer** (`cloudMapDetail`): higher frequency noise erodes edges, adding detail
+4. **Coverage control** (`COVERAGE`): global parameter controlling the proportion of cloud coverage in the sky
+
+Temporal reprojection is key to the production-grade approach: each frame renders only 1/16 of pixels (checkerboard pattern), then reprojects results to the current frame. Combined with 95% historical frame blending, it achieves high-quality results with minimal marching steps.
+
+**Key code**:
+```glsl
+// Layered noise modeling
+float m = cloudMapBase(pos, norY);          // Low-frequency shape
+m *= cloudGradient(norY);                    // Height gradient
+m -= cloudMapDetail(pos) * dstrength * 0.225; // High-frequency detail erosion
+m = smoothstep(0.0, 0.1, m + (COVERAGE - 1.0));
+// Dual-lobe HG scattering
+float scattering = mix(
+    HenyeyGreenstein(sundotrd, 0.8),   // Forward
+    HenyeyGreenstein(sundotrd, -0.2),  // Backward
+    0.5
+);
+// Temporal reprojection (between Buffers)
+vec2 spos = reprojectPos(ro + rd * dist, iResolution.xy, iChannel1);
+vec4 ocol = texture(iChannel1, spos, 0.0);
+col = mix(ocol, col, 0.05); // 5% new frame + 95% history frame
+```
+
+### Variant 5: Gradient Normal Surface Lighting (Fur Ball / Volume Surface)
+
+**Difference from the base version**: Uses central differencing to compute gradient normals within the volume, then applies diffuse + specular lighting as if it were a surface. Suitable for volume objects with a clear "surface" feel (fur, translucent spheres).
+
+**Design concept**: Some volume objects (fur balls, fuzzy surfaces) are volumetric data but visually resemble surfaced objects. In this case, central differencing in the density field computes the gradient (the direction of fastest density change), which serves as the normal for traditional surface lighting models.
+
+- **Half-Lambert**: `dot(N, L) * 0.5 + 0.5` compresses the dark side range, simulating subsurface scattering
+- **Blinn-Phong**: provides specular reflection, adding material definition
+
+**Key code**:
+```glsl
+// Central differencing for normals
+vec3 furNormal(vec3 pos, float density) {
+    float eps = 0.01;
+    vec3 n;
+    n.x = sampleDensity(pos + vec3(eps, 0, 0)) - density;
+    n.y = sampleDensity(pos + vec3(0, eps, 0)) - density;
+    n.z = sampleDensity(pos + vec3(0, 0, eps)) - density;
+    return normalize(n);
+}
+// Half-Lambert diffuse + Blinn-Phong specular
+vec3 N = -furNormal(pos, density);
+float diff = max(0.0, dot(N, L) * 0.5 + 0.5);  // Half-Lambert
+float spec = pow(max(0.0, dot(N, H)), 50.0);     // Tunable: specular sharpness
+```
+
+## In-Depth Performance Optimization
+
+### 1. Early Ray Termination
+
+Immediately break from the loop when accumulated opacity exceeds a threshold (e.g., 0.99). This is the most important optimization — used by all analyzed shaders.
+
+**Effect**: For dense volumes (such as thick cloud layers), many rays can exit within 20-30 steps instead of completing all 80+ steps, achieving 2-4x performance improvement.
+
+### 2. LOD Noise
+
+Reduce the fBM octave count based on ray distance. Distant areas don't need high-frequency detail:
+```glsl
+int lod = 5 - int(log2(1.0 + t * 0.5));
+```
+
+**Effect**: Distant areas use only 2-3 fBM octaves (vs 5 up close), reducing noise sampling by 40-60%. Since distant pixels cover a larger spatial range, high-frequency detail wouldn't be visible anyway.
+
+### 3. Adaptive Step Size
+
+Small steps up close (fine detail), large steps at distance (speed):
+```glsl
+float dt = max(0.05, 0.02 * t);
+```
+
+**Effect**: Significantly reduces the number of distant steps without noticeably degrading near-field quality. However, abrupt step size changes may cause visual discontinuities.
+
+### 4. Dithering
+
+Add pixel-dependent random offset at the ray starting position to eliminate stepping banding artifacts:
+```glsl
+t += STEP_SIZE * hash(fragCoord);
+```
+
+**Note**: Dithering doesn't improve performance but significantly improves visual quality — converting visible banding artifacts into imperceptible high-frequency noise.
+
+### 5. Bounding Volume Clipping
+
+Only march within the interval where the ray intersects the volume (plane clipping, sphere intersection, AABB clipping).
+
+**Effect**: For volumes that occupy a small portion of the screen, many rays can skip marching entirely. Performance improvement depends on the volume's screen coverage area.
+
+### 6. Density Threshold Skip
+
+Skip lighting calculations when density is below a threshold (lighting is often the most expensive part):
+```glsl
+if (den > 0.01) { /* compute lighting and compositing */ }
+```
+
+**Effect**: Lighting calculations (especially secondary volumetric shadow marching) are the most time-consuming part. Skipping lighting for low-density regions saves significant computation.
+
+### 7. Minimal Shadow Step Count
+
+Volumetric self-shadow step counts can be far fewer than the main loop (6-16 steps suffice), with increasing step sizes to cover greater distances.
+
+**Reason**: Human eyes are less sensitive to shadow detail than to shape detail. 6 steps with 1.3x increasing step size can cover approximately 20 units of distance.
+
+### 8. Temporal Reprojection
+
+Reproject the previous frame's results to the current frame for blending, dramatically reducing the required marching steps per frame.
+
+**Typical configuration**: Using only 12 steps + 95% historical frame blending (`mix(oldColor, newColor, 0.05)`) can produce quality far exceeding 12-step single-frame rendering.
+
+**Caveats**:
+- Requires an additional Buffer for storing the historical frame
+- Fast motion may cause ghosting
+- Requires correct reprojection matrix handling for camera movement
+
+## Combination Suggestions
+
+### 1. SDF Terrain + Volumetric Clouds
+
+Render ground/mountains with SDF ray marching, then render cloud layers above using volumetric marching. The two mutually occlude through depth values.
+
+**Implementation points**:
+- Render SDF terrain first, recording hit depth
+- During volumetric marching, stop at the depth value (ground occludes clouds)
+- If the ray passes through the cloud layer before hitting the ground, march within the cloud interval and terminate at the ground
+
+### 2. Volumetric Fog + Scene Lighting
+
+Overlay volumetric fog on existing SDF/polygon scenes, applying `color = color * transmittance + scatteredLight` to already-rendered scenes.
+
+**Implementation points**:
+- After rendering the scene, march fog along the ray for each pixel
+- Accumulate fog scattering and transmittance
+- Final color = scene color × transmittance + fog scattered light
+
+### 3. Multi-Layer Volumes
+
+Different heights or regions use different density functions (e.g., high-altitude cumulus + low-altitude fog layer), each marched independently then composited.
+
+**Implementation points**:
+- Each layer has its own boundaries and density function
+- Can be processed in the same marching loop (checking which layer the current point is in), or marched separately then composited
+- Separate marching is more flexible but requires correct inter-layer occlusion handling
+
+### 4. Particle System + Volume
+
+Particles provide macro-scale motion and shape; volumetric rendering adds internal detail and lighting to particles.
+
+### 5. Post-Process Light Shafts (God Rays)
+
+After volumetric rendering, add light shaft effects using radial blur or screen-space ray marching to enhance volume definition.
+
+**Implementation points**:
+- In screen space, sample radially outward from the sun position, accumulating brightness
+- Or for each pixel, march a short distance along the light source direction, sampling occluder depth
+- Light shaft intensity is multiplied by the dot product of light direction and view direction to control visible angles
+
+### 6. Procedural Sky + Volumetric Clouds
+
+First render a procedural sky/atmospheric scattering as background, then overlay volumetric clouds on top. The transition between the two is achieved through distance fog for natural blending.
+
+**Implementation points**:
+- Use an atmospheric scattering model (Variant 2) or a simplified gradient model for the sky
+- Apply distance fog within the volumetric marching loop: `mix(litCol, bgCol, 1.0 - exp(-0.003 * t * t))`
+- Distant clouds naturally blend into the sky color, avoiding abrupt boundaries
--- a/skills/shader-dev/reference/voronoi-cellular-noise.md
+++ b/skills/shader-dev/reference/voronoi-cellular-noise.md
@@ -0,0 +1,486 @@
+# Voronoi & Cellular Noise — Detailed Reference
+
+This document is a detailed supplement to [SKILL.md](SKILL.md), containing prerequisites, step-by-step explanations, variant descriptions, performance analysis, and complete combination code.
+
+## Prerequisites
+
+- **GLSL Basic Syntax**: `vec2/vec3`, `floor/fract`, `dot`, `smoothstep` and other built-in functions
+- **Vector Math**: dot product, distance calculation, vector normalization
+- **Pseudo-Random Hash Function Concepts**: input coordinates -> pseudo-random values, deterministic but appearing random
+- **fBm (Fractional Brownian Motion) Basics**: multi-layer noise summation, used for advanced variants
+
+## Core Principles in Detail
+
+The essence of Voronoi noise is **spatial partitioning**: scatter a set of feature points across 2D/3D space, and each pixel belongs to the "cell" defined by its nearest feature point.
+
+**Core Algorithm Flow:**
+
+1. Divide space into an integer grid (`floor`), placing one randomly offset feature point in each grid cell
+2. For the current pixel, search all feature points in the surrounding 3x3 (2D) or 3x3x3 (3D) neighborhood
+3. Calculate the distance to each feature point, recording the nearest distance F1 (and optionally the second-nearest distance F2)
+4. Use F1, F2, or their combination (e.g., F2-F1) as the output value, mapping to color/height/shape
+
+**Key Mathematics:**
+- Distance metrics: Euclidean `length(r)` or `dot(r,r)` (squared distance, faster), Manhattan `abs(r.x)+abs(r.y)`, Chebyshev `max(abs(r.x), abs(r.y))`
+- Exact border distance (two-pass algorithm): `dot(0.5*(mr+r), normalize(r-mr))` (perpendicular bisector projection)
+- Rounded borders (harmonic mean): `1/(1/(d2-d1) + 1/(d3-d1))`
+
+## Implementation Steps — Detailed Explanation
+
+### Step 1: Hash Function — Generating Pseudo-Random Feature Points
+
+**What**: Define a hash function that maps 2D integer coordinates to a pseudo-random `vec2` in the [0,1] range.
+
+**Why**: Feature point positions within each grid cell need to be deterministic but appear random. Hash functions provide this "reproducible randomness". Different hash functions affect distribution uniformity and visual quality.
+
+**Code**:
+```glsl
+// Classic sin-dot hash (concise and efficient, suitable for most scenarios)
+vec2 hash2(vec2 p) {
+    p = vec2(dot(p, vec2(127.1, 311.7)),
+             dot(p, vec2(269.5, 183.3)));
+    return fract(sin(p) * 43758.5453);
+}
+
+// 3D version (for 3D Voronoi)
+vec3 hash3(vec3 p) {
+    float n = sin(dot(p, vec3(7.0, 157.0, 113.0)));
+    return fract(vec3(2097152.0, 262144.0, 32768.0) * n);
+}
+
+// High-quality integer hash (more uniform distribution, for production-grade noise)
+vec3 hash3_uint(vec3 p) {
+    uvec3 q = uvec3(ivec3(p)) * uvec3(1597334673U, 3812015801U, 2798796415U);
+    q = (q.x ^ q.y ^ q.z) * uvec3(1597334673U, 3812015801U, 2798796415U);
+    return vec3(q) / float(0xffffffffU);
+}
+```
+
+### Step 2: Grid Partitioning and Neighborhood Search — F1 Distance
+
+**What**: Split input coordinates into integer part (grid ID) and fractional part (position within cell), iterate over the 3x3 neighborhood to compute distances to all feature points, and find the nearest distance F1.
+
+**Why**: `floor/fract` discretizes continuous space into a grid. Since feature points are offset within the [0,1] range, the nearest point can only be in the current cell or its 8 neighbors, so a 3x3 search covers all cases.
+
+**Code**:
+```glsl
+// Basic 2D Voronoi — returns (F1 distance, cell ID)
+vec2 voronoi(vec2 x) {
+    vec2 n = floor(x);   // Current grid coordinate
+    vec2 f = fract(x);   // Offset within cell [0,1)
+
+    vec3 m = vec3(8.0);  // (min distance, corresponding hash value) — initialized to large value
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 g = vec2(float(i), float(j));       // Neighbor offset
+        vec2 o = hash2(n + g);                    // Feature point position in that cell [0,1)
+        vec2 r = g - f + o;                       // Vector from current pixel to that feature point
+        float d = dot(r, r);                      // Squared distance (avoids sqrt)
+
+        if (d < m.x) {
+            m = vec3(d, o);                       // Update nearest distance and cell ID
+        }
+    }
+
+    return vec2(sqrt(m.x), m.y + m.z);  // (distance, ID)
+}
+```
+
+### Step 3: F1 + F2 Tracking — Edge Detection
+
+**What**: Simultaneously record the nearest distance F1 and second-nearest distance F2 during the search, using F2-F1 to extract cell boundaries.
+
+**Why**: The value of F2-F1 is large inside cells (far from boundaries) and approaches 0 at cell junctions (two feature points equidistant). This is the most common Voronoi edge detection method.
+
+**Code**:
+```glsl
+// F1 + F2 Voronoi — returns vec2(F1, F2)
+vec2 voronoi_f1f2(vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+
+    vec2 res = vec2(8.0); // res.x = F1, res.y = F2
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 b = vec2(i, j);
+        vec2 r = b - f + hash2(p + b);
+        float d = dot(r, r); // Can substitute other distance metrics
+
+        if (d < res.x) {
+            res.y = res.x;   // Previous F1 becomes F2
+            res.x = d;       // Update F1
+        } else if (d < res.y) {
+            res.y = d;       // Update F2
+        }
+    }
+
+    res = sqrt(res);
+    return res;
+    // Edge value = res.y - res.x (F2 - F1)
+}
+```
+
+### Step 4: Exact Border Distance — Two-Pass Algorithm
+
+**What**: First pass finds the nearest feature point; second pass calculates the exact distance to all neighboring cell boundaries.
+
+**Why**: Simple F2-F1 is only an approximation of the boundary. For geometrically exact equidistant lines and smooth boundary rendering, the distance to the perpendicular bisector must be computed. The second pass requires a 5x5 search range to ensure geometric correctness.
+
+**Code**:
+```glsl
+// Exact border distance Voronoi — returns vec3(border distance, nearest point offset)
+vec3 voronoi_border(vec2 x) {
+    vec2 ip = floor(x);
+    vec2 fp = fract(x);
+
+    // === Pass 1: Find nearest feature point ===
+    vec2 mg, mr;
+    float md = 8.0;
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 g = vec2(float(i), float(j));
+        vec2 o = hash2(ip + g);
+        vec2 r = g + o - fp;
+        float d = dot(r, r);
+
+        if (d < md) {
+            md = d;
+            mr = r;    // Vector to nearest point
+            mg = g;    // Grid offset of nearest point
+        }
+    }
+
+    // === Pass 2: Calculate shortest distance to border ===
+    md = 8.0;
+
+    for (int j = -2; j <= 2; j++)
+    for (int i = -2; i <= 2; i++) {
+        vec2 g = mg + vec2(float(i), float(j));
+        vec2 o = hash2(ip + g);
+        vec2 r = g + o - fp;
+
+        // Skip self
+        if (dot(mr - r, mr - r) > 0.00001)
+            // Distance to perpendicular bisector = midpoint projected onto direction vector
+            md = min(md, dot(0.5 * (mr + r), normalize(r - mr)));
+    }
+
+    return vec3(md, mr);
+}
+```
+
+### Step 5: Feature Point Animation
+
+**What**: Make feature points move smoothly over time, producing organic dynamic effects.
+
+**Why**: Static Voronoi is suitable for texture maps, but real-time effects usually require animation. Using `sin(iTime + 6.2831*hash)` makes each point oscillate at a different phase while staying within the [0,1] range.
+
+**Code**:
+```glsl
+// Within the neighborhood search loop, replace static hash with animated version:
+vec2 o = hash2(n + g);
+o = 0.5 + 0.5 * sin(iTime + 6.2831 * o); // Animation: each point has a different phase
+vec2 r = g - f + o;
+```
+
+### Step 6: Coloring and Visualization
+
+**What**: Map Voronoi distance values to colors, rendering cell fills, border lines, and feature point markers.
+
+**Why**: Different mapping methods produce dramatically different visual effects. Distance values can be used directly as grayscale, or transformed into rich colors through palette functions.
+
+**Code**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // Must use iTime, otherwise the compiler optimizes away this uniform
+    float time = iTime * 1.0;
+    vec2 p = fragCoord.xy / iResolution.xy;
+    vec2 uv = p * SCALE; // SCALE controls cell density
+
+    // Compute Voronoi
+    vec2 c = voronoi(uv);
+    float dist = c.x;   // F1 distance
+    float id   = c.y;   // Cell ID
+
+    // --- Cell coloring (ID-driven palette) ---
+    vec3 col = 0.5 + 0.5 * cos(id * 6.2831 + vec3(0.0, 1.0, 2.0));
+
+    // --- Distance falloff (cell center bright, edges dark) ---
+    col *= clamp(1.0 - 0.4 * dist * dist, 0.0, 1.0);
+
+    // --- Border lines (draw black line when distance below threshold) ---
+    col -= (1.0 - smoothstep(0.08, 0.09, dist));
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Variant Detailed Descriptions
+
+### Variant 1: 3D Voronoi + fBm Fire
+
+Difference from base version: extends 2D Voronoi to 3D space, multi-layer fBm summation produces volumetric feel, combined with blackbody radiation palette for rendering fire/nebula.
+
+Key modified code:
+```glsl
+#define NUM_OCTAVES 5  // Tunable: fBm layer count
+
+vec3 hash3(vec3 p) {
+    float n = sin(dot(p, vec3(7.0, 157.0, 113.0)));
+    return fract(vec3(2097152.0, 262144.0, 32768.0) * n);
+}
+
+float voronoi3D(vec3 p) {
+    vec3 g = floor(p);
+    p = fract(p);
+    float d = 1.0;
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++)
+    for (int k = -1; k <= 1; k++) {
+        vec3 b = vec3(i, j, k);
+        vec3 r = b - p + hash3(g + b);
+        d = min(d, dot(r, r));
+    }
+    return d;
+}
+
+float fbmVoronoi(vec3 p) {
+    vec3 t = vec3(0.0, 0.0, p.z + iTime * 1.5);
+    float tot = 0.0, sum = 0.0, amp = 1.0;
+    for (int i = 0; i < NUM_OCTAVES; i++) {
+        tot += voronoi3D(p + t) * amp;
+        p *= 2.0;
+        t *= 1.5; // Time frequency differs from spatial frequency -> parallax effect
+        sum += amp;
+        amp *= 0.5;
+    }
+    return tot / sum;
+}
+
+// Blackbody radiation palette
+vec3 firePalette(float i) {
+    float T = 1400.0 + 1300.0 * i;
+    vec3 L = vec3(7.4, 5.6, 4.4);
+    L = pow(L, vec3(5.0)) * (exp(1.43876719683e5 / (T * L)) - 1.0);
+    return 1.0 - exp(-5e8 / L);
+}
+```
+
+### Variant 2: Rounded Borders (3rd-Order Voronoi)
+
+Difference from base version: simultaneously tracks F1, F2, and F3 (three nearest distances), using a harmonic mean formula to produce smoother, more uniform cell boundaries instead of standard Voronoi's sharp intersections.
+
+Key modified code:
+```glsl
+float voronoiRounded(vec2 p) {
+    vec2 g = floor(p);
+    p -= g;
+    vec3 d = vec3(1.0); // d.x=F1, d.y=F2, d.z=F3
+
+    for (int y = -1; y <= 1; y++)
+    for (int x = -1; x <= 1; x++) {
+        vec2 o = vec2(x, y);
+        o += hash2(g + o) - p;
+        float r = dot(o, o);
+
+        // Maintain top 3 nearest distances simultaneously
+        d.z = max(d.x, max(d.y, min(d.z, r))); // F3
+        d.y = max(d.x, min(d.y, r));             // F2
+        d.x = min(d.x, r);                       // F1
+    }
+
+    d = sqrt(d);
+
+    // Harmonic mean formula -> rounded borders
+    return min(2.0 / (1.0 / max(d.y - d.x, 0.001)
+                    + 1.0 / max(d.z - d.x, 0.001)), 1.0);
+}
+```
+
+### Variant 3: Voronoise (Unified Noise-Voronoi Framework)
+
+Difference from base version: through two parameters `u` (jitter amount) and `v` (smoothness), continuously interpolates between Cell Noise, Perlin Noise, and Voronoi. Uses weighted accumulation instead of `min()` operation, requiring a 5x5 search range.
+
+Key modified code:
+```glsl
+#define JITTER 1.0    // Tunable: 0=regular grid, 1=fully random
+#define SMOOTH 0.0    // Tunable: 0=sharp Voronoi, 1=smooth noise
+
+float voronoise(vec2 p, float u, float v) {
+    float k = 1.0 + 63.0 * pow(1.0 - v, 6.0); // Smoothness kernel
+
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+
+    vec2 a = vec2(0.0);
+    for (int y = -2; y <= 2; y++)
+    for (int x = -2; x <= 2; x++) {
+        vec2 g = vec2(x, y);
+        vec3 o = hash3(i + g) * vec3(u, u, 1.0); // u controls jitter
+        vec2 d = g - f + o.xy;
+        float w = pow(1.0 - smoothstep(0.0, 1.414, length(d)), k);
+        a += vec2(o.z * w, w); // Weighted accumulation
+    }
+
+    return a.x / a.y;
+}
+
+// hash3 needs to return vec3
+vec3 hash3(vec2 p) {
+    vec3 q = vec3(dot(p, vec2(127.1, 311.7)),
+                  dot(p, vec2(269.5, 183.3)),
+                  dot(p, vec2(419.2, 371.9)));
+    return fract(sin(q) * 43758.5453);
+}
+```
+
+### Variant 4: Crack Textures (Multi-Layer Recursive Voronoi)
+
+Difference from base version: uses extended jitter range to generate irregular cells, two-pass algorithm for exact boundaries, then overlays Perlin fBm perturbation on crack paths. Multi-layer recursion (rotation + scaling) produces fractal crack networks.
+
+Key modified code:
+```glsl
+#define CRACK_DEPTH 3.0    // Tunable: recursion depth
+#define CRACK_WIDTH 0.0    // Tunable: crack width
+#define CRACK_SLOPE 50.0   // Tunable: crack sharpness
+
+// Extended jitter range makes cell shapes more irregular
+float ofs = 0.5;
+#define disp(p) (-ofs + (1.0 + 2.0 * ofs) * hash2(p))
+
+// Main loop: multi-layer crack overlay
+vec4 O = vec4(0.0);
+vec2 U = uv;
+for (float i = 0.0; i < CRACK_DEPTH; i++) {
+    vec2 D = fbm22(U) * 0.67;           // fBm perturbation of crack paths
+    vec3 H = voronoiBorder(U + D);       // Exact border distance
+    float d = H.x;
+    d = min(1.0, CRACK_SLOPE * pow(max(0.0, d - CRACK_WIDTH), 1.0));
+    O += vec4(1.0 - d) / exp2(i);       // Layer weight decay
+    U *= 1.5 * rot(0.37);               // Rotate + scale into next layer
+}
+```
+
+### Variant 5: Tileable 3D Worley (Cloud Noise)
+
+Difference from base version: implements domain wrapping via `mod()` to generate seamlessly tileable 3D Worley noise. Combined with Perlin-Worley remapping for volumetric cloud rendering. Uses high-quality integer hash.
+
+Key modified code:
+```glsl
+#define TILE_FREQ 4.0  // Tunable: tiling frequency
+
+float worleyTileable(vec3 uv, float freq) {
+    vec3 id = floor(uv);
+    vec3 p = fract(uv);
+    float minDist = 1e4;
+
+    for (float x = -1.0; x <= 1.0; x++)
+    for (float y = -1.0; y <= 1.0; y++)
+    for (float z = -1.0; z <= 1.0; z++) {
+        vec3 offset = vec3(x, y, z);
+        // mod() implements domain wrapping -> seamless tiling
+        vec3 h = hash3_uint(mod(id + offset, vec3(freq))) * 0.5 + 0.5;
+        h += offset;
+        vec3 d = p - h;
+        minDist = min(minDist, dot(d, d));
+    }
+    return 1.0 - minDist; // Inverted Worley
+}
+
+// Worley fBm (GPU Pro 7 cloud approach)
+float worleyFbm(vec3 p, float freq) {
+    return worleyTileable(p * freq, freq) * 0.625
+         + worleyTileable(p * freq * 2.0, freq * 2.0) * 0.25
+         + worleyTileable(p * freq * 4.0, freq * 4.0) * 0.125;
+}
+
+// Perlin-Worley remapping
+float remap(float x, float a, float b, float c, float d) {
+    return (((x - a) / (b - a)) * (d - c)) + c;
+}
+// cloud = remap(perlinNoise, worleyFbm - 1.0, 1.0, 0.0, 1.0);
+```
+
+## Performance Optimization Details
+
+### 1. Avoid sqrt in Distance Comparisons
+
+Use `dot(r,r)` (squared distance) during the comparison phase, only taking `sqrt` for the final output. Saves 9 `sqrt` calls per pixel.
+
+### 2. Unroll 3D Voronoi Loops
+
+GPUs are not efficient with deeply nested loops. The 3x3x3 loop for 3D can be manually unrolled along the z-axis:
+```glsl
+// Instead of 3-level nesting, manually unroll z=-1, 0, 1
+for (int j = -1; j <= 1; j++)
+for (int i = -1; i <= 1; i++) {
+    b = vec3(i, j, -1); r = b - p + hash3(g+b); d = min(d, dot(r,r));
+    b.z = 0.0;          r = b - p + hash3(g+b); d = min(d, dot(r,r));
+    b.z = 1.0;          r = b - p + hash3(g+b); d = min(d, dot(r,r));
+}
+```
+
+### 3. Minimize Search Range
+
+- Basic F1: 3x3 is sufficient
+- Exact border / rounded border: second pass needs 5x5
+- Voronoise (smooth blending): needs 5x5 to cover kernel radius
+- Extended jitter (`ofs>0`): must use 5x5
+- Don't blindly use 5x5; searching 16 extra cells means 16 extra hash computations
+
+### 4. Hash Function Selection
+
+- `sin(dot(...))` hash: fastest, but insufficient precision on some GPUs
+- Texture lookup hash (`textureLod(iChannel0, ...)`): high quality but requires texture resources
+- Integer hash (`uvec3`): high quality without textures, but requires ES 3.0+
+
+### 5. Layer Count Control for Multi-Layer fBm
+
+Each additional fBm layer adds a complete Voronoi search. 3 layers usually provide sufficient detail, 5 layers is the visual upper limit, and beyond 5 layers is rarely worth the performance cost.
+
+## Combination Suggestions in Detail
+
+### 1. Voronoi + fBm Perturbation
+
+Use fBm noise to perturb Voronoi input coordinates, producing organic, irregular cell shapes (like stone textures, magma):
+```glsl
+vec2 distorted_uv = uv + 0.5 * fbm22(uv * 2.0);
+vec2 v = voronoi(distorted_uv * SCALE);
+```
+
+### 2. Voronoi + Bump Mapping
+
+Use Voronoi distance values as a height map, compute normals via finite differences for pseudo-3D bump effects:
+```glsl
+float h0 = voronoiRounded(uv);
+float hx = voronoiRounded(uv + vec2(0.004, 0.0));
+float hy = voronoiRounded(uv + vec2(0.0, 0.004));
+float bump = max(hx - h0, 0.0) * 16.0; // Simple bump value
+```
+
+### 3. Voronoi + Palette Mapping
+
+Use cell ID or distance values to drive the cosine palette, quickly producing rich procedural colors:
+```glsl
+vec3 palette(float t) {
+    return 0.5 + 0.5 * cos(6.2831 * (t + vec3(0.0, 0.33, 0.67)));
+}
+col = palette(cellId * 0.1 + iTime * 0.1);
+```
+
+### 4. Voronoi + Raymarching
+
+Use Voronoi distance as part of an SDF in raymarching scenes to sculpt cellular surface textures or crack effects.
+
+### 5. Multi-Scale Voronoi Stacking
+
+Compute multiple Voronoi layers at different frequencies and stack them for rich detail. Low-frequency layers control large structures, high-frequency layers add fine detail:
+```glsl
+float detail = voronoiRounded(uv * 6.0);       // Main structure
+float fine   = voronoiRounded(uv * 16.0) * 0.5; // Fine detail
+float result = detail + fine * detail;           // Stacking (detail modulated by main structure)
+```
--- a/skills/shader-dev/reference/voxel-rendering.md
+++ b/skills/shader-dev/reference/voxel-rendering.md
@@ -0,0 +1,701 @@
+# Voxel Rendering — Detailed Reference
+
+> This document is a detailed supplement to [SKILL.md](SKILL.md), covering prerequisites, step-by-step tutorials, mathematical derivations, and advanced usage.
+
+## Prerequisites
+
+### GLSL Fundamentals
+- GLSL basic syntax (uniforms, varyings, built-in functions)
+- Vector math: dot product, cross product, normalize, reflect
+- Understanding of step functions like `floor()`, `sign()`, `step()`
+
+### Ray-AABB Intersection (Ray-Box Intersection)
+The foundation of voxel rendering is ray tracing. You need to understand how a ray `P(t) = O + t * D` intersects with an axis-aligned bounding box (AABB). The DDA algorithm is essentially an extension of this test to the entire grid space.
+
+### Basic Lighting Models
+- Lambert diffuse: `diffuse = max(dot(normal, lightDir), 0.0)`
+- Phong specular: `specular = pow(max(dot(reflect(-lightDir, normal), viewDir), 0.0), shininess)`
+
+### SDF (Signed Distance Field) Basics
+An SDF function returns the signed distance from a point to the nearest surface (negative inside, positive outside). In voxel rendering, SDF is commonly used to define voxel occupancy: `d < 0.0` means occupied.
+
+Common SDF primitives:
+```glsl
+float sdSphere(vec3 p, float r) { return length(p) - r; }
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+```
+
+SDF boolean operations:
+- Union: `min(d1, d2)`
+- Intersection: `max(d1, d2)`
+- Subtraction: `max(d1, -d2)`
+
+## Implementation Steps
+
+### Step 1: Camera Ray Construction
+
+**What**: Convert each pixel coordinate into a world-space ray origin and direction.
+
+**Why**: Voxel rendering follows the ray tracing paradigm, with each pixel independently casting a ray. Screen coordinates must first be normalized to the [-1, 1] range, then transformed through camera parameters (focal length, plane vectors) to construct world-space ray directions.
+
+**Mathematical derivation**:
+1. `screenPos = (fragCoord.xy / iResolution.xy) * 2.0 - 1.0` normalizes pixel coordinates to [-1, 1]
+2. The z component of `cameraDir` controls focal length: larger values = smaller FOV (more "telephoto")
+3. `cameraPlaneV` is multiplied by aspect ratio correction to ensure square voxels aren't stretched
+4. Final ray direction = camera forward + screen offset, no normalization needed (the DDA algorithm handles it naturally)
+
+**Code**:
+```glsl
+vec2 screenPos = (fragCoord.xy / iResolution.xy) * 2.0 - 1.0;
+vec3 cameraDir = vec3(0.0, 0.0, 0.8);  // Tunable: focal length, larger = smaller FOV
+vec3 cameraPlaneU = vec3(1.0, 0.0, 0.0);
+vec3 cameraPlaneV = vec3(0.0, 1.0, 0.0) * iResolution.y / iResolution.x;
+vec3 rayDir = cameraDir + screenPos.x * cameraPlaneU + screenPos.y * cameraPlaneV;
+vec3 rayPos = vec3(0.0, 2.0, -12.0);  // Tunable: camera position
+```
+
+### Step 2: DDA Initialization
+
+**What**: Compute the initial parameters needed for grid traversal by the ray.
+
+**Why**: The DDA algorithm requires precomputing the step direction, step cost, and distance to the first boundary for each axis. These values are incrementally updated throughout traversal, avoiding per-step division.
+
+**Key variable details**:
+
+- **`mapPos = floor(rayPos)`**: grid coordinate of the cell containing the ray origin. `floor()` discretizes continuous coordinates to the integer grid.
+
+- **`rayStep = sign(rayDir)`**: step direction for each axis. `sign()` returns +1 or -1, determining whether the ray advances in the positive or negative direction on that axis.
+
+- **`deltaDist = abs(1.0 / rayDir)`**: the t cost for the ray to traverse one full grid cell on each axis. If the ray is normalized (length=1), use `1.0/rayDir` directly; when unnormalized, it's equivalent to `abs(vec3(length(rayDir)) / rayDir)`.
+
+- **`sideDist`**: the t distance from the ray origin to the next grid boundary on each axis. The formula `(sign(rayDir) * (mapPos - rayPos) + sign(rayDir) * 0.5 + 0.5) * deltaDist` computes the distance ratio from the ray origin to the next boundary on that axis, then multiplies by deltaDist to get the actual t value.
+
+**Code**:
+```glsl
+ivec3 mapPos = ivec3(floor(rayPos));        // Current grid coordinate
+vec3 rayStep = sign(rayDir);                 // Step direction per axis (+1/-1)
+vec3 deltaDist = abs(1.0 / rayDir); // t cost to traverse one cell (ray already normalized)
+// Initial t distance to next boundary
+vec3 sideDist = (sign(rayDir) * (vec3(mapPos) - rayPos) + (sign(rayDir) * 0.5) + 0.5) * deltaDist;
+```
+
+### Step 3: DDA Traversal Loop (Branchless Version)
+
+**What**: Traverse the grid cell by cell, checking for hits.
+
+**Why**: The branchless version uses `lessThanEqual` + `min` vector comparisons to determine the minimum axis in one pass, avoiding nested if-else statements and improving GPU efficiency (reduces warp divergence).
+
+**Algorithm logic**:
+1. Each iteration first checks if the current cell is occupied
+2. If no hit, find the axis corresponding to the smallest component in `sideDist`
+3. `lessThanEqual(sideDist.xyz, min(sideDist.yzx, sideDist.zxy))` generates a bvec3 where the minimum axis is true
+4. Add `deltaDist` to that axis's `sideDist`, and add `rayStep` to `mapPos`
+5. `mask` records the axis of the last step, used later for normal calculation
+
+**Code**:
+```glsl
+#define MAX_RAY_STEPS 64  // Tunable: maximum traversal steps, affects maximum view distance
+
+bvec3 mask;
+for (int i = 0; i < MAX_RAY_STEPS; i++) {
+    if (getVoxel(mapPos)) break;  // Hit detection
+
+    // Branchless axis selection: choose the axis with smallest sideDist
+    mask = lessThanEqual(sideDist.xyz, min(sideDist.yzx, sideDist.zxy));
+
+    sideDist += vec3(mask) * deltaDist;
+    mapPos += ivec3(vec3(mask)) * ivec3(rayStep);
+}
+```
+
+**Alternative form (step version, common in compact demos)**:
+```glsl
+vec3 mask = step(sideDist.xyz, sideDist.yzx) * step(sideDist.xyz, sideDist.zxy);
+sideDist += mask * deltaDist;
+mapPos += mask * rayStep;
+```
+
+`step(a, b)` returns `a <= b ? 1.0 : 0.0`; multiplying two steps is equivalent to "this axis is simultaneously <= both other axes," i.e., it is the minimum axis.
+
+### Step 4: Voxel Occupancy Function
+
+**What**: Determine whether a given grid coordinate is occupied.
+
+**Why**: This is the sole "scene definition" interface. By replacing this function, you can generate voxel worlds from any data source — procedural SDF, heightmaps, noise, etc. This design completely decouples scene content from the rendering algorithm.
+
+**Design points**:
+- Input is integer grid coordinates; add 0.5 to get the voxel center point
+- Returns a boolean (simple version) or material ID (advanced version)
+- Can use any combination of SDFs, noise functions, or texture sampling internally
+- Performance-critical: this function is called once per DDA step, so keep it concise
+
+**Code**:
+```glsl
+// Basic version: solid cube (use this when user requests a "voxel cube")
+// NOTE: getVoxel receives ivec3, but internal calculations must all use float!
+bool getVoxel(ivec3 c) {
+    vec3 p = vec3(c) + vec3(0.5);  // ivec3 → vec3 conversion (required!)
+    float d = sdBox(p, vec3(6.0));  // Solid 12x12x12 block
+    return d < 0.0;
+}
+
+// SDF boolean version: sphere carving out a block (keeping only edges)
+bool getVoxelCarved(ivec3 c) {
+    vec3 p = vec3(c) + vec3(0.5);
+    float d = max(-sdSphere(p, 7.5), sdBox(p, vec3(6.0)));  // box ∩ ¬sphere
+    return d < 0.0;
+}
+
+// Advanced version: heightmap terrain with material IDs
+// NOTE: Two correct approaches:
+// Approach 1: Use vec3 parameter (recommended)
+int getVoxelMaterial(vec3 c) {
+    float height = getTerrainHeight(c.xz);
+    if (c.y < height) return 1;       // Ground (c.y is float)
+    if (c.y < height + 4.0) return 7;  // Tree trunk
+    return 0;                          // Air
+}
+
+// Approach 2: Use ivec3 parameter (requires explicit conversion)
+int getVoxelMaterial(ivec3 c) {
+    vec3 p = vec3(c);  // ivec3 → vec3 conversion (required!)
+    float height = getTerrainHeight(p.xz);
+    if (float(c.y) < height) return 1;       // int → float comparison
+    if (float(c.y) < height + 4.0) return 7; // int → float comparison
+    return 0;
+}
+```
+
+### Step 5: Face Shading (Normal + Base Color)
+
+**What**: Assign different brightness levels to different faces based on the hit face's normal direction.
+
+**Why**: This is the simplest voxel shading approach — three distinct face brightnesses produce the classic "Minecraft-style" visual effect. No additional lighting calculations needed; face orientation alone provides differentiation.
+
+**Principle**:
+- `mask` records the axis of the last DDA step
+- Normal = reverse direction of the step axis: `-mask * rayStep`
+- X-axis faces (sides) are darkest, Y-axis faces (top/bottom) brightest, Z-axis faces (front/back) medium brightness
+- This fixed three-value shading simulates basic lighting under overhead illumination
+
+**Code**:
+```glsl
+// Face normal derived directly from mask
+vec3 normal = -vec3(mask) * rayStep;
+
+// Three faces with different brightness
+vec3 color;
+if (mask.x) color = vec3(0.5);   // Side face (X axis) darkest
+if (mask.y) color = vec3(1.0);   // Top face (Y axis) brightest
+if (mask.z) color = vec3(0.75);  // Front/back face (Z axis) medium
+
+fragColor = vec4(color, 1.0);
+```
+
+### Step 6: Precise Hit Position and Face UV
+
+**What**: Compute the precise intersection point of the ray with the voxel surface, and the UV coordinates within that face.
+
+**Why**: The precise intersection point is used for texture mapping and AO interpolation, rather than just grid coordinates. Face UV provides continuous coordinates (0 to 1) within a single voxel face — the basis for texture mapping and smooth AO.
+
+**Mathematical derivation**:
+1. `sideDist - deltaDist` steps back to get the t value of the hit face
+2. `dot(sideDist - deltaDist, mask)` selects the hit axis's t
+3. `hitPos = rayPos + rayDir * t` gives the precise intersection point
+4. `uvw = hitPos - mapPos` gives voxel-local coordinates [0,1]^3
+5. UV is obtained by projecting uvw onto the two tangent axes of the hit face:
+   - If X face is hit, UV = (uvw.y, uvw.z)
+   - If Y face is hit, UV = (uvw.z, uvw.x)
+   - If Z face is hit, UV = (uvw.x, uvw.y)
+   - `dot(mask * uvw.yzx, vec3(1.0))` cleverly uses mask to select the correct components
+
+**Code**:
+```glsl
+// Precise t value: step back one step using sideDist
+float t = dot(sideDist - deltaDist, vec3(mask));
+vec3 hitPos = rayPos + rayDir * t;
+
+// Face UV (for texturing, AO interpolation)
+vec3 uvw = hitPos - vec3(mapPos);  // Voxel-local coordinates [0,1]^3
+vec2 uv = vec2(dot(vec3(mask) * uvw.yzx, vec3(1.0)),
+               dot(vec3(mask) * uvw.zxy, vec3(1.0)));
+```
+
+### Step 7: Neighbor Voxel Ambient Occlusion (AO)
+
+**What**: Sample the 8 neighboring voxels around the hit face (4 edges + 4 corners), compute an occlusion value for each vertex, then bilinearly interpolate.
+
+**Why**: This is the core technique for Minecraft-style smooth lighting. When neighboring voxels are present at edges or corners, those vertex areas should appear darker. This AO requires no additional ray tracing — it's entirely based on neighbor queries, with low computational cost and good results.
+
+**Algorithm details**:
+1. For each vertex of the hit face, check the adjacent 2 edges and 1 corner
+2. `vertexAo(side, corner)` formula: `(side.x + side.y + max(corner, side.x * side.y)) / 3.0`
+   - `side.x * side.y`: when both edges are occupied, even if the corner is empty, there should be full occlusion (prevents light leaking)
+   - `max(corner, side.x * side.y)`: takes the larger of the corner and edge product
+3. Store the 4 vertex AO values in a vec4
+4. Bilinearly interpolate using the face UV for a continuous AO value
+5. `pow(ao, gamma)` controls AO contrast
+
+**Code**:
+```glsl
+// Per-vertex AO: two edges + one corner
+float vertexAo(vec2 side, float corner) {
+    return (side.x + side.y + max(corner, side.x * side.y)) / 3.0;
+}
+
+// Sample AO for 4 vertices of a face
+vec4 voxelAo(vec3 pos, vec3 d1, vec3 d2) {
+    vec4 side = vec4(
+        getVoxel(pos + d1), getVoxel(pos + d2),
+        getVoxel(pos - d1), getVoxel(pos - d2));
+    vec4 corner = vec4(
+        getVoxel(pos + d1 + d2), getVoxel(pos - d1 + d2),
+        getVoxel(pos - d1 - d2), getVoxel(pos + d1 - d2));
+    vec4 ao;
+    ao.x = vertexAo(side.xy, corner.x);
+    ao.y = vertexAo(side.yz, corner.y);
+    ao.z = vertexAo(side.zw, corner.z);
+    ao.w = vertexAo(side.wx, corner.w);
+    return 1.0 - ao;
+}
+
+// Bilinear interpolation using face UV
+vec4 ambient = voxelAo(mapPos - rayStep * mask, mask.zxy, mask.yzx);
+float ao = mix(mix(ambient.z, ambient.w, uv.x), mix(ambient.y, ambient.x, uv.x), uv.y);
+ao = pow(ao, 1.0 / 3.0);  // Tunable: gamma correction controls AO intensity
+```
+
+### Step 8: DDA Shadow Ray
+
+**What**: Cast a second DDA ray from the hit point toward the light source to detect occlusion.
+
+**Why**: Reusing the same DDA algorithm achieves hard shadows without requiring additional ray tracing infrastructure. Shadow rays typically use fewer steps (e.g., 16-32) to save performance.
+
+**Implementation details**:
+- The origin must be offset by `normal * 0.01` to avoid self-intersection
+- Shadow rays only need to determine 0/1 occlusion (hard shadows), no precise intersection needed
+- Returns 0.0 (occluded) or 1.0 (unoccluded)
+- Step count can be lower than the primary ray since only occlusion detection is needed
+
+**Code**:
+```glsl
+#define MAX_SHADOW_STEPS 32  // Tunable: shadow ray steps
+
+float castShadow(vec3 ro, vec3 rd) {
+    vec3 pos = floor(ro);
+    vec3 ri = 1.0 / rd;
+    vec3 rs = sign(rd);
+    vec3 dis = (pos - ro + 0.5 + rs * 0.5) * ri;
+
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        if (getVoxel(ivec3(pos))) return 0.0;  // Occluded
+        vec3 mm = step(dis.xyz, dis.yzx) * step(dis.xyz, dis.zxy);
+        dis += mm * rs * ri;
+        pos += mm * rs;
+    }
+    return 1.0;  // Unoccluded
+}
+
+// Usage during shading
+vec3 sundir = normalize(vec3(-0.5, 0.6, 0.7));
+float shadow = castShadow(hitPos + normal * 0.01, sundir);
+float diffuse = max(dot(normal, sundir), 0.0) * shadow;
+```
+
+## Variant Details
+
+### Variant 1: Glowing Voxels (Glow Accumulation)
+
+**Difference from the base version**: During DDA traversal, accumulates a distance-based glow value at each step, producing a semi-transparent glow effect even without a hit.
+
+**Use cases**: Neon light effects, energy fields, particle clouds, sci-fi style
+
+**Principle**: Using the SDF distance field, glow contribution is large near the voxel surface (small distance → large 1/d²) and small far away. Accumulating contributions from all steps produces a continuous glow field.
+
+**Key parameters**:
+- `0.015`: glow intensity coefficient — larger = brighter
+- `0.01`: minimum distance threshold — prevents division by zero and controls glow "sharpness"
+- Glow color `vec3(0.4, 0.6, 1.0)`: can vary based on distance or material
+
+**Code**:
+```glsl
+float glow = 0.0;
+for (int i = 0; i < MAX_RAY_STEPS; i++) {
+    float d = sdSomeShape(vec3(mapPos));  // Distance to nearest surface
+    glow += 0.015 / (0.01 + d * d);      // Tunable: glow falloff
+    if (d < 0.0) break;
+    // ... normal DDA stepping ...
+}
+vec3 col = baseColor + glow * vec3(0.4, 0.6, 1.0); // Overlay glow color
+```
+
+### Variant 2: Rounded Voxels (Intra-Voxel SDF Refinement)
+
+**Difference from the base version**: After DDA hit, performs a few SDF ray march steps inside the voxel, rendering rounded blocks instead of perfect cubes.
+
+**Use cases**: Organic-style voxels, building block/LEGO effects, chibi characters
+
+**Principle**: After DDA hit, we know which voxel the ray entered, but the precise shape inside is defined by the SDF. Starting SDF ray marching from the voxel entry point, using `sdRoundedBox` to define a rounded cube, marching to the surface yields the precise rounded intersection and normal.
+
+**Key parameters**:
+- `w` (corner radius): 0.0 = perfect cube, 0.5 = sphere
+- 6 internal march steps are typically sufficient for convergence
+- `hash31(mapPos)` randomizes the corner radius per voxel, adding variety
+
+**Code**:
+```glsl
+// Refine inside the voxel after DDA hit
+float id = hash31(mapPos);
+float w = 0.05 + 0.35 * id;  // Tunable: corner radius
+
+float sdRoundedBox(vec3 p, float w) {
+    return length(max(abs(p) - 0.5 + w, 0.0)) - w;
+}
+
+// Start 6-step SDF march from voxel entry
+vec3 localP = hitPos - mapPos - 0.5;
+for (int j = 0; j < 6; j++) {
+    float h = sdRoundedBox(localP, w);
+    if (h < 0.025) break;  // Hit rounded surface
+    localP += rd * max(0.0, h);
+}
+```
+
+### Variant 3: Hybrid SDF-Voxel Traversal
+
+**Difference from the base version**: Uses SDF sphere-tracing (large steps) when far from surfaces, switching to precise DDA voxel traversal when close. Greatly improves traversal efficiency in open areas.
+
+**Use cases**: Large open worlds, long-distance voxel terrain, scenes requiring high view distance
+
+**Principle**:
+1. In open areas far from any voxel surface, SDF values are large, allowing sphere-tracing to skip large distances in one step
+2. When the SDF value approaches `sqrt(3) * voxelSize` (voxel diagonal length), we may be about to enter a voxel region
+3. Switch to DDA to ensure no voxels are skipped
+4. If DDA finds the ray has left the dense region (SDF value increases again), switch back to sphere-tracing
+
+**Key parameters**:
+- `VOXEL_SIZE`: voxel dimensions
+- `SWITCH_DIST = VOXEL_SIZE * 1.732`: switching threshold, sqrt(3) is the voxel diagonal safety factor
+
+**Code**:
+```glsl
+#define VOXEL_SIZE 0.0625       // Tunable: voxel size
+#define SWITCH_DIST (VOXEL_SIZE * 1.732)  // sqrt(3) * voxelSize
+
+bool useVoxel = false;
+for (int i = 0; i < MAX_STEPS; i++) {
+    vec3 pos = ro + rd * t;
+    float d = mapSDF(useVoxel ? voxelCenter : pos);
+
+    if (!useVoxel) {
+        t += d;
+        if (d < SWITCH_DIST) {
+            useVoxel = true;              // Switch to DDA
+            voxelPos = getVoxelPos(pos);
+        }
+    } else {
+        if (d < 0.0) { /* hit */ break; }
+        if (d > SWITCH_DIST) {
+            useVoxel = false;             // Switch back to SDF
+            t += d;
+            continue;
+        }
+        // DDA step one cell
+        vec3 exitT = (voxelPos - ro * ird + ird * VOXEL_SIZE * 0.5);
+        // ... select minimum axis and advance ...
+    }
+}
+```
+
+### Variant 4: Voxel Cone Tracing
+
+**Difference from the base version**: Builds a multi-level mipmap hierarchy of voxels (e.g., 64→32→16→8→4→2), casts cone-shaped rays from hit points, samples coarser LOD levels as distance increases, achieving diffuse/specular global illumination.
+
+**Use cases**: High-quality global illumination, colored indirect lighting, real-time GI for dynamic scenes
+
+**Principle**:
+1. Precompute mipmap levels of voxel data (resolution halved per level)
+2. Cast multiple cone-shaped rays from the hit point across the normal hemisphere (typically 5-7 cones)
+3. Each cone's diameter increases linearly with distance during traversal
+4. Diameter maps to mipmap level: `lod = log2(diameter)`
+5. Sample the corresponding mipmap level
+6. Front-to-back compositing accumulates lighting and occlusion
+
+**Key parameters**:
+- `coneRatio`: cone angle — diffuse uses wide cones (~1.0), specular uses narrow cones (~0.1)
+- 58 steps is a common balance value
+- `voxelFetch(sp, lod)` requires a custom mipmap query function
+
+**Code**:
+```glsl
+// Cone tracing: cast a cone-shaped ray along direction d
+vec4 traceCone(vec3 origin, vec3 dir, float coneRatio) {
+    vec4 light = vec4(0.0);
+    float t = 1.0;
+    for (int i = 0; i < 58; i++) {
+        vec3 sp = origin + dir * t;
+        float diameter = max(1.0, t * coneRatio);  // Cone diameter
+        float lod = log2(diameter);                  // Corresponding mipmap level
+        vec4 sample = voxelFetch(sp, lod);           // LOD sample
+        light += sample * (1.0 - light.w);           // Front-to-back compositing
+        t += diameter;
+    }
+    return light;
+}
+```
+
+### Variant 5: PBR Lighting + Multi-Bounce Reflections
+
+**Difference from the base version**: Uses GGX BRDF instead of Lambert, supports metallic/roughness material parameters, and casts a second DDA ray for reflections.
+
+**Use cases**: Realistic voxel rendering, metallic/glass materials, architectural visualization
+
+**Principle**:
+1. GGX (Trowbridge-Reitz) microfacet model provides physically correct light distribution
+2. Roughness parameter controls specular sharpness: 0.0 = perfect mirror, 1.0 = fully diffuse
+3. Schlick Fresnel approximation: `F = F0 + (1 - F0) * (1 - cos(theta))^5`
+4. Reflection ray reuses the `castRay` function with reduced step count (64 steps typically sufficient)
+5. Multi-bounce reflections can call recursively, but 1-2 bounces usually suffice
+
+**Key parameters**:
+- `roughness`: roughness [0, 1]
+- `F0 = 0.04`: base reflectance for non-metals
+- 64 steps for reflection ray (fewer than primary ray to save performance)
+
+**Code**:
+```glsl
+// GGX diffuse term
+float ggxDiffuse(float NoL, float NoV, float LoH, float roughness) {
+    float FD90 = 0.5 + 2.0 * roughness * LoH * LoH;
+    float a = 1.0 + (FD90 - 1.0) * pow(1.0 - NoL, 5.0);
+    float b = 1.0 + (FD90 - 1.0) * pow(1.0 - NoV, 5.0);
+    return a * b / 3.14159;
+}
+
+// Reflection ray - needs a separate shading function to handle HitInfo
+vec3 shadeHit(HitInfo h, vec3 rd, vec3 sunDir, vec3 skyColor) {
+    if (!h.hit) return skyColor;
+    vec3 matCol = getMaterialColor(h.mat, h.uv);
+    float diff = max(dot(h.normal, sunDir), 0.0);
+    return matCol * diff;
+}
+
+vec3 rd2 = reflect(rd, normal);
+HitInfo reflHit = castRay(hitPos + normal * 0.001, rd2, 64);
+vec3 reflColor = shadeHit(reflHit, rd2, sunDir, skyColor);
+
+// Schlick Fresnel blending
+float fresnel = 0.04 + 0.96 * pow(1.0 - max(dot(normal, -rd), 0.0), 5.0);
+col += fresnel * reflColor;
+```
+
+## In-Depth Performance Optimization
+
+### Main Bottlenecks
+
+1. **DDA Loop Step Count**: Each pixel needs to traverse tens to hundreds of cells — the largest performance cost. Step count is proportional to scene size and openness.
+
+2. **Voxel Query Function**: `getVoxel()` is called once per step; if using noise/textures, texture fetch overhead is significant. The complexity of procedural SDF functions directly impacts frame rate.
+
+3. **AO Neighbor Sampling**: Each hit point requires 8 additional `getVoxel()` queries. Manageable for simple scenes, but with a complex `getVoxel`, these 8 queries may exceed the main traversal cost.
+
+4. **Shadow Rays**: Equivalent to a second full DDA traversal. Dual traversal doubles the pixel shader burden.
+
+### Optimization Techniques
+
+#### Early Exit
+Break immediately when `mapPos` exceeds scene boundaries, avoiding continued traversal in meaningless space:
+```glsl
+if (any(lessThan(mapPos, vec3(-GRID_SIZE))) || any(greaterThan(mapPos, vec3(GRID_SIZE)))) break;
+```
+
+#### Reduce Shadow Steps
+Shadow rays only need to determine occlusion — 16-32 steps usually suffice. No need for the same step count as the primary ray:
+```glsl
+#define MAX_SHADOW_STEPS 32  // Instead of MAX_RAY_STEPS of 128
+```
+
+#### Distance-Based Quality Scaling
+Use high step counts for precise traversal up close, low step counts or LOD at distance. Dynamically adjust the step limit based on screen pixel size.
+
+#### Hybrid Traversal
+Use SDF sphere-tracing for large steps in open areas, switching to DDA near surfaces (see Variant 3). Can reduce traversal steps by 80%+ in large scenes.
+
+#### Avoid Complex Computation Inside the Loop
+Material queries, AO, normals, etc. are all done only after a hit. The traversal loop should only perform the simplest occupancy detection.
+
+#### Leverage GPU Texture Hardware
+Replace procedural voxel queries with texture sampling (`texelFetch`). 3D textures can store precomputed voxel data and are cache-friendly on hardware.
+
+#### Temporal Accumulation
+Multi-frame accumulation — each frame only needs a small number of samples, combined with reprojection for low-noise results. Suitable for scenarios requiring many rays (GI, soft shadows).
+
+## Complete Combination Code Examples
+
+### Procedural Noise Terrain
+Use FBM/Perlin noise inside `getVoxel()` to generate heightmaps, producing Minecraft-style infinite terrain:
+```glsl
+// Recommended approach: use vec3 parameter (simple, no type conversion issues)
+int getVoxel(vec3 c) {
+    // FBM noise heightmap
+    float height = 0.0;
+    float amp = 8.0;
+    float freq = 0.05;
+    vec2 xz = c.xz;
+    for (int i = 0; i < 4; i++) {
+        height += amp * noise(xz * freq);
+        amp *= 0.5;
+        freq *= 2.0;
+    }
+
+    if (c.y > height) return 0;           // Air
+    if (c.y > height - 1.0) return 1;     // Grass
+    if (c.y > height - 4.0) return 2;     // Dirt
+    return 3;                              // Stone
+}
+
+// ivec3 parameter version (requires type conversion)
+int getVoxel(ivec3 c) {
+    vec3 p = vec3(c);  // ivec3 → vec3 conversion
+    float height = 0.0;
+    float amp = 8.0;
+    float freq = 0.05;
+    // NOTE: p.xz returns vec2, must pass vec2 version of noise!
+    // If noise only has vec3 version, use noise(vec3(p.xz * freq, 0.0))
+    vec2 xz = p.xz;
+    for (int i = 0; i < 4; i++) {
+        height += amp * noise(xz * freq);
+        amp *= 0.5;
+        freq *= 2.0;
+    }
+
+    if (float(c.y) > height) return 0;           // int → float comparison
+    if (float(c.y) > height - 1.0) return 1;    // int → float comparison
+    if (float(c.y) > height - 4.0) return 2;    // int → float comparison
+    return 3;
+}
+```
+
+### Texture Mapping
+Sample textures using face UV after hit, achieving a retro pixel art style:
+```glsl
+// During the shading stage
+vec2 texUV = hit.uv;
+// 16x16 pixel texture atlas
+int tileX = mat % 4;
+int tileY = mat / 4;
+vec2 atlasUV = (vec2(tileX, tileY) + texUV) / 4.0;
+vec3 texCol = texture(iChannel0, atlasUV).rgb;
+col *= texCol;
+```
+
+### Atmospheric Scattering / Volumetric Fog
+Accumulate medium density during DDA traversal, achieving volumetric lighting and fog effects:
+```glsl
+float fogAccum = 0.0;
+vec3 fogColor = vec3(0.0);
+for (int i = 0; i < MAX_RAY_STEPS; i++) {
+    // ... DDA stepping ...
+    float density = getDensity(mapPos);  // Atmospheric density
+    if (density > 0.0) {
+        float dt = length(vec3(mask) * deltaDist);  // Current step size
+        fogAccum += density * dt;
+        // Volumetric light: compute lighting within fog
+        float shadowInFog = castShadow(vec3(mapPos) + 0.5, sunDir);
+        fogColor += density * dt * shadowInFog * sunColor * exp(-fogAccum);
+    }
+    if (getVoxel(mapPos) > 0) break;
+}
+// Apply fog effect
+col = col * exp(-fogAccum) + fogColor;
+```
+
+### Water Surface Rendering (Voxel Water Scene)
+A complete voxel water scene with surface wave reflections, underwater refraction, sand, and seaweed:
+```glsl
+float waterY = 0.0;
+
+// Underwater voxel scene (sand + seaweed)
+// IMPORTANT: c.xz returns vec2, which only has .x/.y components — never use .z!
+int getVoxel(vec3 c) {
+    float sandHeight = -3.0 + 0.5 * sin(c.x * 0.3) * cos(c.z * 0.4);
+    if (c.y < sandHeight) return 1;      // Sand interior
+    if (c.y < sandHeight + 1.0) return 2; // Sand surface
+    // Seaweed
+    float grassHash = fract(sin(dot(floor(c.xz), vec2(12.9898, 78.233))) * 43758.5453);
+    if (grassHash > 0.85 && c.y >= sandHeight + 1.0 && c.y < sandHeight + 1.0 + 3.0 * grassHash) {
+        return 3;
+    }
+    return 0;
+}
+
+// Check if ray intersects water surface
+float tWater = (waterY - ro.y) / rd.y;
+bool hitWater = tWater > 0.0 && (tWater < hit.t || !hit.hit);
+
+if (hitWater) {
+    vec3 waterPos = ro + rd * tWater;
+    vec3 waterNormal = vec3(0.0, 1.0, 0.0);
+    // NOTE: waterPos.xz is vec2, access with .x/.y (not .x/.z)
+    vec2 waveXZ = waterPos.xz;  // vec2: waveXZ.x = worldX, waveXZ.y = worldZ
+    waterNormal.x += 0.05 * sin(waveXZ.x * 3.0 + iTime);
+    waterNormal.z += 0.05 * cos(waveXZ.y * 2.0 + iTime * 0.7);
+    waterNormal = normalize(waterNormal);
+
+    // Fresnel
+    float fresnel = 0.04 + 0.96 * pow(1.0 - max(dot(waterNormal, -rd), 0.0), 5.0);
+
+    // Reflection
+    vec3 reflDir = reflect(rd, waterNormal);
+    HitInfo reflHit = castRay(waterPos + waterNormal * 0.01, reflDir, 64);
+    vec3 reflCol = reflHit.hit ? getMaterialColor(reflHit.mat, reflHit.uv) : skyColor;
+
+    // Refraction (underwater voxels: sand, seaweed)
+    vec3 refrDir = refract(rd, waterNormal, 1.0 / 1.33);
+    HitInfo refrHit = castRay(waterPos - waterNormal * 0.01, refrDir, 64);
+    vec3 refrCol;
+    if (refrHit.hit) {
+        vec3 matCol = getMaterialColor(refrHit.mat, refrHit.uv);
+        // Underwater color attenuation (bluer with distance)
+        float underwaterDist = length(refrHit.pos - waterPos);
+        refrCol = mix(matCol, vec3(0.0, 0.15, 0.3), 1.0 - exp(-0.1 * underwaterDist));
+    } else {
+        refrCol = vec3(0.0, 0.1, 0.3);  // Deep water color
+    }
+
+    col = mix(refrCol, reflCol, fresnel);
+    col = mix(col, vec3(0.0, 0.3, 0.5), 0.2);
+}
+```
+
+### Global Illumination (Monte Carlo Hemisphere Sampling)
+Use random hemisphere direction sampling for diffuse indirect lighting:
+```glsl
+vec3 indirectLight = vec3(0.0);
+int numSamples = 4;  // Few samples per frame, accumulate across frames
+for (int s = 0; s < numSamples; s++) {
+    // Cosine-weighted hemisphere sampling
+    vec2 xi = hash22(vec2(fragCoord) + float(iFrame) * 0.618 + float(s));
+    float cosTheta = sqrt(xi.x);
+    float sinTheta = sqrt(1.0 - xi.x);
+    float phi = 6.28318 * xi.y;
+
+    vec3 sampleDir = cosTheta * normal
+                   + sinTheta * cos(phi) * tangent
+                   + sinTheta * sin(phi) * bitangent;
+
+    HitInfo giHit = castRay(hitPos + normal * 0.01, sampleDir, 32);
+    if (giHit.hit) {
+        vec3 giColor = getMaterialColor(giHit.mat, giHit.uv);
+        float giDiff = max(dot(giHit.normal, sunDir), 0.0);
+        indirectLight += giColor * giDiff;
+    } else {
+        indirectLight += skyColor;
+    }
+}
+indirectLight /= float(numSamples);
+col += matCol * indirectLight * 0.5;  // Indirect light contribution
+```
--- a/skills/shader-dev/reference/water-ocean.md
+++ b/skills/shader-dev/reference/water-ocean.md
@@ -0,0 +1,445 @@
+# Water & Ocean Rendering — Detailed Reference
+
+This document is the complete reference for [SKILL.md](SKILL.md), covering prerequisites, detailed explanations for each step, variant descriptions, in-depth performance optimization analysis, and complete code examples for combination suggestions.
+
+## Prerequisites
+
+- **GLSL Fundamentals**: uniforms, varyings, built-in functions
+- **Vector Math**: dot product, cross product, reflection/refraction vectors
+- **Basic Raymarching Concepts**
+- **FBM (Fractal Brownian Motion) / Multi-octave Noise Layering Basics**
+- **Physical Intuition of the Fresnel Effect**: strong reflection at grazing angles, strong transmission at normal incidence
+
+## Core Principles
+
+The essence of water rendering is solving three core problems: **water surface shape generation**, **light-water surface interaction**, and **water body color compositing**.
+
+### 1. Wave Generation: Exponential Sine Layering + Derivative Domain Warping
+
+Traditional sum-of-sines uses `sin(x)` to produce symmetric waveforms, but real ocean waves have **sharp crests and broad troughs**. The core formula:
+
+```
+wave(x) = exp(sin(x) - 1)
+```
+
+- When `sin(x) = 1` (crest): `exp(0) = 1.0`, sharp peak
+- When `sin(x) = -1` (trough): `exp(-2) ≈ 0.135`, broad flat valley
+
+This naturally produces a **trochoidal profile** similar to Gerstner waves, but at much lower computational cost.
+
+When layering multiple waves, the key innovation is **derivative domain warping (Drag)**:
+
+```
+position += direction * derivative * weight * DRAG_MULT
+```
+
+Each wave layer's sampling position is offset by the previous layer's derivative, causing small ripples to naturally cluster on the crests of larger waves — simulating the real-ocean phenomenon of capillary waves riding on gravity waves.
+
+### 2. Lighting Model: Schlick Fresnel + Subsurface Scattering Approximation
+
+**Schlick Fresnel Approximation**:
+```
+F = F0 + (1 - F0) * (1 - dot(N, V))^5
+```
+Where water's F0 ≈ 0.04 (only 4% reflection at normal incidence).
+
+**Subsurface Scattering (SSS)** is approximated through water thickness: troughs have thicker water layers with stronger blue-green scattering; crests have thinner layers with weaker scattering — naturally producing the visual effect of transparent crests and deep blue troughs.
+
+### 3. Water Surface Intersection: Bounded Heightfield Marching
+
+The water surface is constrained within a bounding box of `[0, -WATER_DEPTH]`, and rays only march between the intersection points of two planes. Step size is adaptive: `step = ray_y - wave_height` — large steps when far from the surface, small precise steps when close.
+
+## Implementation Steps
+
+### Step 1: Exponential Sine Wave Function
+
+**What**: Define a single directional wave's value and derivative calculation function.
+
+**Why**: `exp(sin(x) - 1)` transforms the symmetric sine into a realistic waveform with sharp crests and broad troughs. It also returns the analytical derivative, used for subsequent domain warping and normal calculation.
+
+**Code**:
+```glsl
+vec2 wavedx(vec2 position, vec2 direction, float frequency, float timeshift) {
+    float x = dot(direction, position) * frequency + timeshift;
+    float wave = exp(sin(x) - 1.0);     // Sharp crest, broad trough waveform
+    float dx = wave * cos(x);            // Analytical derivative = exp(sin(x)-1) * cos(x)
+    return vec2(wave, -dx);              // Return (value, negative derivative)
+}
+```
+
+### Step 2: Multi-Octave Wave Layering with Domain Warping
+
+**What**: Layer multiple waves with different directions, frequencies, and speeds, applying derivative-driven position offset (drag) between each layer.
+
+**Why**: A single wave is too regular. Multi-octave layering produces natural complex waveforms. Domain warping is the key — it causes small waves to cluster on top of large waves, which is the core technique distinguishing "good-looking ocean" from "ordinary noise." The frequency growth rate of 1.18 (instead of the traditional FBM 2.0) creates smoother transitions between wave layers.
+
+**Code**:
+```glsl
+#define DRAG_MULT 0.38  // Tunable: domain warp strength, 0=none, 0.5=strong clustering
+
+float getwaves(vec2 position, int iterations) {
+    float wavePhaseShift = length(position) * 0.1; // Break long-distance phase synchronization
+    float iter = 0.0;
+    float frequency = 1.0;
+    float timeMultiplier = 2.0;
+    float weight = 1.0;
+    float sumOfValues = 0.0;
+    float sumOfWeights = 0.0;
+    for (int i = 0; i < iterations; i++) {
+        vec2 p = vec2(sin(iter), cos(iter));  // Pseudo-random wave direction
+
+        vec2 res = wavedx(position, p, frequency, iTime * timeMultiplier + wavePhaseShift);
+
+        // Core: offset sampling position based on derivative (small waves ride big waves)
+        position += p * res.y * weight * DRAG_MULT;
+
+        sumOfValues += res.x * weight;
+        sumOfWeights += weight;
+
+        weight = mix(weight, 0.0, 0.2);      // Tunable: weight decay, 0.2 = 80% retained per layer
+        frequency *= 1.18;                     // Tunable: frequency growth rate
+        timeMultiplier *= 1.07;                // Tunable: higher frequency waves animate faster (dispersion)
+        iter += 1232.399963;                   // Large irrational increment ensures uniform direction distribution
+    }
+    return sumOfValues / sumOfWeights;
+}
+```
+
+### Step 3: Bounded Bounding Box Ray Marching
+
+**What**: Constrain the water surface between two horizontal planes and only march between the entry and exit points.
+
+**Why**: Much faster than unbounded SDF marching. The step size `pos.y - height` automatically adapts — large jumps when far from the surface, fine convergence when close. Precomputing bounding box intersections avoids wasting steps in open air.
+
+**Code**:
+```glsl
+#define WATER_DEPTH 1.0  // Tunable: water body thickness, affects SSS and wave amplitude
+
+float intersectPlane(vec3 origin, vec3 direction, vec3 point, vec3 normal) {
+    return clamp(dot(point - origin, normal) / dot(direction, normal), -1.0, 9991999.0);
+}
+
+float raymarchwater(vec3 camera, vec3 start, vec3 end, float depth) {
+    vec3 pos = start;
+    vec3 dir = normalize(end - start);
+    for (int i = 0; i < 64; i++) {         // Tunable: march steps, 64 is usually sufficient
+        float height = getwaves(pos.xz, ITERATIONS_RAYMARCH) * depth - depth;
+        if (height + 0.01 > pos.y) {
+            return distance(pos, camera);
+        }
+        pos += dir * (pos.y - height);      // Adaptive step size
+    }
+    return distance(start, camera);          // If missed, assume hit at top surface
+}
+```
+
+### Step 4: Normal Calculation with Distance Smoothing
+
+**What**: Compute water surface normals using finite differences, and interpolate toward the up direction based on distance to eliminate distant aliasing.
+
+**Why**: Normals determine all lighting details. Using more wave iterations for normals than for ray marching (36 vs 12) is a core performance technique — marching only needs coarse shape, normals need fine detail. The farther away, the more high-frequency normals cause flickering; smoothing toward `(0,1,0)` is equivalent to implicit LOD.
+
+**Code**:
+```glsl
+#define ITERATIONS_RAYMARCH 12  // Tunable: wave iterations for marching (fewer = faster)
+#define ITERATIONS_NORMAL 36    // Tunable: wave iterations for normals (more = finer detail)
+
+vec3 normal(vec2 pos, float e, float depth) {
+    vec2 ex = vec2(e, 0);
+    float H = getwaves(pos.xy, ITERATIONS_NORMAL) * depth;
+    vec3 a = vec3(pos.x, H, pos.y);
+    return normalize(
+        cross(
+            a - vec3(pos.x - e, getwaves(pos.xy - ex.xy, ITERATIONS_NORMAL) * depth, pos.y),
+            a - vec3(pos.x, getwaves(pos.xy + ex.yx, ITERATIONS_NORMAL) * depth, pos.y + e)
+        )
+    );
+}
+
+// Distance smoothing: distant normals approach (0,1,0)
+// N = mix(N, vec3(0.0, 1.0, 0.0), 0.8 * min(1.0, sqrt(dist * 0.01) * 1.1));
+```
+
+### Step 5: Fresnel Reflection and Subsurface Scattering
+
+**What**: Use Schlick Fresnel approximation to calculate reflection/scattering weights, combining sky reflection with depth-dependent blue-green scattering color.
+
+**Why**: The Fresnel effect is key to water surface realism — nearly fully transparent up close, nearly fully reflective at a distance. The SSS color `(0.0293, 0.0698, 0.1717)` comes from empirical values of deep-sea scattering spectra. Troughs have thicker water layers with stronger SSS; crests have thinner layers with weaker SSS, naturally producing light-dark variation.
+
+**Code**:
+```glsl
+// Schlick Fresnel, F0 = 0.04 (water's normal incidence reflectance)
+float fresnel = 0.04 + 0.96 * pow(1.0 - max(0.0, dot(-N, ray)), 5.0);
+
+// Reflection direction, force upward to avoid self-intersection
+vec3 R = normalize(reflect(ray, N));
+R.y = abs(R.y);
+
+// Sky reflection + sun specular
+vec3 reflection = getAtmosphere(R) + getSun(R);
+
+// Subsurface scattering: deeper (trough) = bluer color
+vec3 scattering = vec3(0.0293, 0.0698, 0.1717) * 0.1
+                * (0.2 + (waterHitPos.y + WATER_DEPTH) / WATER_DEPTH);
+
+// Final compositing
+vec3 C = fresnel * reflection + scattering;
+```
+
+### Step 6: Atmosphere and Tone Mapping
+
+**What**: Add a cheap atmospheric scattering model and ACES tone mapping.
+
+**Why**: The water surface reflects the sky, so sky quality directly affects the water's appearance. `1/(ray.y + 0.1)` approximates optical path length, `vec3(5.5, 13.0, 22.4)/22.4` represents Rayleigh scattering coefficient ratios. ACES tone mapping maps HDR values to display range, preserving highlight detail while compressing shadows.
+
+**Code**:
+```glsl
+vec3 extra_cheap_atmosphere(vec3 raydir, vec3 sundir) {
+    float special_trick = 1.0 / (raydir.y * 1.0 + 0.1);
+    float special_trick2 = 1.0 / (sundir.y * 11.0 + 1.0);
+    float raysundt = pow(abs(dot(sundir, raydir)), 2.0);
+    float sundt = pow(max(0.0, dot(sundir, raydir)), 8.0);
+    float mymie = sundt * special_trick * 0.2;
+    vec3 suncolor = mix(vec3(1.0), max(vec3(0.0), vec3(1.0) - vec3(5.5, 13.0, 22.4) / 22.4),
+                        special_trick2);
+    vec3 bluesky = vec3(5.5, 13.0, 22.4) / 22.4 * suncolor;
+    vec3 bluesky2 = max(vec3(0.0), bluesky - vec3(5.5, 13.0, 22.4) * 0.002
+                   * (special_trick + -6.0 * sundir.y * sundir.y));
+    bluesky2 *= special_trick * (0.24 + raysundt * 0.24);
+    return bluesky2 * (1.0 + 1.0 * pow(1.0 - raydir.y, 3.0));
+}
+
+vec3 aces_tonemap(vec3 color) {
+    mat3 m1 = mat3(
+        0.59719, 0.07600, 0.02840,
+        0.35458, 0.90834, 0.13383,
+        0.04823, 0.01566, 0.83777);
+    mat3 m2 = mat3(
+        1.60475, -0.10208, -0.00327,
+       -0.53108,  1.10813, -0.07276,
+       -0.07367, -0.00605,  1.07602);
+    vec3 v = m1 * color;
+    vec3 a = v * (v + 0.0245786) - 0.000090537;
+    vec3 b = v * (0.983729 * v + 0.4329510) + 0.238081;
+    return pow(clamp(m2 * (a / b), 0.0, 1.0), vec3(1.0 / 2.2));
+}
+```
+
+## Common Variants
+
+### Variant 1: 2D Underwater Caustic Texture
+
+Difference from the base version: No 3D ray marching — purely a 2D screen-space effect. Uses an iterative triangular feedback loop to generate caustic light patterns, suitable as a ground projection texture for underwater scenes or as an overlay layer.
+
+Key code:
+```glsl
+#define TAU 6.28318530718
+#define MAX_ITER 5       // Tunable: iteration count, more = finer caustics
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    float time = iTime * 0.5 + 23.0;
+    vec2 uv = fragCoord.xy / iResolution.xy;
+    vec2 p = mod(uv * TAU, TAU) - 250.0;   // mod TAU ensures tileability
+    vec2 i = vec2(p);
+    float c = 1.0;
+    float inten = 0.005;  // Tunable: caustic line width (smaller = thinner)
+
+    for (int n = 0; n < MAX_ITER; n++) {
+        float t = time * (1.0 - (3.5 / float(n + 1)));
+        i = p + vec2(cos(t - i.x) + sin(t + i.y), sin(t - i.y) + cos(t + i.x));
+        c += 1.0 / length(vec2(p.x / (sin(i.x + t) / inten), p.y / (cos(i.y + t) / inten)));
+    }
+    c /= float(MAX_ITER);
+    c = 1.17 - pow(c, 1.4);
+    vec3 colour = vec3(pow(abs(c), 8.0));
+    colour = clamp(colour + vec3(0.0, 0.35, 0.5), 0.0, 1.0); // Aqua blue tint
+    fragColor = vec4(colour, 1.0);
+}
+```
+
+### Variant 2: FBM Bump-Mapped Lake Surface (Plane Intersection + Bump Mapping)
+
+Difference from the base version: No per-pixel ray marching — uses analytical plane intersection + FBM bump mapping instead. Extremely fast, suitable for distant lake surfaces or situations where water must be embedded in complex scenes (e.g., with volumetric cloud reflections).
+
+Key code:
+```glsl
+// Water surface heightmap (FBM + abs folding produces ridge-like ripples)
+float waterMap(vec2 pos) {
+    mat2 m2 = mat2(0.60, -0.80, 0.80, 0.60); // Rotation matrix to avoid axis alignment
+    vec2 posm = pos * m2;
+    return abs(fbm(vec3(8.0 * posm, iTime)) - 0.5) * 0.1;
+}
+
+// Analytical plane intersection replaces ray marching
+float t = -ro.y / rd.y;  // Water surface at y=0
+vec3 hitPos = ro + rd * t;
+
+// Finite difference normals (central differencing)
+float eps = 0.1;
+vec3 normal = vec3(0.0, 1.0, 0.0);
+normal.x = -bumpfactor * (waterMap(hitPos.xz + vec2(eps, 0.0)) - waterMap(hitPos.xz - vec2(eps, 0.0))) / (2.0 * eps);
+normal.z = -bumpfactor * (waterMap(hitPos.xz + vec2(0.0, eps)) - waterMap(hitPos.xz - vec2(0.0, eps))) / (2.0 * eps);
+normal = normalize(normal);
+
+// Bump strength fades with distance (LOD)
+float bumpfactor = 0.1 * (1.0 - smoothstep(0.0, 60.0, distance(ro, hitPos)));
+
+// Refraction uses the built-in refract() function
+vec3 refracted = refract(rd, normal, 1.0 / 1.333);
+```
+
+### Variant 3: Ridged Noise Coastal Waves
+
+Difference from the base version: Uses `1 - abs(noise)` instead of `exp(sin)` to generate waveforms, combined with in-loop domain warping. Suitable for coastal scenes with sharper, more impactful waves that naturally connect to shore foam.
+
+Key code:
+```glsl
+float sea(vec2 p) {
+    float f = 1.0;
+    float r = 0.0;
+    float time = -iTime;
+    for (int i = 0; i < 8; i++) {        // Tunable: 8 octaves
+        r += (1.0 - abs(noise(p * f + 0.9 * time))) / f;  // Ridged noise
+        f *= 2.0;
+        p -= vec2(-0.01, 0.04) * (r - 0.2 * time / (0.1 - f)); // In-loop domain warping
+    }
+    return r / 4.0 + 0.5;
+}
+
+// Shore foam: based on distance between water surface and terrain
+float dh = seaDist - rockDist; // Water-terrain SDF difference
+float foam = 0.0;
+if (dh < 0.0 && dh > -0.02) {
+    foam = 0.5 * exp(20.0 * dh);   // Exponentially decaying shoreline glow
+}
+```
+
+### Variant 4: Flow Map Water Animation (Rivers/Streams)
+
+Difference from the base version: Adds flow-field-driven FBM animation. Uses a two-phase time cycle to eliminate texture stretching, with water flow direction procedurally generated from terrain gradients. Suitable for rivers, streams, and other water bodies with a clear flow direction.
+
+Key code:
+```glsl
+// FBM with analytical derivatives + flow field offset
+vec3 FBM_DXY(vec2 p, vec2 flow, float persistence, float domainWarp) {
+    vec3 f = vec3(0.0);
+    float tot = 0.0;
+    float a = 1.0;
+    for (int i = 0; i < 4; i++) {
+        p += flow;
+        flow *= -0.75;          // Negate + shrink each layer to prevent uniform sliding
+        vec3 v = SmoothNoise_DXY(p);
+        f += v * a;
+        p += v.xy * domainWarp; // Gradient domain warping
+        p *= 2.0;
+        tot += a;
+        a *= persistence;
+    }
+    return f / tot;
+}
+
+// Two-phase flow cycle (eliminates stretching)
+float t0 = fract(time);
+float t1 = fract(time + 0.5);
+vec4 sample0 = SampleWaterNormal(uv + Hash2(floor(time)),     flowRate * (t0 - 0.5));
+vec4 sample1 = SampleWaterNormal(uv + Hash2(floor(time+0.5)), flowRate * (t1 - 0.5));
+float weight = abs(t0 - 0.5) * 2.0;
+vec4 result = mix(sample0, sample1, weight);
+```
+
+### Variant 5: Beer's Law Water Absorption + Volumetric Scattering
+
+Difference from the base version: Replaces the simple SSS approximation with physically correct Beer-Lambert exponential decay for underwater color absorption, plus a forward scattering term. Suitable for realistic scenes requiring tunable clear/turbid water.
+
+Key code:
+```glsl
+// Beer-Lambert attenuation: red light absorbed fastest, blue light slowest
+vec3 GetWaterExtinction(float dist) {
+    float fOpticalDepth = dist * 6.0;     // Tunable: larger = more turbid water
+    vec3 vExtinctCol = vec3(0.5, 0.6, 0.9); // Tunable: absorption spectrum (R decays fast, B slow)
+    return exp2(-fOpticalDepth * vExtinctCol);
+}
+
+// Volumetric in-scattering
+vec3 vInscatter = vSurfaceDiffuse * (1.0 - exp(-refractDist * 0.1))
+               * (1.0 + dot(sunDir, viewDir));  // Forward scattering enhancement
+
+// Final underwater color
+vec3 underwaterColor = terrainColor * GetWaterExtinction(waterDepth) + vInscatter;
+
+// Fresnel compositing
+vec3 finalColor = mix(underwaterColor, reflectionColor, fresnel);
+```
+
+## In-Depth Performance Optimization
+
+### 1. Dual Iteration Count Strategy (Most Critical Optimization)
+
+Ray marching uses few iterations (12), normal calculation uses many (36). Marching only needs a rough intersection point; normals need fine wave detail. This single technique can halve render time with virtually no visual quality loss.
+
+### 2. Distance-Adaptive Normal Smoothing
+
+```glsl
+N = mix(N, vec3(0.0, 1.0, 0.0), 0.8 * min(1.0, sqrt(dist * 0.01) * 1.1));
+```
+
+Distant normals approach `(0,1,0)`, eliminating high-frequency flickering at distance (equivalent to implicit normal mipmapping), while saving expensive normal calculations at long range.
+
+### 3. Bounding Box Clipping
+
+Precompute ray intersections with the top and bottom horizontal planes, and only march between the two intersection points. Rays pointing skyward (`ray.y >= 0`) skip water surface calculations entirely — the simplest and most effective early-out.
+
+### 4. Adaptive Step Size
+
+`pos += dir * (pos.y - height)` uses the current height difference as step size — potentially jumping large distances when far from the surface, automatically shrinking when close. 3-5x faster than fixed step size.
+
+### 5. Filter Width-Aware Normal Attenuation (Advanced)
+
+For scenes requiring more precise LOD:
+```glsl
+vec2 vFilterWidth = max(abs(dFdx(uv)), abs(dFdy(uv)));
+float fScale = 1.0 / (1.0 + max(vFilterWidth.x, vFilterWidth.y) * max(vFilterWidth.x, vFilterWidth.y) * 2000.0);
+normalStrength *= fScale;
+```
+
+Uses screen-space derivatives to automatically detect pixel coverage area — the larger the area, the flatter the normal. This is a precise implementation of manual mipmapping.
+
+### 6. LOD Conditional Detail
+
+```glsl
+if (distanceToSurface < threshold) {
+    // Only compute high-frequency detail when close to the water surface
+    for (int i = 0; i < detailOctaves; i++) { ... }
+}
+```
+
+High-frequency displacement of the water surface SDF is only calculated when close to the surface; at distance, the base plane is used directly, avoiding unnecessary noise sampling.
+
+## Combination Suggestions
+
+### 1. Combining with Volumetric Clouds
+
+Including cloud reflections in the water surface is key to enhancing realism. Steps: first perform volumetric cloud raymarching along the reflection direction `R`, then mix the cloud color as part of `reflection` in the Fresnel compositing. This is a common technique in water rendering shaders.
+
+### 2. Combining with Terrain Systems
+
+Shoreline rendering requires interaction between the water surface SDF and terrain SDF. Key technique: maintain `dh = waterSDF - terrainSDF`, and render foam when `dh ≈ 0` (`exp(k * dh)` produces exponentially decaying coastal glow). A standard technique in shoreline rendering.
+
+### 3. Combining with Caustics
+
+In underwater scenes, project the caustic texture from Variant 1 onto the underwater terrain surface. Modulate caustic intensity as `caustic * exp(-waterDepth * absorption)` for depth-based attenuation.
+
+### 4. Combining with Fog/Atmospheric Scattering
+
+Distant water surfaces must blend into atmospheric fog. Use an independent extinction + in-scatter fog model (not a simple lerp), with each RGB channel attenuating independently:
+```glsl
+vec3 fogExtinction = exp2(fogExtCoeffs * -distance);
+vec3 fogInscatter = fogColor * (1.0 - exp2(fogInCoeffs * -distance));
+finalColor = finalColor * fogExtinction + fogInscatter;
+```
+
+### 5. Combining with Post-Processing
+
+- **Bloom**: Sun specular highlights on the water surface need bloom to look natural; Fibonacci spiral blur works better than simple Gaussian
+- **Tone Mapping**: ACES is the standard choice for ocean scenes, preserving sun highlights while compressing shadows
+- **Depth of Field (DOF)**: Focusing on mid-ground waves with near and far blur greatly enhances cinematic quality (post-process bokeh DOF)
--- a/skills/shader-dev/reference/webgl-pitfalls.md
+++ b/skills/shader-dev/reference/webgl-pitfalls.md
@@ -0,0 +1,41 @@
+# WebGL2 Pitfalls Reference
+
+This is a reference document for the [webgl-pitfalls](../techniques/webgl-pitfalls.md) technique.
+
+## Complete Error Message Reference
+
+| Error Message | Likely Cause | Solution |
+|---|---|---|
+| `'fragCoord' : undeclared identifier` | Using `fragCoord` instead of `gl_FragCoord.xy` in WebGL2 | Replace with `gl_FragCoord.xy` |
+| `'' : Missing main()` | Fragment shader has no `main()` function | Add `void main() { mainImage(fragColor, gl_FragCoord.xy); }` wrapper |
+| `'functionName' : no matching overloaded function found` | Wrong argument types OR function declared after use | Check types; reorder or forward-declare functions |
+| `'return' : function return is not matching type:` | Return expression type doesn't match declared return type | Verify `vec3 foo()` returns `vec3`, not `float` |
+| `#version` must be first | Leading whitespace when extracting from script tag | Use `.trim()` on shader source string |
+| Uniform returns `null` from `getUniformLocation` | Uniform optimized away for being unused | Ensure uniform is actually referenced in shader code |
+
+## Type Mismatch Examples
+
+```glsl
+// ERROR: terrainM expects vec2, passing vec3
+float calcAO(vec3 pos, vec3 nor) {
+    float d = terrainM(pos + h * nor);  // Wrong: pos + h*nor is vec3
+}
+// FIX: Extract xz components
+float calcAO(vec3 pos, vec3 nor) {
+    float d = terrainM(pos.xz + h * nor.xz);  // Correct: vec2
+}
+```
+
+```glsl
+// ERROR: can't access .z on vec2
+vec2 uv = vec2(1.0, 2.0);
+float z = uv.z;  // Wrong: vec2 has no .z
+// FIX: use proper swizzle or conversion
+float z = uv.y;  // Or if you need third component, use vec3
+```
+
+## GLSL ES 3.0 Specific Notes
+
+- All declared `uniform` variables must be used in shader code, otherwise compiler may optimize them away
+- When `gl.getUniformLocation()` returns `null`, setting that uniform triggers `INVALID_OPERATION`
+- Loop counters must be deterministic at runtime — avoid compile-time constant folding issues
--- a/skills/shader-dev/techniques/ambient-occlusion.md
+++ b/skills/shader-dev/techniques/ambient-occlusion.md
@@ -0,0 +1,364 @@
+## WebGL2 Adaptation Requirements
+
+**IMPORTANT: GLSL Type Strictness**: float and vec types cannot be implicitly converted. `vec3 v = 1.0;` is illegal; you must use the vector form (e.g., `vec3(1.0)`, `vec3(1.0) * x`, `value * vec3(1.0)`).
+
+The code templates in this document use ShaderToy GLSL style. When generating standalone HTML pages, you must adapt for WebGL2:
+
+- Use `canvas.getContext("webgl2")`
+- Shader first line: `#version 300 es`, add `precision highp float;` in fragment shader
+- Vertex shader: `attribute` -> `in`, `varying` -> `out`
+- Fragment shader: `varying` -> `in`, `gl_FragColor` -> custom `out vec4 fragColor`, `texture2D()` -> `texture()`
+- ShaderToy's `void mainImage(out vec4 fragColor, in vec2 fragCoord)` must be adapted to the standard `void main()` entry point
+
+# SDF Ambient Occlusion
+
+## Use Cases
+
+- Simulating indirect light occlusion in raymarching / SDF scenes
+- Adding spatial depth and contact shadows (darkening in concavities and crevices)
+- From 5 samples (performance priority) to 32 hemisphere samples (quality priority)
+
+## Core Principles
+
+Sample the SDF along the surface normal direction at multiple distances, comparing the "expected distance" with the "actual distance" to estimate occlusion.
+
+For surface point P, normal N, and sampling distance h:
+- Expected distance = h (SDF should equal h when surroundings are open)
+- Actual distance = map(P + N * h)
+- Occlusion contribution = h - map(P + N * h) (larger difference = stronger occlusion)
+
+```
+AO = 1 - k * sum(weight_i * max(0, h_i - map(P + N * h_i)))
+```
+
+Result: 1.0 = no occlusion, 0.0 = fully occluded. Weights decay exponentially (closer samples have higher weight).
+
+## Implementation Steps
+
+### Step 1: SDF Scene
+
+```glsl
+float map(vec3 p) {
+    float d = p.y; // ground
+    d = min(d, length(p - vec3(0.0, 1.0, 0.0)) - 1.0); // sphere
+    d = min(d, length(vec2(length(p.xz) - 1.5, p.y - 0.5)) - 0.4); // torus
+    return d;
+}
+```
+
+### Step 2: Normal Calculation
+
+```glsl
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+```
+
+### Step 3: Classic Normal-Direction AO (5 Samples)
+
+```glsl
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0; // sampling distance 0.01~0.13
+        float d = map(pos + h * nor);
+        occ += (h - d) * sca; // (expected - actual) * weight
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+```
+
+### Step 4: Applying AO to Lighting
+
+```glsl
+float ao = calcAO(pos, nor);
+
+// affect ambient light only (physically correct)
+vec3 ambient = vec3(0.2, 0.3, 0.5) * ao;
+vec3 color = diffuse * shadow + ambient;
+
+// affect all lighting (visually stronger)
+vec3 color = (diffuse * shadow + ambient) * ao;
+
+// combined with sky visibility
+float skyVis = 0.5 + 0.5 * nor.y;
+vec3 color = diffuse * shadow + ambient * ao * skyVis;
+```
+
+### Step 5: Raymarching Integration
+
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // ... camera setup, ray generation ...
+    float t = 0.0;
+    for (int i = 0; i < 128; i++) {
+        vec3 p = ro + rd * t;
+        float d = map(p);
+        if (d < 0.001) break;
+        t += d;
+        if (t > 100.0) break;
+    }
+
+    vec3 col = vec3(0.0);
+    if (t < 100.0) {
+        vec3 pos = ro + rd * t;
+        vec3 nor = calcNormal(pos);
+        float ao = calcAO(pos, nor);
+
+        vec3 lig = normalize(vec3(1.0, 0.8, -0.6));
+        float dif = clamp(dot(nor, lig), 0.0, 1.0);
+        float sky = 0.5 + 0.5 * nor.y;
+        col = vec3(1.0) * dif + vec3(0.2, 0.3, 0.5) * sky * ao;
+    }
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Complete Code Template
+
+Runs directly in ShaderToy:
+
+```glsl
+// SDF Ambient Occlusion — ShaderToy Template
+// Synthesized from classic raymarching implementations
+
+#define AO_STEPS 5
+#define AO_MAX_DIST 0.12
+#define AO_MIN_DIST 0.01
+#define AO_DECAY 0.95
+#define AO_STRENGTH 3.0
+#define MARCH_STEPS 128
+#define MAX_DIST 100.0
+#define SURF_DIST 0.001
+
+float map(vec3 p) {
+    float ground = p.y;
+    float sphere = length(p - vec3(0.0, 1.0, 0.0)) - 1.0;
+    float torus = length(vec2(length(p.xz) - 1.5, p.y - 0.5)) - 0.4;
+    float box = length(max(abs(p - vec3(-2.5, 0.75, 0.0)) - vec3(0.75), 0.0)) - 0.05;
+    float d = min(ground, sphere);
+    d = min(d, torus);
+    d = min(d, box);
+    return d;
+}
+
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < AO_STEPS; i++) {
+        float h = AO_MIN_DIST + AO_MAX_DIST * float(i) / float(AO_STEPS - 1);
+        float d = map(pos + h * nor);
+        occ += (h - d) * sca;
+        sca *= AO_DECAY;
+    }
+    return clamp(1.0 - AO_STRENGTH * occ, 0.0, 1.0);
+}
+
+float calcShadow(vec3 ro, vec3 rd, float mint, float maxt, float k) {
+    float res = 1.0;
+    float t = mint;
+    for (int i = 0; i < 64; i++) {
+        float h = map(ro + rd * t);
+        res = min(res, k * h / t);
+        t += clamp(h, 0.01, 0.2);
+        if (res < 0.001 || t > maxt) break;
+    }
+    return clamp(res, 0.0, 1.0);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+    float an = 0.3 * iTime;
+    vec3 ro = vec3(4.0 * cos(an), 2.5, 4.0 * sin(an));
+    vec3 ta = vec3(0.0, 0.5, 0.0);
+    vec3 ww = normalize(ta - ro);
+    vec3 uu = normalize(cross(ww, vec3(0.0, 1.0, 0.0)));
+    vec3 vv = cross(uu, ww);
+    vec3 rd = normalize(uv.x * uu + uv.y * vv + 1.8 * ww);
+
+    float t = 0.0;
+    for (int i = 0; i < MARCH_STEPS; i++) {
+        vec3 p = ro + rd * t;
+        float d = map(p);
+        if (d < SURF_DIST) break;
+        t += d;
+        if (t > MAX_DIST) break;
+    }
+
+    vec3 col = vec3(0.4, 0.5, 0.7) - 0.3 * rd.y;
+
+    if (t < MAX_DIST) {
+        vec3 pos = ro + rd * t;
+        vec3 nor = calcNormal(pos);
+        float ao = calcAO(pos, nor);
+
+        vec3 lig = normalize(vec3(0.8, 0.6, -0.5));
+        float dif = clamp(dot(nor, lig), 0.0, 1.0);
+        float sha = calcShadow(pos + nor * 0.01, lig, 0.02, 20.0, 8.0);
+        float sky = 0.5 + 0.5 * nor.y;
+
+        vec3 mate = vec3(0.18);
+        if (pos.y < 0.01) {
+            float f = mod(floor(pos.x) + floor(pos.z), 2.0);
+            mate = 0.1 + 0.08 * f * vec3(1.0);
+        }
+
+        col = vec3(0.0);
+        col += mate * vec3(1.0, 0.9, 0.7) * dif * sha;
+        col += mate * vec3(0.2, 0.3, 0.5) * sky * ao;
+        col += mate * vec3(0.3, 0.2, 0.1) * clamp(-nor.y, 0.0, 1.0) * ao;
+    }
+
+    col = pow(col, vec3(0.4545));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Multiplicative AO (Spout / P_Malin)
+
+```glsl
+float calcAO_multiplicative(vec3 pos, vec3 nor) {
+    float ao = 1.0;
+    float dist = 0.0;
+    for (int i = 0; i <= 5; i++) {
+        dist += 0.1;
+        float d = map(pos + nor * dist);
+        ao *= 1.0 - max(0.0, (dist - d) * 0.2 / dist);
+    }
+    return ao;
+}
+```
+
+### Multi-Scale Separated AO (Protophore / Eric Heitz)
+
+Exponentially increasing sampling distances, separating short-range contact shadows from long-range ambient occlusion, fully unrolled without loops.
+
+```glsl
+float calcAO_multiscale(vec3 pos, vec3 nor) {
+    float aoS = 1.0;
+    aoS *= clamp(map(pos + nor * 0.1) * 10.0, 0.0, 1.0);
+    aoS *= clamp(map(pos + nor * 0.2) * 5.0,  0.0, 1.0);
+    aoS *= clamp(map(pos + nor * 0.4) * 2.5,  0.0, 1.0);
+    aoS *= clamp(map(pos + nor * 0.8) * 1.25, 0.0, 1.0);
+
+    float ao = aoS;
+    ao *= clamp(map(pos + nor * 1.6) * 0.625,  0.0, 1.0);
+    ao *= clamp(map(pos + nor * 3.2) * 0.3125, 0.0, 1.0);
+    ao *= clamp(map(pos + nor * 6.4) * 0.15625,0.0, 1.0);
+
+    return max(0.035, pow(ao, 0.3));
+}
+```
+
+### Jittered Sampling AO
+
+Hash jittering breaks banding artifacts, `1/(1+l)` distance falloff.
+
+```glsl
+float hash(float n) { return fract(sin(n) * 43758.5453); }
+
+float calcAO_jittered(vec3 pos, vec3 nor, float maxDist) {
+    float ao = 0.0;
+    const float nbIte = 6.0;
+    for (float i = 1.0; i < nbIte + 0.5; i++) {
+        float l = (i + hash(i)) * 0.5 / nbIte * maxDist;
+        ao += (l - map(pos + nor * l)) / (1.0 + l);
+    }
+    return clamp(1.0 - ao / nbIte, 0.0, 1.0);
+}
+// call: calcAO_jittered(pos, nor, 4.0)
+```
+
+### Hemisphere Random Direction AO
+
+Random direction sampling within the normal hemisphere, closer to physically accurate, requires 32 samples.
+
+```glsl
+vec2 hash2(float n) {
+    return fract(sin(vec2(n, n + 1.0)) * vec2(43758.5453, 22578.1459));
+}
+
+float calcAO_hemisphere(vec3 pos, vec3 nor, float seed) {
+    float occ = 0.0;
+    for (int i = 0; i < 32; i++) {
+        float h = 0.01 + 4.0 * pow(float(i) / 31.0, 2.0);
+        vec2 an = hash2(seed + float(i) * 13.1) * vec2(3.14159, 6.2831);
+        vec3 dir = vec3(sin(an.x) * sin(an.y), sin(an.x) * cos(an.y), cos(an.x));
+        dir *= sign(dot(dir, nor));
+        occ += clamp(5.0 * map(pos + h * dir) / h, -1.0, 1.0);
+    }
+    return clamp(occ / 32.0, 0.0, 1.0);
+}
+```
+
+### Fibonacci Sphere Uniform Hemisphere AO
+
+Fibonacci sphere points for quasi-uniform hemisphere sampling, avoiding random clustering.
+
+```glsl
+vec3 forwardSF(float i, float n) {
+    const float PI  = 3.141592653589793;
+    const float PHI = 1.618033988749895;
+    float phi = 2.0 * PI * fract(i / PHI);
+    float zi = 1.0 - (2.0 * i + 1.0) / n;
+    float sinTheta = sqrt(1.0 - zi * zi);
+    return vec3(cos(phi) * sinTheta, sin(phi) * sinTheta, zi);
+}
+
+float hash1(float n) { return fract(sin(n) * 43758.5453); }
+
+float calcAO_fibonacci(vec3 pos, vec3 nor) {
+    float ao = 0.0;
+    for (int i = 0; i < 32; i++) {
+        vec3 ap = forwardSF(float(i), 32.0);
+        float h = hash1(float(i));
+        ap *= sign(dot(ap, nor)) * h * 0.1;
+        ao += clamp(map(pos + nor * 0.01 + ap) * 3.0, 0.0, 1.0);
+    }
+    ao /= 32.0;
+    return clamp(ao * 6.0, 0.0, 1.0);
+}
+```
+
+## Performance & Composition
+
+### Performance Tips
+
+- **Bottleneck**: Number of `map()` calls. Each AO sample = one full SDF evaluation
+- **Sample count selection**: Classic normal-direction 3~5 samples is sufficient; hemisphere sampling needs 16~32
+- **Early exit**: `if (occ > 0.35) break;` skips over heavily occluded regions
+- **Unroll loops**: Fixed iteration count (4~7) manually unrolled is more GPU-friendly
+- **Distance degradation**: `float aoSteps = mix(5.0, 2.0, clamp(t / 50.0, 0.0, 1.0));`
+- **Preprocessor toggle**: `#ifdef ENABLE_AMBIENT_OCCLUSION` for on/off control
+- **SDF simplification**: AO sampling can use a simplified `map()`, ignoring fine details
+
+### Composition Tips
+
+- **AO + Soft Shadow**: `col = diffuse * sha + ambient * ao;`
+- **AO + Sky Visibility**: `col += skyColor * ao * (0.5 + 0.5 * nor.y);`
+- **AO + Bounce Light/SSS**: `col += bounceColor * bou * ao;`
+- **AO + Convexity Detection**: Sample along both +N/-N to get both AO and convexity
+- **AO + Fresnel Reflection**: `col += envColor * fre * ao;` reduces environment reflection in occluded areas
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/ambient-occlusion.md)
--- a/skills/shader-dev/techniques/analytic-ray-tracing.md
+++ b/skills/shader-dev/techniques/analytic-ray-tracing.md
@@ -0,0 +1,542 @@
+# Analytic Ray Tracing
+
+## Use Cases
+
+- Rendering scenes composed of geometric primitives (spheres, planes, boxes, cylinders, ellipsoids, etc.)
+- Requiring precise surface intersection points, normals, and distance calculations (no iterative approximation)
+- Building the underlying geometry engine for ray tracers / path tracers
+- Scenes requiring accurate shadows, reflections, and refractions
+
+## Core Principles
+
+Substitute the ray equation `P(t) = O + tD` into the geometric body's implicit equation to obtain an algebraic equation in `t`, then solve it in closed form.
+
+**Unified intersection workflow**: Build equation -> Simplify to standard form -> Discriminant test -> Take smallest positive root -> Compute gradient at intersection for normal
+
+**Key formulas**:
+- **Sphere** `|P-C|^2 = r^2` -> Quadratic equation
+- **Plane** `N·P + d = 0` -> Linear equation
+- **Box** Intersection of three pairs of parallel planes -> Slab Method
+- **Ellipsoid** `|P/R|^2 = 1` -> Sphere intersection in scaled space
+- **Torus** `(|P_xy| - R)^2 + P_z^2 = r^2` -> Quartic equation
+
+## Implementation Steps
+
+### Step 1: Ray Generation
+
+```glsl
+vec3 generateRay(vec2 fragCoord, vec2 resolution, vec3 ro, vec3 ta) {
+    vec2 p = (2.0 * fragCoord - resolution) / resolution.y;
+    vec3 cw = normalize(ta - ro);
+    vec3 cu = normalize(cross(cw, vec3(0, 1, 0)));
+    vec3 cv = cross(cu, cw);
+    float fov = 1.5;
+    return normalize(p.x * cu + p.y * cv + fov * cw);
+}
+```
+
+### Step 2: Ray-Sphere Intersection
+
+```glsl
+// Optimized version with sphere center at origin
+float iSphere(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, float r) {
+    float b = dot(ro, rd);
+    float c = dot(ro, ro) - r * r;
+    float h = b * b - c;
+    if (h < 0.0) return MAX_DIST;
+    h = sqrt(h);
+    float d1 = -b - h;
+    float d2 = -b + h;
+    if (d1 >= distBound.x && d1 <= distBound.y) {
+        normal = normalize(ro + rd * d1);
+        return d1;
+    } else if (d2 >= distBound.x && d2 <= distBound.y) {
+        normal = normalize(ro + rd * d2);
+        return d2;
+    }
+    return MAX_DIST;
+}
+```
+
+```glsl
+// General version, supports arbitrary sphere center (sph = vec4(center.xyz, radius))
+float sphIntersect(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    if (h < 0.0) return -1.0;
+    return -b - sqrt(h);
+}
+```
+
+### Step 3: Ray-Plane Intersection
+
+```glsl
+float iPlane(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal,
+             vec3 planeNormal, float planeDist) {
+    float denom = dot(rd, planeNormal);
+    if (denom > 0.0) return MAX_DIST;
+    float d = -(dot(ro, planeNormal) + planeDist) / denom;
+    if (d < distBound.x || d > distBound.y) return MAX_DIST;
+    normal = planeNormal;
+    return d;
+}
+
+// fast horizontal ground plane
+float iGroundPlane(vec3 ro, vec3 rd, float height) {
+    return -(ro.y - height) / rd.y;
+}
+```
+
+### Step 4: Ray-Box Intersection (Slab Method)
+
+```glsl
+float iBox(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, vec3 boxSize) {
+    vec3 m = sign(rd) / max(abs(rd), 1e-8);
+    vec3 n = m * ro;
+    vec3 k = abs(m) * boxSize;
+    vec3 t1 = -n - k;
+    vec3 t2 = -n + k;
+    float tN = max(max(t1.x, t1.y), t1.z);
+    float tF = min(min(t2.x, t2.y), t2.z);
+    if (tN > tF || tF <= 0.0) return MAX_DIST;
+    if (tN >= distBound.x && tN <= distBound.y) {
+        normal = -sign(rd) * step(t1.yzx, t1.xyz) * step(t1.zxy, t1.xyz);
+        return tN;
+    } else if (tF >= distBound.x && tF <= distBound.y) {
+        normal = -sign(rd) * step(t1.yzx, t1.xyz) * step(t1.zxy, t1.xyz);
+        return tF;
+    }
+    return MAX_DIST;
+}
+```
+
+### Step 5: Ray-Ellipsoid Intersection
+
+```glsl
+// Transform to unit sphere space for intersection, transform normal back to original space
+float iEllipsoid(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, vec3 rad) {
+    vec3 ocn = ro / rad;
+    vec3 rdn = rd / rad;
+    float a = dot(rdn, rdn);
+    float b = dot(ocn, rdn);
+    float c = dot(ocn, ocn);
+    float h = b * b - a * (c - 1.0);
+    if (h < 0.0) return MAX_DIST;
+    float d = (-b - sqrt(h)) / a;
+    if (d < distBound.x || d > distBound.y) return MAX_DIST;
+    normal = normalize((ro + d * rd) / rad);
+    return d;
+}
+```
+
+### Step 6: Ray-Cylinder Intersection (With End Caps)
+
+```glsl
+// pa, pb: cylinder axis endpoints, ra: radius
+float iCylinder(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal,
+                vec3 pa, vec3 pb, float ra) {
+    vec3 ca = pb - pa;
+    vec3 oc = ro - pa;
+    float caca = dot(ca, ca);
+    float card = dot(ca, rd);
+    float caoc = dot(ca, oc);
+    float a = caca - card * card;
+    float b = caca * dot(oc, rd) - caoc * card;
+    float c = caca * dot(oc, oc) - caoc * caoc - ra * ra * caca;
+    float h = b * b - a * c;
+    if (h < 0.0) return MAX_DIST;
+    h = sqrt(h);
+    float d = (-b - h) / a;
+    float y = caoc + d * card;
+    if (y > 0.0 && y < caca && d >= distBound.x && d <= distBound.y) {
+        normal = (oc + d * rd - ca * y / caca) / ra;
+        return d;
+    }
+    d = ((y < 0.0 ? 0.0 : caca) - caoc) / card;
+    if (abs(b + a * d) < h && d >= distBound.x && d <= distBound.y) {
+        normal = normalize(ca * sign(y) / caca);
+        return d;
+    }
+    return MAX_DIST;
+}
+```
+
+### Step 7: Scene Intersection and Shading
+
+```glsl
+#define MAX_DIST 1e10
+
+vec3 worldHit(vec3 ro, vec3 rd, vec2 dist, out vec3 normal) {
+    vec3 d = vec3(dist, 0.0);
+    vec3 tmpNormal;
+    float t;
+
+    t = iPlane(ro, rd, d.xy, normal, vec3(0, 1, 0), 0.0);
+    if (t < d.y) { d.y = t; d.z = 1.0; }
+
+    t = iSphere(ro - vec3(0, 0.5, 0), rd, d.xy, tmpNormal, 0.5);
+    if (t < d.y) { d.y = t; d.z = 2.0; normal = tmpNormal; }
+
+    t = iBox(ro - vec3(2, 0.5, 0), rd, d.xy, tmpNormal, vec3(0.5));
+    if (t < d.y) { d.y = t; d.z = 3.0; normal = tmpNormal; }
+
+    return d;
+}
+
+vec3 shade(vec3 pos, vec3 normal, vec3 rd, vec3 albedo) {
+    vec3 lightDir = normalize(vec3(-1.0, 0.75, 1.0));
+    float diff = max(dot(normal, lightDir), 0.0);
+    float amb = 0.5 + 0.5 * normal.y;
+    return albedo * (amb * 0.2 + diff * 0.8);
+}
+```
+
+> **IMPORTANT: Critical pitfall**: `d.xy` must be passed as distBound, and `d.y` must be updated each time a closer intersection is found! If the deployed code passes the original `dist` directly without updating, the intersection logic will fail (all object distance tests become invalid), resulting in a completely black screen.
+
+```glsl
+#define MAX_BOUNCES 4
+#define EPSILON 0.001
+
+float schlickFresnel(float cosTheta, float F0) {
+    return F0 + (1.0 - F0) * pow(1.0 - cosTheta, 5.0);
+}
+
+vec3 radiance(vec3 ro, vec3 rd) {
+    vec3 color = vec3(0.0);
+    vec3 mask = vec3(1.0);
+    vec3 normal;
+    for (int i = 0; i < MAX_BOUNCES; i++) {
+        vec3 res = worldHit(ro, rd, vec2(EPSILON, MAX_DIST), normal);
+        if (res.z < 0.5) {
+            color += mask * vec3(0.6, 0.8, 1.0);
+            break;
+        }
+        vec3 hitPos = ro + rd * res.y;
+        vec3 albedo = getAlbedo(res.z);
+        float F = schlickFresnel(max(0.0, dot(normal, -rd)), 0.04);
+        color += mask * (1.0 - F) * shade(hitPos, normal, rd, albedo);
+        mask *= F * albedo;
+        rd = reflect(rd, normal);
+        ro = hitPos + EPSILON * rd;
+    }
+    return color;
+}
+```
+
+## Complete Code Template
+
+Runs directly on ShaderToy, includes sphere, plane, and box primitives with reflection and Blinn-Phong shading.
+
+> **IMPORTANT: Must follow**: All intersection function calls must use `d.xy` as the `distBound` parameter, and update `d.y` after each closer intersection is found. Incorrect usage: `iSphere(ro, rd, dist, ...)` (always using the original dist). Correct usage: `iSphere(ro, rd, d.xy, ...)` followed by `if (t < d.y) { d.y = t; ... }` to update.
+
+```glsl
+// Analytic Ray Tracing - Complete ShaderToy Template
+#define MAX_DIST 1e10
+#define EPSILON 0.001
+#define MAX_BOUNCES 3
+#define FOV 1.5
+#define GAMMA 2.2
+#define SHADOW_ENABLED true
+
+float iSphere(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, float r) {
+    float b = dot(ro, rd);
+    float c = dot(ro, ro) - r * r;
+    float h = b * b - c;
+    if (h < 0.0) return MAX_DIST;
+    h = sqrt(h);
+    float d1 = -b - h, d2 = -b + h;
+    if (d1 >= distBound.x && d1 <= distBound.y) { normal = normalize(ro + rd * d1); return d1; }
+    if (d2 >= distBound.x && d2 <= distBound.y) { normal = normalize(ro + rd * d2); return d2; }
+    return MAX_DIST;
+}
+
+float iPlane(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal,
+             vec3 planeNormal, float planeDist) {
+    float denom = dot(rd, planeNormal);
+    if (denom > 0.0) return MAX_DIST;
+    float d = -(dot(ro, planeNormal) + planeDist) / denom;
+    if (d < distBound.x || d > distBound.y) return MAX_DIST;
+    normal = planeNormal;
+    return d;
+}
+
+float iBox(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, vec3 boxSize) {
+    vec3 m = sign(rd) / max(abs(rd), 1e-8);
+    vec3 n = m * ro;
+    vec3 k = abs(m) * boxSize;
+    vec3 t1 = -n - k, t2 = -n + k;
+    float tN = max(max(t1.x, t1.y), t1.z);
+    float tF = min(min(t2.x, t2.y), t2.z);
+    if (tN > tF || tF <= 0.0) return MAX_DIST;
+    if (tN >= distBound.x && tN <= distBound.y) {
+        normal = -sign(rd) * step(t1.yzx, t1.xyz) * step(t1.zxy, t1.xyz); return tN;
+    }
+    if (tF >= distBound.x && tF <= distBound.y) {
+        normal = -sign(rd) * step(t1.yzx, t1.xyz) * step(t1.zxy, t1.xyz); return tF;
+    }
+    return MAX_DIST;
+}
+
+struct Material { vec3 albedo; float specular; float roughness; };
+
+Material getMaterial(float matId, vec3 pos) {
+    if (matId < 1.5) {
+        float checker = mod(floor(pos.x) + floor(pos.z), 2.0);
+        return Material(vec3(0.4 + 0.4 * checker), 0.02, 0.8);
+    } else if (matId < 2.5) { return Material(vec3(1.0, 0.2, 0.2), 0.5, 0.3); }
+    else if (matId < 3.5) { return Material(vec3(0.2, 0.4, 1.0), 0.1, 0.6); }
+    else if (matId < 4.5) { return Material(vec3(1.0, 1.0, 1.0), 0.8, 0.05); }
+    else { return Material(vec3(0.8, 0.6, 0.2), 0.3, 0.4); }
+}
+
+vec3 worldHit(vec3 ro, vec3 rd, vec2 dist, out vec3 normal) {
+    vec3 d = vec3(dist, 0.0); vec3 tmp; float t;
+    t = iPlane(ro, rd, d.xy, tmp, vec3(0, 1, 0), 0.0);
+    if (t < d.y) { d.y = t; d.z = 1.0; normal = tmp; }
+    t = iSphere(ro - vec3(-2.0, 1.0, 0.0), rd, d.xy, tmp, 1.0);
+    if (t < d.y) { d.y = t; d.z = 2.0; normal = tmp; }
+    t = iSphere(ro - vec3(0.0, 0.6, 2.0), rd, d.xy, tmp, 0.6);
+    if (t < d.y) { d.y = t; d.z = 3.0; normal = tmp; }
+    t = iSphere(ro - vec3(2.0, 0.8, -1.0), rd, d.xy, tmp, 0.8);
+    if (t < d.y) { d.y = t; d.z = 4.0; normal = tmp; }
+    t = iBox(ro - vec3(0.0, 0.5, -2.0), rd, d.xy, tmp, vec3(0.5));
+    if (t < d.y) { d.y = t; d.z = 5.0; normal = tmp; }
+    return d;
+}
+
+float shadow(vec3 ro, vec3 rd, float maxDist) {
+    vec3 normal;
+    vec3 res = worldHit(ro, rd, vec2(EPSILON, maxDist), normal);
+    return res.z > 0.5 ? 0.3 : 1.0;
+}
+
+float schlick(float cosTheta, float F0) {
+    return F0 + (1.0 - F0) * pow(1.0 - cosTheta, 5.0);
+}
+
+vec3 skyColor(vec3 rd) {
+    vec3 col = mix(vec3(1.0), vec3(0.5, 0.7, 1.0), 0.5 + 0.5 * rd.y);
+    vec3 sunDir = normalize(vec3(-0.4, 0.7, -0.6));
+    float sun = clamp(dot(sunDir, rd), 0.0, 1.0);
+    col += vec3(1.0, 0.6, 0.1) * (pow(sun, 4.0) + 10.0 * pow(sun, 32.0));
+    return col;
+}
+
+vec3 render(vec3 ro, vec3 rd) {
+    vec3 color = vec3(0.0), mask = vec3(1.0), normal;
+    for (int bounce = 0; bounce < MAX_BOUNCES; bounce++) {
+        vec3 res = worldHit(ro, rd, vec2(EPSILON, 100.0), normal);
+        if (res.z < 0.5) { color += mask * skyColor(rd); break; }
+        vec3 hitPos = ro + rd * res.y;
+        Material mat = getMaterial(res.z, hitPos);
+        vec3 lightDir = normalize(vec3(-0.4, 0.7, -0.6));
+        float diff = max(dot(normal, lightDir), 0.0);
+        float amb = 0.5 + 0.5 * normal.y;
+        float sha = SHADOW_ENABLED ? shadow(hitPos + normal * EPSILON, lightDir, 50.0) : 1.0;
+        vec3 halfVec = normalize(lightDir - rd);
+        float spec = pow(max(dot(normal, halfVec), 0.0), 1.0 / max(mat.roughness, 0.001));
+        float F = schlick(max(0.0, dot(normal, -rd)), 0.04 + 0.96 * mat.specular);
+        vec3 diffCol = mat.albedo * (amb * 0.15 + diff * sha * 0.85);
+        vec3 specCol = vec3(spec * sha);
+        color += mask * mix(diffCol, specCol, F * mat.specular);
+        mask *= F * mat.albedo;
+        if (length(mask) < 0.01) break;
+        rd = reflect(rd, normal);
+        ro = hitPos + normal * EPSILON;
+    }
+    return color;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+    float angle = 0.3 * iTime;
+    vec3 ro = vec3(4.0 * cos(angle), 2.5, 4.0 * sin(angle));
+    vec3 ta = vec3(0.0, 0.5, 0.0);
+    vec3 cw = normalize(ta - ro);
+    vec3 cu = normalize(cross(cw, vec3(0, 1, 0)));
+    vec3 cv = cross(cu, cw);
+    vec3 rd = normalize(p.x * cu + p.y * cv + FOV * cw);
+    vec3 col = render(ro, rd);
+    col = col / (1.0 + col);
+    col = pow(col, vec3(1.0 / GAMMA));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Path Tracing
+
+```glsl
+vec3 cosWeightedRandomHemisphereDirection(vec3 n, inout uint seed) {
+    uint ri = seed * 1103515245u + 12345u;
+    seed = ri;
+    float r1 = float(ri) / float(0xFFFFFFFFu);
+    ri = seed * 1103515245u + 12345u;
+    seed = ri;
+    float r2 = float(ri) / float(0xFFFFFFFFu);
+    vec3 uu = normalize(cross(n, abs(n.y) > 0.5 ? vec3(1,0,0) : vec3(0,1,0)));
+    vec3 vv = cross(uu, n);
+    float ra = sqrt(r1);
+    float rx = ra * cos(6.2831 * r2);
+    float ry = ra * sin(6.2831 * r2);
+    float rz = sqrt(1.0 - r1);
+    return normalize(rx * uu + ry * vv + rz * n);
+}
+// In the bounce loop, replace reflect with:
+// rd = cosWeightedRandomHemisphereDirection(normal, seed);
+// ro = hitPos + EPSILON * rd;
+// mask *= mat.albedo;
+```
+
+### Variant 2: Analytic Soft Shadow
+
+```glsl
+float sphSoftShadow(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    float d = sqrt(max(0.0, sph.w * sph.w - h)) - sph.w;
+    float t = -b - sqrt(max(h, 0.0));
+    return (t > 0.0) ? max(d, 0.0) / t : 1.0;
+}
+```
+
+### Variant 3: Analytic Anti-Aliasing
+
+```glsl
+vec2 sphDistances(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    float d = sqrt(max(0.0, sph.w * sph.w - h)) - sph.w;
+    return vec2(d, -b - sqrt(max(h, 0.0)));
+}
+// float px = 2.0 / iResolution.y;
+// vec2 dt = sphDistances(ro, rd, sph);
+// float coverage = 1.0 - clamp(dt.x / (dt.y * px), 0.0, 1.0);
+// col = mix(bgColor, sphereColor, coverage);
+```
+
+### Variant 4: Refraction (Snell's Law)
+
+```glsl
+// Requires a random number function defined first
+float hash1(float p) {
+    return fract(sin(p) * 43758.5453);
+}
+
+// Add refraction branch in the render loop:
+float refrIndex = 1.5; // glass ~ 1.5, water ~ 1.33
+bool inside = dot(rd, normal) > 0.0;
+vec3 n = inside ? -normal : normal;
+float eta = inside ? refrIndex : 1.0 / refrIndex;
+vec3 refracted = refract(rd, n, eta);
+float cosI = abs(dot(rd, n));
+float F = schlick(cosI, pow((1.0 - eta) / (1.0 + eta), 2.0));
+// Use bounce count as random seed
+float randSeed = float(bounce) + 1.0;
+if (refracted != vec3(0.0) && hash1(randSeed * 12.9898) > F) {
+    rd = refracted;
+} else {
+    rd = reflect(rd, n);
+}
+ro = hitPos + rd * EPSILON;
+```
+
+### Variant 5: Higher-Order Algebraic Surface (Sphere4)
+
+```glsl
+float iSphere4(vec3 ro, vec3 rd, vec2 distBound, inout vec3 normal, float ra) {
+    float r2 = ra * ra;
+    vec3 d2 = rd*rd, d3 = d2*rd;
+    vec3 o2 = ro*ro, o3 = o2*ro;
+    float ka = 1.0 / dot(d2, d2);
+    float k0 = ka * dot(ro, d3);
+    float k1 = ka * dot(o2, d2);
+    float k2 = ka * dot(o3, rd);
+    float k3 = ka * (dot(o2, o2) - r2 * r2);
+    float c0 = k1 - k0 * k0;
+    float c1 = k2 + 2.0 * k0 * (k0 * k0 - 1.5 * k1);
+    float c2 = k3 - 3.0 * k0 * (k0 * (k0 * k0 - 2.0 * k1) + 4.0/3.0 * k2);
+    float p = c0 * c0 * 3.0 + c2;
+    float q = c0 * c0 * c0 - c0 * c2 + c1 * c1;
+    float h = q * q - p * p * p * (1.0/27.0);
+    if (h < 0.0) return MAX_DIST;
+    h = sqrt(h);
+    float s = sign(q+h) * pow(abs(q+h), 1.0/3.0);
+    float t = sign(q-h) * pow(abs(q-h), 1.0/3.0);
+    vec2 v = vec2((s+t) + c0*4.0, (s-t) * sqrt(3.0)) * 0.5;
+    float r = length(v);
+    float d = -abs(v.y) / sqrt(r + v.x) - c1/r - k0;
+    if (d >= distBound.x && d <= distBound.y) {
+        vec3 pos = ro + rd * d;
+        normal = normalize(pos * pos * pos);
+        return d;
+    }
+    return MAX_DIST;
+}
+```
+
+## Common Errors and Safeguards
+
+### Error 1: Distance Bound Not Updated
+**Symptom**: Screen is completely black or shows only background
+**Cause**: `distBound.y` not updated after each intersection
+**Fix**:
+```glsl
+// WRONG:
+t = iSphere(ro, rd, dist, tmpNormal, 1.0);
+
+// CORRECT:
+t = iSphere(ro, rd, d.xy, tmpNormal, 1.0);
+if (t < d.y) { d.y = t; d.z = matId; normal = tmpNormal; }
+```
+
+### Error 2: EPSILON Too Small Causing Self-Intersection Artifacts
+**Symptom**: Black spots or artifacts on object surfaces
+**Cause**: `EPSILON` value too small, ray still intersects with itself
+**Fix**: Adjust EPSILON based on scene scale; typical values 1e-3 ~ 1e-2
+
+### Error 3: Variable Used as Loop Upper Bound
+**Symptom**: WebGL2 compilation failure or shader crash
+**Cause**: In GLSL ES 3.0, `for` loop upper bounds must be constants
+**Fix**: Use `#define` for loop upper bounds, and keep bounds to 4-5 iterations max
+
+### Error 4: Division by Zero Causing NaN
+**Symptom**: Stripe patterns from NaN propagation across the screen
+**Cause**: Division not protected when ray direction components are zero
+**Fix**: Always use `max(abs(x), 1e-8)` or similar protection
+
+### Error 5: Missing Hash Function in Refraction Variant
+**Symptom**: Compilation error "undefined function 'hash1'"
+**Fix**: Add the function definition when using the refraction variant:
+```glsl
+float hash1(float p) {
+    return fract(sin(p) * 43758.5453);
+}
+```
+
+## Performance & Composition
+
+**Performance tips**:
+- **Distance bound clipping**: Shorten `distBound.y` after each closer intersection; subsequent objects are automatically skipped
+- **Bounding sphere pre-test**: Pre-screen with bounding sphere for complex geometry (torus, etc.)
+- **Shadow ray simplification**: Only need to determine occlusion, no normal calculation needed
+- **Avoid unnecessary sqrt**: Return early when discriminant is negative; `c > 0.0 && b > 0.0` for fast rejection
+- **Grid acceleration**: Use 3D DDA grid traversal for large numbers of similar primitives
+
+**Composition approaches**:
+- **+ Raymarching SDF**: Analytic primitives define major structures, SDF handles complex details
+- **+ Volume effects**: Analytic intersection provides precise entry/exit distances for volume sampling within the range
+- **+ PBR materials**: Precise normals plug directly into Cook-Torrance and other BRDFs
+- **+ Spatial transforms**: Rotate/translate rays to reuse the same intersection functions
+- **+ Analytic AA/AO/soft shadows**: Fully analytic pipeline, zero noise
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/analytic-ray-tracing.md)
--- a/skills/shader-dev/techniques/anti-aliasing.md
+++ b/skills/shader-dev/techniques/anti-aliasing.md
@@ -0,0 +1,124 @@
+# Anti-Aliasing Techniques
+
+## Use Cases
+- Eliminating jagged edges (staircase artifacts) in ray-marched or SDF-rendered scenes
+- Smooth 2D SDF shape rendering
+- Post-process edge smoothing for any shader output
+- Temporal smoothing for noise reduction
+
+## Core Principles
+
+Anti-aliasing in shaders differs from rasterization pipelines. Without hardware MSAA on procedural geometry, we rely on analytical or post-process approaches.
+
+## Techniques
+
+### 1. Supersampling (SSAA) for Ray Marching
+
+Render multiple sub-pixel samples and average:
+```glsl
+#define AA 2  // 1=off, 2=4x, 3=9x
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec3 totalColor = vec3(0.0);
+    for (int m = 0; m < AA; m++)
+    for (int n = 0; n < AA; n++) {
+        vec2 offset = vec2(float(m), float(n)) / float(AA) - 0.5;
+        vec2 uv = (2.0 * (fragCoord + offset) - iResolution.xy) / iResolution.y;
+        vec3 col = render(uv);
+        totalColor += col;
+    }
+    fragColor = vec4(totalColor / float(AA * AA), 1.0);
+}
+```
+Cost: AA^2 × full render. Use AA=2 for quality, AA=1 for development.
+
+### 2. SDF Analytical Anti-Aliasing
+
+For 2D SDF shapes, use pixel width to compute smooth edges:
+```glsl
+float d = sdShape(uv);
+float fw = fwidth(d);  // screen-space derivative of SDF
+float alpha = smoothstep(fw, -fw, d);  // smooth edge over exactly 1 pixel
+
+// Alternative: manual pixel width for more control
+float pixelWidth = 2.0 / iResolution.y;  // approximate pixel size in UV space
+float alpha2 = smoothstep(pixelWidth, -pixelWidth, d);
+```
+
+For 3D SDF scenes, apply anti-aliasing at the edge of geometry:
+```glsl
+// After ray marching, at the surface:
+float edgeFade = 1.0 - smoothstep(0.0, 0.01 * t, lastSdfValue);
+// t = ray distance — scales threshold with distance for consistent edge width
+```
+
+### 3. Temporal Anti-Aliasing (TAA) Basics
+
+Blend current frame with previous frame using a multipass buffer:
+```glsl
+// Buffer A: render with sub-pixel jitter
+vec2 jitter = (hash22(vec2(iFrame)) - 0.5) / iResolution.xy;
+vec2 uv = (fragCoord + jitter) / iResolution.xy;
+vec3 currentColor = render(uv);
+
+// Buffer A output: store current render
+fragColor = vec4(currentColor, 1.0);
+
+// Image shader: blend with history
+vec3 current = texture(iChannel0, fragCoord / iResolution.xy).rgb;  // this frame
+vec3 history = texture(iChannel1, fragCoord / iResolution.xy).rgb;  // previous frame
+float blend = 0.9;  // higher = smoother but more ghosting
+fragColor = vec4(mix(current, history, blend), 1.0);
+```
+Note: Full TAA also needs motion vectors and neighborhood clamping to avoid ghosting.
+
+### 4. FXAA (Fast Approximate Anti-Aliasing)
+
+Simplified post-process edge detection and smoothing:
+```glsl
+vec3 fxaa(sampler2D tex, vec2 uv, vec2 texelSize) {
+    // Sample center and 4 neighbors
+    vec3 rgbM = texture(tex, uv).rgb;
+    vec3 rgbN = texture(tex, uv + vec2(0.0, texelSize.y)).rgb;
+    vec3 rgbS = texture(tex, uv - vec2(0.0, texelSize.y)).rgb;
+    vec3 rgbE = texture(tex, uv + vec2(texelSize.x, 0.0)).rgb;
+    vec3 rgbW = texture(tex, uv - vec2(texelSize.x, 0.0)).rgb;
+
+    // Luma for edge detection
+    vec3 lumaCoeff = vec3(0.299, 0.587, 0.114);
+    float lumaN = dot(rgbN, lumaCoeff);
+    float lumaS = dot(rgbS, lumaCoeff);
+    float lumaE = dot(rgbE, lumaCoeff);
+    float lumaW = dot(rgbW, lumaCoeff);
+    float lumaM = dot(rgbM, lumaCoeff);
+
+    float lumaMin = min(lumaM, min(min(lumaN, lumaS), min(lumaE, lumaW)));
+    float lumaMax = max(lumaM, max(max(lumaN, lumaS), max(lumaE, lumaW)));
+    float lumaRange = lumaMax - lumaMin;
+
+    // Skip if edge contrast is low
+    if (lumaRange < max(0.0312, lumaMax * 0.125)) return rgbM;
+
+    // Blend along edge direction
+    vec2 dir;
+    dir.x = -((lumaN + lumaS) - 2.0 * lumaM);
+    dir.y = ((lumaE + lumaW) - 2.0 * lumaM);
+    float dirReduce = max(lumaRange * 0.25, 1.0 / 128.0);
+    float rcpDirMin = 1.0 / (min(abs(dir.x), abs(dir.y)) + dirReduce);
+    dir = clamp(dir * rcpDirMin, -8.0, 8.0) * texelSize;
+
+    vec3 rgbA = 0.5 * (texture(tex, uv + dir * (1.0/3.0 - 0.5)).rgb +
+                        texture(tex, uv + dir * (2.0/3.0 - 0.5)).rgb);
+    return rgbA;
+}
+```
+
+## Choosing the Right Approach
+
+| Method | Cost | Quality | Best For |
+|--------|------|---------|----------|
+| SSAA 2x2 | 4× render | Excellent | Final quality renders |
+| SDF analytical | Minimal | Great for SDF | 2D shapes, UI elements |
+| TAA | 1× + blend | Good + temporal | Animated scenes with multipass |
+| FXAA | 1 pass post | Good | Any scene, post-process only |
+
+→ For deeper details, see [reference/anti-aliasing.md](../reference/anti-aliasing.md)
--- a/skills/shader-dev/techniques/atmospheric-scattering.md
+++ b/skills/shader-dev/techniques/atmospheric-scattering.md
@@ -0,0 +1,522 @@
+# Atmospheric & Subsurface Scattering
+
+## Use Cases
+- Sky rendering (sunrise/sunset/noon/night)
+- Aerial perspective
+- Sun halo (Mie scattering haze)
+- Planetary atmosphere rim glow
+- Translucent material SSS (candles, skin, jelly)
+- Volumetric light (God rays)
+
+## Core Principles
+
+Three physical mechanisms:
+
+**Rayleigh scattering** — molecular-scale particles, β_R(λ) ∝ 1/λ⁴, shorter wavelengths scatter more strongly (blue sky / red sunset).
+Sea-level values: `vec3(5.5e-6, 13.0e-6, 22.4e-6)` m⁻¹.
+Phase function: `P_R(θ) = 3/(16π) × (1 + cos²θ)`, symmetric forward-backward.
+
+**Mie scattering** — aerosol particles, wavelength-independent, strong forward scattering (sun halo).
+Sea-level value: `vec3(21e-6)` m⁻¹.
+Phase function: Henyey-Greenstein, `g ≈ 0.76~0.88`.
+
+**Beer-Lambert attenuation** — `T(A→B) = exp(-∫ σ_e(s) ds)`, exponential decay of light through a medium.
+
+**Algorithm flow**: ray march along the view ray; at each sample point: compute density → compute optical depth toward the sun → Beer-Lambert attenuation → phase function weighting → accumulate.
+
+## Implementation Steps
+
+### Step 1: Ray-Sphere Intersection
+
+```glsl
+// Returns (t_near, t_far); no intersection when t_near > t_far
+vec2 raySphereIntersect(vec3 p, vec3 dir, float r) {
+    float b = dot(p, dir);
+    float c = dot(p, p) - r * r;
+    float d = b * b - c;
+    if (d < 0.0) return vec2(1e5, -1e5);
+    d = sqrt(d);
+    return vec2(-b - d, -b + d);
+}
+```
+
+### Step 2: Atmospheric Physical Constants
+
+```glsl
+#define PLANET_RADIUS 6371e3
+#define ATMOS_RADIUS  6471e3
+#define PLANET_CENTER vec3(0.0)
+
+#define BETA_RAY vec3(5.5e-6, 13.0e-6, 22.4e-6)  // Rayleigh scattering coefficients
+#define BETA_MIE vec3(21e-6)                        // Mie scattering coefficients
+#define BETA_OZONE vec3(2.04e-5, 4.97e-5, 1.95e-6) // Ozone absorption
+
+#define MIE_G 0.76          // Anisotropy parameter 0.76~0.88
+#define MIE_EXTINCTION 1.1  // Extinction/scattering ratio
+
+#define H_RAY 8000.0        // Rayleigh scale height
+#define H_MIE 1200.0        // Mie scale height
+#define H_OZONE 30e3        // Ozone peak altitude
+#define OZONE_FALLOFF 4e3   // Ozone decay width
+
+#define PRIMARY_STEPS 32    // Primary ray steps 8(mobile)~64(high quality)
+#define LIGHT_STEPS 8       // Light direction steps 4~16
+```
+
+### Step 3: Phase Functions
+
+```glsl
+float phaseRayleigh(float cosTheta) {
+    return 3.0 / (16.0 * 3.14159265) * (1.0 + cosTheta * cosTheta);
+}
+
+// Henyey-Greenstein phase function
+float phaseMie(float cosTheta, float g) {
+    float gg = g * g;
+    float num = (1.0 - gg) * (1.0 + cosTheta * cosTheta);
+    float denom = (2.0 + gg) * pow(1.0 + gg - 2.0 * g * cosTheta, 1.5);
+    return 3.0 / (8.0 * 3.14159265) * num / denom;
+}
+```
+
+### Step 4: Atmospheric Density Sampling
+
+```glsl
+// Returns vec3(rayleigh, mie, ozone) density
+vec3 atmosphereDensity(vec3 pos, float planetRadius) {
+    float height = length(pos) - planetRadius;
+    float densityRay = exp(-height / H_RAY);
+    float densityMie = exp(-height / H_MIE);
+    float denom = (H_OZONE - height) / OZONE_FALLOFF;
+    float densityOzone = (1.0 / (denom * denom + 1.0)) * densityRay;
+    return vec3(densityRay, densityMie, densityOzone);
+}
+```
+
+### Step 5: Light Direction Optical Depth
+
+```glsl
+vec3 lightOpticalDepth(vec3 pos, vec3 sunDir) {
+    float atmoDist = raySphereIntersect(pos - PLANET_CENTER, sunDir, ATMOS_RADIUS).y;
+    float stepSize = atmoDist / float(LIGHT_STEPS);
+    float rayPos = stepSize * 0.5;
+    vec3 optDepth = vec3(0.0);
+    for (int i = 0; i < LIGHT_STEPS; i++) {
+        vec3 samplePos = pos + sunDir * rayPos;
+        float height = length(samplePos - PLANET_CENTER) - PLANET_RADIUS;
+        if (height < 0.0) return vec3(1e10); // Occluded by planet
+        optDepth += atmosphereDensity(samplePos, PLANET_RADIUS) * stepSize;
+        rayPos += stepSize;
+    }
+    return optDepth;
+}
+```
+
+### Step 6: Primary Scattering Integration
+
+```glsl
+vec3 calculateScattering(
+    vec3 rayOrigin, vec3 rayDir, float maxDist,
+    vec3 sunDir, vec3 sunIntensity
+) {
+    vec2 atmoHit = raySphereIntersect(rayOrigin - PLANET_CENTER, rayDir, ATMOS_RADIUS);
+    if (atmoHit.x > atmoHit.y) return vec3(0.0);
+
+    vec2 planetHit = raySphereIntersect(rayOrigin - PLANET_CENTER, rayDir, PLANET_RADIUS);
+
+    float tStart = max(atmoHit.x, 0.0);
+    float tEnd = atmoHit.y;
+    if (planetHit.x > 0.0) tEnd = min(tEnd, planetHit.x);
+    tEnd = min(tEnd, maxDist);
+
+    float stepSize = (tEnd - tStart) / float(PRIMARY_STEPS);
+    float cosTheta = dot(rayDir, sunDir);
+    float phaseR = phaseRayleigh(cosTheta);
+    float phaseM = phaseMie(cosTheta, MIE_G);
+
+    vec3 totalRay = vec3(0.0), totalMie = vec3(0.0), optDepthI = vec3(0.0);
+    float rayPos = tStart + stepSize * 0.5;
+
+    for (int i = 0; i < PRIMARY_STEPS; i++) {
+        vec3 samplePos = rayOrigin + rayDir * rayPos;
+        vec3 density = atmosphereDensity(samplePos, PLANET_RADIUS) * stepSize;
+        optDepthI += density;
+
+        vec3 optDepthL = lightOpticalDepth(samplePos, sunDir);
+        vec3 tau = BETA_RAY * (optDepthI.x + optDepthL.x)
+                 + BETA_MIE * 1.1 * (optDepthI.y + optDepthL.y)
+                 + BETA_OZONE * (optDepthI.z + optDepthL.z);
+        vec3 attenuation = exp(-tau);
+
+        totalRay += density.x * attenuation;
+        totalMie += density.y * attenuation;
+        rayPos += stepSize;
+    }
+
+    return sunIntensity * (totalRay * BETA_RAY * phaseR + totalMie * BETA_MIE * phaseM);
+}
+```
+
+### Step 7: Tone Mapping
+
+```glsl
+vec3 tonemapExposure(vec3 color) { return 1.0 - exp(-color); }
+
+vec3 tonemapReinhard(vec3 color) {
+    float l = dot(color, vec3(0.2126, 0.7152, 0.0722));
+    vec3 tc = color / (color + 1.0);
+    return mix(color / (l + 1.0), tc, tc);
+}
+
+vec3 gammaCorrect(vec3 color) { return pow(color, vec3(1.0 / 2.2)); }
+```
+
+## Complete Code Template
+
+Fully runnable Rayleigh + Mie atmospheric scattering for ShaderToy:
+
+```glsl
+#define PI 3.14159265359
+
+#define PLANET_RADIUS 6371e3
+#define ATMOS_RADIUS  6471e3
+#define PLANET_CENTER vec3(0.0)
+
+#define BETA_RAY vec3(5.5e-6, 13.0e-6, 22.4e-6)
+#define BETA_MIE vec3(21e-6)
+#define BETA_OZONE vec3(2.04e-5, 4.97e-5, 1.95e-6)
+
+#define MIE_G 0.76
+#define MIE_EXTINCTION 1.1
+
+#define H_RAY 8e3
+#define H_MIE 1.2e3
+#define H_OZONE 30e3
+#define OZONE_FALLOFF 4e3
+
+#define PRIMARY_STEPS 32
+#define LIGHT_STEPS 8
+
+#define SUN_INTENSITY vec3(40.0)
+
+vec2 raySphereIntersect(vec3 p, vec3 dir, float r) {
+    float b = dot(p, dir);
+    float c = dot(p, p) - r * r;
+    float d = b * b - c;
+    if (d < 0.0) return vec2(1e5, -1e5);
+    d = sqrt(d);
+    return vec2(-b - d, -b + d);
+}
+
+float phaseRayleigh(float cosTheta) {
+    return 3.0 / (16.0 * PI) * (1.0 + cosTheta * cosTheta);
+}
+
+float phaseMie(float cosTheta, float g) {
+    float gg = g * g;
+    float num = (1.0 - gg) * (1.0 + cosTheta * cosTheta);
+    float denom = (2.0 + gg) * pow(1.0 + gg - 2.0 * g * cosTheta, 1.5);
+    return 3.0 / (8.0 * PI) * num / denom;
+}
+
+vec3 atmosphereDensity(vec3 pos) {
+    float height = length(pos - PLANET_CENTER) - PLANET_RADIUS;
+    float dRay = exp(-height / H_RAY);
+    float dMie = exp(-height / H_MIE);
+    float dOzone = (1.0 / (pow((H_OZONE - height) / OZONE_FALLOFF, 2.0) + 1.0)) * dRay;
+    return vec3(dRay, dMie, dOzone);
+}
+
+vec3 calculateScattering(
+    vec3 start, vec3 dir, float maxDist,
+    vec3 sceneColor, vec3 sunDir, vec3 sunIntensity
+) {
+    start -= PLANET_CENTER;
+
+    float a = dot(dir, dir);
+    float b = 2.0 * dot(dir, start);
+    float c = dot(start, start) - ATMOS_RADIUS * ATMOS_RADIUS;
+    float d = b * b - 4.0 * a * c;
+    if (d < 0.0) return sceneColor;
+
+    vec2 rayLen = vec2(
+        max((-b - sqrt(d)) / (2.0 * a), 0.0),
+        min((-b + sqrt(d)) / (2.0 * a), maxDist)
+    );
+    if (rayLen.x > rayLen.y) return sceneColor;
+
+    bool allowMie = maxDist > rayLen.y;
+    rayLen.y = min(rayLen.y, maxDist);
+    rayLen.x = max(rayLen.x, 0.0);
+
+    float stepSize = (rayLen.y - rayLen.x) / float(PRIMARY_STEPS);
+    float rayPos = rayLen.x + stepSize * 0.5;
+
+    vec3 totalRay = vec3(0.0);
+    vec3 totalMie = vec3(0.0);
+    vec3 optI = vec3(0.0);
+
+    float mu = dot(dir, sunDir);
+    float phaseR = phaseRayleigh(mu);
+    float phaseM = allowMie ? phaseMie(mu, MIE_G) : 0.0;
+
+    for (int i = 0; i < PRIMARY_STEPS; i++) {
+        vec3 pos = start + dir * rayPos;
+        float height = length(pos) - PLANET_RADIUS;
+
+        vec3 density = vec3(exp(-height / H_RAY), exp(-height / H_MIE), 0.0);
+        float dOzone = (H_OZONE - height) / OZONE_FALLOFF;
+        density.z = (1.0 / (dOzone * dOzone + 1.0)) * density.x;
+        density *= stepSize;
+        optI += density;
+
+        float la = dot(sunDir, sunDir);
+        float lb = 2.0 * dot(sunDir, pos);
+        float lc = dot(pos, pos) - ATMOS_RADIUS * ATMOS_RADIUS;
+        float ld = lb * lb - 4.0 * la * lc;
+        float lightStepSize = (-lb + sqrt(ld)) / (2.0 * la * float(LIGHT_STEPS));
+        float lightPos = lightStepSize * 0.5;
+        vec3 optL = vec3(0.0);
+
+        for (int j = 0; j < LIGHT_STEPS; j++) {
+            vec3 posL = pos + sunDir * lightPos;
+            float heightL = length(posL) - PLANET_RADIUS;
+            vec3 densityL = vec3(exp(-heightL / H_RAY), exp(-heightL / H_MIE), 0.0);
+            float dOzoneL = (H_OZONE - heightL) / OZONE_FALLOFF;
+            densityL.z = (1.0 / (dOzoneL * dOzoneL + 1.0)) * densityL.x;
+            densityL *= lightStepSize;
+            optL += densityL;
+            lightPos += lightStepSize;
+        }
+
+        vec3 attn = exp(
+            -BETA_RAY * (optI.x + optL.x)
+            - BETA_MIE * MIE_EXTINCTION * (optI.y + optL.y)
+            - BETA_OZONE * (optI.z + optL.z)
+        );
+
+        totalRay += density.x * attn;
+        totalMie += density.y * attn;
+
+        rayPos += stepSize;
+    }
+
+    vec3 opacity = exp(-(BETA_MIE * optI.y + BETA_RAY * optI.x + BETA_OZONE * optI.z));
+
+    return (
+        phaseR * BETA_RAY * totalRay +
+        phaseM * BETA_MIE * totalMie
+    ) * sunIntensity + sceneColor * opacity;
+}
+
+vec3 getCameraVector(vec3 resolution, vec2 coord) {
+    vec2 uv = coord.xy / resolution.xy - vec2(0.5);
+    uv.x *= resolution.x / resolution.y;
+    return normalize(vec3(uv.x, uv.y, -1.0));
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec3 rayDir = getCameraVector(iResolution, fragCoord);
+    vec3 cameraPos = vec3(0.0, PLANET_RADIUS + 100.0, 0.0);
+    vec3 sunDir = normalize(vec3(0.0, cos(-iTime / 8.0), sin(-iTime / 8.0)));
+
+    vec4 scene = vec4(0.0, 0.0, 0.0, 1e12);
+    vec3 sunDisk = vec3(dot(rayDir, sunDir) > 0.9998 ? 3.0 : 0.0);
+    scene.xyz = sunDisk;
+
+    vec2 groundHit = raySphereIntersect(cameraPos - PLANET_CENTER, rayDir, PLANET_RADIUS);
+    if (groundHit.x > 0.0) {
+        scene.w = groundHit.x;
+        vec3 hitPos = cameraPos + rayDir * groundHit.x - PLANET_CENTER;
+        vec3 normal = normalize(hitPos);
+        float shadow = max(0.0, dot(normal, sunDir));
+        scene.xyz = vec3(0.1, 0.15, 0.08) * shadow;
+    }
+
+    vec3 col = calculateScattering(
+        cameraPos, rayDir, scene.w,
+        scene.xyz, sunDir, SUN_INTENSITY
+    );
+
+    col = 1.0 - exp(-col);
+    col = pow(col, vec3(1.0 / 2.2));
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Advanced Fog Models
+
+Three progressive fog techniques, from simple to physically motivated. These can be used standalone or combined with the full atmospheric scattering above.
+
+### Level 1: Basic Exponential Fog
+```glsl
+vec3 applyFog(vec3 col, float t) {
+    float fogAmount = 1.0 - exp(-t * density);
+    vec3 fogColor = vec3(0.5, 0.6, 0.7);
+    return mix(col, fogColor, fogAmount);
+}
+```
+
+### Level 2: Sun-Aware Fog (Scattering Tint)
+Fog color shifts warm when looking toward the sun — creates a very natural light dispersion effect:
+```glsl
+vec3 applyFogSun(vec3 col, float t, vec3 rd, vec3 sunDir) {
+    float fogAmount = 1.0 - exp(-t * density);
+    float sunAmount = max(dot(rd, sunDir), 0.0);
+    vec3 fogColor = mix(
+        vec3(0.5, 0.6, 0.7),          // base fog (blue-grey)
+        vec3(1.0, 0.9, 0.7),          // sun-facing fog (warm gold)
+        pow(sunAmount, 8.0)
+    );
+    return mix(col, fogColor, fogAmount);
+}
+```
+
+### Level 3: Height-Based Fog (Analytical Integration)
+Density decreases exponentially with altitude: `d(y) = a * exp(-b * y)`. The formula is an exact analytical integral along the ray, not an approximation — fog pools in valleys and clears at altitude:
+```glsl
+vec3 applyFogHeight(vec3 col, float t, vec3 ro, vec3 rd) {
+    float a = 0.5;    // density multiplier
+    float b = 0.3;    // density falloff with height
+    float fogAmount = (a / b) * exp(-ro.y * b) * (1.0 - exp(-t * rd.y * b)) / rd.y;
+    fogAmount = clamp(fogAmount, 0.0, 1.0);
+    vec3 fogColor = vec3(0.5, 0.6, 0.7);
+    return mix(col, fogColor, fogAmount);
+}
+```
+
+### Level 4: Extinction + Inscattering Separation
+Independent RGB coefficients for absorption and scattering — allows chromatic fog effects where different wavelengths scatter differently:
+```glsl
+vec3 applyFogPhysical(vec3 col, float t, vec3 fogCol) {
+    vec3 be = vec3(0.02, 0.025, 0.03);   // extinction coefficients (RGB)
+    vec3 bi = vec3(0.015, 0.02, 0.025);  // inscattering coefficients (RGB)
+    vec3 extinction = exp(-t * be);
+    vec3 inscatter = (1.0 - exp(-t * bi));
+    return col * extinction + fogCol * inscatter;
+}
+```
+
+## Common Variants
+
+### Variant 1: Non-Physical Analytic Approximation (No Ray March)
+
+Extremely low-cost analytic sky, suitable for mobile / backgrounds.
+
+```glsl
+#define zenithDensity(x) 0.7 / pow(max(x - 0.1, 0.0035), 0.75)
+
+vec3 getSkyAbsorption(vec3 skyColor, float zenith) {
+    return exp2(skyColor * -zenith) * 2.0;
+}
+
+float getMie(vec2 p, vec2 lp) {
+    float disk = clamp(1.0 - pow(distance(p, lp), 0.1), 0.0, 1.0);
+    return disk * disk * (3.0 - 2.0 * disk) * 2.0 * 3.14159;
+}
+
+vec3 getAtmosphericScattering(vec2 screenPos, vec2 lightPos) {
+    vec3 skyColor = vec3(0.39, 0.57, 1.0);
+    float zenith = zenithDensity(screenPos.y);
+    float rayleighMult = 1.0 + pow(1.0 - clamp(distance(screenPos, lightPos), 0.0, 1.0), 2.0) * 1.57;
+    vec3 absorption = getSkyAbsorption(skyColor, zenith);
+    vec3 sunAbsorption = getSkyAbsorption(skyColor, zenithDensity(lightPos.y + 0.1));
+    vec3 sky = skyColor * zenith * rayleighMult;
+    vec3 mie = getMie(screenPos, lightPos) * sunAbsorption;
+    float sunDist = clamp(length(max(lightPos.y + 0.1, 0.0)), 0.0, 1.0);
+    vec3 totalSky = mix(sky * absorption, sky / (sky + 0.5), sunDist);
+    totalSky += mie;
+    totalSky *= sunAbsorption * 0.5 + 0.5 * length(sunAbsorption);
+    return totalSky;
+}
+```
+
+### Variant 2: Ozone Absorption Layer
+
+Already integrated in the complete template. Set `BETA_OZONE` to a non-zero value to enable, producing a deeper blue zenith and purple tones at sunset.
+
+### Variant 3: Subsurface Scattering (SSS)
+
+For translucent materials (candles/skin/jelly), using SDF-estimated thickness to control light transmission.
+
+```glsl
+float subsurface(vec3 p, vec3 viewDir, vec3 normal) {
+    vec3 scatterDir = refract(viewDir, normal, 1.0 / 1.5); // IOR 1.3~2.0
+    vec3 samplePos = p;
+    float accumThickness = 0.0;
+    float MAX_SCATTER = 2.5;
+    for (float i = 0.1; i < MAX_SCATTER; i += 0.2) {
+        samplePos += scatterDir * i;
+        accumThickness += map(samplePos); // SDF function
+    }
+    float thickness = max(0.0, -accumThickness);
+    float SCATTER_STRENGTH = 16.0;
+    return SCATTER_STRENGTH * pow(MAX_SCATTER * 0.5, 3.0) / thickness;
+}
+// Usage: float ss = max(0.0, subsurface(hitPos, viewDir, normal));
+// vec3 sssColor = albedo * smoothstep(0.0, 2.0, pow(ss, 0.6));
+// finalColor = mix(lambertian, sssColor, 0.7) + specular;
+```
+
+### Variant 4: LUT Precomputation Pipeline (Production Grade)
+
+Precompute Transmittance/Multiple Scattering/Sky-View into LUTs, only table lookups at runtime.
+
+```glsl
+// Transmittance LUT query (Hillaire 2020)
+vec3 getValFromTLUT(sampler2D tex, vec2 bufferRes, vec3 pos, vec3 sunDir) {
+    float height = length(pos);
+    vec3 up = pos / height;
+    float sunCosZenithAngle = dot(sunDir, up);
+    vec2 uv = vec2(
+        256.0 * clamp(0.5 + 0.5 * sunCosZenithAngle, 0.0, 1.0),
+        64.0 * max(0.0, min(1.0, (height - groundRadiusMM) / (atmosphereRadiusMM - groundRadiusMM)))
+    );
+    uv /= bufferRes;
+    return texture(tex, uv).rgb;
+}
+```
+
+### Variant 5: Analytic Fast Atmosphere (with Aerial Perspective)
+
+Analytic exponential approximation replacing ray march, with distance attenuation support.
+
+```glsl
+void getRayleighMie(float opticalDepth, float densityR, float densityM, out vec3 R, out vec3 M) {
+    vec3 C_RAYLEIGH = vec3(5.802, 13.558, 33.100) * 1e-6;
+    vec3 C_MIE = vec3(3.996e-6);
+    R = (1.0 - exp(-opticalDepth * densityR * C_RAYLEIGH / 2.5)) * 2.5;
+    M = (1.0 - exp(-opticalDepth * densityM * C_MIE / 0.5)) * 0.5;
+}
+
+vec3 getLightTransmittance(vec3 lightDir) {
+    vec3 C_RAYLEIGH = vec3(5.802, 13.558, 33.100) * 1e-6;
+    vec3 C_MIE = vec3(3.996e-6);
+    vec3 C_OZONE = vec3(0.650, 1.881, 0.085) * 1e-6;
+    float extinction = exp(-clamp(lightDir.y + 0.05, 0.0, 1.0) * 40.0)
+                     + exp(-clamp(lightDir.y + 0.5, 0.0, 1.0) * 5.0) * 0.4
+                     + pow(clamp(1.0 - lightDir.y, 0.0, 1.0), 2.0) * 0.02
+                     + 0.002;
+    return exp(-(C_RAYLEIGH + C_MIE + C_OZONE) * extinction * 1e6);
+}
+```
+
+## Performance & Composition
+
+### Performance Tips
+- **Nested ray march (O(N*M))**: reduce step counts (mobile: PRIMARY=12, LIGHT=4), use analytic approximation instead of light march, precompute Transmittance LUT
+- **Dense exp()/pow()**: Schlick approximation replacing HG phase function — `k = 1.55*g - 0.55*g³; phase = (1-k²) / (4π*(1+k*cosθ)²)`
+- **Full-screen per-pixel**: Sky-View LUT (200x200) table lookup, half-resolution rendering + bilinear upsampling
+- **Banding dithering**: non-uniform step offset of 0.3, temporal blue noise dithering
+
+### Composition Tips
+- **+ Volumetric clouds**: atmospheric transmittance determines sun color reaching the cloud layer, set `maxDist` to cloud distance
+- **+ SDF scene**: SDF hit distance → `maxDist`, scene color → `sceneColor`, automatic aerial perspective
+- **+ God Rays**: add occlusion to scattering integration (shadow map or additional ray march)
+- **+ Terrain**: `finalColor = terrainColor * transmittance + inscattering`
+- **+ PBR/SSS**: `diffuse = mix(lambert, sss, 0.7); final = ambient + albedo*diffuse + specular + fresnel*env`
+
+## Further Reading
+
+For full step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/atmospheric-scattering.md)
--- a/skills/shader-dev/techniques/camera-effects.md
+++ b/skills/shader-dev/techniques/camera-effects.md
@@ -0,0 +1,115 @@
+# Camera & Lens Effects
+
+## Use Cases
+- Adding cinematic depth of field (bokeh blur)
+- Motion blur for dynamic scenes
+- Lens distortion and chromatic aberration
+- Film grain and photographic realism
+
+## Techniques
+
+### 1. Depth of Field (Thin Lens Model)
+
+Simulate camera aperture by jittering ray origins on a virtual lens disk:
+```glsl
+// For each sample:
+vec2 lens = randomDisk(seed) * apertureSize;  // random point on aperture
+vec3 focalPoint = ro + rd * focalDistance;     // point on focal plane
+vec3 newRo = ro + cameraRight * lens.x + cameraUp * lens.y;  // offset origin
+vec3 newRd = normalize(focalPoint - newRo);   // new ray toward focal point
+
+// Accumulate multiple samples (16-64) for smooth bokeh
+// Use with AA loop or temporal accumulation
+
+// Disk sampling helper:
+vec2 randomDisk(float seed) {
+    float angle = hash11(seed) * 6.2831853;
+    float radius = sqrt(hash11(seed + 1.0));
+    return vec2(cos(angle), sin(angle)) * radius;
+}
+```
+
+Parameters:
+- `apertureSize`: 0.0 = pinhole (sharp), 0.1-0.5 = visible bokeh
+- `focalDistance`: distance to the in-focus plane
+
+### 2. Post-Process Depth of Field (Single Pass)
+
+Cheaper approximation using depth buffer blur:
+```glsl
+vec3 dofPostProcess(sampler2D colorTex, sampler2D depthTex, vec2 uv) {
+    float depth = texture(depthTex, uv).r;
+    float coc = abs(depth - focalDepth) * apertureSize;  // circle of confusion
+    coc = clamp(coc, 0.0, maxBlur);
+
+    vec3 color = vec3(0.0);
+    float total = 0.0;
+    // 16-tap Poisson disk sampling
+    for (int i = 0; i < 16; i++) {
+        vec2 offset = poissonDisk[i] * coc / iResolution.xy;
+        color += texture(colorTex, uv + offset).rgb;
+        total += 1.0;
+    }
+    return color / total;
+}
+```
+
+### 3. Motion Blur (Velocity-Based)
+
+```glsl
+// Simple radial motion blur (camera rotation)
+vec3 motionBlur(vec2 uv, float amount) {
+    vec3 color = vec3(0.0);
+    vec2 center = vec2(0.5);
+    int samples = 8;
+    for (int i = 0; i < samples; i++) {
+        float t = float(i) / float(samples - 1) - 0.5;
+        vec2 sampleUV = mix(uv, center, t * amount);
+        color += texture(iChannel0, sampleUV).rgb;
+    }
+    return color / float(samples);
+}
+
+// Time-based motion blur for ray marching
+// Sample multiple time offsets within the frame:
+// float t_shutter = iTime + (hash11(seed) - 0.5) * shutterSpeed;
+// Use t_shutter instead of iTime for scene animation
+```
+
+### 4. Lens Distortion
+
+```glsl
+// Barrel/pincushion distortion
+vec2 lensDistortion(vec2 uv, float k1, float k2) {
+    vec2 centered = uv - 0.5;
+    float r2 = dot(centered, centered);
+    float distortion = 1.0 + k1 * r2 + k2 * r2 * r2;
+    return centered * distortion + 0.5;
+    // k1 > 0: pincushion, k1 < 0: barrel
+}
+```
+
+### 5. Film Grain
+
+```glsl
+vec3 filmGrain(vec3 color, vec2 uv, float time, float intensity) {
+    float grain = hash12(uv * iResolution.xy + fract(time) * 1000.0) - 0.5;
+    // Apply more grain in darker areas (realistic film response)
+    float luminance = dot(color, vec3(0.299, 0.587, 0.114));
+    float grainAmount = intensity * (1.0 - luminance * 0.5);
+    return color + grain * grainAmount;
+}
+```
+
+### 6. Vignette
+
+```glsl
+vec3 vignette(vec3 color, vec2 uv, float intensity, float smoothness) {
+    vec2 centered = uv - 0.5;
+    float dist = length(centered);
+    float vig = smoothstep(0.5, 0.5 - smoothness, dist);
+    return color * mix(1.0 - intensity, 1.0, vig);
+}
+```
+
+→ For deeper details, see [reference/camera-effects.md](../reference/camera-effects.md)
--- a/skills/shader-dev/techniques/cellular-automata.md
+++ b/skills/shader-dev/techniques/cellular-automata.md
@@ -0,0 +1,531 @@
+# Cellular Automata & Reaction-Diffusion
+
+## Use Cases
+- GPU grid evolution simulation (cellular automata, reaction-diffusion)
+- Organic texture generation: spots, stripes, mazes, coral, vein patterns
+- Conway's Game of Life and variants (custom B/S rule sets)
+- Gray-Scott reaction-diffusion real-time visualization
+- Using simulation results to drive 3D surface displacement, lighting, or coloring
+
+## Core Principles
+
+### Cellular Automata (CA)
+Each cell on a discrete grid updates based on **its own state** and **neighbor states** according to fixed rules. Conway B3/S23 rules:
+- Dead cell with exactly 3 live neighbors → birth
+- Live cell with 2 or 3 live neighbors → survival
+- Otherwise → death
+
+Neighbor computation (Moore neighborhood, 8 neighbors): `k = Σ cell(px + offset)`
+
+### Reaction-Diffusion (RD)
+Gray-Scott model — two substances u (activator) and v (inhibitor) diffuse and react:
+```
+∂u/∂t = Du·∇²u - u·v² + F·(1-u)
+∂v/∂t = Dv·∇²v + u·v² - (F+k)·v
+```
+- `Du, Dv`: diffusion coefficients (Du > Dv produces patterns)
+- `F`: feed rate, `k`: kill rate
+- `∇²`: Laplacian, discretized using a nine-point stencil
+
+Key parameters `(F, k)` determine the pattern:
+| F | k | Pattern |
+|---|---|---------|
+| 0.035 | 0.065 | spots |
+| 0.040 | 0.060 | stripes |
+| 0.025 | 0.055 | labyrinthine |
+| 0.050 | 0.065 | solitons |
+
+## Implementation Steps
+
+### Step 1: Grid State Storage & Self-Feedback
+```glsl
+// Buffer A: iChannel0 bound to Buffer A itself (self-feedback)
+vec4 prevState = texelFetch(iChannel0, ivec2(fragCoord), 0);
+// UV sampling (supports texture filtering)
+vec2 uv = fragCoord / iResolution.xy;
+vec4 prevSmooth = texture(iChannel0, uv);
+```
+
+### Step 2: Initialization (Noise Seeding)
+```glsl
+float hash1(float n) {
+    return fract(sin(n) * 138.5453123);
+}
+vec3 hash33(in vec2 p) {
+    float n = sin(dot(p, vec2(41, 289)));
+    return fract(vec3(2097152, 262144, 32768) * n);
+}
+
+if (iFrame < 2) {
+    // CA: random binary
+    float f = step(0.9, hash1(fragCoord.x * 13.0 + hash1(fragCoord.y * 71.1)));
+    fragColor = vec4(f, 0.0, 0.0, 0.0);
+} else if (iFrame < 10) {
+    // RD: random continuous values
+    vec3 noise = hash33(fragCoord / iResolution.xy + vec2(53, 43) * float(iFrame));
+    fragColor = vec4(noise, 1.0);
+}
+```
+
+### Step 3: Neighbor Sampling & Laplacian
+```glsl
+// --- Method A: Discrete CA neighbor counting ---
+int cell(in ivec2 p) {
+    ivec2 r = ivec2(textureSize(iChannel0, 0));
+    p = (p + r) % r;  // wrap-around boundary
+    return (texelFetch(iChannel0, p, 0).x > 0.5) ? 1 : 0;
+}
+ivec2 px = ivec2(fragCoord);
+int k = cell(px+ivec2(-1,-1)) + cell(px+ivec2(0,-1)) + cell(px+ivec2(1,-1))
+      + cell(px+ivec2(-1, 0))                        + cell(px+ivec2(1, 0))
+      + cell(px+ivec2(-1, 1)) + cell(px+ivec2(0, 1)) + cell(px+ivec2(1, 1));
+
+// --- Method B: Nine-point Laplacian (for RD) ---
+// Weights: diagonal 0.5, cross 1.0, center -6.0
+vec2 laplacian(vec2 uv) {
+    vec2 px = 1.0 / iResolution.xy;
+    vec4 P = vec4(px, 0.0, -px.x);
+    return
+        0.5 * texture(iChannel0, uv - P.xy).xy
+      +       texture(iChannel0, uv - P.zy).xy
+      + 0.5 * texture(iChannel0, uv - P.wy).xy
+      +       texture(iChannel0, uv - P.xz).xy
+      - 6.0 * texture(iChannel0, uv).xy
+      +       texture(iChannel0, uv + P.xz).xy
+      + 0.5 * texture(iChannel0, uv + P.wy).xy
+      +       texture(iChannel0, uv + P.zy).xy
+      + 0.5 * texture(iChannel0, uv + P.xy).xy;
+}
+
+// --- Method C: 3x3 weighted blur (Gaussian approximation) ---
+// Weights: corner 1, edge 2, center 4, total 16
+float blur3x3(vec2 uv) {
+    vec3 e = vec3(1, 0, -1);
+    vec2 px = 1.0 / iResolution.xy;
+    float res = 0.0;
+    res += texture(iChannel0, uv + e.xx*px).x + texture(iChannel0, uv + e.xz*px).x
+         + texture(iChannel0, uv + e.zx*px).x + texture(iChannel0, uv + e.zz*px).x;
+    res += (texture(iChannel0, uv + e.xy*px).x + texture(iChannel0, uv + e.yx*px).x
+          + texture(iChannel0, uv + e.yz*px).x + texture(iChannel0, uv + e.zy*px).x) * 2.;
+    res += texture(iChannel0, uv + e.yy*px).x * 4.;
+    return res / 16.0;
+}
+```
+
+### Step 4: State Update Rules
+```glsl
+// --- CA: Conway B3/S23 ---
+int e = cell(px);
+float f = (((k == 2) && (e == 1)) || (k == 3)) ? 1.0 : 0.0;
+
+// --- CA: Generic Birth/Survival bitmask ---
+float ff = 0.0;
+if (currentAlive) {
+    ff = ((stayset & (1 << (k - 1))) > 0) ? float(k) : 0.0;
+} else {
+    ff = ((bornset & (1 << (k - 1))) > 0) ? 1.0 : 0.0;
+}
+
+// --- RD: Gray-Scott update ---
+float u = prevState.x;
+float v = prevState.y;
+vec2 Duv = laplacian(uv) * DIFFUSION;
+float du = Duv.x - u * v * v + F * (1.0 - u);
+float dv = Duv.y + u * v * v - (F + k) * v;
+fragColor.xy = clamp(vec2(u + du * DT, v + dv * DT), 0.0, 1.0);
+
+// --- RD: Simplified version (gradient + random decay) ---
+float avgRD = blur3x3(uv);
+vec2 pwr = (1.0 / iResolution.xy) * 1.5;
+vec2 lap = vec2(
+    texture(iChannel0, uv + vec2(pwr.x, 0)).y - texture(iChannel0, uv - vec2(pwr.x, 0)).y,
+    texture(iChannel0, uv + vec2(0, pwr.y)).y - texture(iChannel0, uv - vec2(0, pwr.y)).y
+);
+uv = uv + lap * (1.0 / iResolution.xy) * 3.0;
+float newRD = texture(iChannel0, uv).x + (noise.z - 0.5) * 0.0025 - 0.002;
+newRD += dot(texture(iChannel0, uv + (noise.xy - 0.5) / iResolution.xy).xy, vec2(1, -1)) * 0.145;
+```
+
+### Step 5: Visualization & Coloring
+```glsl
+// Color mapping
+float c = 1.0 - texture(iChannel0, uv).y;
+vec3 col = pow(vec3(1.5, 1, 1) * c, vec3(1, 4, 12));
+
+// Gradient normals + bump lighting
+vec3 normal(vec2 uv) {
+    vec3 delta = vec3(1.0 / iResolution.xy, 0.0);
+    float du = texture(iChannel0, uv + delta.xz).x - texture(iChannel0, uv - delta.xz).x;
+    float dv = texture(iChannel0, uv + delta.zy).x - texture(iChannel0, uv - delta.zy).x;
+    return normalize(vec3(du, dv, 1.0));
+}
+
+// Specular highlight
+float c2 = 1.0 - texture(iChannel0, uv + 0.5 / iResolution.xy).y;
+col += vec3(0.36, 0.73, 1.0) * max(c2 * c2 - c * c, 0.0) * 12.0;
+
+// Vignette + gamma
+col *= pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.125) * 1.15;
+col *= smoothstep(0.0, 1.0, iTime / 2.0);
+fragColor = vec4(sqrt(min(col, 1.0)), 1.0);
+```
+
+## Complete Code Template
+
+ShaderToy setup: Buffer A's iChannel0 = Buffer A (self-feedback, linear filtering). Image's iChannel0 = Buffer A.
+
+### Standalone HTML JS Skeleton (Ping-Pong Render Pipeline)
+
+CA/RD requires framebuffer self-feedback. The following JS skeleton demonstrates the correct WebGL2 multi-pass ping-pong structure:
+
+```javascript
+<script>
+let frameCount = 0;
+let mouse = [0, 0, 0, 0];
+
+const canvas = document.getElementById('c');
+const gl = canvas.getContext('webgl2');
+const ext = gl.getExtension('EXT_color_buffer_float');
+
+function createShader(type, src) {
+    const s = gl.createShader(type);
+    gl.shaderSource(s, src);
+    gl.compileShader(s);
+    if (!gl.getShaderParameter(s, gl.COMPILE_STATUS))
+        console.error(gl.getShaderInfoLog(s));
+    return s;
+}
+function createProgram(vsSrc, fsSrc) {
+    const p = gl.createProgram();
+    gl.attachShader(p, createShader(gl.VERTEX_SHADER, vsSrc));
+    gl.attachShader(p, createShader(gl.FRAGMENT_SHADER, fsSrc));
+    gl.linkProgram(p);
+    return p;
+}
+
+const vsSource = `#version 300 es
+in vec2 pos;
+void main(){ gl_Position=vec4(pos,0,1); }`;
+
+// fsBuffer / fsImage: adapt from the Buffer A / Image templates below (uniform declarations + void main entry point)
+
+const progBuf = createProgram(vsSource, fsBuffer);
+const progImg = createProgram(vsSource, fsImage);
+
+function createFBO(w, h) {
+    const tex = gl.createTexture();
+    gl.bindTexture(gl.TEXTURE_2D, tex);
+    const fmt = ext ? gl.RGBA16F : gl.RGBA;
+    const typ = ext ? gl.FLOAT : gl.UNSIGNED_BYTE;
+    gl.texImage2D(gl.TEXTURE_2D, 0, fmt, w, h, 0, gl.RGBA, typ, null);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
+    const fbo = gl.createFramebuffer();
+    gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+    gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex, 0);
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+    return { fbo, tex };
+}
+
+let W, H, bufA, bufB;
+
+const vao = gl.createVertexArray();
+gl.bindVertexArray(vao);
+const vbo = gl.createBuffer();
+gl.bindBuffer(gl.ARRAY_BUFFER, vbo);
+gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([-1,-1, 1,-1, -1,1, 1,1]), gl.STATIC_DRAW);
+gl.enableVertexAttribArray(0);
+gl.vertexAttribPointer(0, 2, gl.FLOAT, false, 0, 0);
+
+function resize() {
+    canvas.width = W = innerWidth;
+    canvas.height = H = innerHeight;
+    bufA = createFBO(W, H);
+    bufB = createFBO(W, H);
+    frameCount = 0;
+}
+addEventListener('resize', resize);
+resize();
+
+canvas.addEventListener('mousedown', e => { mouse[2] = e.clientX; mouse[3] = H - e.clientY; });
+canvas.addEventListener('mouseup', () => { mouse[2] = 0; mouse[3] = 0; });
+canvas.addEventListener('mousemove', e => { mouse[0] = e.clientX; mouse[1] = H - e.clientY; });
+
+function render(t) {
+    t *= 0.001;
+
+    // Buffer pass: read bufA → write bufB
+    gl.useProgram(progBuf);
+    gl.bindFramebuffer(gl.FRAMEBUFFER, bufB.fbo);
+    gl.viewport(0, 0, W, H);
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, bufA.tex);
+    gl.uniform1i(gl.getUniformLocation(progBuf, 'iChannel0'), 0);
+    gl.uniform2f(gl.getUniformLocation(progBuf, 'iResolution'), W, H);
+    gl.uniform1f(gl.getUniformLocation(progBuf, 'iTime'), t);
+    gl.uniform1i(gl.getUniformLocation(progBuf, 'iFrame'), frameCount);
+    gl.uniform4f(gl.getUniformLocation(progBuf, 'iMouse'), ...mouse);
+    gl.bindVertexArray(vao);
+    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
+    [bufA, bufB] = [bufB, bufA];
+
+    // Image pass: read bufA → screen
+    gl.useProgram(progImg);
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+    gl.viewport(0, 0, W, H);
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, bufA.tex);
+    gl.uniform1i(gl.getUniformLocation(progImg, 'iChannel0'), 0);
+    gl.uniform2f(gl.getUniformLocation(progImg, 'iResolution'), W, H);
+    gl.uniform1f(gl.getUniformLocation(progImg, 'iTime'), t);
+    gl.uniform1i(gl.getUniformLocation(progImg, 'iFrame'), frameCount);
+    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
+
+    frameCount++;
+    requestAnimationFrame(render);
+}
+requestAnimationFrame(render);
+</script>
+```
+
+### Buffer A (Simulation Computation)
+```glsl
+// Gray-Scott Reaction-Diffusion — Buffer A (Simulation)
+// iChannel0 = Buffer A (self-feedback, linear filtering)
+
+#define DU 0.210          // u diffusion coefficient (0.1~0.3)
+#define DV 0.105          // v diffusion coefficient (0.05~0.15)
+#define F  0.040          // feed rate (0.01~0.08)
+#define K  0.060          // kill rate (0.04~0.07)
+#define DT 1.0            // time step (0.5~2.0)
+#define INIT_FRAMES 10
+
+float hash1(float n) {
+    return fract(sin(n) * 138.5453123);
+}
+vec3 hash33(vec2 p) {
+    float n = sin(dot(p, vec2(41.0, 289.0)));
+    return fract(vec3(2097152.0, 262144.0, 32768.0) * n);
+}
+
+// Nine-point Laplacian: diagonal 0.05, cross 0.2, center -1.0
+vec2 laplacian9(vec2 uv) {
+    vec2 px = 1.0 / iResolution.xy;
+    vec2 c  = texture(iChannel0, uv).xy;
+    vec2 n  = texture(iChannel0, uv + vec2( 0, px.y)).xy;
+    vec2 s  = texture(iChannel0, uv + vec2( 0,-px.y)).xy;
+    vec2 e  = texture(iChannel0, uv + vec2( px.x, 0)).xy;
+    vec2 w  = texture(iChannel0, uv + vec2(-px.x, 0)).xy;
+    vec2 ne = texture(iChannel0, uv + vec2( px.x, px.y)).xy;
+    vec2 nw = texture(iChannel0, uv + vec2(-px.x, px.y)).xy;
+    vec2 se = texture(iChannel0, uv + vec2( px.x,-px.y)).xy;
+    vec2 sw = texture(iChannel0, uv + vec2(-px.x,-px.y)).xy;
+    return (n + s + e + w) * 0.2 + (ne + nw + se + sw) * 0.05 - c;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+
+    // Initialization
+    if (iFrame < INIT_FRAMES) {
+        float rnd = hash1(fragCoord.x * 13.0 + hash1(fragCoord.y * 71.1 + float(iFrame)));
+        float u = 1.0;
+        float v = (rnd > 0.9) ? 1.0 : 0.0;
+        vec2 center = iResolution.xy * 0.5;
+        if (abs(fragCoord.x - center.x) < 20.0 && abs(fragCoord.y - center.y) < 20.0) {
+            v = hash1(fragCoord.x * 7.0 + fragCoord.y * 13.0) > 0.5 ? 1.0 : 0.0;
+        }
+        fragColor = vec4(u, v, 0.0, 1.0);
+        return;
+    }
+
+    // Read current state
+    vec2 state = texture(iChannel0, uv).xy;
+    float u = state.x;
+    float v = state.y;
+
+    // Gray-Scott equations
+    vec2 lap = laplacian9(uv);
+    float uvv = u * v * v;
+    float du = DU * lap.x - uvv + F * (1.0 - u);
+    float dv = DV * lap.y + uvv - (F + K) * v;
+
+    u += du * DT;
+    v += dv * DT;
+
+    // Mouse interaction: click to add v
+    if (iMouse.z > 0.0) {
+        if (length(fragCoord - iMouse.xy) < 10.0) v = 1.0;
+    }
+
+    fragColor = vec4(clamp(u, 0.0, 1.0), clamp(v, 0.0, 1.0), 0.0, 1.0);
+}
+```
+
+### Image (Visualization Output)
+```glsl
+// Gray-Scott Reaction-Diffusion — Image (Visualization)
+// iChannel0 = Buffer A (linear filtering)
+
+#define LIGHT_STRENGTH 12.0   // specular intensity (5~20)
+#define COLOR_MODE 0          // 0=blue-gold, 1=flame, 2=monochrome
+#define VIGNETTE 1            // 0=off, 1=vignette on
+
+vec3 getNormal(vec2 uv) {
+    vec2 d = 1.0 / iResolution.xy;
+    float du = texture(iChannel0, uv + vec2(d.x, 0)).y - texture(iChannel0, uv - vec2(d.x, 0)).y;
+    float dv = texture(iChannel0, uv + vec2(0, d.y)).y - texture(iChannel0, uv - vec2(0, d.y)).y;
+    return normalize(vec3(du, dv, 0.05));
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    float val = texture(iChannel0, uv).y;
+    float c = 1.0 - val;
+
+    vec3 col;
+    #if COLOR_MODE == 0
+    float pattern = -cos(uv.x*0.75*3.14159-0.9)*cos(uv.y*1.5*3.14159-0.75)*0.5+0.5;
+    col = pow(vec3(1.5, 1.0, 1.0) * c, vec3(1.0, 4.0, 12.0));
+    col = mix(col, col.zyx, clamp(pattern - 0.2, 0.0, 1.0));
+    #elif COLOR_MODE == 1
+    col = vec3(c * 1.2, pow(c, 3.0), pow(c, 9.0));
+    #else
+    col = vec3(c);
+    #endif
+
+    float c2 = 1.0 - texture(iChannel0, uv + 0.5 / iResolution.xy).y;
+    col += vec3(0.36, 0.73, 1.0) * max(c2*c2 - c*c, 0.0) * LIGHT_STRENGTH;
+
+    #if VIGNETTE == 1
+    col *= pow(16.0*uv.x*uv.y*(1.0-uv.x)*(1.0-uv.y), 0.125) * 1.15;
+    #endif
+    col *= smoothstep(0.0, 1.0, iTime / 2.0);
+    fragColor = vec4(sqrt(clamp(col, 0.0, 1.0)), 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Conway's Game of Life (Discrete CA)
+```glsl
+int cell(in ivec2 p) {
+    ivec2 r = ivec2(textureSize(iChannel0, 0));
+    p = (p + r) % r;
+    return (texelFetch(iChannel0, p, 0).x > 0.5) ? 1 : 0;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    ivec2 px = ivec2(fragCoord);
+    int k = cell(px+ivec2(-1,-1)) + cell(px+ivec2(0,-1)) + cell(px+ivec2(1,-1))
+          + cell(px+ivec2(-1, 0))                        + cell(px+ivec2(1, 0))
+          + cell(px+ivec2(-1, 1)) + cell(px+ivec2(0, 1)) + cell(px+ivec2(1, 1));
+    int e = cell(px);
+    float f = (((k == 2) && (e == 1)) || (k == 3)) ? 1.0 : 0.0;
+    if (iFrame < 2)
+        f = step(0.9, fract(sin(fragCoord.x*13.0 + sin(fragCoord.y*71.1)) * 138.5));
+    fragColor = vec4(f, 0.0, 0.0, 1.0);
+}
+```
+
+### Variant 2: Configurable Rule Set CA (B/S Bitmask)
+```glsl
+#define BORN_SET  8        // birth bitmask, 8 = B3
+#define STAY_SET  12       // survival bitmask, 12 = S23
+#define LIVEVAL   2.0
+#define DECIMATE  1.0      // decay value
+
+float ff = 0.0;
+float ev = texelFetch(iChannel0, px, 0).w;
+if (ev > 0.5) {
+    if (DECIMATE > 0.0) ff = ev - DECIMATE;
+    if ((STAY_SET & (1 << (k - 1))) > 0) ff = LIVEVAL;
+} else {
+    ff = ((BORN_SET & (1 << (k - 1))) > 0) ? LIVEVAL : 0.0;
+}
+```
+
+### Variant 3: Separable Gaussian Blur RD (Multi-Buffer)
+```glsl
+// Buffer B: horizontal blur (reads Buffer A)
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    float h = 1.0 / iResolution.x;
+    vec4 sum = vec4(0.0);
+    sum += texture(iChannel0, fract(vec2(uv.x - 4.0*h, uv.y))) * 0.05;
+    sum += texture(iChannel0, fract(vec2(uv.x - 3.0*h, uv.y))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x - 2.0*h, uv.y))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x - 1.0*h, uv.y))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x,         uv.y))) * 0.16;
+    sum += texture(iChannel0, fract(vec2(uv.x + 1.0*h, uv.y))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x + 2.0*h, uv.y))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x + 3.0*h, uv.y))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x + 4.0*h, uv.y))) * 0.05;
+    fragColor = vec4(sum.xyz / 0.98, 1.0);
+}
+// Buffer C: vertical blur (reads Buffer B), same structure but along y-axis
+// Buffer A: reaction step reads Buffer C as the diffusion term
+```
+
+### Variant 4: Continuous Differential Operator CA (Vein/Fluid Style)
+```glsl
+#define STEPS 40       // advection step count (10~60)
+#define ts    0.2      // advection rotation strength
+#define cs   -2.0      // curl scale
+#define ls    0.05     // Laplacian scale
+#define amp   1.0      // self-amplification coefficient
+#define upd   0.4      // update smoothing coefficient
+
+// 3x3 discrete curl and divergence
+curl = uv_n.x - uv_s.x - uv_e.y + uv_w.y
+     + _D * (uv_nw.x + uv_nw.y + uv_ne.x - uv_ne.y
+           + uv_sw.y - uv_sw.x - uv_se.y - uv_se.x);
+div  = uv_s.y - uv_n.y - uv_e.x + uv_w.x
+     + _D * (uv_nw.x - uv_nw.y - uv_ne.x - uv_ne.y
+           + uv_sw.x + uv_sw.y + uv_se.y - uv_se.x);
+
+// Multi-step advection loop
+for (int i = 0; i < STEPS; i++) {
+    advect(off, vUv, texel, curl, div, lapl, blur);
+    offd = rot(offd, ts * curl);
+    off += offd;
+    ab += blur / float(STEPS);
+}
+```
+
+### Variant 5: RD-Driven 3D Surface (Raymarched RD)
+```glsl
+// Image pass: use RD texture for displacement in SDF
+vec2 map(in vec3 pos) {
+    vec3 p = normalize(pos);
+    vec2 uv;
+    uv.x = 0.5 + atan(p.z, p.x) / (2.0 * 3.14159);
+    uv.y = 0.5 - asin(p.y) / 3.14159;
+    float y = texture(iChannel0, uv).y;
+    float displacement = 0.1 * y;
+    float sd = length(pos) - (2.0 + displacement);
+    return vec2(sd, y);
+}
+```
+
+## Performance & Composition
+
+### Performance Tips
+- **texelFetch vs texture**: Use `texelFetch` for discrete CA (exact pixel reads), `texture` for continuous RD (bilinear interpolation)
+- **Separable blur replaces large kernels**: For large diffusion radii, use two-pass separable Gaussian (O(2N)) instead of NxN Laplacian (O(N²))
+- **Sub-iterations**: Multiple small DT steps within a single frame improves stability
+- **Reduced resolution**: Low-resolution buffer simulation + Image pass upsampling
+- **Avoid branching**: Use `step()/mix()/clamp()` instead of `if/else`
+
+### Composition Directions
+- **RD + Raymarching**: RD as heightmap mapped onto 3D surface for displacement modeling
+- **CA/RD + Particle Systems**: Field used as velocity field or spawn probability field to drive particles
+- **RD + Bump Lighting**: Compute normals from RD values, combine with environment maps for metallic etching/ripple effects
+- **CA + Color Decay Trails**: After death, fade per-frame with different RGB decay rates producing colored trails
+- **RD + Domain Transforms**: Apply vortex/spiral transforms before sampling, producing spiral swirl patterns
+
+## Further Reading
+
+Full step-by-step tutorial, mathematical derivations, and advanced usage in [reference](../reference/cellular-automata.md)
--- a/skills/shader-dev/techniques/color-palette.md
+++ b/skills/shader-dev/techniques/color-palette.md
@@ -0,0 +1,380 @@
+# Color Palette & Color Space
+
+## Use Cases
+- Mapping scalar values (distance, temperature, time, iteration count) to continuous color ramps
+- Perceptually uniform color interpolation/gradients
+- HDR rendering with linear-space workflow (sRGB decode -> shading -> tone mapping -> sRGB encode)
+- Physically realistic glow/flame/blackbody radiation colors
+
+## Core Principles
+
+Core: **map a scalar t in [0,1] to an RGB vec3**.
+
+### Cosine Palette
+```
+color(t) = a + b * cos(2pi * (c * t + d))
+```
+- **a** = brightness offset (~0.5), **b** = amplitude (~0.5), **c** = frequency, **d** = phase (the key parameter controlling color style)
+
+### HSV/HSL Branchless Conversion
+```
+rgb = clamp(abs(mod(H*6 + vec3(0,4,2), 6) - 3) - 1, 0, 1)
+```
+Uses piecewise linear functions to approximate RGB variation with hue. C1 continuity can be achieved via `rgb*rgb*(3-2*rgb)`.
+
+### CIE Lab/Lch Perceptually Uniform Interpolation
+RGB -> XYZ -> Lab -> Lch pipeline; interpolate in perceptually uniform space to avoid brightness discontinuities in RGB/HSV.
+
+### Blackbody Radiation Palette
+Temperature T -> Planckian locus approximation -> CIE chromaticity -> XYZ -> RGB, with Stefan-Boltzmann (T^4) controlling brightness.
+
+## Implementation
+
+### Cosine Palette
+```glsl
+// a: offset, b: amplitude, c: frequency, d: phase, t: input scalar
+vec3 palette(float t, vec3 a, vec3 b, vec3 c, vec3 d) {
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Classic Preset Parameters
+```glsl
+// Rainbow: d=(0.0, 0.33, 0.67)
+// Warm: d=(0.0, 0.10, 0.20)
+// Blue-purple to orange: c=(1,0.7,0.4) d=(0.0, 0.15, 0.20)
+// Warm-cool mix: a=(.8,.5,.4) b=(.2,.4,.2) c=(2,1,1) d=(0.0, 0.25, 0.25)
+
+// Simplified version: fixed a/b/c, only adjust d
+vec3 palette(float t) {
+    vec3 a = vec3(0.5, 0.5, 0.5);
+    vec3 b = vec3(0.5, 0.5, 0.5);
+    vec3 c = vec3(1.0, 1.0, 1.0);
+    vec3 d = vec3(0.263, 0.416, 0.557);
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### HSV -> RGB (Standard + Smooth)
+```glsl
+// Standard HSV -> RGB (branchless)
+vec3 hsv2rgb(vec3 c) {
+    vec3 rgb = clamp(abs(mod(c.x * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+    return c.z * mix(vec3(1.0), rgb, c.y);
+}
+
+// Smooth version (C1 continuous)
+vec3 hsv2rgb_smooth(vec3 c) {
+    vec3 rgb = clamp(abs(mod(c.x * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+    rgb = rgb * rgb * (3.0 - 2.0 * rgb); // Hermite smoothing
+    return c.z * mix(vec3(1.0), rgb, c.y);
+}
+```
+
+### HSL -> RGB
+```glsl
+vec3 hue2rgb(float h) {
+    return clamp(abs(mod(h * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+}
+
+vec3 hsl2rgb(float h, float s, float l) {
+    vec3 rgb = hue2rgb(h);
+    return l + s * (rgb - 0.5) * (1.0 - abs(2.0 * l - 1.0));
+}
+```
+
+### RGB -> HSV
+```glsl
+// Sam Hocevar branchless method
+vec3 rgb2hsv(vec3 c) {
+    vec4 K = vec4(0.0, -1.0 / 3.0, 2.0 / 3.0, -1.0);
+    vec4 p = mix(vec4(c.bg, K.wz), vec4(c.gb, K.xy), step(c.b, c.g));
+    vec4 q = mix(vec4(p.xyw, c.r), vec4(c.r, p.yzx), step(p.x, c.r));
+    float d = q.x - min(q.w, q.y);
+    float e = 1.0e-10;
+    return vec3(abs(q.z + (q.w - q.y) / (6.0 * d + e)), d / (q.x + e), q.x);
+}
+```
+
+### CIE Lab/Lch Conversion Pipeline
+```glsl
+float xyzF(float t) { return mix(pow(t, 1.0/3.0), 7.787037 * t + 0.139731, step(t, 0.00885645)); }
+float xyzR(float t) { return mix(t * t * t, 0.1284185 * (t - 0.139731), step(t, 0.20689655)); }
+
+vec3 rgb2lch(vec3 c) {
+    c *= mat3(0.4124, 0.3576, 0.1805,
+              0.2126, 0.7152, 0.0722,
+              0.0193, 0.1192, 0.9505);
+    c = vec3(xyzF(c.x), xyzF(c.y), xyzF(c.z));
+    vec3 lab = vec3(max(0.0, 116.0 * c.y - 16.0),
+                    500.0 * (c.x - c.y),
+                    200.0 * (c.y - c.z));
+    return vec3(lab.x, length(lab.yz), atan(lab.z, lab.y));
+}
+
+vec3 lch2rgb(vec3 c) {
+    c = vec3(c.x, cos(c.z) * c.y, sin(c.z) * c.y);
+    float lg = (1.0 / 116.0) * (c.x + 16.0);
+    vec3 xyz = vec3(xyzR(lg + 0.002 * c.y),
+                    xyzR(lg),
+                    xyzR(lg - 0.005 * c.z));
+    return xyz * mat3( 3.2406, -1.5372, -0.4986,
+                      -0.9689,  1.8758,  0.0415,
+                       0.0557, -0.2040,  1.0570);
+}
+
+// Circular hue interpolation
+float lerpAngle(float a, float b, float x) {
+    float ang = mod(mod((a - b), 6.28318) + 9.42477, 6.28318) - 3.14159;
+    return ang * x + b;
+}
+
+vec3 lerpLch(vec3 a, vec3 b, float x) {
+    return vec3(mix(b.xy, a.xy, x), lerpAngle(a.z, b.z, x));
+}
+```
+
+### sRGB Gamma & Tone Mapping
+```glsl
+// Precise sRGB encoding
+float sRGB_encode(float t) {
+    return mix(1.055 * pow(t, 1.0/2.4) - 0.055, 12.92 * t, step(t, 0.0031308));
+}
+vec3 sRGB_encode(vec3 c) {
+    return vec3(sRGB_encode(c.x), sRGB_encode(c.y), sRGB_encode(c.z));
+}
+
+// Fast approximation: pow(color, vec3(2.2)) / pow(color, vec3(1.0/2.2))
+
+// Reinhard tone mapping
+vec3 tonemap_reinhard(vec3 col) {
+    return col / (1.0 + col);
+}
+```
+
+### Blackbody Radiation Palette
+```glsl
+#define TEMP_MAX 4000.0 // Tunable: maximum temperature (K)
+vec3 blackbodyPalette(float t) {
+    t *= TEMP_MAX;
+    float cx = (0.860117757 + 1.54118254e-4 * t + 1.28641212e-7 * t * t)
+             / (1.0 + 8.42420235e-4 * t + 7.08145163e-7 * t * t);
+    float cy = (0.317398726 + 4.22806245e-5 * t + 4.20481691e-8 * t * t)
+             / (1.0 - 2.89741816e-5 * t + 1.61456053e-7 * t * t);
+    float d = 2.0 * cx - 8.0 * cy + 4.0;
+    vec3 XYZ = vec3(3.0 * cx / d, 2.0 * cy / d, 1.0 - (3.0 * cx + 2.0 * cy) / d);
+    vec3 RGB = mat3(3.240479, -0.969256, 0.055648,
+                   -1.537150,  1.875992, -0.204043,
+                   -0.498535,  0.041556,  1.057311) * vec3(XYZ.x / XYZ.y, 1.0, XYZ.z / XYZ.y);
+    return max(RGB, 0.0) * pow(t * 0.0004, 4.0);
+}
+```
+
+## Complete Code Template
+
+A ShaderToy shader demonstrating all core techniques:
+
+```glsl
+// === Procedural Color Palette Showcase ===
+#define PI  3.14159265
+#define TAU 6.28318530
+
+// ============ Tunable Parameters ============
+#define PALETTE_A vec3(0.5, 0.5, 0.5)   // Offset: increase = brighter overall
+#define PALETTE_B vec3(0.5, 0.5, 0.5)   // Amplitude: increase = more contrast
+#define PALETTE_C vec3(1.0, 1.0, 1.0)   // Frequency: increase = denser color variation
+#define PALETTE_D vec3(0.0, 0.33, 0.67) // Phase: change = completely different hues
+#define TEMP_MAX 4000.0                  // Blackbody max temperature (K)
+#define NUM_ITER 4                       // Fractal iteration count
+
+// ============ Color Functions ============
+
+vec3 cosinePalette(float t, vec3 a, vec3 b, vec3 c, vec3 d) {
+    return a + b * cos(TAU * (c * t + d));
+}
+
+vec3 palette(float t) {
+    return cosinePalette(t, PALETTE_A, PALETTE_B, PALETTE_C, PALETTE_D);
+}
+
+vec3 hsv2rgb(vec3 c) {
+    vec3 rgb = clamp(abs(mod(c.x * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+    rgb = rgb * rgb * (3.0 - 2.0 * rgb);
+    return c.z * mix(vec3(1.0), rgb, c.y);
+}
+
+vec3 hsl2rgb(float h, float s, float l) {
+    vec3 rgb = clamp(abs(mod(h * 6.0 + vec3(0.0, 4.0, 2.0), 6.0) - 3.0) - 1.0, 0.0, 1.0);
+    return l + s * (rgb - 0.5) * (1.0 - abs(2.0 * l - 1.0));
+}
+
+vec3 blackbodyPalette(float t) {
+    t *= TEMP_MAX;
+    float cx = (0.860117757 + 1.54118254e-4*t + 1.28641212e-7*t*t)
+             / (1.0 + 8.42420235e-4*t + 7.08145163e-7*t*t);
+    float cy = (0.317398726 + 4.22806245e-5*t + 4.20481691e-8*t*t)
+             / (1.0 - 2.89741816e-5*t + 1.61456053e-7*t*t);
+    float d = 2.0*cx - 8.0*cy + 4.0;
+    vec3 XYZ = vec3(3.0*cx/d, 2.0*cy/d, 1.0 - (3.0*cx + 2.0*cy)/d);
+    vec3 RGB = mat3(3.240479, -0.969256, 0.055648,
+                   -1.537150,  1.875992, -0.204043,
+                   -0.498535,  0.041556,  1.057311) * vec3(XYZ.x/XYZ.y, 1.0, XYZ.z/XYZ.y);
+    return max(RGB, 0.0) * pow(t * 0.0004, 4.0);
+}
+
+vec3 sRGB(vec3 c) { return pow(clamp(c, 0.0, 1.0), vec3(1.0/2.2)); }
+vec3 tonemap(vec3 c) { return c / (1.0 + c); }
+
+// ============ Main ============
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (fragCoord * 2.0 - iResolution.xy) / iResolution.y;
+    vec2 uv0 = uv;
+    float band = fragCoord.y / iResolution.y;
+    vec3 col = vec3(0.0);
+
+    if (band < 0.2) {
+        // Cosine Palette
+        float t = fragCoord.x / iResolution.x + iTime * 0.1;
+        col = palette(t);
+    } else if (band < 0.4) {
+        // HSV color wheel
+        float h = fragCoord.x / iResolution.x;
+        float v = (band - 0.2) / 0.2;
+        col = hsv2rgb(vec3(h + iTime * 0.05, 1.0, v));
+    } else if (band < 0.6) {
+        // HSL color wheel
+        float h = fragCoord.x / iResolution.x;
+        float l = (band - 0.4) / 0.2;
+        col = hsl2rgb(h + iTime * 0.05, 1.0, l);
+    } else if (band < 0.8) {
+        // Blackbody radiation
+        float t = fragCoord.x / iResolution.x;
+        col = tonemap(blackbodyPalette(t));
+    } else {
+        // Cosine Palette fractal glow
+        vec2 p = uv;
+        vec3 finalColor = vec3(0.0);
+        for (int i = 0; i < NUM_ITER; i++) {
+            p = fract(p * 1.5) - 0.5;
+            float d = length(p) * exp(-length(uv0));
+            vec3 c = palette(length(uv0) + float(i) * 0.4 + iTime * 0.4);
+            d = sin(d * 8.0 + iTime) / 8.0;
+            d = abs(d);
+            d = pow(0.01 / d, 1.2);
+            finalColor += c * d;
+        }
+        col = tonemap(finalColor);
+    }
+
+    // Band separator lines
+    float bandLine = smoothstep(0.003, 0.0, abs(fract(band * 5.0) - 0.5) - 0.49);
+    col *= 1.0 - bandLine * 0.8;
+    col = sRGB(col);
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Multi-Harmonic Cosine Palette (Anti-Aliased)
+```glsl
+vec3 fcos(vec3 x) {
+    vec3 w = fwidth(x);
+    return cos(x) * smoothstep(TAU, 0.0, w);
+}
+
+vec3 getColor(float t) {
+    vec3 col = vec3(0.4);
+    col += 0.12 * fcos(TAU * t *   1.0 + vec3(0.0, 0.8, 1.1));
+    col += 0.11 * fcos(TAU * t *   3.1 + vec3(0.3, 0.4, 0.1));
+    col += 0.10 * fcos(TAU * t *   5.1 + vec3(0.1, 0.7, 1.1));
+    col += 0.09 * fcos(TAU * t *   9.1 + vec3(0.2, 0.8, 1.4));
+    col += 0.08 * fcos(TAU * t *  17.1 + vec3(0.2, 0.6, 0.7));
+    col += 0.07 * fcos(TAU * t *  31.1 + vec3(0.1, 0.6, 0.7));
+    col += 0.06 * fcos(TAU * t *  65.1 + vec3(0.0, 0.5, 0.8));
+    col += 0.06 * fcos(TAU * t * 115.1 + vec3(0.1, 0.4, 0.7));
+    col += 0.09 * fcos(TAU * t * 265.1 + vec3(1.1, 1.4, 2.7));
+    return col;
+}
+```
+
+### Hash-Driven Per-Tile Color
+```glsl
+float hash12(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * 0.1031);
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+vec2 tileId = floor(uv);
+vec3 tileColor = palette(hash12(tileId));
+```
+
+### Saturation-Preserving Improved RGB Interpolation
+```glsl
+float getsat(vec3 c) {
+    float mi = min(min(c.x, c.y), c.z);
+    float ma = max(max(c.x, c.y), c.z);
+    return (ma - mi) / (ma + 1e-7);
+}
+
+vec3 iLerp(vec3 a, vec3 b, float x) {
+    vec3 ic = mix(a, b, x) + vec3(1e-6, 0.0, 0.0);
+    float sd = abs(getsat(ic) - mix(getsat(a), getsat(b), x));
+    vec3 dir = normalize(vec3(2.0*ic.x - ic.y - ic.z,
+                              2.0*ic.y - ic.x - ic.z,
+                              2.0*ic.z - ic.y - ic.x));
+    float lgt = dot(vec3(1.0), ic);
+    float ff = dot(dir, normalize(ic));
+    ic += 1.5 * dir * sd * ff * lgt;
+    return clamp(ic, 0.0, 1.0);
+}
+```
+
+### Circular Hue Interpolation
+```glsl
+// HSV space (hue [0,1])
+vec3 lerpHSV(vec3 a, vec3 b, float x) {
+    float hue = (mod(mod((b.x - a.x), 1.0) + 1.5, 1.0) - 0.5) * x + a.x;
+    return vec3(hue, mix(a.yz, b.yz, x));
+}
+
+// Lch space (hue [0, 2pi])
+float lerpAngle(float a, float b, float x) {
+    float ang = mod(mod((a - b), TAU) + PI * 3.0, TAU) - PI;
+    return ang * x + b;
+}
+```
+
+### Additive Color Stacking (Glow/HDR)
+```glsl
+vec3 finalColor = vec3(0.0);
+for (int i = 0; i < 4; i++) {
+    vec3 col = palette(length(uv) + float(i) * 0.4 + iTime * 0.4);
+    float glow = pow(0.01 / abs(sdfValue), 1.2);
+    finalColor += col * glow;
+}
+finalColor = finalColor / (1.0 + finalColor); // Reinhard tonemap
+```
+
+## Performance & Composition
+
+**Performance tips:**
+- Cosine Palette: ~3-4 clock cycles (1 MAD + 1 COS + 1 MAD)
+- HSV/HSL conversion: fully branchless using `mod`/`abs`/`clamp` vectorization
+- Multi-harmonic band-limited filtering: `fwidth()` + `smoothstep` adds ~2 extra instructions to eliminate aliasing
+- Lch pipeline ~57 instructions; if you only need "slightly better than RGB", use `iLerp` (~15 instructions) instead
+- sRGB approximation `pow(c, 2.2)` has <0.4% error and optimizes better in the compiler
+
+**Common combinations:**
+- **Cosine Palette + SDF Raymarching**: normals/distance/attributes as t input
+- **HSL/HSV + Data Visualization**: iteration count -> hue, saturation/brightness encode other dimensions
+- **Cosine Palette + Fractals/Noise**: `length(uv)` or `fbm(p)` + `iTime` driving dynamic colors
+- **Blackbody + Volume Rendering/Fire**: temperature field -> `blackbodyPalette()` -> physically plausible colors
+- **Linear space workflow**: sRGB decode -> linear shading -> tonemap -> sRGB encode
+- **Hash + Palette + Tiling**: `hash(tileID)` as palette input for unified color harmony
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/color-palette.md)
--- a/skills/shader-dev/techniques/csg-boolean-operations.md
+++ b/skills/shader-dev/techniques/csg-boolean-operations.md
@@ -0,0 +1,491 @@
+## WebGL2 Adaptation Requirements
+
+The code templates in this document use ShaderToy GLSL style. When generating standalone HTML pages, you must adapt for WebGL2:
+
+- Use `canvas.getContext("webgl2")`
+- First line of shaders: `#version 300 es`, add `precision highp float;` in fragment shaders
+- Vertex shader: `attribute` -> `in`, `varying` -> `out`
+- Fragment shader: `varying` -> `in`, `gl_FragColor` -> custom output variable (must be declared before `void main()`, e.g. `out vec4 outColor;`), `texture2D()` -> `texture()`
+- ShaderToy's `void mainImage(out vec4 fragColor, in vec2 fragCoord)` must be adapted to the standard `void main()` entry point
+
+# CSG Boolean Operations
+
+## Core Principles
+
+CSG boolean operations are per-point value operations on two distance fields:
+
+| Operation | Expression | Meaning |
+|-----------|-----------|---------|
+| Union | `min(d1, d2)` | Take nearest surface, keeping both shapes |
+| Intersection | `max(d1, d2)` | Take farthest surface, keeping only the overlap |
+| Subtraction | `max(d1, -d2)` | Cut d1 using the interior of d2 |
+
+**Smooth booleans** (smooth min/max) introduce a blending band in the transition region. The parameter `k` controls the blend band width (larger = rounder, `k=0` degenerates to hard boolean). Multiple variants exist with different mathematical properties.
+
+## Implementation Steps
+
+### Step 1: Hard Boolean Operations
+
+```glsl
+float opUnion(float d1, float d2) { return min(d1, d2); }
+float opIntersection(float d1, float d2) { return max(d1, d2); }
+float opSubtraction(float d1, float d2) { return max(d1, -d2); }
+```
+
+### Step 2: Smooth Union (Polynomial Version)
+
+```glsl
+// k: blend radius, typical values 0.05~0.5
+float opSmoothUnion(float d1, float d2, float k) {
+    float h = clamp(0.5 + 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) - k * h * (1.0 - h);
+}
+```
+
+### Step 3: Smooth Subtraction and Intersection (Polynomial Version)
+
+```glsl
+float opSmoothSubtraction(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 + d1) / k, 0.0, 1.0);
+    return mix(d2, -d1, h) + k * h * (1.0 - h);
+}
+
+float opSmoothIntersection(float d1, float d2, float k) {
+    float h = clamp(0.5 - 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) + k * h * (1.0 - h);
+}
+```
+
+### Step 4: Quadratic Optimized Version (Recommended as Default)
+
+```glsl
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+
+float smax(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+
+// Subtraction via smax
+float sSub(float d1, float d2, float k) {
+    return smax(d1, -d2, k);
+}
+```
+
+### Step 4b: Smooth Minimum Variant Library
+
+Different smin implementations have different mathematical properties. Choose based on your needs:
+
+| Variant | Rigid | Associative | Best For |
+|---------|-------|-------------|----------|
+| Quadratic (default above) | Yes | No | General use, fastest |
+| Cubic | Yes | No | Smoother C2 transitions |
+| Quartic | Yes | No | Highest quality blending |
+| Exponential | No | Yes | Multi-body blending (order-independent) |
+| Circular Geometric | Yes | Yes | Strict local blending |
+
+**Rigid**: preserves original SDF shape outside the blend region (no under-estimation).
+**Associative**: `smin(a, smin(b, c))` == `smin(smin(a, b), c)` — important when blending many objects where evaluation order varies.
+
+```glsl
+// --- Cubic Polynomial smin (C2 continuous, smoother transitions) ---
+float sminCubic(float a, float b, float k) {
+    k *= 6.0;
+    float h = max(k - abs(a - b), 0.0) / k;
+    return min(a, b) - h * h * h * k * (1.0 / 6.0);
+}
+
+// --- Quartic Polynomial smin (C3 continuous, highest quality) ---
+float sminQuartic(float a, float b, float k) {
+    k *= 16.0 / 3.0;
+    float h = max(k - abs(a - b), 0.0) / k;
+    return min(a, b) - h * h * h * (4.0 - h) * k * (1.0 / 16.0);
+}
+
+// --- Exponential smin (associative — order independent for multi-body blending) ---
+float sminExp(float a, float b, float k) {
+    float r = exp2(-a / k) + exp2(-b / k);
+    return -k * log2(r);
+}
+
+// --- Circular Geometric smin (rigid + local + associative) ---
+float sminCircle(float a, float b, float k) {
+    k *= 1.0 / (1.0 - sqrt(0.5));
+    return max(k, min(a, b)) - length(max(k - vec2(a, b), 0.0));
+}
+
+// --- Gradient-aware smin (carries material/color through blending) ---
+// x = distance, yzw = material properties or color components
+vec4 sminColor(vec4 a, vec4 b, float k) {
+    k *= 4.0;
+    float h = max(k - abs(a.x - b.x), 0.0) / (2.0 * k);
+    return vec4(
+        min(a.x, b.x) - h * h * k,
+        mix(a.yzw, b.yzw, (a.x < b.x) ? h : 1.0 - h)
+    );
+}
+
+// --- Smooth maximum from any smin variant ---
+// smax(a, b, k) = -smin(-a, -b, k)
+// Smooth subtraction: smax(d1, -d2, k)
+// Smooth intersection: smax(d1, d2, k)
+```
+
+### Step 5: Basic SDF Primitives
+
+```glsl
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return length(max(d, 0.0)) + min(max(d.x, max(d.y, d.z)), 0.0);
+}
+
+float sdCylinder(vec3 p, float h, float r) {
+    vec2 d = abs(vec2(length(p.xz), p.y)) - vec2(r, h);
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0));
+}
+```
+
+### Step 6: CSG Composition for Scene Building
+
+```glsl
+float mapScene(vec3 p) {
+    float cube = sdBox(p, vec3(1.0));
+    float sphere = sdSphere(p, 1.2);
+    float cylX = sdCylinder(p.yzx, 2.0, 0.4);
+    float cylY = sdCylinder(p.xyz, 2.0, 0.4);
+    float cylZ = sdCylinder(p.zxy, 2.0, 0.4);
+
+    // (cube intersect sphere) - three cylinders = nut
+    float shape = opIntersection(cube, sphere);
+    float holes = opUnion(cylX, opUnion(cylY, cylZ));
+    return opSubtraction(shape, holes);
+}
+```
+
+### Step 7: Smooth CSG Modeling for Organic Forms
+
+```glsl
+// Use different k values for different body parts: large k for major joints, small k for fine details
+float mapCreature(vec3 p) {
+    float body = sdSphere(p, 0.5);
+    float head = sdSphere(p - vec3(0.0, 0.6, 0.3), 0.25);
+    float d = smin(body, head, 0.15);          // large blend
+
+    float leg = sdCylinder(p - vec3(0.2, -0.5, 0.0), 0.3, 0.08);
+    d = smin(d, leg, 0.08);                    // medium blend
+
+    float eye = sdSphere(p - vec3(0.05, 0.75, 0.4), 0.05);
+    d = smax(d, -eye, 0.02);                  // small blend for subtraction
+    return d;
+}
+```
+
+### Step 8: Ray Marching Main Loop
+
+```glsl
+float rayMarch(vec3 ro, vec3 rd, float maxDist) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + rd * t;
+        float d = mapScene(p);
+        if (d < SURF_DIST) return t;
+        t += d;
+        if (t > maxDist) break;
+    }
+    return -1.0;
+}
+```
+
+### Step 9: Normal Calculation (Tetrahedral Sampling, 4 Samples More Efficient Than 6 with Central Differences)
+
+```glsl
+vec3 calcNormal(vec3 pos) {
+    vec2 e = vec2(0.001, -0.001);
+    return normalize(
+        e.xyy * mapScene(pos + e.xyy) +
+        e.yyx * mapScene(pos + e.yyx) +
+        e.yxy * mapScene(pos + e.yxy) +
+        e.xxx * mapScene(pos + e.xxx)
+    );
+}
+```
+
+## Full Code Template
+
+```glsl
+// === CSG Boolean Operations - WebGL2 Full Template ===
+// Note: When generating HTML with this template, pass iTime, iResolution, etc. via uniforms
+
+#define MAX_STEPS 128
+#define MAX_DIST 50.0
+#define SURF_DIST 0.001
+#define SMOOTH_K 0.1
+
+// === Hard Boolean Operations ===
+float opUnion(float d1, float d2) { return min(d1, d2); }
+float opIntersection(float d1, float d2) { return max(d1, d2); }
+float opSubtraction(float d1, float d2) { return max(d1, -d2); }
+
+// === Smooth Boolean Operations (Quadratic Optimized) ===
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+
+float smax(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+
+// === SDF Primitives ===
+float sdSphere(vec3 p, float r) {
+    return length(p) - r;
+}
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return length(max(d, 0.0)) + min(max(d.x, max(d.y, d.z)), 0.0);
+}
+
+float sdCylinder(vec3 p, float h, float r) {
+    vec2 d = abs(vec2(length(p.xz), p.y)) - vec2(r, h);
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0));
+}
+
+float sdEllipsoid(vec3 p, vec3 r) {
+    float k0 = length(p / r);
+    float k1 = length(p / (r * r));
+    return k0 * (k0 - 1.0) / k1;
+}
+
+float sdCapsule(vec3 p, vec3 a, vec3 b, float r) {
+    vec3 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h) - r;
+}
+
+// === Scene Definition ===
+float mapScene(vec3 p) {
+    // Rotation animation
+    float angle = iTime * 0.3;
+    float c = cos(angle), s = sin(angle);
+    p.xz = mat2(c, -s, s, c) * p.xz;
+
+    // Primitives
+    float cube = sdBox(p, vec3(1.0));
+    float sphere = sdSphere(p, 1.25);
+    float cylR = 0.45;
+    float cylX = sdCylinder(p.yzx, 2.0, cylR);
+    float cylY = sdCylinder(p.xyz, 2.0, cylR);
+    float cylZ = sdCylinder(p.zxy, 2.0, cylR);
+
+    // Hard boolean combination: nut = (cube intersect sphere) - three cylinders
+    float nut = opSubtraction(
+        opIntersection(cube, sphere),
+        opUnion(cylX, opUnion(cylY, cylZ))
+    );
+
+    // Organic spheres -- smooth union blending
+    float blob1 = sdSphere(p - vec3(1.8, 0.0, 0.0), 0.4);
+    float blob2 = sdSphere(p - vec3(-1.8, 0.0, 0.0), 0.4);
+    float blob3 = sdSphere(p - vec3(0.0, 1.8, 0.0), 0.4);
+    float blobs = smin(blob1, smin(blob2, blob3, 0.3), 0.3);
+
+    return smin(nut, blobs, 0.15);
+}
+
+// === Normal Calculation (Tetrahedral Sampling) ===
+vec3 calcNormal(vec3 pos) {
+    vec2 e = vec2(0.001, -0.001);
+    return normalize(
+        e.xyy * mapScene(pos + e.xyy) +
+        e.yyx * mapScene(pos + e.yyx) +
+        e.yxy * mapScene(pos + e.yxy) +
+        e.xxx * mapScene(pos + e.xxx)
+    );
+}
+
+// === Ray Marching ===
+float rayMarch(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + rd * t;
+        float d = mapScene(p);
+        if (d < SURF_DIST) return t;
+        t += d;
+        if (t > MAX_DIST) break;
+    }
+    return -1.0;
+}
+
+// === Soft Shadows ===
+float calcSoftShadow(vec3 ro, vec3 rd, float k) {
+    float res = 1.0;
+    float t = 0.02;
+    for (int i = 0; i < 64; i++) {
+        float h = mapScene(ro + rd * t);
+        res = min(res, k * h / t);
+        t += clamp(h, 0.01, 0.2);
+        if (res < 0.001 || t > 20.0) break;
+    }
+    return clamp(res, 0.0, 1.0);
+}
+
+// === AO (Ambient Occlusion) ===
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i);
+        float d = mapScene(pos + h * nor);
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+
+// === Main Function (WebGL2 Adapted) ===
+out vec4 outColor;
+void main() {
+    vec2 uv = (gl_FragCoord.xy - 0.5 * iResolution.xy) / iResolution.y;
+
+    // Camera
+    float camDist = 4.0;
+    float camAngle = 0.3;
+    vec3 ro = vec3(
+        camDist * cos(iTime * 0.2),
+        camDist * sin(camAngle),
+        camDist * sin(iTime * 0.2)
+    );
+    vec3 ta = vec3(0.0, 0.0, 0.0);
+
+    // Camera matrix
+    vec3 ww = normalize(ta - ro);
+    vec3 uu = normalize(cross(ww, vec3(0.0, 1.0, 0.0)));
+    vec3 vv = cross(uu, ww);
+    vec3 rd = normalize(uv.x * uu + uv.y * vv + 2.0 * ww);
+
+    // Background color
+    vec3 col = vec3(0.4, 0.5, 0.6) - 0.3 * rd.y;
+
+    // Ray marching
+    float t = rayMarch(ro, rd);
+    if (t > 0.0) {
+        vec3 pos = ro + rd * t;
+        vec3 nor = calcNormal(pos);
+
+        vec3 lightDir = normalize(vec3(0.8, 0.6, -0.3));
+        float dif = clamp(dot(nor, lightDir), 0.0, 1.0);
+        float sha = calcSoftShadow(pos + nor * 0.01, lightDir, 16.0);
+        float ao = calcAO(pos, nor);
+        float amb = 0.5 + 0.5 * nor.y;
+
+        vec3 mate = vec3(0.2, 0.3, 0.4);
+        col = vec3(0.0);
+        col += mate * 2.0 * dif * sha;
+        col += mate * 0.3 * amb * ao;
+    }
+
+    col = pow(col, vec3(0.4545));
+    outColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Exponential Smooth Union
+
+```glsl
+float sminExp(float a, float b, float k) {
+    float res = exp(-k * a) + exp(-k * b);
+    return -log(res) / k;
+}
+```
+
+### Variant 2: Smooth Operations with Color Blending
+
+```glsl
+// Returns blend factor for the caller to blend colors
+float sminWithFactor(float a, float b, float k, out float blend) {
+    float h = clamp(0.5 + 0.5 * (b - a) / k, 0.0, 1.0);
+    blend = h;
+    return mix(b, a, h) - k * h * (1.0 - h);
+}
+// float blend;
+// float d = sminWithFactor(d1, d2, 0.1, blend);
+// vec3 color = mix(color2, color1, blend);
+
+// vec3 overload of smax
+vec3 smax(vec3 a, vec3 b, float k) {
+    vec3 h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+```
+
+### Variant 3: Stepwise CSG Modeling (Architectural/Industrial)
+
+```glsl
+float sdBuilding(vec3 p) {
+    float walls = sdBox(p, vec3(1.0, 0.8, 1.0));
+    vec3 roofP = p;
+    roofP.y -= 0.8;
+    float roof = sdBox(roofP, vec3(1.2, 0.3, 1.2));
+    float d = opUnion(walls, roof);
+
+    // Cut windows (exploiting symmetry)
+    vec3 winP = abs(p);
+    winP -= vec3(1.01, 0.3, 0.4);
+    float window = sdBox(winP, vec3(0.1, 0.15, 0.12));
+    d = opSubtraction(d, window);
+
+    // Hollow out interior
+    float hollow = sdBox(p, vec3(0.95, 0.75, 0.95));
+    d = opSubtraction(d, hollow);
+    return d;
+}
+```
+
+### Variant 4: Large-Scale Organic Character Modeling
+
+```glsl
+float mapCharacter(vec3 p) {
+    float body = sdEllipsoid(p, vec3(0.5, 0.4, 0.6));
+    float head = sdEllipsoid(p - vec3(0.0, 0.5, 0.5), vec3(0.25));
+    float d = smin(body, head, 0.2);           // large k: wide blend
+
+    float ear = sdEllipsoid(p - vec3(0.3, 0.6, 0.3), vec3(0.15, 0.2, 0.05));
+    d = smin(d, ear, 0.08);                    // medium blend
+
+    float nostril = sdSphere(p - vec3(0.0, 0.4, 0.7), 0.03);
+    d = smax(d, -nostril, 0.02);               // small k: fine sculpting
+    return d;
+}
+```
+
+## Performance & Composition Tips
+
+**Performance:**
+- Bounding volume acceleration: use AABB/bounding spheres to skip distant sub-scenes, reducing `mapScene()` calls
+- Tetrahedral sampling normals (4 samples) outperform central differences (6 samples)
+- Step scaling `t += d * 0.9` can reduce overshoot penetration
+- Prefer quadratic optimized smin/smax (fastest); use exponential version when extreme smoothness is needed
+- `k` must not be zero (division by zero error); fall back to hard boolean when near zero
+- For symmetric shapes, use `abs()` to fold coordinates and define only one side
+
+**Composition techniques:**
+- **+ Domain Repetition**: `mod()`/`fract()` for infinite repetition of CSG shapes (mechanical arrays, railings)
+- **+ Procedural Displacement**: overlay noise displacement on SDF for surface detail
+- **+ Procedural Texturing**: use smin blend factor to simultaneously blend material ID / color
+- **+ 2D SDF**: equally applicable to 2D scenes (clouds, UI shape compositing)
+- **+ Animation**: bind k values, positions, and radii to `iTime` for dynamic deformation
+
+## Further Reading
+
+Full step-by-step tutorials, mathematical derivations, and advanced usage in [reference](../reference/csg-boolean-operations.md)
--- a/skills/shader-dev/techniques/domain-repetition.md
+++ b/skills/shader-dev/techniques/domain-repetition.md
@@ -0,0 +1,333 @@
+# Domain Repetition & Space Folding
+
+## Use Cases
+
+- **Infinite repeating scenes**: render infinitely extending geometry from a single SDF primitive (corridors, cities, star fields)
+- **Kaleidoscope/symmetry effects**: N-fold rotational symmetry, mirror symmetry, polyhedral symmetry
+- **Fractal geometry**: generate self-similar structures through iterative space folding (Apollonian, Kali-set)
+- **Architectural/mechanical structures**: build complex yet regular scenes using repetition + variation
+- **Spiral/toroidal topology**: repeat geometry along polar or spiral paths
+
+Core value: **define geometry in a single cell, render infinite space**.
+
+## Core Principles
+
+The essence of domain repetition is **coordinate transformation**: before computing the SDF, fold/map point `p` into a finite "fundamental domain".
+
+**Three fundamental operations:**
+
+| Operation | Formula | Effect |
+|-----------|---------|--------|
+| **mod repetition** | `p = mod(p + c/2, c) - c/2` | Infinite translational repetition along an axis |
+| **abs mirroring** | `p = abs(p)` | Mirror symmetry across an axis plane |
+| **Rotational folding** | `angle = mod(atan(p.y,p.x), TAU/N)` | N-fold rotational symmetry |
+
+Key math: `mod(x,c)` -> periodic mapping to `[0,c)`; `abs(x)` -> reflection symmetry; `fract(x)` = `mod(x,1.0)` -> normalized period.
+
+## Implementation Steps
+
+### Step 1: Cartesian Domain Repetition (mod repetition)
+
+```glsl
+// Infinite translational repetition along one or more axes
+vec3 domainRepeat(vec3 p, vec3 period) {
+    return mod(p + period * 0.5, period) - period * 0.5;
+}
+
+float map(vec3 p) {
+    vec3 q = domainRepeat(p, vec3(4.0)); // repeat every 4 units
+    return sdBox(q, vec3(0.5));
+}
+```
+
+### Step 2: Symmetric Folding (abs-mod triangle wave)
+
+```glsl
+// Boundary-continuous symmetric folding, coordinates oscillate 0->tile->0
+vec3 symmetricFold(vec3 p, float tile) {
+    return abs(vec3(tile) - mod(p, vec3(tile * 2.0)));
+}
+
+// Star Nest classic usage
+p = abs(vec3(tile) - mod(p, vec3(tile * 2.0)));
+```
+
+### Step 3: Angular Domain Repetition (Polar Coordinate Folding)
+
+```glsl
+// N-way rotational symmetry (kaleidoscope)
+vec2 pmod(vec2 p, float count) {
+    float angle = atan(p.x, p.y) + PI / count;
+    float sector = TAU / count;
+    angle = floor(angle / sector) * sector;
+    return p * rot(-angle);
+}
+
+p1.xy = pmod(p1.xy, 5.0); // 5-fold symmetry
+```
+
+### Step 4: fract Domain Folding (Fractal Iteration)
+
+```glsl
+// Apollonian fractal core loop
+float map(vec3 p, float s) {
+    float scale = 1.0;
+    vec4 orb = vec4(1000.0);
+
+    for (int i = 0; i < 8; i++) {
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5); // centered fract folding
+        float r2 = dot(p, p);
+        orb = min(orb, vec4(abs(p), r2));
+        float k = s / r2;    // spherical inversion scaling
+        p *= k;
+        scale *= k;
+    }
+    return 0.25 * abs(p.y) / scale;
+}
+```
+
+### Step 5: Iterative abs Folding (IFS / Kali-set)
+
+```glsl
+// IFS abs folding fractal
+float ifsBox(vec3 p) {
+    for (int i = 0; i < 5; i++) {
+        p = abs(p) - 1.0;
+        p.xy *= rot(iTime * 0.3);
+        p.xz *= rot(iTime * 0.1);
+    }
+    return sdBox(p, vec3(0.4, 0.8, 0.3));
+}
+
+// Kali-set variant: mod repetition + IFS + dot(p,p) scaling
+vec2 de(vec3 pos) {
+    vec3 tpos = pos;
+    tpos.xz = abs(0.5 - mod(tpos.xz, 1.0));
+    vec4 p = vec4(tpos, 1.0);               // w tracks scaling
+    for (int i = 0; i < 7; i++) {
+        p.xyz = abs(p.xyz) - vec3(-0.02, 1.98, -0.02);
+        p = p * (2.0) / clamp(dot(p.xyz, p.xyz), 0.4, 1.0)
+            - vec4(0.5, 1.0, 0.4, 0.0);
+        p.xz *= rot(0.416);
+    }
+    return vec2(length(max(abs(p.xyz)-vec3(0.1,5.0,0.1), 0.0)) / p.w, 0.0);
+}
+```
+
+### Step 6: Reflection Folding (Polyhedral Symmetry)
+
+```glsl
+// Plane reflection
+float pReflect(inout vec3 p, vec3 planeNormal, float offset) {
+    float t = dot(p, planeNormal) + offset;
+    if (t < 0.0) p = p - (2.0 * t) * planeNormal;
+    return sign(t);
+}
+
+// Icosahedral folding
+void pModIcosahedron(inout vec3 p) {
+    vec3 nc = vec3(-0.5, -cos(PI/5.0), sqrt(0.75 - cos(PI/5.0)*cos(PI/5.0)));
+    p = abs(p);
+    pReflect(p, nc, 0.0);
+    p.xy = abs(p.xy);
+    pReflect(p, nc, 0.0);
+    p.xy = abs(p.xy);
+    pReflect(p, nc, 0.0);
+}
+```
+
+### Step 7: Toroidal/Cylindrical Domain Warping
+
+```glsl
+// Bend the xz plane into a toroidal topology
+vec2 displaceLoop(vec2 p, float radius) {
+    return vec2(length(p) - radius, atan(p.y, p.x));
+}
+
+pDonut.xz = displaceLoop(pDonut.xz, donutRadius);
+pDonut.z *= donutRadius; // unfold angle to linear length
+```
+
+### Step 8: 1D Centered Domain Repetition (with Cell ID)
+
+```glsl
+// Returns cell index, usable for random variations
+float pMod1(inout float p, float size) {
+    float halfsize = size * 0.5;
+    float c = floor((p + halfsize) / size);
+    p = mod(p + halfsize, size) - halfsize;
+    return c;
+}
+
+float cellID = pMod1(p.x, 2.0);
+float salt = fract(sin(cellID * 127.1) * 43758.5453);
+```
+
+## Full Code Template
+
+Combined demo: Cartesian repetition + angular repetition + IFS folding. Runs directly in ShaderToy.
+
+```glsl
+#define PI 3.14159265359
+#define TAU 6.28318530718
+#define MAX_STEPS 100
+#define MAX_DIST 50.0
+#define SURF_DIST 0.001
+#define PERIOD 4.0
+#define ANGULAR_COUNT 6.0
+#define IFS_ITERS 5
+#define IFS_OFFSET 1.2
+
+mat2 rot(float a) {
+    float c = cos(a), s = sin(a);
+    return mat2(c, s, -s, c);
+}
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return length(max(d, 0.0)) + min(max(d.x, max(d.y, d.z)), 0.0);
+}
+
+vec3 domainRepeat(vec3 p, vec3 period) {
+    return mod(p + period * 0.5, period) - period * 0.5;
+}
+
+vec2 pmod(vec2 p, float count) {
+    float a = atan(p.x, p.y) + PI / count;
+    float n = TAU / count;
+    a = floor(a / n) * n;
+    return p * rot(-a);
+}
+
+float map(vec3 p) {
+    vec3 q = domainRepeat(p, vec3(PERIOD));
+    q.xz = pmod(q.xz, ANGULAR_COUNT);
+    for (int i = 0; i < IFS_ITERS; i++) {
+        q = abs(q) - IFS_OFFSET;
+        q.xy *= rot(0.785);
+        q.yz *= rot(0.471);
+    }
+    return sdBox(q, vec3(0.15, 0.4, 0.15));
+}
+
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+float raymarch(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        float d = map(ro + rd * t);
+        if (d < SURF_DIST || t > MAX_DIST) break;
+        t += d;
+    }
+    return t;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (fragCoord * 2.0 - iResolution.xy) / iResolution.y;
+
+    float time = iTime * 0.5;
+    vec3 ro = vec3(sin(time) * 6.0, 3.0 + sin(time * 0.7) * 2.0, cos(time) * 6.0);
+    vec3 ta = vec3(0.0);
+    vec3 ww = normalize(ta - ro);
+    vec3 uu = normalize(cross(ww, vec3(0.0, 1.0, 0.0)));
+    vec3 vv = cross(uu, ww);
+    vec3 rd = normalize(uv.x * uu + uv.y * vv + 1.8 * ww);
+
+    float t = raymarch(ro, rd);
+
+    vec3 col = vec3(0.0);
+    if (t < MAX_DIST) {
+        vec3 p = ro + rd * t;
+        vec3 n = calcNormal(p);
+        vec3 lightDir = normalize(vec3(0.5, 0.8, -0.6));
+        float diff = clamp(dot(n, lightDir), 0.0, 1.0);
+        float amb = 0.5 + 0.5 * n.y;
+        vec3 baseColor = 0.5 + 0.5 * cos(p * 0.5 + vec3(0.0, 2.0, 4.0));
+        col = baseColor * (0.2 * amb + 0.8 * diff);
+        col *= exp(-0.03 * t * t);
+    }
+
+    col = pow(col, vec3(0.4545));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### 1. Volumetric Light/Glow Rendering
+
+```glsl
+float acc = 0.0, t = 0.0;
+for (int i = 0; i < 99; i++) {
+    float dist = map(ro + rd * t);
+    dist = max(abs(dist), 0.02);
+    acc += exp(-dist * 3.0);       // decay factor controls glow sharpness
+    t += dist * 0.5;               // step scale <1 for denser sampling
+}
+vec3 col = vec3(acc * 0.01, acc * 0.011, acc * 0.012);
+```
+
+### 2. Single-Axis/Dual-Axis Selective Repetition
+
+```glsl
+q.xz = mod(q.xz + 2.0, 4.0) - 2.0; // repeat only xz, y stays unchanged
+```
+
+### 3. Fractal fract Domain Folding (Apollonian Type)
+
+```glsl
+float scale = 1.0;
+for (int i = 0; i < 8; i++) {
+    p = -1.0 + 2.0 * fract(0.5 * p + 0.5);
+    float k = 1.2 / dot(p, p);
+    p *= k;
+    scale *= k;
+}
+return 0.25 * abs(p.y) / scale;
+```
+
+### 4. Multi-Layer Nested Repetition
+
+```glsl
+float indexX = amod(p.xz, segments); // outer layer: angular repetition
+p.x -= radius;
+p.y = repeat(p.y, cellSize);         // inner layer: linear repetition
+float salt = rng(vec2(indexX, floor(p.y / cellSize)));
+```
+
+### 5. Finite Domain Repetition (Clamp Limited)
+
+```glsl
+vec3 domainRepeatLimited(vec3 p, float size, vec3 limit) {
+    return p - size * clamp(floor(p / size + 0.5), -limit, limit);
+}
+// Repeat 5 times along x, 3 times along y/z
+vec3 q = domainRepeatLimited(p, 2.0, vec3(2.0, 1.0, 1.0));
+```
+
+## Performance & Composition Tips
+
+**Performance:**
+- 5-8 fractal iterations are typically sufficient; use `vec4.w` to track scaling and avoid extra variables
+- Ensure geometry radius < period/2 to prevent inaccurate SDF at cell boundaries
+- Volumetric light step size should increase with distance: `t += dist * (0.3 + t * 0.02)`
+- Use `clamp(dot(p,p), min, max)` to prevent numerical explosion
+- Avoid `normalize()` inside loops; manually divide by length instead
+
+**Composition:**
+- **Domain Repetition + Ray Marching**: the most fundamental combination, used by all reference shaders
+- **Domain Repetition + Orbit Trap Coloring**: record `min(orb, abs(p))` during fractal iteration for coloring
+- **Domain Repetition + Toroidal Warping**: `displaceLoop` to bend space before applying linear/angular repetition
+- **Domain Repetition + Noise Variation**: cell ID -> pseudo-random number -> modulate geometry parameters
+- **Domain Repetition + Polar Spiral**: `cartToPolar` combined with `pMod1` for spiral path repetition
+
+## Further Reading
+
+Full step-by-step tutorials, mathematical derivations, and advanced usage in [reference](../reference/domain-repetition.md)
--- a/skills/shader-dev/techniques/domain-warping.md
+++ b/skills/shader-dev/techniques/domain-warping.md
@@ -0,0 +1,414 @@
+# Domain Warping
+
+## Use Cases
+
+- **Marble/jade textures**: multi-layer warping produces streaked stone textures
+- **Fabric/silk appearance**: warping field creases simulate textile surfaces
+- **Geological formations**: rock strata, lava flows, surface erosion
+- **Gas giant atmospheres**: Jupiter-style banded circulation
+- **Smoke/fire/explosions**: fluid effects combined with volumetric rendering
+- **Abstract art backgrounds**: procedural organic patterns, suitable for UI backgrounds, music visualization
+- **Electric current/plasma effects**: ridged FBM variant produces sharp arc patterns
+
+Core advantage: relies only on math functions (no texture assets needed), outputs seamless tiling, animatable, GPU-friendly.
+
+## Core Principles
+
+Warp input coordinates with noise, then query the main function:
+
+```
+f(p) -> f(p + fbm(p))
+```
+
+Classic multi-layer recursive nesting:
+
+```
+result = fbm(p + fbm(p + fbm(p)))
+```
+
+Each FBM layer's output serves as a coordinate offset for the next layer; deeper nesting produces more organic deformation.
+
+**Key mathematical structure**:
+
+1. **Noise** `noise(p)`: pseudo-random values at integer lattice points + Hermite interpolation `f*f*(3.0-2.0*f)`
+2. **FBM**: `fbm(p) = sum of (0.5^i) * noise(p * 2^i * R^i)`, where `R` is a rotation matrix for decorrelation
+3. **Domain warping chain**: `fbm(p + fbm(p + fbm(p)))`
+
+The rotation matrix `mat2(0.80, 0.60, -0.60, 0.80)` (approx 36.87 deg) is the most widely used decorrelation transform.
+
+## Implementation Steps
+
+### Step 1: Hash Function
+
+```glsl
+// Map 2D integer coordinates to a pseudo-random float
+float hash(vec2 p) {
+    p = fract(p * 0.6180339887); // golden ratio pre-perturbation
+    p *= 25.0;
+    return fract(p.x * p.y * (p.x + p.y));
+}
+```
+
+> The classic `fract(sin(dot(p, vec2(127.1, 311.7))) * 43758.5453)` also works; the sin-free version above has more stable precision on some GPUs.
+
+### Step 2: Value Noise
+
+```glsl
+// Hash values at integer lattice points, Hermite smooth interpolation
+float noise(vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+    f = f * f * (3.0 - 2.0 * f);
+
+    return mix(
+        mix(hash(i + vec2(0.0, 0.0)), hash(i + vec2(1.0, 0.0)), f.x),
+        mix(hash(i + vec2(0.0, 1.0)), hash(i + vec2(1.0, 1.0)), f.x),
+        f.y
+    );
+}
+```
+
+### Step 3: FBM
+
+```glsl
+const mat2 mtx = mat2(0.80, 0.60, -0.60, 0.80); // rotation approx 36.87 deg
+
+float fbm(vec2 p) {
+    float f = 0.0;
+    f += 0.500000 * noise(p); p = mtx * p * 2.02;
+    f += 0.250000 * noise(p); p = mtx * p * 2.03;
+    f += 0.125000 * noise(p); p = mtx * p * 2.01;
+    f += 0.062500 * noise(p); p = mtx * p * 2.04;
+    f += 0.031250 * noise(p); p = mtx * p * 2.01;
+    f += 0.015625 * noise(p);
+    return f / 0.96875;
+}
+```
+
+> Lacunarity uses 2.01~2.04 rather than exactly 2.0 to avoid visual artifacts caused by lattice regularity.
+
+### Step 4: Domain Warping (Core)
+
+```glsl
+// Classic three-layer domain warping
+float pattern(vec2 p) {
+    return fbm(p + fbm(p + fbm(p)));
+}
+```
+
+### Step 5: Time Animation
+
+```glsl
+// Inject time into the first and last octaves: low frequency drives overall flow, high frequency adds detail variation
+float fbm(vec2 p) {
+    float f = 0.0;
+    f += 0.500000 * noise(p + iTime);       // lowest frequency: slow overall flow
+    p = mtx * p * 2.02;
+    f += 0.250000 * noise(p); p = mtx * p * 2.03;
+    f += 0.125000 * noise(p); p = mtx * p * 2.01;
+    f += 0.062500 * noise(p); p = mtx * p * 2.04;
+    f += 0.031250 * noise(p); p = mtx * p * 2.01;
+    f += 0.015625 * noise(p + sin(iTime));  // highest frequency: subtle detail motion
+    return f / 0.96875;
+}
+```
+
+### Step 6: Coloring
+
+```glsl
+// Map scalar field (0~1) to color using a mix chain
+// IMPORTANT: Note: GLSL is strictly typed. Variable declarations must be complete, e.g. vec3 col = vec3(0.2, 0.1, 0.4)
+// IMPORTANT: Decimals must be written as 0.x, not .x (division by zero errors)
+vec3 palette(float t) {
+    vec3 col = vec3(0.2, 0.1, 0.4);                               // deep purple base
+    col = mix(col, vec3(0.3, 0.05, 0.05), t);                     // dark red
+    col = mix(col, vec3(0.9, 0.9, 0.9), t * t);                   // high values toward white
+    col = mix(col, vec3(0.0, 0.2, 0.4), smoothstep(0.6, 0.8, t)); // blue highlight
+    return col * t * 2.0;
+}
+```
+
+## Full Code Template
+
+```glsl
+// Domain Warping — Full Runnable Template (ShaderToy)
+
+#define WARP_DEPTH 3        // Warp nesting depth (1=subtle, 2=moderate, 3=classic)
+#define NUM_OCTAVES 6       // FBM octave count (4=coarse fast, 6=fine)
+#define TIME_SCALE 1.0      // Animation speed (0.05=very slow, 1.0=fluid, 2.0=fast)
+#define WARP_STRENGTH 1.0   // Warp intensity (0.5=subtle, 1.0=standard, 2.0=strong)
+#define BASE_SCALE 3.0      // Overall noise scale (larger = denser texture)
+
+const mat2 mtx = mat2(0.80, 0.60, -0.60, 0.80);
+
+float hash(vec2 p) {
+    p = fract(p * 0.6180339887);
+    p *= 25.0;
+    return fract(p.x * p.y * (p.x + p.y));
+}
+
+float noise(vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+    f = f * f * (3.0 - 2.0 * f);
+
+    return mix(
+        mix(hash(i + vec2(0.0, 0.0)), hash(i + vec2(1.0, 0.0)), f.x),
+        mix(hash(i + vec2(0.0, 1.0)), hash(i + vec2(1.0, 1.0)), f.x),
+        f.y
+    );
+}
+
+float fbm(vec2 p) {
+    float f = 0.0;
+    float amp = 0.5;
+    float freq = 1.0;
+    float norm = 0.0;
+
+    for (int i = 0; i < NUM_OCTAVES; i++) {
+        float t = 0.0;
+        if (i == 0) t = iTime * TIME_SCALE;
+        if (i == NUM_OCTAVES - 1) t = sin(iTime * TIME_SCALE);
+
+        f += amp * noise(p + t);
+        norm += amp;
+        p = mtx * p * 2.02;
+        amp *= 0.5;
+    }
+    return f / norm;
+}
+
+float pattern(vec2 p) {
+    float val = fbm(p);
+
+    #if WARP_DEPTH >= 2
+    val = fbm(p + WARP_STRENGTH * val);
+    #endif
+
+    #if WARP_DEPTH >= 3
+    val = fbm(p + WARP_STRENGTH * val);
+    #endif
+
+    return val;
+}
+
+vec3 palette(float t) {
+    vec3 col = vec3(0.2, 0.1, 0.4);
+    col = mix(col, vec3(0.3, 0.05, 0.05), t);
+    col = mix(col, vec3(0.9, 0.9, 0.9), t * t);
+    col = mix(col, vec3(0.0, 0.2, 0.4), smoothstep(0.6, 0.8, t));
+    return col * t * 2.0;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+    uv *= BASE_SCALE;
+
+    float shade = pattern(uv);
+    vec3 col = palette(shade);
+
+    // Vignette effect
+    vec2 q = fragCoord / iResolution.xy;
+    col *= 0.5 + 0.5 * sqrt(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y));
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Multi-Resolution Layered Warping
+
+Different warp layers use FBM with different octave counts, outputting `vec2` for dual-axis displacement, with intermediate variables used for coloring.
+
+```glsl
+float fbm4(vec2 p) {
+    float f = 0.0;
+    f += 0.5000 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.02;
+    f += 0.2500 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.03;
+    f += 0.1250 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.01;
+    f += 0.0625 * (-1.0 + 2.0 * noise(p));
+    return f / 0.9375;
+}
+
+float fbm6(vec2 p) {
+    float f = 0.0;
+    f += 0.500000 * noise(p); p = mtx * p * 2.02;
+    f += 0.250000 * noise(p); p = mtx * p * 2.03;
+    f += 0.125000 * noise(p); p = mtx * p * 2.01;
+    f += 0.062500 * noise(p); p = mtx * p * 2.04;
+    f += 0.031250 * noise(p); p = mtx * p * 2.01;
+    f += 0.015625 * noise(p);
+    return f / 0.96875;
+}
+
+vec2 fbm4_2(vec2 p) {
+    return vec2(fbm4(p + vec2(1.0)), fbm4(p + vec2(6.2)));
+}
+vec2 fbm6_2(vec2 p) {
+    return vec2(fbm6(p + vec2(9.2)), fbm6(p + vec2(5.7)));
+}
+
+float func(vec2 q, out vec2 o, out vec2 n) {
+    q += 0.05 * sin(vec2(0.11, 0.13) * iTime + length(q) * 4.0);
+    o = 0.5 + 0.5 * fbm4_2(q);
+    o += 0.02 * sin(vec2(0.13, 0.11) * iTime * length(o));
+    n = fbm6_2(4.0 * o);
+    vec2 p = q + 2.0 * n + 1.0;
+    float f = 0.5 + 0.5 * fbm4(2.0 * p);
+    f = mix(f, f * f * f * 3.5, f * abs(n.x));
+    return f;
+}
+
+// Coloring uses intermediate variables o, n
+vec3 col = vec3(0.2, 0.1, 0.4);
+col = mix(col, vec3(0.3, 0.05, 0.05), f);
+col = mix(col, vec3(0.9, 0.9, 0.9), dot(n, n));
+col = mix(col, vec3(0.5, 0.2, 0.2), 0.5 * o.y * o.y);
+col = mix(col, vec3(0.0, 0.2, 0.4), 0.5 * smoothstep(1.2, 1.3, abs(n.y) + abs(n.x)));
+col *= f * 2.0;
+```
+
+### Variant 2: Turbulence/Ridged Warping (Electric Arc/Plasma Effect)
+
+In FBM, apply `abs(noise - 0.5)` to produce ridged textures, with dual-axis independent displacement + time-reversed drift.
+
+```glsl
+float fbm_ridged(vec2 p) {
+    float z = 2.0;
+    float rz = 0.0;
+    for (float i = 1.0; i < 6.0; i++) {
+        rz += abs((noise(p) - 0.5) * 2.0) / z;
+        z *= 2.0;
+        p *= 2.0;
+    }
+    return rz;
+}
+
+float dualfbm(vec2 p) {
+    vec2 p2 = p * 0.7;
+    vec2 basis = vec2(
+        fbm_ridged(p2 - iTime * 0.24),
+        fbm_ridged(p2 + iTime * 0.26)
+    );
+    basis = (basis - 0.5) * 0.2;
+    p += basis;
+    return fbm_ridged(p * makem2(iTime * 0.03));
+}
+
+// Electric arc coloring
+vec3 col = vec3(0.2, 0.1, 0.4) / rz;
+```
+
+### Variant 3: Pseudo-3D Lit Domain Warping
+
+Estimate screen-space normals via finite differences, apply directional lighting for an embossed effect.
+
+```glsl
+float e = 2.0 / iResolution.y;
+vec3 nor = normalize(vec3(
+    pattern(p + vec2(e, 0.0)) - shade,
+    2.0 * e,
+    pattern(p + vec2(0.0, e)) - shade
+));
+
+vec3 lig = normalize(vec3(0.9, 0.2, -0.4));
+float dif = clamp(0.3 + 0.7 * dot(nor, lig), 0.0, 1.0);
+vec3 lin = vec3(0.70, 0.90, 0.95) * (nor.y * 0.5 + 0.5);
+lin += vec3(0.15, 0.10, 0.05) * dif;
+
+col *= 1.2 * lin;
+col = 1.0 - col;
+col = 1.1 * col * col;
+```
+
+### Variant 4: Flow Field Iterative Warping (Gas Giant Effect)
+
+Compute the FBM gradient field, Euler-integrate to iteratively advect coordinates, simulating fluid convection vortices.
+
+```glsl
+#define ADVECT_ITERATIONS 5
+
+vec2 field(vec2 p) {
+    float t = 0.2 * iTime;
+    p.x += t;
+    float n = fbm(p, t);
+    float e = 0.25;
+    float nx = fbm(p + vec2(e, 0.0), t);
+    float ny = fbm(p + vec2(0.0, e), t);
+    return vec2(n - ny, nx - n) / e;
+}
+
+vec3 distort(vec2 p) {
+    for (float i = 0.0; i < float(ADVECT_ITERATIONS); i++) {
+        p += field(p) / float(ADVECT_ITERATIONS);
+    }
+    return vec3(fbm(p, 0.0));
+}
+```
+
+### Variant 5: 3D Volumetric Domain Warping (Explosion/Fireball Effect)
+
+Displace a sphere SDF with 3D FBM, rendered via volumetric ray marching.
+
+```glsl
+#define NOISE_FREQ 4.0
+#define NOISE_AMP -0.5
+
+mat3 m3 = mat3(0.00, 0.80, 0.60,
+              -0.80, 0.36,-0.48,
+              -0.60,-0.48, 0.64);
+
+float noise3D(vec3 p) {
+    vec3 fl = floor(p);
+    vec3 fr = fract(p);
+    fr = fr * fr * (3.0 - 2.0 * fr);
+    float n = fl.x + fl.y * 157.0 + 113.0 * fl.z;
+    return mix(mix(mix(hash(n+0.0),   hash(n+1.0),   fr.x),
+                   mix(hash(n+157.0), hash(n+158.0), fr.x), fr.y),
+               mix(mix(hash(n+113.0), hash(n+114.0), fr.x),
+                   mix(hash(n+270.0), hash(n+271.0), fr.x), fr.y), fr.z);
+}
+
+float fbm3D(vec3 p) {
+    float f = 0.0;
+    f += 0.5000 * noise3D(p); p = m3 * p * 2.02;
+    f += 0.2500 * noise3D(p); p = m3 * p * 2.03;
+    f += 0.1250 * noise3D(p); p = m3 * p * 2.01;
+    f += 0.0625 * noise3D(p); p = m3 * p * 2.02;
+    f += 0.03125 * abs(noise3D(p));
+    return f / 0.9375;
+}
+
+float distanceFunc(vec3 p, out float displace) {
+    float d = length(p) - 0.5;
+    displace = fbm3D(p * NOISE_FREQ + vec3(0, -1, 0) * iTime);
+    d += displace * NOISE_AMP;
+    return d;
+}
+```
+
+## Performance & Composition
+
+### Performance Tips
+
+- Three warp layers x 6 octaves = 18 noise samples per pixel; adding lit finite differences can reach 54
+- **Reduce octaves**: 4 instead of 6, ~33% performance gain with minimal visual difference
+- **Reduce warp depth**: two layers `fbm(p + fbm(p))` is already organic enough, saving ~33%
+- **sin-product noise**: `sin(p.x)*sin(p.y)` is branchless and memory-free, suitable for mobile
+- **GPU built-in derivatives**: `dFdx/dFdy` instead of finite differences, 3x faster
+- **Texture noise**: pre-bake noise textures, trading computation for memory reads
+- **LOD adaptive**: reduce octave count for distant pixels
+- **Supersampling**: only use 2x2 when anti-aliasing is needed, 4x performance cost
+
+### Composition Suggestions
+
+- **Ray marching**: warped scalar field as SDF displacement function -> fire, explosions, organic forms
+- **Polar coordinate transform**: domain warping in polar space -> vortices, nebulae, spirals
+- **Cosine palette**: `a + b*cos(2*pi*(c*t+d))` is more flexible than mix chains
+- **Post-processing**: bloom glow, tone mapping `col/(1+col)`, chromatic aberration (RGB channel offset sampling)
+- **Particles/geometry**: scalar field driving particle velocity fields, vertex displacement, UV animation
+
+## Further Reading
+
+Full step-by-step tutorials, mathematical derivations, and advanced usage in [reference](../reference/domain-warping.md)
--- a/skills/shader-dev/techniques/fluid-simulation.md
+++ b/skills/shader-dev/techniques/fluid-simulation.md
--- a/skills/shader-dev/techniques/fractal-rendering.md
+++ b/skills/shader-dev/techniques/fractal-rendering.md
@@ -0,0 +1,436 @@
+# Fractal Rendering Skill
+
+## Use Cases
+- Rendering self-similar mathematical structures: Mandelbrot/Julia sets (2D), Mandelbulb (3D), IFS fractals (Menger/Apollonian)
+- Procedural textures or backgrounds requiring infinite detail
+- Real-time generation of complex geometric visual effects (music visualization, sci-fi scenes, abstract art)
+- Suitable for ShaderToy, demo scene, procedural content generation
+
+## Core Principles
+
+Fractal rendering is essentially **visualization of iterative systems**, falling into three categories:
+
+### 1. Escape-Time Algorithm
+Iterate `Z <- Z^2 + c`, count escape steps. Distance estimation by simultaneously tracking the derivative `Z'`:
+```
+Z  <- Z^2 + c
+Z' <- 2*Z*Z' + 1
+d(c) = |Z|*log|Z| / |Z'|
+```
+
+### 2. Iterated Function System (IFS / KIFS)
+Fold-sort-scale-offset iteration produces self-similar structures:
+```
+p = abs(p)                          // fold
+sort p.xyz descending               // sort
+p = Scale * p - Offset * (Scale-1)  // scale and offset
+```
+
+### 3. Spherical Inversion Fractals
+`fract()` space folding + spherical inversion `p *= s/dot(p,p)`:
+```
+p = -1.0 + 2.0 * fract(0.5*p + 0.5)
+k = s / dot(p, p)
+p *= k; scale *= k
+```
+
+All 3D fractals are rendered via **Sphere Tracing (Ray Marching)**.
+
+## Implementation Steps
+
+### Step 1: Coordinate Normalization
+```glsl
+vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+```
+
+### Step 2: 2D Mandelbrot Escape-Time Iteration
+```glsl
+float distanceToMandelbrot(in vec2 c) {
+    vec2 z  = vec2(0.0);
+    vec2 dz = vec2(0.0);
+    float m2 = 0.0;
+
+    for (int i = 0; i < MAX_ITER; i++) {
+        if (m2 > BAILOUT * BAILOUT) break;
+        // Z' -> 2*Z*Z' + 1
+        dz = 2.0 * vec2(z.x*dz.x - z.y*dz.y,
+                         z.x*dz.y + z.y*dz.x) + vec2(1.0, 0.0);
+        // Z -> Z^2 + c
+        z = vec2(z.x*z.x - z.y*z.y, 2.0*z.x*z.y) + c;
+        m2 = dot(z, z);
+    }
+    return 0.5 * sqrt(dot(z,z) / dot(dz,dz)) * log(dot(z,z));
+}
+```
+
+### Step 3: Mandelbulb Distance Field (Spherical Coordinate Power-N)
+```glsl
+float mandelbulb(vec3 p) {
+    vec3 z = p;
+    float dr = 1.0;
+    float r;
+
+    for (int i = 0; i < FRACTAL_ITER; i++) {
+        r = length(z);
+        if (r > BAILOUT) break;
+        float theta = atan(z.y, z.x);
+        float phi   = asin(z.z / r);
+        dr = pow(r, POWER - 1.0) * dr * POWER + 1.0;
+        r = pow(r, POWER);
+        theta *= POWER;
+        phi *= POWER;
+        z = r * vec3(cos(theta)*cos(phi),
+                      sin(theta)*cos(phi),
+                      sin(phi)) + p;
+    }
+    return 0.5 * log(r) * r / dr;
+}
+```
+
+### Step 4: Menger Sponge Distance Field (KIFS)
+```glsl
+float mengerDE(vec3 z) {
+    z = abs(1.0 - mod(z, 2.0));  // infinite tiling
+    float d = 1000.0;
+
+    for (int n = 0; n < IFS_ITER; n++) {
+        z = abs(z);
+        if (z.x < z.y) z.xy = z.yx;
+        if (z.x < z.z) z.xz = z.zx;
+        if (z.y < z.z) z.yz = z.zy;
+        z = SCALE * z - OFFSET * (SCALE - 1.0);
+        if (z.z < -0.5 * OFFSET.z * (SCALE - 1.0))
+            z.z += OFFSET.z * (SCALE - 1.0);
+        d = min(d, length(z) * pow(SCALE, float(-n) - 1.0));
+    }
+    return d - 0.001;
+}
+```
+
+### Step 5: Apollonian Distance Field (Spherical Inversion)
+```glsl
+vec4 orb;  // orbit trap
+
+float apollonianDE(vec3 p, float s) {
+    float scale = 1.0;
+    orb = vec4(1000.0);
+
+    for (int i = 0; i < INVERSION_ITER; i++) {
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5);
+        float r2 = dot(p, p);
+        orb = min(orb, vec4(abs(p), r2));
+        float k = s / r2;
+        p *= k;
+        scale *= k;
+    }
+    return 0.25 * abs(p.y) / scale;
+}
+```
+
+### Step 6: Ray Marching
+```glsl
+float rayMarch(vec3 ro, vec3 rd) {
+    float t = 0.01;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        float precis = PRECISION * t;
+        float h = map(ro + rd * t);
+        if (h < precis || t > MAX_DIST) break;
+        t += h * FUDGE_FACTOR;
+    }
+    return (t > MAX_DIST) ? -1.0 : t;
+}
+```
+
+### Step 7: Normal Calculation
+```glsl
+// 4-tap tetrahedral method (recommended)
+vec3 calcNormal(vec3 pos, float t) {
+    float precis = 0.001 * t;
+    vec2 e = vec2(1.0, -1.0) * precis;
+    return normalize(
+        e.xyy * map(pos + e.xyy) +
+        e.yyx * map(pos + e.yyx) +
+        e.yxy * map(pos + e.yxy) +
+        e.xxx * map(pos + e.xxx));
+}
+```
+
+### Step 8: Shading & Lighting
+```glsl
+vec3 shade(vec3 pos, vec3 nor, vec3 rd, vec4 trap) {
+    vec3 light1 = normalize(LIGHT_DIR);
+    float diff = clamp(dot(light1, nor), 0.0, 1.0);
+    float amb  = 0.7 + 0.3 * nor.y;
+    float ao   = pow(clamp(trap.w * 2.0, 0.0, 1.0), 1.2);
+
+    vec3 brdf = vec3(0.4) * amb * ao + vec3(1.0) * diff * ao;
+
+    vec3 rgb = vec3(1.0);
+    rgb = mix(rgb, vec3(1.0, 0.8, 0.2), clamp(6.0*trap.y, 0.0, 1.0));
+    rgb = mix(rgb, vec3(1.0, 0.55, 0.0), pow(clamp(1.0-2.0*trap.z, 0.0, 1.0), 8.0));
+    return rgb * brdf;
+}
+```
+
+### Step 9: Camera
+```glsl
+void setupCamera(vec2 uv, vec3 ro, vec3 ta, float cr, out vec3 rd) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);
+    vec3 cu = normalize(cross(cw, cp));
+    vec3 cv = normalize(cross(cu, cw));
+    rd = normalize(uv.x * cu + uv.y * cv + 2.0 * cw);
+}
+```
+
+## Complete Code Template
+
+3D Apollonian fractal (spherical inversion type) with full ray marching pipeline, orbit trap coloring, and AO. Ready to run in ShaderToy.
+
+```glsl
+// Fractal Rendering — Apollonian (Spherical Inversion) Template
+
+#define MAX_STEPS 200
+#define MAX_DIST 30.0
+#define PRECISION 0.001
+#define INVERSION_ITER 8    // Tunable: 5-12
+#define AA 1                // Tunable: 1=no AA, 2=4xSSAA
+
+vec4 orb;
+
+float map(vec3 p, float s) {
+    float scale = 1.0;
+    orb = vec4(1000.0);
+
+    for (int i = 0; i < INVERSION_ITER; i++) {
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5);
+        float r2 = dot(p, p);
+        orb = min(orb, vec4(abs(p), r2));
+        float k = s / r2;
+        p     *= k;
+        scale *= k;
+    }
+    return 0.25 * abs(p.y) / scale;
+}
+
+float trace(vec3 ro, vec3 rd, float s) {
+    float t = 0.01;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        float precis = PRECISION * t;
+        float h = map(ro + rd * t, s);
+        if (h < precis || t > MAX_DIST) break;
+        t += h;
+    }
+    return (t > MAX_DIST) ? -1.0 : t;
+}
+
+vec3 calcNormal(vec3 pos, float t, float s) {
+    float precis = PRECISION * t;
+    vec2 e = vec2(1.0, -1.0) * precis;
+    return normalize(
+        e.xyy * map(pos + e.xyy, s) +
+        e.yyx * map(pos + e.yyx, s) +
+        e.yxy * map(pos + e.yxy, s) +
+        e.xxx * map(pos + e.xxx, s));
+}
+
+vec3 render(vec3 ro, vec3 rd, float anim) {
+    vec3 col = vec3(0.0);
+    float t = trace(ro, rd, anim);
+
+    if (t > 0.0) {
+        vec4 tra = orb;
+        vec3 pos = ro + t * rd;
+        vec3 nor = calcNormal(pos, t, anim);
+
+        vec3 light1 = normalize(vec3(0.577, 0.577, -0.577));
+        vec3 light2 = normalize(vec3(-0.707, 0.0, 0.707));
+        float key = clamp(dot(light1, nor), 0.0, 1.0);
+        float bac = clamp(0.2 + 0.8 * dot(light2, nor), 0.0, 1.0);
+        float amb = 0.7 + 0.3 * nor.y;
+        float ao  = pow(clamp(tra.w * 2.0, 0.0, 1.0), 1.2);
+
+        vec3 brdf = vec3(0.40) * amb * ao
+                  + vec3(1.00) * key * ao
+                  + vec3(0.40) * bac * ao;
+
+        vec3 rgb = vec3(1.0);
+        rgb = mix(rgb, vec3(1.0, 0.80, 0.2), clamp(6.0 * tra.y, 0.0, 1.0));
+        rgb = mix(rgb, vec3(1.0, 0.55, 0.0), pow(clamp(1.0 - 2.0*tra.z, 0.0, 1.0), 8.0));
+
+        col = rgb * brdf * exp(-0.2 * t);
+    }
+    return sqrt(col);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    float time = iTime * 0.25;
+    float anim = 1.1 + 0.5 * smoothstep(-0.3, 0.3, cos(0.1 * iTime));
+
+    vec3 tot = vec3(0.0);
+
+    #if AA > 1
+    for (int jj = 0; jj < AA; jj++)
+    for (int ii = 0; ii < AA; ii++)
+    #else
+    int ii = 1, jj = 1;
+    #endif
+    {
+        vec2 q = fragCoord.xy + vec2(float(ii), float(jj)) / float(AA);
+        vec2 p = (2.0 * q - iResolution.xy) / iResolution.y;
+
+        vec3 ro = vec3(2.8*cos(0.1 + 0.33*time),
+                       0.4 + 0.3*cos(0.37*time),
+                       2.8*cos(0.5 + 0.35*time));
+        vec3 ta = vec3(1.9*cos(1.2 + 0.41*time),
+                       0.4 + 0.1*cos(0.27*time),
+                       1.9*cos(2.0 + 0.38*time));
+        float roll = 0.2 * cos(0.1 * time);
+
+        vec3 cw = normalize(ta - ro);
+        vec3 cp = vec3(sin(roll), cos(roll), 0.0);
+        vec3 cu = normalize(cross(cw, cp));
+        vec3 cv = normalize(cross(cu, cw));
+        vec3 rd = normalize(p.x*cu + p.y*cv + 2.0*cw);
+
+        tot += render(ro, rd, anim);
+    }
+
+    tot /= float(AA * AA);
+    fragColor = vec4(tot, 1.0);
+}
+```
+
+## Common Variants
+
+### 1. 2D Mandelbrot (Distance Estimation Coloring)
+Pure 2D, no ray marching needed. Complex iteration + distance coloring.
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 p = (2.0*fragCoord - iResolution.xy) / iResolution.y;
+    float tz = 0.5 - 0.5*cos(0.225*iTime);
+    float zoo = pow(0.5, 13.0*tz);
+    vec2 c = vec2(-0.05, 0.6805) + p * zoo; // Tunable: zoom center point
+
+    vec2 z = vec2(0.0), dz = vec2(0.0);
+    for (int i = 0; i < 300; i++) {
+        if (dot(z,z) > 1024.0) break;
+        dz = 2.0*vec2(z.x*dz.x-z.y*dz.y, z.x*dz.y+z.y*dz.x) + vec2(1.0,0.0);
+        z  = vec2(z.x*z.x-z.y*z.y, 2.0*z.x*z.y) + c;
+    }
+
+    float d = 0.5*sqrt(dot(z,z)/dot(dz,dz))*log(dot(z,z));
+    d = clamp(pow(4.0*d/zoo, 0.2), 0.0, 1.0);
+    fragColor = vec4(vec3(d), 1.0);
+}
+```
+
+### 2. Mandelbulb Power-N
+Spherical coordinate trigonometric functions; `POWER` parameter controls morphology.
+```glsl
+#define POWER 8.0       // Tunable: 2-16
+#define FRACTAL_ITER 4  // Tunable: 2-8
+
+float mandelbulbDE(vec3 p) {
+    vec3 z = p;
+    float dr = 1.0, r;
+    for (int i = 0; i < FRACTAL_ITER; i++) {
+        r = length(z);
+        if (r > 2.0) break;
+        float theta = atan(z.y, z.x);
+        float phi   = asin(z.z / r);
+        dr = pow(r, POWER - 1.0) * dr * POWER + 1.0;
+        r = pow(r, POWER);
+        theta *= POWER; phi *= POWER;
+        z = r * vec3(cos(theta)*cos(phi), sin(theta)*cos(phi), sin(phi)) + p;
+    }
+    return 0.5 * log(r) * r / dr;
+}
+```
+
+### 3. Menger Sponge (KIFS)
+`abs()` folding + conditional sorting, regular geometric fractal.
+```glsl
+#define SCALE 3.0
+#define OFFSET vec3(0.92858,0.92858,0.32858)
+#define IFS_ITER 7
+
+float mengerDE(vec3 z) {
+    z = abs(1.0 - mod(z, 2.0));
+    float d = 1000.0;
+    for (int n = 0; n < IFS_ITER; n++) {
+        z = abs(z);
+        if (z.x < z.y) z.xy = z.yx;
+        if (z.x < z.z) z.xz = z.zx;
+        if (z.y < z.z) z.yz = z.zy;
+        z = SCALE * z - OFFSET * (SCALE - 1.0);
+        if (z.z < -0.5*OFFSET.z*(SCALE-1.0))
+            z.z += OFFSET.z*(SCALE-1.0);
+        d = min(d, length(z) * pow(SCALE, float(-n)-1.0));
+    }
+    return d - 0.001;
+}
+```
+
+### 4. Quaternion Julia Set
+Quaternion `Z <- Z^2 + c` (4D), with fixed `c` parameter; visualized by taking a 3D slice.
+```glsl
+vec4 qsqr(vec4 a) {
+    return vec4(a.x*a.x - a.y*a.y - a.z*a.z - a.w*a.w,
+                2.0*a.x*a.y, 2.0*a.x*a.z, 2.0*a.x*a.w);
+}
+
+float juliaDE(vec3 p, vec4 c) {
+    vec4 z = vec4(p, 0.0);
+    float md2 = 1.0, mz2 = dot(z, z);
+    for (int i = 0; i < 11; i++) {
+        md2 *= 4.0 * mz2;
+        z = qsqr(z) + c;
+        mz2 = dot(z, z);
+        if (mz2 > 4.0) break;
+    }
+    return 0.25 * sqrt(mz2 / md2) * log(mz2);
+}
+// Animated c: vec4 c = 0.45*cos(vec4(0.5,3.9,1.4,1.1)+time*vec4(1.2,1.7,1.3,2.5))-vec4(0.3,0,0,0);
+```
+
+### 5. Minimal IFS Field (2D, No Ray Marching)
+`abs(p)/dot(p,p) + offset` iteration, weighted accumulation produces a density field.
+```glsl
+float field(vec3 p) {
+    float strength = 7.0 + 0.03 * log(1.e-6 + fract(sin(iTime) * 4373.11));
+    float accum = 0.0, prev = 0.0, tw = 0.0;
+    for (int i = 0; i < 32; ++i) {
+        float mag = dot(p, p);
+        p = abs(p) / mag + vec3(-0.5, -0.4, -1.5); // Tunable: offset values
+        float w = exp(-float(i) / 7.0);
+        accum += w * exp(-strength * pow(abs(mag - prev), 2.3));
+        tw += w;
+        prev = mag;
+    }
+    return max(0.0, 5.0 * accum / tw - 0.7);
+}
+```
+
+## Performance & Composition
+
+### Performance Tips
+- Core bottleneck: outer ray marching x inner fractal iteration (e.g., `200 x 8 = 1600` map calls per pixel)
+- Reduce `MAX_STEPS` to 60-100, compensate with fudge factor 0.7-0.9
+- Hit threshold `precis = 0.001 * t` relaxes with distance
+- Fractal iteration: break immediately when `|z|^2 > bailout`
+- Reducing iterations from 8 to 4-5 has minimal visual impact
+- Use 4-tap normals instead of 6-tap to save 33%
+- Use AA=1 during development, AA=2 for release (AA=3 = 9x overhead)
+- Avoid `pow()` inside loops; manually expand for low powers
+
+### Composition Techniques
+- **Volumetric light**: accumulate `exp(-10.0 * h)` during ray march for god rays
+- **Tone Mapping**: ACES + sRGB gamma for handling high-frequency detail
+- **Transparent refraction**: negative distance field reverse ray march + Beer's law absorption
+- **Orbit Trap coloring**: map trap values to HSV or emissive colors
+- **Soft shadows**: ray march toward light, accumulate `min(k * h / t)` for soft shadows
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/fractal-rendering.md)
--- a/skills/shader-dev/techniques/lighting-model.md
+++ b/skills/shader-dev/techniques/lighting-model.md
@@ -0,0 +1,527 @@
+# Lighting Models Skill
+
+## Use Cases
+- Adding realistic lighting to raymarched or rasterized scenes
+- Simulating light interaction with various materials (metal, dielectric, water, skin, etc.)
+- From simple diffuse/specular to full PBR
+- Multi-light compositing (sun, sky, ambient)
+- Adding material appearance to SDF scenes in ShaderToy
+
+## Core Principles
+
+Lighting = Diffuse + Specular Reflection:
+
+- **Diffuse**: Lambert's law `I = max(0, N·L)`
+- **Specular**: Empirical model uses Blinn-Phong `pow(max(0, N·H), shininess)`; physically-based model uses Cook-Torrance BRDF
+
+### Key Formulas
+
+```
+Lambert:        L_diffuse  = albedo * lightColor * max(0, N·L)
+Blinn-Phong:    H = normalize(V + L); L_specular = lightColor * pow(max(0, N·H), shininess)
+Cook-Torrance:  f_specular = D(h) * F(v,h) * G(l,v,h) / (4 * (N·L) * (N·V))
+Fresnel:        F = F0 + (1 - F0) * (1 - V·H)^5
+```
+
+- **D** = GGX/Trowbridge-Reitz normal distribution
+- **F** = Schlick Fresnel approximation
+- **G** = Smith geometric shadowing
+- F0: dielectric ~0.04, metals use baseColor
+
+## Implementation Steps
+
+### Step 1: Scene Basics (Normal + Vector Setup)
+
+```glsl
+// SDF normal (finite difference method)
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+vec3 N = calcNormal(pos);           // surface normal
+vec3 V = -rd;                        // view direction
+vec3 L = normalize(lightPos - pos);  // light direction (point light)
+// directional light: vec3 L = normalize(vec3(0.6, 0.8, -0.5));
+```
+
+### Step 2: Lambert Diffuse
+
+```glsl
+float NdotL = max(0.0, dot(N, L));
+vec3 diffuse = albedo * lightColor * NdotL;
+
+// energy-conserving version
+vec3 diffuse_conserved = albedo / PI * lightColor * NdotL;
+
+// Half-Lambert (reduces over-darkening on backlit faces, commonly used for SSS approximation)
+float halfLambert = NdotL * 0.5 + 0.5;
+vec3 diffuse_wrapped = albedo * lightColor * halfLambert;
+```
+
+### Step 3: Blinn-Phong Specular
+
+```glsl
+vec3 H = normalize(V + L);
+float NdotH = max(0.0, dot(N, H));
+float SHININESS = 32.0;  // 4.0 (rough) ~ 256.0 (smooth)
+
+// with normalization factor for energy conservation
+float normFactor = (SHININESS + 8.0) / (8.0 * PI);
+float spec = normFactor * pow(NdotH, SHININESS);
+vec3 specular = lightColor * spec;
+```
+
+### Step 4: Fresnel-Schlick
+
+```glsl
+vec3 fresnelSchlick(vec3 F0, float cosTheta) {
+    return F0 + (1.0 - F0) * pow(1.0 - cosTheta, 5.0);
+}
+
+// metallic workflow
+vec3 F0 = mix(vec3(0.04), baseColor, metallic);
+
+// computed with V·H (specular reflection BRDF)
+float VdotH = max(0.0, dot(V, H));
+vec3 F = fresnelSchlick(F0, VdotH);
+
+// computed with N·V (environment reflection, rim light)
+float NdotV = max(0.0, dot(N, V));
+vec3 F_env = fresnelSchlick(F0, NdotV);
+```
+
+### Step 5: GGX Normal Distribution (D Term)
+
+```glsl
+float distributionGGX(float NdotH, float roughness) {
+    float a = roughness * roughness;  // roughness must be squared first
+    float a2 = a * a;
+    float denom = NdotH * NdotH * (a2 - 1.0) + 1.0;
+    return a2 / (PI * denom * denom);
+}
+```
+
+### Step 6: Geometric Shadowing (G Term)
+
+```glsl
+// Method 1: Schlick-GGX
+float geometrySchlickGGX(float NdotV, float roughness) {
+    float r = roughness + 1.0;
+    float k = (r * r) / 8.0;
+    return NdotV / (NdotV * (1.0 - k) + k);
+}
+float geometrySmith(float NdotV, float NdotL, float roughness) {
+    return geometrySchlickGGX(NdotV, roughness) * geometrySchlickGGX(NdotL, roughness);
+}
+
+// Method 2: Height-Correlated Smith (more accurate, directly returns the visibility term)
+float visibilitySmith(float NdotV, float NdotL, float roughness) {
+    float a2 = roughness * roughness;
+    float gv = NdotL * sqrt(NdotV * (NdotV - NdotV * a2) + a2);
+    float gl = NdotV * sqrt(NdotL * (NdotL - NdotL * a2) + a2);
+    return 0.5 / max(gv + gl, 0.00001);
+}
+
+// Method 3: Simplified approximation
+float G1V(float dotNV, float k) {
+    return 1.0 / (dotNV * (1.0 - k) + k);
+}
+// Usage: float vis = G1V(NdotL, k) * G1V(NdotV, k); where k = roughness/2
+```
+
+### Step 7: Assembling Cook-Torrance BRDF
+
+```glsl
+vec3 cookTorranceBRDF(vec3 N, vec3 V, vec3 L, float roughness, vec3 F0) {
+    vec3 H = normalize(V + L);
+    float NdotL = max(0.0, dot(N, L));
+    float NdotV = max(0.0, dot(N, V));
+    float NdotH = max(0.0, dot(N, H));
+    float VdotH = max(0.0, dot(V, H));
+
+    float D = distributionGGX(NdotH, roughness);
+    vec3 F = fresnelSchlick(F0, VdotH);
+    float Vis = visibilitySmith(NdotV, NdotL, roughness);
+
+    // Vis version already includes the 4*NdotV*NdotL denominator
+    vec3 specular = D * F * Vis;
+    // Or with standard G term: specular = (D * F * G) / max(4.0 * NdotV * NdotL, 0.001);
+
+    return specular * NdotL;
+}
+```
+
+### Step 8: Multi-Light Accumulation and Compositing
+
+```glsl
+vec3 shade(vec3 pos, vec3 N, vec3 V, vec3 albedo, float roughness, float metallic) {
+    vec3 F0 = mix(vec3(0.04), albedo, metallic);
+    vec3 diffuseColor = albedo * (1.0 - metallic);  // metals have no diffuse
+    vec3 color = vec3(0.0);
+
+    // primary light (sun)
+    vec3 sunDir = normalize(vec3(0.6, 0.8, -0.5));
+    vec3 sunColor = vec3(1.0, 0.95, 0.85) * 2.0;
+    vec3 H = normalize(V + sunDir);
+    float NdotL = max(0.0, dot(N, sunDir));
+    float NdotV = max(0.0, dot(N, V));
+    float VdotH = max(0.0, dot(V, H));
+    vec3 F = fresnelSchlick(F0, VdotH);
+    vec3 kD = (1.0 - F) * (1.0 - metallic);  // energy conservation
+
+    color += kD * diffuseColor / PI * sunColor * NdotL;
+    color += cookTorranceBRDF(N, V, sunDir, roughness, F0) * sunColor;
+
+    // sky light (hemisphere approximation)
+    vec3 skyColor = vec3(0.2, 0.5, 1.0) * 0.3;
+    float skyDiffuse = 0.5 + 0.5 * N.y;
+    color += diffuseColor * skyColor * skyDiffuse;
+
+    // back light / rim light
+    vec3 backDir = normalize(vec3(-sunDir.x, 0.0, -sunDir.z));
+    float backDiffuse = clamp(dot(N, backDir) * 0.5 + 0.5, 0.0, 1.0);
+    color += diffuseColor * vec3(0.25, 0.15, 0.1) * backDiffuse;
+
+    return color;
+}
+```
+
+### Step 9: Ambient Occlusion (AO)
+
+```glsl
+// Raymarching AO (using SDF queries)
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0;
+        float d = map(pos + h * nor);
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+
+float ao = calcAO(pos, N);
+diffuseLight *= ao;
+// specular AO (more subtle):
+specularLight *= clamp(pow(NdotV + ao, roughness * roughness) - 1.0 + ao, 0.0, 1.0);
+```
+
+### Outdoor Three-Light Model
+
+The go-to lighting setup for outdoor SDF scenes. Uses three directional sources to approximate full global illumination with minimal cost:
+
+```glsl
+// === Outdoor Three-Light Lighting ===
+// Compute material, occlusion, and shadow first
+vec3 material = getMaterial(pos, nor);  // albedo, keep ≤ 0.2 for realism
+float occ = calcAO(pos, nor);          // ambient occlusion
+float sha = calcSoftShadow(pos, sunDir, 0.02, 8.0);
+
+// Three light contributions
+float sun = clamp(dot(nor, sunDir), 0.0, 1.0);        // direct sunlight
+float sky = clamp(0.5 + 0.5 * nor.y, 0.0, 1.0);       // hemisphere sky light
+float ind = clamp(dot(nor, normalize(sunDir * vec3(-1.0, 0.0, -1.0))), 0.0, 1.0); // indirect bounce
+
+// Combine with colored shadows (key technique: shadow penumbra tints blue)
+vec3 lin = vec3(0.0);
+lin += sun * vec3(1.64, 1.27, 0.99) * pow(vec3(sha), vec3(1.0, 1.2, 1.5));  // warm sun, colored shadow
+lin += sky * vec3(0.16, 0.20, 0.28) * occ;   // cool sky fill
+lin += ind * vec3(0.40, 0.28, 0.20) * occ;   // warm ground bounce
+
+vec3 color = material * lin;
+```
+
+Key principles:
+- **Colored shadow penumbra**: `pow(vec3(sha), vec3(1.0, 1.2, 1.5))` makes shadow edges slightly blue/cool, mimicking real subsurface scattering in penumbra regions
+- **Material albedo rule**: Keep diffuse albedo ≤ 0.2; adjust light intensities for brightness, not material values. Real-world surfaces rarely exceed 0.3 albedo
+- **Linear workflow**: All computations in linear space, apply gamma `pow(color, vec3(1.0/2.2))` at the very end
+- **Sky light approximation**: `0.5 + 0.5 * nor.y` is a cheap hemisphere integral — surfaces pointing up get full sky, pointing down get none
+- Do NOT apply ambient occlusion to the sun/key light — shadows handle that
+
+## Complete Code Template
+
+```glsl
+// Lighting Model Complete Template - Runs directly in ShaderToy
+// Progressive implementation from Lambert to Cook-Torrance PBR
+
+#define PI 3.14159265359
+
+// ========== Adjustable Parameters ==========
+#define ROUGHNESS 0.35
+#define METALLIC 0.0
+#define ALBEDO vec3(0.8, 0.2, 0.2)
+#define SUN_DIR normalize(vec3(0.6, 0.8, -0.5))
+#define SUN_COLOR vec3(1.0, 0.95, 0.85) * 2.0
+#define SKY_COLOR vec3(0.2, 0.5, 1.0) * 0.4
+#define BACKGROUND_TOP vec3(0.5, 0.7, 1.0)
+#define BACKGROUND_BOT vec3(0.8, 0.85, 0.9)
+
+// ========== SDF Scene ==========
+float map(vec3 p) {
+    float sphere = length(p - vec3(0.0, 0.0, 0.0)) - 1.0;
+    float ground = p.y + 1.0;
+    return min(sphere, ground);
+}
+
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+// ========== AO ==========
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0;
+    float sca = 1.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0;
+        float d = map(pos + h * nor);
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+
+// ========== Soft Shadow ==========
+float softShadow(vec3 ro, vec3 rd, float mint, float maxt) {
+    float res = 1.0;
+    float t = mint;
+    for (int i = 0; i < 24; i++) {
+        float h = map(ro + rd * t);
+        res = min(res, 8.0 * h / t);
+        t += clamp(h, 0.02, 0.2);
+        if (res < 0.001 || t > maxt) break;
+    }
+    return clamp(res, 0.0, 1.0);
+}
+
+// ========== PBR BRDF Components ==========
+float D_GGX(float NdotH, float roughness) {
+    float a = roughness * roughness;
+    float a2 = a * a;
+    float d = NdotH * NdotH * (a2 - 1.0) + 1.0;
+    return a2 / (PI * d * d);
+}
+
+vec3 F_Schlick(vec3 F0, float cosTheta) {
+    return F0 + (1.0 - F0) * pow(1.0 - cosTheta, 5.0);
+}
+
+float V_SmithGGX(float NdotV, float NdotL, float roughness) {
+    float a2 = roughness * roughness;
+    a2 *= a2;
+    float gv = NdotL * sqrt(NdotV * NdotV * (1.0 - a2) + a2);
+    float gl = NdotV * sqrt(NdotL * NdotL * (1.0 - a2) + a2);
+    return 0.5 / max(gv + gl, 1e-5);
+}
+
+// ========== Complete Lighting ==========
+vec3 shade(vec3 pos, vec3 N, vec3 V, vec3 albedo, float roughness, float metallic) {
+    vec3 F0 = mix(vec3(0.04), albedo, metallic);
+    vec3 diffuseColor = albedo * (1.0 - metallic);
+    float NdotV = max(dot(N, V), 1e-4);
+    float ao = calcAO(pos, N);
+    vec3 color = vec3(0.0);
+
+    // sunlight
+    {
+        vec3 L = SUN_DIR;
+        vec3 H = normalize(V + L);
+        float NdotL = max(dot(N, L), 0.0);
+        float NdotH = max(dot(N, H), 0.0);
+        float VdotH = max(dot(V, H), 0.0);
+        float D = D_GGX(NdotH, roughness);
+        vec3  F = F_Schlick(F0, VdotH);
+        float Vis = V_SmithGGX(NdotV, NdotL, roughness);
+        vec3 kD = (1.0 - F) * (1.0 - metallic);
+        vec3 diffuse  = kD * diffuseColor / PI;
+        vec3 specular = D * F * Vis;
+        float shadow = softShadow(pos, L, 0.02, 5.0);
+        color += (diffuse + specular) * SUN_COLOR * NdotL * shadow;
+    }
+
+    // sky light (hemisphere approximation)
+    {
+        float skyDiff = 0.5 + 0.5 * N.y;
+        color += diffuseColor * SKY_COLOR * skyDiff * ao;
+    }
+
+    // back light / rim light
+    {
+        vec3 backDir = normalize(vec3(-SUN_DIR.x, 0.0, -SUN_DIR.z));
+        float backDiff = clamp(dot(N, backDir) * 0.5 + 0.5, 0.0, 1.0);
+        color += diffuseColor * vec3(0.15, 0.1, 0.08) * backDiff * ao;
+    }
+
+    // environment reflection (simplified)
+    {
+        vec3 R = reflect(-V, N);
+        vec3 envColor = mix(BACKGROUND_BOT, BACKGROUND_TOP, clamp(R.y * 0.5 + 0.5, 0.0, 1.0));
+        vec3 F_env = F_Schlick(F0, NdotV);
+        float envOcc = clamp(pow(NdotV + ao, roughness * roughness) - 1.0 + ao, 0.0, 1.0);
+        color += F_env * envColor * envOcc * (1.0 - roughness * 0.7);
+    }
+
+    return color;
+}
+
+// ========== Raymarching ==========
+float raymarch(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = 0; i < 128; i++) {
+        float d = map(ro + rd * t);
+        if (d < 0.001) return t;
+        t += d;
+        if (t > 50.0) break;
+    }
+    return -1.0;
+}
+
+// ========== Background ==========
+vec3 background(vec3 rd) {
+    vec3 col = mix(BACKGROUND_BOT, BACKGROUND_TOP, clamp(rd.y * 0.5 + 0.5, 0.0, 1.0));
+    float sun = clamp(dot(rd, SUN_DIR), 0.0, 1.0);
+    col += SUN_COLOR * 0.3 * pow(sun, 8.0);
+    col += SUN_COLOR * 1.0 * pow(sun, 256.0);
+    return col;
+}
+
+// ========== Main Function ==========
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+    float angle = iTime * 0.3;
+    vec3 ro = vec3(3.0 * cos(angle), 1.5, 3.0 * sin(angle));
+    vec3 ta = vec3(0.0, 0.0, 0.0);
+    vec3 ww = normalize(ta - ro);
+    vec3 uu = normalize(cross(ww, vec3(0.0, 1.0, 0.0)));
+    vec3 vv = cross(uu, ww);
+    vec3 rd = normalize(uv.x * uu + uv.y * vv + 1.5 * ww);
+
+    vec3 col = background(rd);
+    float t = raymarch(ro, rd);
+
+    if (t > 0.0) {
+        vec3 pos = ro + t * rd;
+        vec3 N = calcNormal(pos);
+        vec3 V = -rd;
+        vec3 albedo = ALBEDO;
+        float roughness = ROUGHNESS;
+        float metallic = METALLIC;
+
+        if (pos.y < -0.99) {
+            roughness = 0.8;
+            metallic = 0.0;
+            float checker = mod(floor(pos.x) + floor(pos.z), 2.0);
+            albedo = mix(vec3(0.3), vec3(0.6), checker);
+        }
+
+        col = shade(pos, N, V, albedo, roughness, metallic);
+    }
+
+    col = col / (col + vec3(1.0));       // Tone mapping (Reinhard)
+    col = pow(col, vec3(1.0 / 2.2));     // Gamma
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Classic Phong (Non-PBR)
+
+```glsl
+vec3 R = reflect(-L, N);
+float spec = pow(max(0.0, dot(R, V)), 32.0);
+vec3 color = albedo * lightColor * NdotL + lightColor * spec;
+```
+
+### Variant 2: Point Light Attenuation
+
+```glsl
+float dist = length(lightPos - pos);
+float attenuation = 1.0 / (1.0 + dist * 0.1 + dist * dist * 0.01);
+color *= attenuation;
+```
+
+### Variant 3: IBL (Image-Based Lighting)
+
+```glsl
+// diffuse IBL: spherical harmonics
+vec3 diffuseIBL = diffuseColor * SHIrradiance(N);
+
+// specular IBL: EnvBRDFApprox
+vec3 EnvBRDFApprox(vec3 specColor, float roughness, float NdotV) {
+    vec4 c0 = vec4(-1, -0.0275, -0.572, 0.022);
+    vec4 c1 = vec4(1, 0.0425, 1.04, -0.04);
+    vec4 r = roughness * c0 + c1;
+    float a004 = min(r.x * r.x, exp2(-9.28 * NdotV)) * r.x + r.y;
+    vec2 AB = vec2(-1.04, 1.04) * a004 + r.zw;
+    return specColor * AB.x + AB.y;
+}
+vec3 R = reflect(-V, N);
+vec3 envColor = textureLod(envMap, R, roughness * 7.0).rgb;
+vec3 specularIBL = EnvBRDFApprox(F0, roughness, NdotV) * envColor;
+```
+
+### Variant 4: Subsurface Scattering Approximation (SSS)
+
+```glsl
+// SDF-based interior probing
+float subsurface(vec3 pos, vec3 L) {
+    float sss = 0.0;
+    for (int i = 0; i < 5; i++) {
+        float h = 0.05 + float(i) * 0.1;
+        float d = map(pos + L * h);
+        sss += max(0.0, h - d);
+    }
+    return clamp(1.0 - sss * 4.0, 0.0, 1.0);
+}
+
+// Henyey-Greenstein phase function
+float HenyeyGreenstein(float cosTheta, float g) {
+    float g2 = g * g;
+    return (1.0 - g2) / (pow(1.0 + g2 - 2.0 * g * cosTheta, 1.5) * 4.0 * PI);
+}
+float sssAmount = HenyeyGreenstein(dot(V, L), 0.5);
+color += sssColor * sssAmount * NdotL;
+```
+
+### Variant 5: Beer's Law Water Lighting
+
+```glsl
+vec3 waterExtinction(float depth) {
+    float opticalDepth = depth * 6.0;
+    vec3 extinctColor = 1.0 - vec3(0.5, 0.4, 0.1);
+    return exp2(-opticalDepth * extinctColor);
+}
+vec3 underwaterColor = objectColor * waterExtinction(depth);
+vec3 inscatter = waterDiffuse * (1.0 - exp(-depth * 0.1));
+underwaterColor += inscatter;
+```
+
+## Performance & Composition
+
+- **Fresnel optimization**: Use `x2*x2*x` instead of `pow(x, 5.0)`
+- **Visibility term**: Use `V_SmithGGX` to directly return `G/(4*NdotV*NdotL)`, avoiding separate division
+- **AO sampling**: 5 samples is sufficient; can reduce to 3 at far distances
+- **Soft shadow**: `clamp(h, 0.02, 0.2)` limits step size; 14~24 steps usually sufficient; `8.0*h/t` controls softness
+- **Simplified IBL**: Without cubemap, approximate with `mix(groundColor, skyColor, R.y*0.5+0.5)`
+- **Branch culling**: Skip specular calculation when `NdotL <= 0`
+- **Raymarching integration**: Use SDF finite differences for normals, query SDF directly for AO/shadows
+- **Volume rendering integration**: Beer's Law attenuation + Henyey-Greenstein phase function; FBM noise procedural normals can be passed directly to lighting functions
+- **Post-processing integration**: ACES `(col*(2.51*col+0.03))/(col*(2.43*col+0.59)+0.14)` / Reinhard `col/(col+1)` + Gamma
+- **Reflection integration**: `reflect(rd, N)` to query scene again, blend result with Fresnel weighting
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/lighting-model.md)
--- a/skills/shader-dev/techniques/matrix-transform.md
+++ b/skills/shader-dev/techniques/matrix-transform.md
@@ -0,0 +1,455 @@
+# Matrix Transforms & Camera
+
+## Use Cases
+
+- Camera systems in 3D scenes (orbit camera, fly camera, path camera)
+- SDF object domain transforms via translation, rotation, and scale matrices
+- Generating 3D rays from screen pixels (perspective / orthographic projection)
+- Hierarchical rotation transforms for joint animation
+- Rotation in noise domain warping, IFS fractal iterations
+
+## Core Principles
+
+The essence of matrix transforms is coordinate system transformation. In a ray marching pipeline:
+
+1. **Camera matrix**: Screen pixels → world-space ray direction (view-to-world)
+2. **Object transform matrix**: World-space sample point → object local space (world-to-local, domain transform)
+
+### Key Formulas
+
+**2D Rotation** R(θ) = `[[cosθ, -sinθ], [sinθ, cosθ]]`
+
+**3D Rotation Around Y-Axis** Ry(θ) = `[[cosθ, 0, sinθ], [0, 1, 0], [-sinθ, 0, cosθ]]`
+
+**Rodrigues (Arbitrary Axis k, Angle θ)**: `R = cosθ·I + (1-cosθ)·k⊗k + sinθ·K`
+
+**LookAt Camera**:
+```
+forward = normalize(target - eye)
+right   = normalize(cross(forward, worldUp))
+up      = cross(right, forward)
+viewMatrix = mat3(right, up, forward)
+```
+
+**Perspective Ray**: `rd = normalize(camMatrix * vec3(uv, focalLength))`
+
+## Implementation Steps
+
+### Step 1: Screen Coordinate Normalization
+
+```glsl
+// Range [-aspect, aspect] x [-1, 1]
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+```
+
+### Step 2: Rotation Matrices
+
+```glsl
+// 2D rotation (mat2)
+mat2 rot2D(float a) {
+    float c = cos(a), s = sin(a);
+    return mat2(c, s, -s, c);
+}
+
+// 3D single-axis rotation (mat3)
+mat3 rotX(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(1, 0, 0,  0, c, s,  0, -s, c);
+}
+mat3 rotY(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(c, 0, s,  0, 1, 0,  -s, 0, c);
+}
+mat3 rotZ(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(c, s, 0,  -s, c, 0,  0, 0, 1);
+}
+
+// Euler angles → mat3 (yaw/pitch/roll)
+mat3 fromEuler(vec3 ang) {
+    vec2 a1 = vec2(sin(ang.x), cos(ang.x));
+    vec2 a2 = vec2(sin(ang.y), cos(ang.y));
+    vec2 a3 = vec2(sin(ang.z), cos(ang.z));
+    mat3 m;
+    m[0] = vec3( a1.y*a3.y + a1.x*a2.x*a3.x,
+                  a1.y*a2.x*a3.x + a3.y*a1.x,
+                 -a2.y*a3.x);
+    m[1] = vec3(-a2.y*a1.x, a1.y*a2.y, a2.x);
+    m[2] = vec3( a3.y*a1.x*a2.x + a1.y*a3.x,
+                  a1.x*a3.x - a1.y*a3.y*a2.x,
+                  a2.y*a3.y);
+    return m;
+}
+
+// Rodrigues arbitrary-axis rotation (mat3)
+mat3 rotationMatrix(vec3 axis, float angle) {
+    axis = normalize(axis);
+    float s = sin(angle), c = cos(angle), oc = 1.0 - c;
+    return mat3(
+        oc*axis.x*axis.x + c,          oc*axis.x*axis.y - axis.z*s, oc*axis.z*axis.x + axis.y*s,
+        oc*axis.x*axis.y + axis.z*s,   oc*axis.y*axis.y + c,        oc*axis.y*axis.z - axis.x*s,
+        oc*axis.z*axis.x - axis.y*s,   oc*axis.y*axis.z + axis.x*s, oc*axis.z*axis.z + c
+    );
+}
+```
+
+### Step 3: LookAt Camera
+
+```glsl
+// Classic setCamera, cr = camera roll
+mat3 setCamera(in vec3 ro, in vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);
+    vec3 cu = normalize(cross(cw, cp));
+    vec3 cv = normalize(cross(cu, cw));
+    return mat3(cu, cv, cw);
+}
+
+// mat4 LookAt (with translation, for homogeneous coordinate scenes)
+mat4 LookAt(vec3 pos, vec3 target, vec3 up) {
+    vec3 dir = normalize(target - pos);
+    vec3 x = normalize(cross(dir, up));
+    vec3 y = cross(x, dir);
+    return mat4(vec4(x, 0), vec4(y, 0), vec4(dir, 0), vec4(pos, 1));
+}
+```
+
+### Step 4: Perspective Ray Generation
+
+```glsl
+// mat3 camera — focalLength controls FOV: 1.0≈90°, 2.0≈53°, 4.0≈28°
+#define FOCAL_LENGTH 2.0
+mat3 cam = setCamera(ro, ta, 0.0);
+vec3 rd = cam * normalize(vec3(uv, FOCAL_LENGTH));
+
+// Manual basis vector composition
+#define FOV 1.0
+vec3 rd = normalize(camDir + (uv.x * camRight + uv.y * camUp) * FOV);
+
+// mat4 homogeneous coordinates
+mat4 viewToWorld = LookAt(camPos, camTarget, camUp);
+vec3 rd = (viewToWorld * normalize(vec4(uv, 1.0, 0.0))).xyz;
+```
+
+### Step 5: Mouse-Interactive Camera
+
+```glsl
+// Spherical coordinate orbit camera
+#define CAM_DIST 5.0
+#define CAM_HEIGHT 1.0
+
+vec2 mouse = iMouse.xy / iResolution.xy;
+float angleH = mouse.x * 6.2832;
+float angleV = mouse.y * 3.1416 - 1.5708;
+
+if (iMouse.z <= 0.0) {
+    angleH = iTime * 0.5;
+    angleV = 0.3;
+}
+
+vec3 ro = vec3(
+    CAM_DIST * cos(angleH) * cos(angleV),
+    CAM_DIST * sin(angleV) + CAM_HEIGHT,
+    CAM_DIST * sin(angleH) * cos(angleV)
+);
+vec3 ta = vec3(0.0);
+```
+
+### Step 6: SDF Domain Transforms
+
+```glsl
+// Translation
+float d = sdSphere(p - vec3(2.0, 0.0, 0.0), 1.0);
+
+// Rotation (orthogonal matrix inverse = transpose)
+float d = sdBox(rotY(0.5) * p, vec3(1.0));
+
+// Scale (divide by scale factor, multiply back into distance)
+#define SCALE 2.0
+float d = sdSphere(p / SCALE, 1.0) * SCALE;
+
+// mat4 SRT composition
+mat4 Loc4(vec3 d) {
+    d *= -1.0;
+    return mat4(1,0,0,d.x, 0,1,0,d.y, 0,0,1,d.z, 0,0,0,1);
+}
+
+mat4 transposeM4(in mat4 m) {
+    return mat4(
+        vec4(m[0].x, m[1].x, m[2].x, m[3].x),
+        vec4(m[0].y, m[1].y, m[2].y, m[3].y),
+        vec4(m[0].z, m[1].z, m[2].z, m[3].z),
+        vec4(m[0].w, m[1].w, m[2].w, m[3].w));
+}
+
+vec3 opTx(vec3 p, mat4 m) {
+    return (transposeM4(m) * vec4(p, 1.0)).xyz;
+}
+
+// First translate to (3,0,0), then rotate 45° around Y-axis
+mat4 xform = Rot4Y(0.785) * Loc4(vec3(3.0, 0.0, 0.0));
+float d = sdBox(opTx(p, xform), vec3(1.0));
+```
+
+### Step 7: Quaternion Rotation
+
+```glsl
+vec4 axisAngleToQuat(vec3 axis, float angleDeg) {
+    float half_angle = angleDeg * 3.14159265 / 360.0;
+    vec2 sc = sin(vec2(half_angle, half_angle + 1.5707963));
+    return vec4(normalize(axis) * sc.x, sc.y);
+}
+
+vec3 quatRotate(vec3 pos, vec3 axis, float angleDeg) {
+    vec4 q = axisAngleToQuat(axis, angleDeg);
+    return pos + 2.0 * cross(q.xyz, cross(q.xyz, pos) + q.w * pos);
+}
+
+// Hierarchical rotation in joint animation
+vec3 limbPos = quatRotate(p - shoulderOffset, vec3(1,0,0), swingAngle);
+float d = sdEllipsoid(limbPos, limbSize);
+```
+
+## Complete Code Template
+
+Can be run directly in ShaderToy, demonstrating LookAt camera + multi-object domain transforms + mouse interaction.
+
+```glsl
+// === Matrix Transforms & Camera - Complete Template ===
+
+#define PI 3.14159265
+#define MAX_STEPS 128
+#define MAX_DIST 50.0
+#define SURF_DIST 0.001
+#define FOCAL_LENGTH 2.0
+#define CAM_DIST 6.0
+#define AUTO_SPEED 0.4
+
+// ---------- Rotation Matrix Utilities ----------
+
+mat2 rot2D(float a) {
+    float c = cos(a), s = sin(a);
+    return mat2(c, s, -s, c);
+}
+
+mat3 rotX(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(1,0,0, 0,c,s, 0,-s,c);
+}
+
+mat3 rotY(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(c,0,s, 0,1,0, -s,0,c);
+}
+
+mat3 rotZ(float a) {
+    float s = sin(a), c = cos(a);
+    return mat3(c,s,0, -s,c,0, 0,0,1);
+}
+
+mat3 rotAxis(vec3 axis, float angle) {
+    axis = normalize(axis);
+    float s = sin(angle), c = cos(angle), oc = 1.0 - c;
+    return mat3(
+        oc*axis.x*axis.x+c,         oc*axis.x*axis.y-axis.z*s, oc*axis.z*axis.x+axis.y*s,
+        oc*axis.x*axis.y+axis.z*s,  oc*axis.y*axis.y+c,        oc*axis.y*axis.z-axis.x*s,
+        oc*axis.z*axis.x-axis.y*s,  oc*axis.y*axis.z+axis.x*s, oc*axis.z*axis.z+c
+    );
+}
+
+// ---------- LookAt Camera ----------
+
+mat3 setCamera(vec3 ro, vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);
+    vec3 cu = normalize(cross(cw, cp));
+    vec3 cv = normalize(cross(cu, cw));
+    return mat3(cu, cv, cw);
+}
+
+// ---------- SDF Primitives ----------
+
+float sdSphere(vec3 p, float r) { return length(p) - r; }
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 q = abs(p) - b;
+    return length(max(q, 0.0)) + min(max(q.x, max(q.y, q.z)), 0.0);
+}
+
+float sdTorus(vec3 p, vec2 t) {
+    vec2 q = vec2(length(p.xz) - t.x, p.y);
+    return length(q) - t.y;
+}
+
+// ---------- Scene (Domain Transform Demo) ----------
+
+float map(vec3 p) {
+    float d = p.y + 1.0; // Ground plane
+
+    // Static sphere
+    d = min(d, sdSphere(p, 0.5));
+
+    // Rotating box (spinning around Y-axis)
+    vec3 p2 = p - vec3(2.5, 0.0, 0.0);
+    p2 = rotY(iTime * 0.8) * p2;
+    d = min(d, sdBox(p2, vec3(0.6)));
+
+    // Arbitrary-axis rotating torus
+    vec3 p3 = p - vec3(-2.5, 0.5, 0.0);
+    p3 = rotAxis(vec3(1,1,0), iTime * 0.6) * p3;
+    d = min(d, sdTorus(p3, vec2(0.6, 0.2)));
+
+    // Scaled + rotated sphere
+    vec3 p4 = p - vec3(0.0, 0.5, 2.5);
+    p4 = rotZ(iTime * 1.2) * rotX(iTime * 0.7) * p4;
+    float scale = 1.5;
+    d = min(d, sdSphere(p4 / scale, 0.4) * scale);
+
+    return d;
+}
+
+// ---------- Normal ----------
+
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+// ---------- Ray March ----------
+
+float rayMarch(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + rd * t;
+        float d = map(p);
+        if (d < SURF_DIST) break;
+        t += d;
+        if (t > MAX_DIST) break;
+    }
+    return t;
+}
+
+// ---------- Main Function ----------
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+    // Mouse-interactive orbit camera
+    float angleH, angleV;
+    if (iMouse.z > 0.0) {
+        vec2 m = iMouse.xy / iResolution.xy;
+        angleH = m.x * 2.0 * PI;
+        angleV = (m.y - 0.5) * PI;
+    } else {
+        angleH = iTime * AUTO_SPEED;
+        angleV = 0.35;
+    }
+
+    vec3 ro = vec3(
+        CAM_DIST * cos(angleH) * cos(angleV),
+        CAM_DIST * sin(angleV) + 1.0,
+        CAM_DIST * sin(angleH) * cos(angleV)
+    );
+    vec3 ta = vec3(0.0);
+
+    mat3 cam = setCamera(ro, ta, 0.0);
+    vec3 rd = cam * normalize(vec3(uv, FOCAL_LENGTH));
+
+    float t = rayMarch(ro, rd);
+
+    vec3 col = vec3(0.0);
+    if (t < MAX_DIST) {
+        vec3 p = ro + rd * t;
+        vec3 n = calcNormal(p);
+        vec3 lightDir = normalize(vec3(1.0, 2.0, -1.0));
+        float diff = max(dot(n, lightDir), 0.0);
+        col = vec3(0.8, 0.85, 0.9) * (diff + 0.15);
+        if (p.y < -0.99) {
+            float checker = mod(floor(p.x) + floor(p.z), 2.0);
+            col *= 0.5 + 0.3 * checker;
+        }
+    } else {
+        col = vec3(0.4, 0.6, 0.9) - rd.y * 0.3;
+    }
+
+    col = pow(col, vec3(0.4545));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Orthographic Projection Camera
+
+```glsl
+#define ORTHO_SIZE 5.0
+mat3 cam = setCamera(ro, ta, 0.0);
+vec3 rd = cam * vec3(0.0, 0.0, 1.0);   // Fixed direction
+ro += cam * vec3(uv * ORTHO_SIZE, 0.0); // Offset origin
+```
+
+### Euler Angle Full Rotation Camera
+
+```glsl
+vec3 ang = vec3(pitch, yaw, roll);
+mat3 rot = fromEuler(ang);
+vec3 ori = vec3(0.0, 0.0, 3.0) * rot;
+vec3 rd = normalize(vec3(uv, -2.0)) * rot;
+```
+
+### Quaternion Joint Rotation
+
+```glsl
+vec3 legP = quatRotate(p - hipOffset, vec3(1,0,0), legAngle);
+float dLeg = sdEllipsoid(legP, vec3(0.2, 0.6, 0.25));
+```
+
+### mat4 SRT Pipeline
+
+```glsl
+mat4 Rot4Y(float a) {
+    float c = cos(a), s = sin(a);
+    return mat4(c,0,s,0, 0,1,0,0, -s,0,c,0, 0,0,0,1);
+}
+
+mat4 xform = Rot4Y(angle) * Loc4(vec3(3.0, 0.0, 0.0));
+float d = sdBox(opTx(p, xform), boxSize);
+```
+
+### Path Camera (Animated Flight)
+
+```glsl
+vec2 pathCenter(float z) {
+    return vec2(sin(z * 0.17) * 3.0, sin(z * 0.1 + 4.0) * 2.0);
+}
+
+float z_offset = iTime * 10.0;
+vec3 camPos = vec3(pathCenter(z_offset), 0.0);
+vec3 camTarget = vec3(pathCenter(z_offset + 5.0), 5.0);
+mat4 viewToWorld = LookAt(camPos, camTarget, camUp);
+vec3 rd = (viewToWorld * normalize(vec4(uv, 1.0, 0.0))).xyz;
+```
+
+## Performance & Composition
+
+**Performance**:
+- Compute `sin/cos` of the same angle only once: `vec2 sc = sin(vec2(a, a + 1.5707963));`
+- Use `mat3` instead of `mat4` for pure rotation (saves 7 multiply-adds)
+- Inverse of orthogonal rotation matrix = transpose; use `transpose(m)` or `v * m`
+- Pre-compute matrices that don't depend on `p` outside `map()`
+- Pre-multiply multiple rotations into a single matrix
+
+**Composition**:
+- **SDF / Ray Marching**: Camera generates rays + domain transforms place objects (fundamental pipeline)
+- **Noise / fBm**: Rotate sampling coordinates to break axis-aligned regularity `fbm(rot * p)`
+- **Fractals / IFS**: Embed rotation in iterations to create complex geometry
+- **Lighting**: Normal transform for pure rotation matrices is the same as vertex transform
+- **Post-Processing**: FOV for depth of field; `mat2` for chromatic aberration/motion blur direction
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/matrix-transform.md)
--- a/skills/shader-dev/techniques/multipass-buffer.md
+++ b/skills/shader-dev/techniques/multipass-buffer.md
@@ -0,0 +1,922 @@
+### Standalone HTML Complete Shader Template (Must Be Strictly Followed)
+
+**IMPORTANT: The following template can be copied directly; every line must be strictly followed**:
+
+**Vertex Shader** (common to all shaders):
+```glsl
+#version 300 es
+in vec4 iPosition;
+void main() {
+    gl_Position = iPosition;
+}
+```
+
+**Fragment Shader Buffer A Example** (particle physics simulation):
+```glsl
+#version 300 es
+precision highp float;
+
+// IMPORTANT: Critical: uniforms must be declared; ShaderToy's iTime/iResolution etc. are global variables
+uniform float iTime;
+uniform vec2 iResolution;
+uniform int iFrame;
+uniform vec4 iMouse;
+
+// IMPORTANT: Critical: mainImage parameters need manual extraction
+// ShaderToy: void mainImage(out vec4 fragColor, in vec2 fragCoord)
+// Adapted to:
+out vec4 fragColor;
+void main() {
+    vec2 fragCoord = gl_FragCoord.xy;
+    vec2 uv = fragCoord / iResolution;
+
+    // IMPORTANT: Critical: texture2D → texture
+    vec4 prev = texture(iChannel0, uv);
+
+    // ... particle physics logic ...
+
+    fragColor = vec4(pos, vel);
+}
+```
+
+**Fragment Shader Image Example**:
+```glsl
+#version 300 es
+precision highp float;
+
+uniform float iTime;
+uniform vec2 iResolution;
+uniform int iFrame;
+uniform vec4 iMouse;
+uniform sampler2D iChannel0;
+
+out vec4 fragColor;
+
+void main() {
+    vec2 fragCoord = gl_FragCoord.xy;
+    vec2 uv = fragCoord / iResolution;
+
+    // IMPORTANT: Critical: texture2D → texture, mainImage → standard main
+    vec4 col = texture(iChannel0, uv);
+
+    // Rendering logic
+    col = col / (1.0 + col);  // Tone mapping
+
+    fragColor = col;
+}
+```
+
+**IMPORTANT: Common GLSL ES 3.00 Errors** (must be avoided):
+1. **#version must be on the first line** - Any comments/blank lines will cause "version directive must occur on the first line" error
+2. **in/out qualifiers** - WebGL1's attribute/varying must be changed to in/out in ES3
+3. **texture function** - ES3 uses `texture(sampler, uv)`, not `texture2D(sampler, uv)`
+4. **Type strictness** - `vec4 = float` is illegal, must use `vec4(v, v, v, v)` or `vec4(v)` or `vec4(vec3(v), 1.0)`
+
+## Standalone HTML Multi-Channel Framebuffer Implementation
+
+**IMPORTANT: Multi-Channel Rendering Pipeline Core Pitfalls**: ShaderToy code requires manual Framebuffer rendering pipeline implementation. The following template demonstrates the correct approach:
+
+```javascript
+// Correct multi-channel Framebuffer creation
+const NUM_BUFFERS = 2;  // Buffer A, Buffer B
+const buffers = [];
+const textures = [];
+
+// Check float texture linear filtering extension
+const ext = gl.getExtension('EXT_color_buffer_float');
+const floatLinear = gl.getExtension('OES_texture_float_linear');
+
+// Each Buffer needs an independent Framebuffer + texture
+for (let i = 0; i < NUM_BUFFERS; i++) {
+    const texture = gl.createTexture();
+    gl.bindTexture(gl.TEXTURE_2D, texture);
+
+    // IMPORTANT: Critical: Must use UNSIGNED_BYTE format without EXT_color_buffer_float extension!
+    // RGBA16F/RGBA32F require the extension, otherwise GL_INVALID_OPERATION
+    // Float textures need EXT_color_buffer_float; RGBA16F supports HDR data
+    if (ext) {
+        gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA16F, width, height, 0, gl.RGBA, gl.FLOAT, null);
+    } else {
+        gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA, width, height, 0, gl.RGBA, gl.UNSIGNED_BYTE, null);
+    }
+
+    // IMPORTANT: Critical: Texture parameters must be set, otherwise GL_INVALID_FRAMEBUFFER
+    // IMPORTANT: Float textures use NEAREST, or require OES_texture_float_linear extension for LINEAR
+    // IMPORTANT: Critical: Float textures must use CLAMP_TO_EDGE wrap mode; REPEAT is not supported for float textures
+    // IMPORTANT: Critical: Must fall back to UNSIGNED_BYTE format without EXT_color_buffer_float extension
+    const filterMode = (ext && floatLinear) ? gl.LINEAR : gl.NEAREST;
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, filterMode);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, filterMode);
+    // IMPORTANT: Must use CLAMP_TO_EDGE: float textures do not support REPEAT
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
+
+    const fbo = gl.createFramebuffer();
+    gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+    gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, texture, 0);
+
+    // IMPORTANT: Critical: Check Framebuffer completeness
+    const status = gl.checkFramebufferStatus(gl.FRAMEBUFFER);
+    if (status !== gl.FRAMEBUFFER_COMPLETE) {
+        console.error("Framebuffer incomplete:", status);
+    }
+
+    textures.push(texture);
+    buffers.push(fbo);
+}
+gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+
+// Render loop: render to Buffer first, then render to screen
+function render() {
+    // 1. Render to Buffer A (self-feedback reads previous Buffer)
+    gl.bindFramebuffer(gl.FRAMEBUFFER, buffers[0]);
+    gl.viewport(0, 0, width, height);
+    // Bind previous frame texture to iChannel0
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[1]);  // Read from other Buffer
+    // Set uniforms etc...
+    // Execute shader rendering
+
+    // 2. Swap Buffers (simulate self-feedback)
+    // IMPORTANT: Critical: Must swap textures for next frame reading; FBO handles remain unchanged
+    [textures[0], textures[1]] = [textures[1], textures[0]];
+
+    // 3. Render to screen
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+    // Bind Buffer result to texture
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[0]);
+    // Execute Image pass shader
+}
+```
+
+**IMPORTANT: Common Errors** (JavaScript/WebGL side):
+1. **Missing texture parameters** - Must set `TEXTURE_MIN_FILTER`, `TEXTURE_MAG_FILTER`, `TEXTURE_WRAP_S/T`
+2. **Missing Framebuffer completeness check** - `gl.checkFramebufferStatus()` must return `FRAMEBUFFER_COMPLETE` before use
+3. **Float texture extension** - `gl.RGBA16F` requires `EXT_color_buffer_float` extension, otherwise fall back to `gl.UNSIGNED_BYTE`
+4. **Buffer ping-pong error** - Self-feedback must use 2 independent FBOs alternating read/write; a single FBO + texture swap causes "Feedback loop" error
+5. **Particle system empty texture initialization** - Textures are empty before the first frame; shaders reading default values cause render failure — must execute initPass() to pre-render
+
+# Multi-Pass Buffer Techniques
+
+## Use Cases
+
+When single-frame computation cannot achieve the desired effect and cross-frame data persistence or multi-stage processing pipelines are needed, use multi-pass buffers:
+
+- **Temporal accumulation**: Motion blur, TAA, progressive rendering
+- **Physics simulation**: Fluids, reaction-diffusion, particle systems
+- **Persistent state**: Game state, particle positions/velocities, interaction history
+- **Deferred rendering**: G-Buffer → post-processing → compositing
+- **Post-processing chains**: HDR Bloom (downsample → blur → composite)
+- **Iterative solvers**: Poisson solver, vorticity confinement, multi-scale computation
+
+## Core Principles
+
+Multi-pass buffers split the rendering pipeline into multiple Buffers, each outputting a texture as input for the next stage.
+
+### Self-Feedback
+A Buffer reads its own previous frame output, achieving cross-frame state persistence: `x(n+1) = f(x(n))`
+```
+Buffer A (frame N) reads → Buffer A (frame N-1) output
+```
+
+### Pipeline Chaining
+Multiple Buffers process in sequence:
+```
+Buffer A (geometry) → Buffer B (blur H) → Buffer C (blur V) → Image (compositing)
+```
+
+### Structured Data Storage
+Specific pixels serve as data registers, read precisely via `texelFetch`:
+```
+texel (0,0) = ball position+velocity (vec4)
+texel (1,0) = paddle position
+texel (x,1)-(x,12) = brick grid state
+```
+
+### Key Mathematical Patterns
+
+- **Fluid self-advection**: `newPos = texture(buf, uv - dt * velocity * texelSize)`
+- **Gaussian blur**: `sum += texture(buf, uv + offset_i) * weight_i`
+- **Temporal blending**: `result = mix(newFrame, prevFrame, blendWeight)`
+- **Vorticity confinement**: `vortForce = curl × normalize(gradient(|curl|))`
+
+## Implementation Steps
+
+### Step 1: Minimal Self-Feedback Loop
+
+Buffer A (iChannel0 → Buffer A self-feedback):
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+
+    vec4 prev = texture(iChannel0, uv);
+
+    // New content: procedural noise contour lines
+    float n = noise(vec3(uv * 8.0, 0.1 * iTime));
+    float v = sin(6.2832 * 10.0 * n);
+    v = smoothstep(1.0, 0.0, 0.5 * abs(v) / fwidth(v));
+    vec4 newContent = 0.5 + 0.5 * sin(12.0 * n + vec4(0, 2.1, -2.1, 0));
+
+    // Decay + offset blending
+    vec4 decayed = exp(-33.0 / iResolution.y) * texture(iChannel0, (fragCoord + vec2(1.0, sin(iTime))) / iResolution.xy);
+    fragColor = mix(decayed, newContent, v);
+
+    // Initialization guard
+    if (iFrame < 4) fragColor = vec4(0.5);
+}
+```
+
+Image (iChannel0 → Buffer A):
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    fragColor = texture(iChannel0, fragCoord / iResolution.xy);
+}
+```
+
+### Step 2: Fluid Self-Advection
+
+Buffer A (iChannel0 → Buffer A self-feedback):
+```glsl
+#define ROT_NUM 5
+#define SCALE_NUM 20
+
+const float ang = 6.2832 / float(ROT_NUM);
+mat2 m = mat2(cos(ang), sin(ang), -sin(ang), cos(ang));
+
+float getRot(vec2 pos, vec2 b) {
+    vec2 p = b;
+    float rot = 0.0;
+    for (int i = 0; i < ROT_NUM; i++) {
+        rot += dot(texture(iChannel0, fract((pos + p) / iResolution.xy)).xy - vec2(0.5),
+                   p.yx * vec2(1, -1));
+        p = m * p;
+    }
+    return rot / float(ROT_NUM) / dot(b, b);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 pos = fragCoord;
+    float rnd = fract(sin(float(iFrame) * 12.9898) * 43758.5453);
+    vec2 b = vec2(cos(ang * rnd), sin(ang * rnd));
+
+    // Multi-scale rotation sampling
+    vec2 v = vec2(0);
+    float bbMax = 0.7 * iResolution.y;
+    bbMax *= bbMax;
+    for (int l = 0; l < SCALE_NUM; l++) {
+        if (dot(b, b) > bbMax) break;
+        vec2 p = b;
+        for (int i = 0; i < ROT_NUM; i++) {
+            v += p.yx * getRot(pos + p, b);
+            p = m * p;
+        }
+        b *= 2.0;
+    }
+
+    // Self-advection
+    fragColor = texture(iChannel0, fract((pos + v * vec2(-1, 1) * 2.0) / iResolution.xy));
+
+    // Center driving force
+    vec2 scr = (fragCoord / iResolution.xy) * 2.0 - 1.0;
+    fragColor.xy += 0.01 * scr / (dot(scr, scr) / 0.1 + 0.3);
+
+    if (iFrame <= 4) fragColor = texture(iChannel1, fragCoord / iResolution.xy);
+}
+```
+
+### Step 3-4: Navier-Stokes Solver + Chained Acceleration
+
+Buffer A / B / C use identical code (via Common tab's `solveFluid`):
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 w = 1.0 / iResolution.xy;
+
+    vec4 lastMouse = texelFetch(iChannel0, ivec2(0, 0), 0);
+    vec4 data = solveFluid(iChannel0, uv, w, iTime, iMouse.xyz, lastMouse.xyz);
+
+    if (iFrame < 20) data = vec4(0.5, 0, 0, 0);
+    if (fragCoord.y < 1.0) data = iMouse;  // Mouse state storage
+
+    fragColor = data;
+}
+```
+
+iChannel bindings: A→C(prev frame), B→A, C→B — 3 iterations per frame.
+
+### Step 5: Separable Gaussian Blur
+
+Buffer B (horizontal, iChannel0 → source Buffer) — Buffer C vertical direction is analogous, using y-axis offset:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 pixelSize = 1.0 / iResolution.xy;
+    vec2 uv = fragCoord * pixelSize;
+    float h = pixelSize.x;
+    vec4 sum = vec4(0.0);
+    // 9-tap Gaussian (sigma ≈ 2.0)
+    sum += texture(iChannel0, fract(vec2(uv.x - 4.0*h, uv.y))) * 0.05;
+    sum += texture(iChannel0, fract(vec2(uv.x - 3.0*h, uv.y))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x - 2.0*h, uv.y))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x - 1.0*h, uv.y))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x,          uv.y))) * 0.16;
+    sum += texture(iChannel0, fract(vec2(uv.x + 1.0*h, uv.y))) * 0.15;
+    sum += texture(iChannel0, fract(vec2(uv.x + 2.0*h, uv.y))) * 0.12;
+    sum += texture(iChannel0, fract(vec2(uv.x + 3.0*h, uv.y))) * 0.09;
+    sum += texture(iChannel0, fract(vec2(uv.x + 4.0*h, uv.y))) * 0.05;
+    fragColor = vec4(sum.xyz / 0.98, 1.0);
+}
+```
+
+### Step 6: Structured State Storage
+
+```glsl
+// Register address definitions
+const ivec2 txBallPosVel = ivec2(0, 0);
+const ivec2 txPaddlePos  = ivec2(1, 0);
+const ivec2 txPoints     = ivec2(2, 0);
+const ivec2 txState      = ivec2(3, 0);
+const ivec4 txBricks     = ivec4(0, 1, 13, 12);
+
+vec4 loadValue(ivec2 addr) {
+    return texelFetch(iChannel0, addr, 0);
+}
+
+void storeValue(ivec2 addr, vec4 val, inout vec4 fragColor, ivec2 currentPixel) {
+    fragColor = (currentPixel == addr) ? val : fragColor;
+}
+
+void storeValue(ivec4 rect, vec4 val, inout vec4 fragColor, ivec2 currentPixel) {
+    fragColor = (currentPixel.x >= rect.x && currentPixel.y >= rect.y &&
+                 currentPixel.x <= rect.z && currentPixel.y <= rect.w) ? val : fragColor;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    ivec2 px = ivec2(fragCoord - 0.5);
+    if (fragCoord.x > 14.0 || fragCoord.y > 14.0) discard;
+
+    vec4 ballPosVel = loadValue(txBallPosVel);
+    float paddlePos = loadValue(txPaddlePos).x;
+    float points = loadValue(txPoints).x;
+
+    if (iFrame == 0) {
+        ballPosVel = vec4(0.0, -0.8, 0.6, 1.0);
+        paddlePos = 0.0;
+        points = 0.0;
+    }
+
+    // ... game logic update ...
+
+    fragColor = loadValue(px);
+    storeValue(txBallPosVel, ballPosVel, fragColor, px);
+    storeValue(txPaddlePos, vec4(paddlePos, 0, 0, 0), fragColor, px);
+    storeValue(txPoints, vec4(points, 0, 0, 0), fragColor, px);
+}
+```
+
+### Step 7: Mouse State Inter-Frame Tracking
+
+```glsl
+// Method 1: First-row pixel storage
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 w = 1.0 / iResolution.xy;
+    vec4 lastMouse = texelFetch(iChannel0, ivec2(0, 0), 0);
+    // ... simulation logic ...
+    if (fragCoord.y < 1.0) fragColor = iMouse;
+}
+
+// Method 2: Fixed UV region storage
+vec2 mouseDelta() {
+    vec2 pixelSize = 1.0 / iResolution.xy;
+    float eighth = 1.0 / 8.0;
+    vec4 oldMouse = texture(iChannel2, vec2(7.5 * eighth, 2.5 * eighth));
+    vec4 nowMouse = vec4(iMouse.xy / iResolution.xy, iMouse.zw / iResolution.xy);
+    if (oldMouse.z > pixelSize.x && oldMouse.w > pixelSize.y &&
+        nowMouse.z > pixelSize.x && nowMouse.w > pixelSize.y) {
+        return nowMouse.xy - oldMouse.xy;
+    }
+    return vec2(0.0);
+}
+```
+
+## Complete Code Template
+
+A fully runnable fluid simulation shader (self-feedback + vorticity confinement + mouse interaction + color advection).
+
+### Common tab
+
+```glsl
+#define DT 0.15
+#define VORTICITY_AMOUNT 0.11
+#define VISCOSITY 0.55
+#define PRESSURE_K 0.2
+#define FORCE_RADIUS 0.001
+#define FORCE_STRENGTH 0.001
+#define VELOCITY_DECAY 1e-4
+
+float mag2(vec2 p) { return dot(p, p); }
+
+vec2 emitter1(float t) { t *= 0.62; return vec2(0.12, 0.5 + sin(t) * 0.2); }
+vec2 emitter2(float t) { t *= 0.62; return vec2(0.88, 0.5 + cos(t + 1.5708) * 0.2); }
+
+vec4 solveFluid(sampler2D smp, vec2 uv, vec2 w, float time, vec3 mouse, vec3 lastMouse) {
+    vec4 data = textureLod(smp, uv, 0.0);
+    vec4 tr = textureLod(smp, uv + vec2(w.x, 0), 0.0);
+    vec4 tl = textureLod(smp, uv - vec2(w.x, 0), 0.0);
+    vec4 tu = textureLod(smp, uv + vec2(0, w.y), 0.0);
+    vec4 td = textureLod(smp, uv - vec2(0, w.y), 0.0);
+
+    vec3 dx = (tr.xyz - tl.xyz) * 0.5;
+    vec3 dy = (tu.xyz - td.xyz) * 0.5;
+    vec2 densDif = vec2(dx.z, dy.z);
+
+    data.z -= DT * dot(vec3(densDif, dx.x + dy.y), data.xyz);
+
+    vec2 laplacian = tu.xy + td.xy + tr.xy + tl.xy - 4.0 * data.xy;
+    vec2 viscForce = vec2(VISCOSITY) * laplacian;
+
+    data.xyw = textureLod(smp, uv - DT * data.xy * w, 0.0).xyw;
+
+    vec2 newForce = vec2(0);
+    newForce += 0.75 * vec2(0.0003, 0.00015) / (mag2(uv - emitter1(time)) + 0.0001);
+    newForce -= 0.75 * vec2(0.0003, 0.00015) / (mag2(uv - emitter2(time)) + 0.0001);
+
+    if (mouse.z > 1.0 && lastMouse.z > 1.0) {
+        vec2 vv = clamp((mouse.xy * w - lastMouse.xy * w) * 400.0, -6.0, 6.0);
+        newForce += FORCE_STRENGTH / (mag2(uv - mouse.xy * w) + FORCE_RADIUS) * vv;
+    }
+
+    data.xy += DT * (viscForce - PRESSURE_K / DT * densDif + newForce);
+    data.xy = max(vec2(0), abs(data.xy) - VELOCITY_DECAY) * sign(data.xy);
+
+    data.w = (tr.y - tl.y - tu.x + td.x);
+    vec2 vort = vec2(abs(tu.w) - abs(td.w), abs(tl.w) - abs(tr.w));
+    vort *= VORTICITY_AMOUNT / length(vort + 1e-9) * data.w;
+    data.xy += vort;
+
+    data.y *= smoothstep(0.5, 0.48, abs(uv.y - 0.5));
+    data = clamp(data, vec4(vec2(-10), 0.5, -10.0), vec4(vec2(10), 3.0, 10.0));
+
+    return data;
+}
+```
+
+### Buffer A / B / C (Fluid Sub-Steps 1/2/3)
+
+iChannel bindings: A←C(prev frame), B←A, C←B
+
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 w = 1.0 / iResolution.xy;
+    vec4 lastMouse = texelFetch(iChannel0, ivec2(0, 0), 0);
+    vec4 data = solveFluid(iChannel0, uv, w, iTime, iMouse.xyz, lastMouse.xyz);
+    if (iFrame < 20) data = vec4(0.5, 0, 0, 0);
+    if (fragCoord.y < 1.0) data = iMouse;
+    fragColor = data;
+}
+```
+
+### Buffer D (Color Advection, iChannel0 → Buffer C, iChannel1 → Buffer D self-feedback)
+
+```glsl
+#define COLOR_DECAY 0.004
+#define COLOR_ADVECT_SCALE 3.0
+
+vec3 getPalette(float x, vec3 c1, vec3 c2, vec3 p1, vec3 p2) {
+    float x2 = fract(x / 2.0);
+    x = fract(x);
+    mat3 m = mat3(c1, p1, c2);
+    mat3 m2 = mat3(c2, p2, c1);
+    float omx = 1.0 - x;
+    vec3 pws = vec3(omx * omx, 2.0 * omx * x, x * x);
+    return clamp(mix(m * pws, m2 * pws, step(x2, 0.5)), 0.0, 1.0);
+}
+
+vec4 palette1(float x) {
+    return vec4(getPalette(-x, vec3(0.2, 0.5, 0.7), vec3(0.9, 0.4, 0.1),
+                vec3(1.0, 1.2, 0.5), vec3(1.0, -0.4, 0.0)), 1.0);
+}
+vec4 palette2(float x) {
+    return vec4(getPalette(-x, vec3(0.4, 0.3, 0.5), vec3(0.9, 0.75, 0.4),
+                vec3(0.1, 0.8, 1.3), vec3(1.25, -0.1, 0.1)), 1.0);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 w = 1.0 / iResolution.xy;
+
+    vec2 velo = textureLod(iChannel0, uv, 0.0).xy;
+    vec4 col = textureLod(iChannel1, uv - DT * velo * w * COLOR_ADVECT_SCALE, 0.0);
+
+    vec2 mo = iMouse.xy / iResolution.xy;
+    vec4 lastMouse = texelFetch(iChannel1, ivec2(0, 0), 0);
+    if (iMouse.z > 1.0 && lastMouse.z > 1.0) {
+        float str = smoothstep(-0.5, 1.0, length(mo - lastMouse.xy / iResolution.xy));
+        col += str * 0.0009 / (pow(length(uv - mo), 1.7) + 0.002) * palette2(-iTime * 0.7);
+    }
+
+    col += 0.0025 / (0.0005 + pow(length(uv - emitter1(iTime)), 1.75)) * DT * 0.12 * palette1(iTime * 0.05);
+    col += 0.0025 / (0.0005 + pow(length(uv - emitter2(iTime)), 1.75)) * DT * 0.12 * palette2(iTime * 0.05 + 0.675);
+
+    if (iFrame < 20) col = vec4(0.0);
+    col = clamp(col, 0.0, 5.0);
+    col = max(col - (0.0001 + col * COLOR_DECAY) * 0.5, 0.0);
+
+    if (fragCoord.y < 1.0 && fragCoord.x < 1.0) col = iMouse;
+    fragColor = col;
+}
+```
+
+### Image (iChannel0 → Buffer D)
+
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec4 col = textureLod(iChannel0, fragCoord / iResolution.xy, 0.0);
+    if (fragCoord.y < 1.0 || fragCoord.y >= iResolution.y - 1.0) col = vec4(0);
+    fragColor = col;
+}
+```
+
+## Common Variants
+
+### Variant 1: TAA Temporal Accumulation Anti-Aliasing
+
+```glsl
+// Buffer A: Sub-pixel jittered rendering
+vec2 jitter = vec2(rand(uv + sin(iTime)), rand(uv + 1.0 + sin(iTime))) / iResolution.xy;
+vec3 eyevec = normalize(vec3(((uv + jitter) * 2.0 - 1.0) * vec2(aspect, 1.0), fov));
+float blendWeight = 0.9;
+color = mix(color, texture(iChannel_self, uv).rgb, blendWeight);
+
+// Buffer C (TAA): YCoCg neighborhood clamping to prevent ghosting
+vec3 newYCC = RGBToYCoCg(newFrame);
+vec3 histYCC = RGBToYCoCg(history);
+vec3 colorAvg = ...; vec3 colorVar = ...;
+vec3 sigma = sqrt(max(vec3(0), colorVar - colorAvg * colorAvg));
+histYCC = clamp(histYCC, colorAvg - 0.75 * sigma, colorAvg + 0.75 * sigma);
+result = YCoCgToRGB(mix(newYCC, histYCC, 0.95));
+```
+
+### Variant 2: Deferred Rendering G-Buffer
+
+```glsl
+// Buffer A: G-Buffer output
+col.xy = (normal * camMat * 0.5 + 0.5).xy;  // Normal
+col.z = 1.0 - abs((t * rd) * camMat).z / DMAX;  // Depth
+col.w = dot(lightDir, nor) * 0.5 + 0.5;  // Diffuse
+
+// Buffer B: Edge detection
+float checkSame(vec4 center, vec4 sample) {
+    vec2 diffNormal = abs(center.xy - sample.xy) * Sensitivity.x;
+    float diffDepth = abs(center.z - sample.z) * Sensitivity.y;
+    return (diffNormal.x + diffNormal.y < 0.1 && diffDepth < 0.1) ? 1.0 : 0.0;
+}
+```
+
+### Variant 3: HDR Bloom
+
+```glsl
+// Buffer B: MIP pyramid (multi-level downsampling packed into one texture)
+vec2 CalcOffset(float octave) {
+    vec2 offset = vec2(0);
+    vec2 padding = vec2(10.0) / iResolution.xy;
+    offset.x = -min(1.0, floor(octave / 3.0)) * (0.25 + padding.x);
+    offset.y = -(1.0 - 1.0 / exp2(octave)) - padding.y * octave;
+    offset.y += min(1.0, floor(octave / 3.0)) * 0.35;
+    return offset;
+}
+// Image: Accumulate multi-level bloom + Reinhard tone mapping
+bloom += Grab(coord, 1.0, CalcOffset(0.0)) * 1.0;
+bloom += Grab(coord, 2.0, CalcOffset(1.0)) * 1.5;
+color = pow(color, vec3(1.5));
+color = color / (1.0 + color);
+```
+
+### Variant 4: Reaction-Diffusion System
+
+```glsl
+// Buffer A: Gray-Scott reaction-diffusion
+vec2 uv_red = uv + vec2(dx.x, dy.x) * pixelSize * 8.0;
+float new_val = texture(iChannel0, fract(uv_red)).x;
+new_val += (noise.x - 0.5) * 0.0025 - 0.002;
+new_val -= (texture(iChannel_blur, fract(uv_red)).x -
+            texture(iChannel_self, fract(uv_red)).x) * 0.047;
+```
+
+### Variant 5: Multi-Scale MIP Fluid
+
+```glsl
+for (int i = 0; i < NUM_SCALES; i++) {
+    float mip = float(i);
+    float stride = float(1 << i);
+    vec4 t = stride * vec4(texel, -texel.y, 0);
+    vec2 d = textureLod(sampler, fract(uv + t.ww), mip).xy;
+    float w = WEIGHT_FUNCTION;
+    result += w * computation(neighbors);
+}
+```
+
+### Variant 6: Particle System (Position-Velocity Storage)
+
+**IMPORTANT: Particle System Implementation Key**: Particle state is stored in texture pixels, one particle per pixel. Rendering must iterate over the particle texture for sampling.
+
+**Buffer A (Particle Physics Simulation)**:
+```glsl
+// Each texture pixel stores one particle: xy=position, zw=velocity
+
+// IMPORTANT: Critical: hash function must return vec2! Returning float causes type mismatch errors
+vec2 hash2(vec2 p) {
+    return fract(sin(mat2(127.1, 311.7, 269.5, 183.3) * p) * 43758.5453);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec4 prev = texture(iChannel0, uv);
+
+    vec2 pos = prev.xy;
+    vec2 vel = prev.zw;
+
+    // IMPORTANT: Initialization guard: use integer comparison + pixel-coordinate-based random (avoids particle overlap when time is too small)
+    if (iFrame < 3) {
+        // Use fragCoord (pixel coordinates) to ensure each particle has a unique position, independent of time
+        // IMPORTANT: Critical: hash2 returns vec2, assign directly to pos/vel
+        pos = hash2(fragCoord * 0.01 + vec2(1.7, 9.3));
+        vel = (hash2(fragCoord * 0.01 + vec2(5.3, 2.8)) - 0.5) * 0.02;
+        fragColor = vec4(pos, vel);
+        return;
+    }
+
+    // Physics update
+    vel *= 0.98;  // Damping
+
+    // Mouse interaction
+    vec2 mouse = iMouse.xy / iResolution.xy;
+    if (iMouse.z > 0.0) {
+        vec2 toMouse = mouse - pos;
+        vel += normalize(toMouse + 0.001) * 0.0005 / (length(toMouse) + 0.1);
+    }
+
+    // Motion
+    pos += vel * 60.0 * 0.016;
+    pos = fract(pos);  // Boundary wrapping
+
+    fragColor = vec4(pos, vel);
+}
+```
+
+**Image (Render Particles)**:
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 w = 1.0 / iResolution.xy;
+
+    vec3 color = vec3(0.02, 0.02, 0.05);  // Dark background
+
+    // Iterate over particle texture for sampling (performance-sensitive, balance sample count)
+    float glow = 0.0;
+    for (float y = 0.0; y < 1.0; y += 0.02) {  // IMPORTANT: Step size determines sampling density
+        for (float x = 0.0; x < 1.0; x += 0.02) {
+            vec4 particle = texture(iChannel0, vec2(x, y));
+            vec2 pPos = particle.xy;
+            float dist = length(uv - pPos);
+            float size = 0.01 + length(particle.zw) * 0.3;
+            glow += exp(-dist * dist / (size * size)) * 0.15;
+        }
+    }
+
+    // Particle glow
+    color += vec3(0.3, 0.6, 1.0) * glow;
+
+    // Vignette
+    color *= 1.0 - length(uv - 0.5) * 0.8;
+
+    // Tone mapping
+    color = color / (1.0 + color);
+
+    fragColor = vec4(color, 1.0);
+}
+```
+
+**Key Points**:
+- Buffer A self-feedback: iChannel0 → Buffer A
+- Image reads: iChannel0 → Buffer A (particle state)
+- Step size 0.02 produces 2500 samples; adjust based on performance
+- Particle size varies with velocity: `size = 0.01 + length(vel) * 0.3`
+
+**Complete JavaScript Rendering Pipeline (Particle System 3-Pass)**:
+```javascript
+// Particle system needs 4 Framebuffers (2 each for Buffer A and Buffer B ping-pong) + screen output
+// Buffer A: Particle physics (self-feedback) - uses FBO 0/1 ping-pong
+// Buffer B: Density accumulation (reads Buffer A) - uses FBO 2/3 ping-pong
+// Image: Final rendering (reads Buffer A + Buffer B)
+
+// IMPORTANT: Critical: Must use 2 FBOs for ping-pong! Single FBO + texture swap causes
+// "Feedback loop formed between Framebuffer and active Texture" error
+const buffers = [null, null, null, null];  // [A_FBO0, A_FBO1, B_FBO0, B_FBO1]
+const textures = [null, null, null, null];  // [A_tex0, A_tex1, B_tex0, B_tex1]
+
+function createBuffers() {
+    // Buffer A: 2 FBOs for ping-pong
+    for (let i = 0; i < 2; i++) {
+        const tex = createTexture();
+        textures[i] = tex;
+
+        const fbo = gl.createFramebuffer();
+        gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+        gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex, 0);
+        buffers[i] = fbo;
+    }
+    // Buffer B: 2 FBOs for ping-pong
+    for (let i = 0; i < 2; i++) {
+        const tex = createTexture();
+        textures[2 + i] = tex;
+
+        const fbo = gl.createFramebuffer();
+        gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+        gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex, 0);
+        buffers[2 + i] = fbo;
+    }
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+}
+
+// IMPORTANT: Critical: Initialization pre-rendering - must execute before the first frame!
+// Empty textures cause particle initialization failure (reading 0,0,0,0 makes all particles overlap)
+let aReadIdx = 0;  // Current read FBO index (0 or 1)
+let bReadIdx = 0;  // Buffer B current read FBO index (0 or 1)
+
+function initPass() {
+    // ===== Buffer A Initialization =====
+    // Render first frame using FBO 0
+    gl.bindFramebuffer(gl.FRAMEBUFFER, buffers[0]);
+    gl.viewport(0, 0, width, height);
+    gl.useProgram(programBufferA);
+    setupAttribute(programBufferA);
+    // Bind FBO 1's texture as input (not yet rendered, but avoids binding errors)
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[1]);
+    gl.uniform1i(gl.getUniformLocation(programBufferA, 'iChannel0'), 0);
+    gl.uniform2f(gl.getUniformLocation(programBufferA, 'iResolution'), width, height);
+    gl.uniform1f(gl.getUniformLocation(programBufferA, 'iTime'), 0);
+    gl.uniform1i(gl.getUniformLocation(programBufferA, 'iFrame'), 0);
+    gl.uniform4f(gl.getUniformLocation(programBufferA, 'iMouse'), 0, 0, 0, 0);
+    gl.drawArrays(gl.TRIANGLES, 0, 6);
+
+    // Render second frame using FBO 1 (iFrame=1)
+    gl.bindFramebuffer(gl.FRAMEBUFFER, buffers[1]);
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[0]);  // Read FBO 0's result
+    gl.uniform1i(gl.getUniformLocation(programBufferA, 'iFrame'), 1);
+    gl.drawArrays(gl.TRIANGLES, 0, 6);
+
+    // Render one more frame to ensure initialization is complete
+    gl.bindFramebuffer(gl.FRAMEBUFFER, buffers[0]);
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[1]);
+    gl.uniform1i(gl.getUniformLocation(programBufferA, 'iFrame'), 2);
+    gl.drawArrays(gl.TRIANGLES, 0, 6);
+
+    // ===== Buffer B Initialization =====
+    gl.bindFramebuffer(gl.FRAMEBUFFER, buffers[2]);  // B_FBO0
+    gl.viewport(0, 0, width, height);
+    gl.useProgram(programBufferB);
+    setupAttribute(programBufferB);
+
+    // Bind latest Buffer A result (FBO 0's result)
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[0]);
+    gl.uniform1i(gl.getUniformLocation(programBufferB, 'iChannel0'), 0);
+
+    // Bind Buffer B previous frame (FBO 3's texture, not yet rendered)
+    gl.activeTexture(gl.TEXTURE1);
+    gl.bindTexture(gl.TEXTURE_2D, textures[3]);
+    gl.uniform1i(gl.getUniformLocation(programBufferB, 'iChannel1'), 1);
+
+    gl.uniform2f(gl.getUniformLocation(programBufferB, 'iResolution'), width, height);
+    gl.uniform1f(gl.getUniformLocation(programBufferB, 'iTime'), 0);
+    gl.uniform1i(gl.getUniformLocation(programBufferB, 'iFrame'), 0);
+    gl.drawArrays(gl.TRIANGLES, 0, 6);
+
+    // Buffer B second frame
+    gl.bindFramebuffer(gl.FRAMEBUFFER, buffers[3]);  // B_FBO1
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[1]);  // Buffer A latest
+    gl.activeTexture(gl.TEXTURE1);
+    gl.bindTexture(gl.TEXTURE_2D, textures[2]);  // Buffer B FBO0 result
+    gl.uniform1i(gl.getUniformLocation(programBufferB, 'iFrame'), 1);
+    gl.drawArrays(gl.TRIANGLES, 0, 6);
+
+    // Initialize ping-pong indices
+    aReadIdx = 0;  // Next frame reads FBO 0
+    bReadIdx = 0;  // Next frame reads FBO 2
+
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+}
+
+function render() {
+    // ===== Pass 1: Buffer A (Particle Physics Self-Feedback) =====
+    // aReadIdx = 0: read FBO 0, write FBO 1
+    // aReadIdx = 1: read FBO 1, write FBO 0
+    const aWriteIdx = 1 - aReadIdx;
+
+    // Write to target FBO (not the current read FBO)
+    gl.bindFramebuffer(gl.FRAMEBUFFER, buffers[aWriteIdx]);
+    gl.viewport(0, 0, width, height);
+
+    // Read previous frame Buffer A texture (from current read FBO's texture)
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[aReadIdx]);
+    gl.uniform1i(uniformsBufferA.iChannel0, 0);
+
+    gl.uniform2f(uniformsBufferA.iResolution, width, height);
+    gl.uniform1f(uniformsBufferA.iTime, time);
+    gl.uniform1i(uniformsBufferA.iFrame, frameCount);
+    gl.uniform4f(uniformsBufferA.iMouse, mouse.x, mouse.y, mouse.z, mouse.w);
+
+    // Render particle physics
+    gl.useProgram(programBufferA);
+    gl.drawArrays(gl.TRIANGLES, 0, 6);
+
+    // Switch read index
+    aReadIdx = aWriteIdx;
+
+    // ===== Pass 2: Buffer B (Density Field) =====
+    const bWriteIdx = 1 - bReadIdx;
+
+    gl.bindFramebuffer(gl.FRAMEBUFFER, buffers[2 + bWriteIdx]);  // B_FBO0 or B_FBO1
+    gl.viewport(0, 0, width, height);
+
+    // Bind current Buffer A particle state (use latest Buffer A result)
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[aReadIdx]);  // A latest result
+    gl.uniform1i(uniformsBufferB.iChannel0, 0);
+
+    // Bind previous frame Buffer B density (for accumulation)
+    gl.activeTexture(gl.TEXTURE1);
+    gl.bindTexture(gl.TEXTURE_2D, textures[2 + bReadIdx]);  // B_read
+    gl.uniform1i(uniformsBufferB.iChannel1, 1);
+
+    gl.uniform2f(uniformsBufferB.iResolution, width, height);
+    gl.uniform1f(uniformsBufferB.iTime, time);
+    gl.uniform1i(uniformsBufferB.iFrame, frameCount);
+
+    // Render density accumulation
+    gl.useProgram(programBufferB);
+    gl.drawArrays(gl.TRIANGLES, 0, 6);
+
+    // Switch Buffer B read index
+    bReadIdx = bWriteIdx;
+
+    // ===== Pass 3: Image (Final Rendering to Screen) =====
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+    gl.viewport(0, 0, width, height);
+
+    // Bind Buffer A particles (use latest Buffer A result)
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, textures[aReadIdx]);
+    gl.uniform1i(uniformsImage.iChannel0, 0);
+
+    // Bind Buffer B density (use latest Buffer B result)
+    gl.activeTexture(gl.TEXTURE1);
+    gl.bindTexture(gl.TEXTURE_2D, textures[2 + bReadIdx]);
+    gl.uniform1i(uniformsImage.iChannel1, 1);
+
+    gl.uniform2f(uniformsImage.iResolution, width, height);
+    gl.uniform1f(uniformsImage.iTime, time);
+    gl.uniform1i(uniformsImage.iFrame, frameCount);
+    gl.uniform4f(uniformsImage.iMouse, mouse.x, mouse.y, mouse.z, mouse.w);
+
+    // Render to screen
+    gl.useProgram(programImage);
+    gl.drawArrays(gl.TRIANGLES, 0, 6);
+}
+```
+
+**IMPORTANT: Key Points**:
+- **Must use 2 FBOs for ping-pong**: Each Buffer needs two independent FBOs (read FBO + write FBO); a single FBO + texture swap causes "Feedback loop" error
+- Use FBO index switching (not texture swapping): bind target FBO when writing, bind source texture when reading
+- Image pass binds the latest Buffer results (obtained via read index)
+
+## Performance & Composition
+
+**Performance Optimization**:
+- Separable blur: N² → 2N samples
+- Bilinear tap trick: 5 samples replace 9-tap Gaussian
+- MIP sampling replaces large kernels: `textureLod` at high MIP levels ≈ large-range average
+- `discard` outside data regions to skip unnecessary computation
+- RGBA channel packing: velocity(xy) + density(z) + curl(w) in one vec4
+- Chained sub-steps: A→B→C same code for 3x simulation speed
+- `if (dot(b,b) > bbMax) break;` adaptive early exit
+- `iFrame < 20` progressive initialization to prevent explosion
+
+**Typical Composition Patterns**:
+- **Fluid + Lighting**: Fluid buffer → Image computes gradient normals → diffuse + specular
+- **Fluid + Color Advection**: Separate Buffer tracks color field, advected by velocity field
+- **Scene + Bloom + TAA**: 4-Buffer pipeline (render → downsample → blur → composite tone mapping)
+- **G-Buffer + Screen-Space Effects**: 2-Buffer without temporal feedback (geometry → edge/SSAO/SSR → stylized compositing)
+- **State Storage + Visualization Separation**: Buffer A pure logic + Image pure rendering (`texelFetch` reads state + distance field drawing)
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/multipass-buffer.md)
--- a/skills/shader-dev/techniques/normal-estimation.md
+++ b/skills/shader-dev/techniques/normal-estimation.md
@@ -0,0 +1,318 @@
+## WebGL2 Adaptation Requirements
+
+**IMPORTANT: GLSL Type Strictness Warning**:
+- GLSL is a strongly typed language with **no `string` type**; using string types is forbidden
+- Common illegal types: `string`, `int` (can only use `int` literals, cannot declare variable types as `int`)
+- vec2/vec3/vec4 cannot be implicitly converted between each other; explicit construction is required
+- Float precision: `highp float` (recommended), `mediump float`, `lowp float`
+
+The code templates in this document use ShaderToy GLSL style. When generating standalone HTML pages, you must adapt for WebGL2:
+
+- Use `canvas.getContext("webgl2")`
+- Shader first line: `#version 300 es`, add `precision highp float;` in fragment shader
+- Vertex shader: `attribute` -> `in`, `varying` -> `out`
+- Fragment shader: `varying` -> `in`, `gl_FragColor` -> custom `out vec4 fragColor`, `texture2D()` -> `texture()`
+- ShaderToy's `void mainImage(out vec4 fragColor, in vec2 fragCoord)` must be adapted to the standard `void main()` entry point
+
+# SDF Normal Estimation
+
+## Use Cases
+
+- Lighting calculations in raymarching rendering pipelines (diffuse, specular, Fresnel, etc.)
+- Any 3D scene based on SDF distance fields (fractals, parametric surfaces, boolean geometry, procedural terrain)
+- Edge detection and contour rendering (Laplacian value as a byproduct of normal sampling)
+- Prerequisite for ambient occlusion (AO) computation
+
+## Core Principles
+
+The gradient of an SDF `nabla f(p)` points in the direction of fastest distance increase, which is the outward surface normal. Numerical differentiation approximates the gradient:
+
+$$\vec{n} = \text{normalize}\left(\nabla f(p)\right)$$
+
+Three main strategies:
+
+| Method | Samples | Accuracy | Recommendation |
+|--------|---------|----------|----------------|
+| Forward difference | 4 | O(epsilon) | Simple scenes |
+| Central difference | 6 | O(epsilon^2) | When symmetry is needed |
+| **Tetrahedron method** | **4** | **Between the two** | **Preferred** |
+
+Key parameter epsilon: commonly `0.0005 ~ 0.001`; for advanced scenes, multiply by ray distance `t` for adaptive scaling.
+
+## Implementation Steps
+
+### Step 1: Define SDF Scene Function
+
+```glsl
+float map(vec3 p) {
+    float d = length(p) - 1.0; // unit sphere
+    return d;
+}
+```
+
+### Step 2: Choose Differentiation Method
+
+#### Method A: Forward Difference -- 4 Samples
+
+```glsl
+const float EPSILON = 1e-3;
+
+vec3 getNormal(vec3 p) {
+    vec3 n;
+    n.x = map(vec3(p.x + EPSILON, p.y, p.z));
+    n.y = map(vec3(p.x, p.y + EPSILON, p.z));
+    n.z = map(vec3(p.x, p.y, p.z + EPSILON));
+    return normalize(n - map(p));
+}
+```
+
+#### Method B: Central Difference -- 6 Samples
+
+```glsl
+vec3 getNormal(vec3 p) {
+    vec2 o = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + o.xyy) - map(p - o.xyy),
+        map(p + o.yxy) - map(p - o.yxy),
+        map(p + o.yyx) - map(p - o.yyx)
+    ));
+}
+```
+
+#### Method C: Tetrahedron Method -- 4 Samples (Recommended)
+
+```glsl
+// Classic tetrahedron method, coefficient 0.5773 ~ 1/sqrt(3)
+vec3 calcNormal(vec3 pos) {
+    float eps = 0.0005;
+    vec2 e = vec2(1.0, -1.0) * 0.5773;
+    return normalize(
+        e.xyy * map(pos + e.xyy * eps) +
+        e.yyx * map(pos + e.yyx * eps) +
+        e.yxy * map(pos + e.yxy * eps) +
+        e.xxx * map(pos + e.xxx * eps)
+    );
+}
+```
+
+### Step 3: Apply to Lighting
+
+```glsl
+vec3 pos = ro + rd * t;        // hit point
+vec3 nor = calcNormal(pos);    // surface normal
+
+vec3 lightDir = normalize(vec3(1.0, 4.0, -4.0));
+float diff = max(dot(nor, lightDir), 0.0);
+vec3 col = vec3(0.8) * diff;
+```
+
+## Complete Code Template
+
+```glsl
+// SDF Normal Estimation — Complete ShaderToy Template
+
+#define MAX_STEPS 128
+#define MAX_DIST 100.0
+#define SURF_DIST 0.001
+#define NORMAL_METHOD 2      // 0=forward diff, 1=central diff, 2=tetrahedron
+
+// ---- SDF Scene Definition ----
+float map(vec3 p) {
+    float sphere = length(p - vec3(0.0, 1.0, 0.0)) - 1.0;
+    float ground = p.y;
+    return min(sphere, ground);
+}
+
+// ---- Normal Estimation ----
+
+vec3 normalForward(vec3 p) {
+    float eps = 0.001;
+    float d = map(p);
+    return normalize(vec3(
+        map(p + vec3(eps, 0.0, 0.0)),
+        map(p + vec3(0.0, eps, 0.0)),
+        map(p + vec3(0.0, 0.0, eps))
+    ) - d);
+}
+
+vec3 normalCentral(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+vec3 normalTetra(vec3 p) {
+    float eps = 0.0005;
+    vec2 e = vec2(1.0, -1.0) * 0.5773;
+    return normalize(
+        e.xyy * map(p + e.xyy * eps) +
+        e.yyx * map(p + e.yyx * eps) +
+        e.yxy * map(p + e.yxy * eps) +
+        e.xxx * map(p + e.xxx * eps)
+    );
+}
+
+vec3 calcNormal(vec3 p) {
+#if NORMAL_METHOD == 0
+    return normalForward(p);
+#elif NORMAL_METHOD == 1
+    return normalCentral(p);
+#else
+    return normalTetra(p);
+#endif
+}
+
+// ---- Raymarching ----
+float raymarch(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + rd * t;
+        float d = map(p);
+        if (d < SURF_DIST || t > MAX_DIST) break;
+        t += d;
+    }
+    return t;
+}
+
+// ---- Main Function ----
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+    vec3 ro = vec3(0.0, 2.0, -5.0);
+    vec3 rd = normalize(vec3(uv, 1.5));
+
+    float t = raymarch(ro, rd);
+    vec3 col = vec3(0.0);
+
+    if (t < MAX_DIST) {
+        vec3 pos = ro + rd * t;
+        vec3 nor = calcNormal(pos);
+
+        vec3 sunDir = normalize(vec3(0.8, 0.4, -0.6));
+        float diff = clamp(dot(nor, sunDir), 0.0, 1.0);
+        float amb = 0.5 + 0.5 * nor.y;
+        vec3 ref = reflect(rd, nor);
+        float spec = pow(clamp(dot(ref, sunDir), 0.0, 1.0), 16.0);
+
+        col = vec3(0.18) * amb + vec3(1.0, 0.95, 0.85) * diff + vec3(0.5) * spec;
+    } else {
+        col = vec3(0.5, 0.7, 1.0) - 0.5 * rd.y;
+    }
+
+    col = pow(col, vec3(0.4545));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: NuSan Reverse-Offset Forward Difference
+
+```glsl
+// Reverse-offset forward difference
+vec2 noff = vec2(0.001, 0.0);
+vec3 normal = normalize(
+    map(pos) - vec3(
+        map(pos - noff.xyy),
+        map(pos - noff.yxy),
+        map(pos - noff.yyx)
+    )
+);
+```
+
+### Variant 2: Adaptive Epsilon (Distance Scaling)
+
+```glsl
+// Adaptive epsilon based on ray distance
+vec3 calcNormal(vec3 pos, float t) {
+    float precis = 0.001 * t;
+    vec2 e = vec2(1.0, -1.0) * precis;
+    return normalize(
+        e.xyy * map(pos + e.xyy) +
+        e.yyx * map(pos + e.yyx) +
+        e.yxy * map(pos + e.yxy) +
+        e.xxx * map(pos + e.xxx)
+    );
+}
+```
+
+### Variant 3: Large Epsilon for Rounding / Anti-Aliasing
+
+```glsl
+// Large epsilon for rounding / anti-aliasing
+vec3 getNormal(vec3 p) {
+    vec2 e = vec2(0.015, -0.015); // intentionally large epsilon
+    return normalize(
+        e.xyy * map(p + e.xyy) +
+        e.yyx * map(p + e.yyx) +
+        e.yxy * map(p + e.yxy) +
+        e.xxx * map(p + e.xxx)
+    );
+}
+```
+
+### Variant 4: Anti-Inlining Loop
+
+```glsl
+// Anti-inlining loop — reduces compile time for complex SDFs
+#define ZERO (min(iFrame, 0))
+
+vec3 calcNormal(vec3 p, float t) {
+    vec3 n = vec3(0.0);
+    for (int i = ZERO; i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3(
+            (((i + 3) >> 1) & 1),
+            ((i >> 1) & 1),
+            (i & 1)
+        ) - 1.0);
+        n += e * map(p + e * 0.001 * t);
+    }
+    return normalize(n);
+}
+```
+
+### Variant 5: Normal + Edge Detection
+
+```glsl
+// Central difference + Laplacian edge detection
+float edge = 0.0;
+vec3 normal(vec3 p) {
+    vec3 e = vec3(0.0, det * 5.0, 0.0);
+
+    float d1 = de(p - e.yxx), d2 = de(p + e.yxx);
+    float d3 = de(p - e.xyx), d4 = de(p + e.xyx);
+    float d5 = de(p - e.xxy), d6 = de(p + e.xxy);
+    float d  = de(p);
+
+    edge = abs(d - 0.5 * (d2 + d1))
+         + abs(d - 0.5 * (d4 + d3))
+         + abs(d - 0.5 * (d6 + d5));
+    edge = min(1.0, pow(edge, 0.55) * 15.0);
+
+    return normalize(vec3(d1 - d2, d3 - d4, d5 - d6));
+}
+```
+
+## Performance & Composition
+
+**Performance**:
+- Default to tetrahedron method (4 samples, better accuracy than forward difference)
+- Only switch to central difference (6 samples) when jagged normal artifacts appear
+- Use anti-inlining loop (Variant 4) for complex SDFs to avoid compile time explosion
+- Epsilon recommended `0.0005 ~ 0.001`; best practice is adaptive `eps * t`
+- Too small (< 1e-5) produces floating-point noise; too large (> 0.05) loses detail
+- Reuse SDF sampling results when multiple types of information are needed at the same position (e.g., Variant 5)
+
+**Common combinations**:
+- **Normal + Soft Shadow**: `calcSoftShadow(pos + nor * 0.01, sunDir, 16.0)` -- normal offset at start point to avoid self-intersection
+- **Normal + AO**: Multi-step SDF sampling along the normal to estimate occlusion
+- **Normal + Fresnel**: `pow(clamp(1.0 + dot(nor, rd), 0.0, 1.0), 5.0)`
+- **Normal + Bump Mapping**: Overlay texture gradient perturbation on SDF normals
+- **Normal + Triplanar Mapping**: Use `abs(nor)` components as triplanar blend weights
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/normal-estimation.md)
--- a/skills/shader-dev/techniques/particle-system.md
+++ b/skills/shader-dev/techniques/particle-system.md
--- a/skills/shader-dev/techniques/path-tracing-gi.md
+++ b/skills/shader-dev/techniques/path-tracing-gi.md
@@ -0,0 +1,623 @@
+Path tracing requires multi-pass rendering: Buffer A traces and accumulates samples each frame (iChannel0=self), Image Pass reads accumulated data and applies tone mapping for display. Below is the JS skeleton for standalone HTML:
+
+### Standalone HTML Multi-Pass Template (Ping-Pong Accumulation)
+
+```html
+<!DOCTYPE html>
+<html>
+<head>
+    <meta charset="utf-8">
+    <title>Path Tracer</title>
+    <style>
+        body { margin: 0; overflow: hidden; background: #000; }
+        canvas { display: block; width: 100vw; height: 100vh; }
+    </style>
+</head>
+<body>
+<canvas id="c"></canvas>
+<script>
+let frameCount = 0;
+let mouse = [0, 0, 0, 0];
+
+const canvas = document.getElementById('c');
+const gl = canvas.getContext('webgl2');
+const ext = gl.getExtension('EXT_color_buffer_float');
+
+function createShader(type, src) {
+    const s = gl.createShader(type);
+    gl.shaderSource(s, src);
+    gl.compileShader(s);
+    if (!gl.getShaderParameter(s, gl.COMPILE_STATUS))
+        console.error(gl.getShaderInfoLog(s));
+    return s;
+}
+function createProgram(vsSrc, fsSrc) {
+    const p = gl.createProgram();
+    gl.attachShader(p, createShader(gl.VERTEX_SHADER, vsSrc));
+    gl.attachShader(p, createShader(gl.FRAGMENT_SHADER, fsSrc));
+    gl.linkProgram(p);
+    return p;
+}
+
+const vsSource = `#version 300 es
+in vec2 pos;
+void main(){ gl_Position=vec4(pos,0,1); }`;
+
+// fsBuffer: path tracing + accumulation (see "Complete Code Template - Buffer A" below)
+// fsImage:  ACES tone mapping + gamma (see "Complete Code Template - Image Pass" below)
+const progBuf = createProgram(vsSource, fsBuffer);
+const progImg = createProgram(vsSource, fsImage);
+
+function createFBO(w, h) {
+    const tex = gl.createTexture();
+    gl.bindTexture(gl.TEXTURE_2D, tex);
+    // Key: check float texture extension, fall back to RGBA8 if not supported
+    // Path tracing accumulation needs high precision, but RGBA8 works too (with slight banding)
+    const fmt = ext ? gl.RGBA16F : gl.RGBA;
+    const typ = ext ? gl.FLOAT : gl.UNSIGNED_BYTE;
+    gl.texImage2D(gl.TEXTURE_2D, 0, fmt, w, h, 0, gl.RGBA, typ, null);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
+    const fbo = gl.createFramebuffer();
+    gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+    gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex, 0);
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+    return { fbo, tex };
+}
+
+let W, H, bufA, bufB;
+
+const vao = gl.createVertexArray();
+gl.bindVertexArray(vao);
+const vbo = gl.createBuffer();
+gl.bindBuffer(gl.ARRAY_BUFFER, vbo);
+gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([-1,-1, 1,-1, -1,1, 1,1]), gl.STATIC_DRAW);
+gl.enableVertexAttribArray(0);
+gl.vertexAttribPointer(0, 2, gl.FLOAT, false, 0, 0);
+
+function resize() {
+    canvas.width = W = innerWidth;
+    canvas.height = H = innerHeight;
+    bufA = createFBO(W, H);
+    bufB = createFBO(W, H);
+    frameCount = 0;
+}
+addEventListener('resize', resize);
+resize();
+
+canvas.addEventListener('mousedown', e => { mouse[2] = e.clientX; mouse[3] = H - e.clientY; });
+canvas.addEventListener('mouseup', () => { mouse[2] = 0; mouse[3] = 0; });
+canvas.addEventListener('mousemove', e => { mouse[0] = e.clientX; mouse[1] = H - e.clientY; });
+
+function render(t) {
+    t *= 0.001;
+    // Buffer pass: read bufA (previous frame accumulation) -> write bufB (current frame accumulation)
+    gl.useProgram(progBuf);
+    gl.bindFramebuffer(gl.FRAMEBUFFER, bufB.fbo);
+    gl.viewport(0, 0, W, H);
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, bufA.tex);
+    gl.uniform1i(gl.getUniformLocation(progBuf, 'iChannel0'), 0);
+    gl.uniform2f(gl.getUniformLocation(progBuf, 'iResolution'), W, H);
+    gl.uniform1f(gl.getUniformLocation(progBuf, 'iTime'), t);
+    gl.uniform1i(gl.getUniformLocation(progBuf, 'iFrame'), frameCount);
+    gl.uniform4f(gl.getUniformLocation(progBuf, 'iMouse'), ...mouse);
+    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
+    [bufA, bufB] = [bufB, bufA];
+
+    // Image pass: read bufA (accumulated result) -> screen (tone mapped)
+    gl.useProgram(progImg);
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+    gl.viewport(0, 0, W, H);
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, bufA.tex);
+    gl.uniform1i(gl.getUniformLocation(progImg, 'iChannel0'), 0);
+    gl.uniform2f(gl.getUniformLocation(progImg, 'iResolution'), W, H);
+    gl.uniform1f(gl.getUniformLocation(progImg, 'iTime'), t);
+    gl.uniform1i(gl.getUniformLocation(progImg, 'iFrame'), frameCount);
+    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
+
+    frameCount++;
+    requestAnimationFrame(render);
+}
+requestAnimationFrame(render);
+</script>
+</body>
+</html>
+```
+
+# Path Tracing & Global Illumination
+
+## Use Cases
+- Physically accurate global illumination: indirect lighting, color bleeding, caustics
+- Complex light transport with reflection, refraction, and diffuse interreflection
+- Progressive high-quality rendering with multi-frame accumulation in ShaderToy
+- Scenes requiring precise light interactions such as Cornell Box and glassware
+
+## Core Principles
+
+Path tracing solves the rendering equation via Monte Carlo methods. For each pixel, a ray is cast from the camera and bounced through the scene; at each bounce: intersect -> shade -> sample next direction -> accumulate contribution.
+
+Core formulas:
+- **Rendering equation**: $L_o = L_e + \int f_r \cdot L_i \cdot \cos\theta \, d\omega$
+- **MC estimate**: $L \approx \frac{1}{N} \sum \frac{f_r \cdot L_i \cdot \cos\theta}{p(\omega)}$
+- **Schlick Fresnel**: $F = F_0 + (1 - F_0)(1 - \cos\theta)^5$
+- **Cosine-weighted PDF**: $p(\omega) = \cos\theta / \pi$
+
+Use iterative loops instead of recursion: `acc` (accumulated radiance) and `throughput` (path attenuation) track path contributions.
+
+## Implementation Steps
+
+### Step 1: PRNG
+```glsl
+// Integer hash (recommended, good quality)
+int iSeed;
+int irand() { iSeed = iSeed * 0x343fd + 0x269ec3; return (iSeed >> 16) & 32767; }
+float frand() { return float(irand()) / 32767.0; }
+void srand(ivec2 p, int frame) {
+    int n = frame;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    n += p.y;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    n += p.x;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    iSeed = n;
+}
+
+// Alternative: sin-hash (simpler)
+float seed;
+float rand() { return fract(sin(seed++) * 43758.5453123); }
+```
+
+### Step 2: Ray-Scene Intersection
+```glsl
+// Analytic sphere intersection
+struct Ray { vec3 o, d; };
+struct Sphere { float r; vec3 p, e, c; int refl; };
+
+float iSphere(Sphere s, Ray r) {
+    vec3 op = s.p - r.o;
+    float b = dot(op, r.d);
+    float det = b * b - dot(op, op) + s.r * s.r;
+    if (det < 0.) return 0.;
+    det = sqrt(det);
+    float t = b - det;
+    if (t > 1e-3) return t;
+    t = b + det;
+    return t > 1e-3 ? t : 0.;
+}
+
+// SDF ray marching (complex geometry)
+float map(vec3 p) { /* return distance to nearest surface */ }
+float raymarch(vec3 ro, vec3 rd, float tmax) {
+    float t = 0.01;
+    for (int i = 0; i < 256; i++) {
+        float h = map(ro + rd * t);
+        if (abs(h) < 0.0001 || t > tmax) break;
+        t += h;
+    }
+    return t;
+}
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.0001, 0.);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)));
+}
+```
+
+### Step 3: Cosine-Weighted Hemisphere Sampling
+```glsl
+// fizzer method (most concise)
+vec3 cosineDirection(vec3 n) {
+    float u = frand(), v = frand();
+    float a = 6.2831853 * v;
+    float b = 2.0 * u - 1.0;
+    vec3 dir = vec3(sqrt(1.0 - b * b) * vec2(cos(a), sin(a)), b);
+    return normalize(n + dir);
+}
+
+// ONB construction method (more intuitive)
+vec3 cosineDirectionONB(vec3 n) {
+    vec2 r = vec2(frand(), frand());
+    vec3 u = normalize(cross(n, vec3(0., 1., 1.)));
+    vec3 v = cross(u, n);
+    float ra = sqrt(r.y);
+    return normalize(ra * cos(6.2831853 * r.x) * u + ra * sin(6.2831853 * r.x) * v + sqrt(1.0 - r.y) * n);
+}
+```
+
+### Step 4: Materials and BRDF
+```glsl
+#define MAT_DIFF 0
+#define MAT_SPEC 1
+#define MAT_REFR 2
+
+// Diffuse: throughput *= albedo; dir = cosineDirection(nl)
+// Specular: throughput *= albedo; dir = reflect(rd, n)
+
+// Refraction (glass)
+void handleDielectric(inout Ray r, vec3 n, vec3 x, float ior, vec3 albedo, inout vec3 mask) {
+    float a = dot(n, r.d), ddn = abs(a);
+    float nnt = mix(1.0 / ior, ior, float(a > 0.));
+    float cos2t = 1. - nnt * nnt * (1. - ddn * ddn);
+    r = Ray(x, reflect(r.d, n));
+    if (cos2t > 0.) {
+        vec3 tdir = normalize(r.d * nnt + sign(a) * n * (ddn * nnt + sqrt(cos2t)));
+        float R0 = (ior - 1.) * (ior - 1.) / ((ior + 1.) * (ior + 1.));
+        float c = 1. - mix(ddn, dot(tdir, n), float(a > 0.));
+        float Re = R0 + (1. - R0) * c * c * c * c * c;
+        float P = .25 + .5 * Re;
+        if (frand() < P) { mask *= Re / P; }
+        else { mask *= albedo * (1. - Re) / (1. - P); r = Ray(x, tdir); }
+    }
+}
+```
+
+### Step 5: Direct Light Sampling (NEE)
+```glsl
+// Spherical light solid angle sampling
+vec3 coneSample(vec3 d, float phi, float sina, float cosa) {
+    vec3 w = normalize(d);
+    vec3 u = normalize(cross(w.yzx, w));
+    vec3 v = cross(w, u);
+    return (u * cos(phi) + v * sin(phi)) * sina + w * cosa;
+}
+
+// Called at diffuse shading points:
+vec3 l0 = lightPos - x;
+float cos_a_max = sqrt(1. - clamp(lightR * lightR / dot(l0, l0), 0., 1.));
+float cosa = mix(cos_a_max, 1., frand());
+vec3 l = coneSample(l0, 6.2831853 * frand(), sqrt(1. - cosa * cosa), cosa);
+// After shadow test passes:
+float omega = 6.2831853 * (1. - cos_a_max);
+vec3 directLight = lightEmission * clamp(dot(l, nl), 0., 1.) * omega / PI;
+```
+
+### Step 6: Path Tracing Main Loop
+```glsl
+#define MAX_BOUNCES 8
+
+vec3 pathtrace(Ray r) {
+    vec3 acc = vec3(0.), throughput = vec3(1.);
+    for (int depth = 0; depth < MAX_BOUNCES; depth++) {
+        // 1. Intersect
+        float t; vec3 n, albedo, emission; int matType;
+        if (!intersectScene(r, t, n, albedo, emission, matType)) break;
+        vec3 x = r.o + r.d * t;
+        vec3 nl = dot(n, r.d) < 0. ? n : -n;
+
+        // 2. Accumulate self-emission
+        acc += throughput * emission;
+
+        // 3. Russian roulette (starting from bounce 3)
+        if (depth > 2) {
+            float p = max(throughput.r, max(throughput.g, throughput.b));
+            if (frand() > p) break;
+            throughput /= p;
+        }
+
+        // 4. Material branching
+        if (matType == MAT_DIFF) {
+            acc += throughput * directLighting(x, nl, albedo, ...); // NEE
+            throughput *= albedo;
+            r = Ray(x + nl * 1e-3, cosineDirection(nl));
+        } else if (matType == MAT_SPEC) {
+            throughput *= albedo;
+            r = Ray(x + nl * 1e-3, reflect(r.d, n));
+        } else {
+            handleDielectric(r, n, x, 1.5, albedo, throughput);
+        }
+    }
+    return acc;
+}
+```
+
+### Step 7: Progressive Accumulation and Display
+```glsl
+// Buffer A: path tracing + accumulation
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    srand(ivec2(fragCoord), iFrame);
+    // ... camera setup, ray generation ...
+    vec3 color = pathtrace(ray);
+    vec4 prev = texelFetch(iChannel0, ivec2(fragCoord), 0);
+    if (iFrame == 0) prev = vec4(0.);
+    fragColor = prev + vec4(color, 1.0);
+}
+
+// Image Pass: ACES tone mapping + Gamma
+vec3 ACES(vec3 x) {
+    float a = 2.51, b = 0.03, c = 2.43, d = 0.59, e = 0.14;
+    return (x * (a * x + b)) / (x * (c * x + d) + e);
+}
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec4 data = texelFetch(iChannel0, ivec2(fragCoord), 0);
+    vec3 col = data.rgb / max(data.a, 1.0);
+    col = ACES(col);
+    col = pow(clamp(col, 0., 1.), vec3(1.0 / 2.2));
+    vec2 uv = fragCoord / iResolution.xy;
+    col *= 0.5 + 0.5 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.1);
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Complete Code Template
+
+ShaderToy dual pass: Buffer A (path tracing + accumulation, iChannel0=self), Image (display).
+
+**Buffer A:**
+```glsl
+#define PI 3.14159265359
+#define MAX_BOUNCES 6
+#define SAMPLES_PER_FRAME 2
+#define NUM_SPHERES 9
+#define IOR_GLASS 1.5
+#define ENABLE_NEE
+
+#define MAT_DIFF 0
+#define MAT_SPEC 1
+#define MAT_REFR 2
+
+int iSeed;
+int irand() { iSeed = iSeed * 0x343fd + 0x269ec3; return (iSeed >> 16) & 32767; }
+float frand() { return float(irand()) / 32767.0; }
+void srand(ivec2 p, int frame) {
+    int n = frame;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    n += p.y;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    n += p.x;
+    n = (n << 13) ^ n; n = n * (n * n * 15731 + 789221) + 1376312589;
+    iSeed = n;
+}
+
+struct Ray { vec3 o, d; };
+struct Sphere { float r; vec3 p, e, c; int refl; };
+
+Sphere spheres[NUM_SPHERES];
+void initScene() {
+    spheres[0] = Sphere(1e5,  vec3(-1e5+1., 40.8, 81.6),   vec3(0.),  vec3(.75,.25,.25), MAT_DIFF);
+    spheres[1] = Sphere(1e5,  vec3( 1e5+99., 40.8, 81.6),  vec3(0.),  vec3(.25,.25,.75), MAT_DIFF);
+    spheres[2] = Sphere(1e5,  vec3(50., 40.8, -1e5),        vec3(0.),  vec3(.75),         MAT_DIFF);
+    spheres[3] = Sphere(1e5,  vec3(50., 40.8, 1e5+170.),    vec3(0.),  vec3(0.),          MAT_DIFF);
+    spheres[4] = Sphere(1e5,  vec3(50., -1e5, 81.6),        vec3(0.),  vec3(.75),         MAT_DIFF);
+    spheres[5] = Sphere(1e5,  vec3(50., 1e5+81.6, 81.6),    vec3(0.),  vec3(.75),         MAT_DIFF);
+    spheres[6] = Sphere(16.5, vec3(27., 16.5, 47.),         vec3(0.),  vec3(1.),          MAT_SPEC);
+    spheres[7] = Sphere(16.5, vec3(73., 16.5, 78.),         vec3(0.),  vec3(.7,1.,.9),    MAT_REFR);
+    spheres[8] = Sphere(600., vec3(50., 681.33, 81.6),      vec3(12.), vec3(0.),           MAT_DIFF);
+}
+
+float iSphere(Sphere s, Ray r) {
+    vec3 op = s.p - r.o;
+    float b = dot(op, r.d);
+    float det = b * b - dot(op, op) + s.r * s.r;
+    if (det < 0.) return 0.;
+    det = sqrt(det);
+    float t = b - det;
+    if (t > 1e-3) return t;
+    t = b + det;
+    return t > 1e-3 ? t : 0.;
+}
+
+int intersect(Ray r, out float t, out Sphere s, int avoid) {
+    int id = -1; t = 1e5;
+    for (int i = 0; i < NUM_SPHERES; ++i) {
+        float d = iSphere(spheres[i], r);
+        if (i != avoid && d > 0. && d < t) { t = d; id = i; s = spheres[i]; }
+    }
+    return id;
+}
+
+vec3 cosineDirection(vec3 n) {
+    float u = frand(), v = frand();
+    float a = 6.2831853 * v;
+    float b = 2.0 * u - 1.0;
+    vec3 dir = vec3(sqrt(1.0 - b * b) * vec2(cos(a), sin(a)), b);
+    return normalize(n + dir);
+}
+
+vec3 coneSample(vec3 d, float phi, float sina, float cosa) {
+    vec3 w = normalize(d);
+    vec3 u = normalize(cross(w.yzx, w));
+    vec3 v = cross(w, u);
+    return (u * cos(phi) + v * sin(phi)) * sina + w * cosa;
+}
+
+vec3 radiance(Ray r) {
+    vec3 acc = vec3(0.), mask = vec3(1.);
+    int id = -1;
+    for (int depth = 0; depth < MAX_BOUNCES; ++depth) {
+        float t; Sphere obj;
+        if ((id = intersect(r, t, obj, id)) < 0) break;
+        vec3 x = r.o + r.d * t;
+        vec3 n = normalize(x - obj.p);
+        vec3 nl = n * sign(-dot(n, r.d));
+
+        if (depth > 3) {
+            float p = max(obj.c.r, max(obj.c.g, obj.c.b));
+            if (frand() > p) { acc += mask * obj.e; break; }
+            mask /= p;
+        }
+
+        if (obj.refl == MAT_DIFF) {
+            vec3 d = cosineDirection(nl);
+            vec3 e = vec3(0.);
+            #ifdef ENABLE_NEE
+            {
+                Sphere ls = spheres[8];
+                vec3 l0 = ls.p - x;
+                float cos_a_max = sqrt(1. - clamp(ls.r * ls.r / dot(l0, l0), 0., 1.));
+                float cosa = mix(cos_a_max, 1., frand());
+                vec3 l = coneSample(l0, 6.2831853 * frand(), sqrt(1. - cosa * cosa), cosa);
+                float st; Sphere dummy;
+                if (intersect(Ray(x, l), st, dummy, id) == 8) {
+                    float omega = 6.2831853 * (1. - cos_a_max);
+                    e = ls.e * clamp(dot(l, nl), 0., 1.) * omega / PI;
+                }
+            }
+            #endif
+            acc += mask * obj.e + mask * obj.c * e;
+            mask *= obj.c;
+            r = Ray(x + nl * 1e-3, d);
+        } else if (obj.refl == MAT_SPEC) {
+            acc += mask * obj.e;
+            mask *= obj.c;
+            r = Ray(x + nl * 1e-3, reflect(r.d, n));
+        } else {
+            acc += mask * obj.e;
+            float a = dot(n, r.d), ddn = abs(a);
+            float nc = 1., nt = IOR_GLASS;
+            float nnt = mix(nc / nt, nt / nc, float(a > 0.));
+            float cos2t = 1. - nnt * nnt * (1. - ddn * ddn);
+            r = Ray(x, reflect(r.d, n));
+            if (cos2t > 0.) {
+                vec3 tdir = normalize(r.d * nnt + sign(a) * n * (ddn * nnt + sqrt(cos2t)));
+                float R0 = (nt - nc) * (nt - nc) / ((nt + nc) * (nt + nc));
+                float c = 1. - mix(ddn, dot(tdir, n), float(a > 0.));
+                float Re = R0 + (1. - R0) * c * c * c * c * c;
+                float P = .25 + .5 * Re;
+                if (frand() < P) { mask *= Re / P; }
+                else { mask *= obj.c * (1. - Re) / (1. - P); r = Ray(x, tdir); }
+            }
+        }
+    }
+    return acc;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    initScene();
+    srand(ivec2(fragCoord), iFrame);
+    vec2 uv = 2. * fragCoord / iResolution.xy - 1.;
+    vec3 camPos = vec3(50., 40.8, 169.);
+    vec3 cz = normalize(vec3(50., 40., 81.6) - camPos);
+    vec3 cx = vec3(1., 0., 0.);
+    vec3 cy = normalize(cross(cx, cz));
+    cx = cross(cz, cy);
+    vec3 color = vec3(0.);
+    for (int i = 0; i < SAMPLES_PER_FRAME; i++) {
+        vec2 jitter = vec2(frand(), frand()) - 0.5;
+        vec2 suv = uv + jitter * 2.0 / iResolution.xy;
+        float fov = 0.53135;
+        vec3 rd = normalize(fov * (iResolution.x / iResolution.y * suv.x * cx + suv.y * cy) + cz);
+        color += radiance(Ray(camPos, rd));
+    }
+    vec4 prev = texelFetch(iChannel0, ivec2(fragCoord), 0);
+    if (iFrame == 0) prev = vec4(0.);
+    fragColor = prev + vec4(color, float(SAMPLES_PER_FRAME));
+}
+```
+
+**Image Pass** (iChannel0 = Buffer A):
+```glsl
+vec3 ACES(vec3 x) {
+    float a = 2.51, b = 0.03, c = 2.43, d = 0.59, e = 0.14;
+    return (x * (a * x + b)) / (x * (c * x + d) + e);
+}
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec4 data = texelFetch(iChannel0, ivec2(fragCoord), 0);
+    vec3 col = data.rgb / max(data.a, 1.0);
+    col = ACES(col);
+    col = pow(clamp(col, 0., 1.), vec3(1.0 / 2.2));
+    vec2 uv = fragCoord / iResolution.xy;
+    col *= 0.5 + 0.5 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.1);
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### 1. SDF Scene Path Tracing
+```glsl
+float map(vec3 p) {
+    float d = p.y + 0.5;
+    d = min(d, length(p - vec3(0., 0.4, 0.)) - 0.4);
+    return d;
+}
+float intersectScene(vec3 ro, vec3 rd, float tmax) {
+    float t = 0.01;
+    for (int i = 0; i < 128; i++) {
+        float h = map(ro + rd * t);
+        if (h < 0.0001 || t > tmax) break;
+        t += h;
+    }
+    return t < tmax ? t : -1.0;
+}
+```
+
+### 2. Disney BRDF Path Tracing
+```glsl
+struct Material { vec3 albedo; float metallic, roughness; };
+
+float D_GGX(float a2, float NoH) {
+    float d = NoH * NoH * (a2 - 1.0) + 1.0;
+    return a2 / (PI * d * d);
+}
+float G_Smith(float NoV, float NoL, float a2) {
+    float g1 = (2.0 * NoV) / (NoV + sqrt(a2 + (1.0 - a2) * NoV * NoV));
+    float g2 = (2.0 * NoL) / (NoL + sqrt(a2 + (1.0 - a2) * NoL * NoL));
+    return g1 * g2;
+}
+vec3 SampleGGXVNDF(vec3 V, float ax, float ay, float r1, float r2) {
+    vec3 Vh = normalize(vec3(ax * V.x, ay * V.y, V.z));
+    float lensq = Vh.x * Vh.x + Vh.y * Vh.y;
+    vec3 T1 = lensq > 0. ? vec3(-Vh.y, Vh.x, 0) * inversesqrt(lensq) : vec3(1, 0, 0);
+    vec3 T2 = cross(Vh, T1);
+    float r = sqrt(r1), phi = 2.0 * PI * r2;
+    float t1 = r * cos(phi), t2 = r * sin(phi);
+    float s = 0.5 * (1.0 + Vh.z);
+    t2 = (1.0 - s) * sqrt(1.0 - t1 * t1) + s * t2;
+    vec3 Nh = t1 * T1 + t2 * T2 + sqrt(max(0., 1. - t1*t1 - t2*t2)) * Vh;
+    return normalize(vec3(ax * Nh.x, ay * Nh.y, max(0., Nh.z)));
+}
+```
+
+### 3. Depth of Field
+```glsl
+#define APERTURE 0.12
+#define FOCUS_DIST 8.0
+
+vec2 uniformDisk() {
+    vec2 r = vec2(frand(), frand());
+    return sqrt(r.y) * vec2(cos(6.2831853 * r.x), sin(6.2831853 * r.x));
+}
+// After generating the ray:
+vec3 focalPoint = ro + rd * FOCUS_DIST;
+vec3 offset = ca * vec3(uniformDisk() * APERTURE, 0.);
+ro += offset;
+rd = normalize(focalPoint - ro);
+```
+
+### 4. MIS (Multiple Importance Sampling)
+```glsl
+float misWeight(float pdfA, float pdfB) {
+    float a2 = pdfA * pdfA, b2 = pdfB * pdfB;
+    return a2 / (a2 + b2);
+}
+// BRDF sample hits light -> misWeight(brdfPdf, lightPdf)
+// Light sample -> misWeight(lightPdf, brdfPdf)
+```
+
+### 5. Volumetric Path Tracing (Participating Media)
+```glsl
+vec3 transmittance = exp(-extinction * distance);
+float scatterDist = -log(frand()) / extinctionMajorant;
+if (scatterDist < hitDist) {
+    pos += ray.d * scatterDist;
+    ray.d = uniformSphereSample(); // or Henyey-Greenstein
+    throughput *= albedo;
+}
+```
+
+## Performance & Composition
+
+- 1-4 spp per frame + inter-frame accumulation for convergence; Russian roulette from bounce 3-4, survival probability = max throughput component
+- NEE significantly accelerates small light sources; offset along normal by 1e-3~1e-4 or record hit ID to prevent self-intersection
+- `min(color, 10.)` to prevent fireflies; SDF limited to 128-256 steps + reasonable tmax; integer hash preferred over sin-hash
+- **Composition**: SDF modeling / HDR environment maps / Disney BRDF (GGX+VNDF) / volume rendering (Beer-Lambert) / spectral rendering (Sellmeier+CIE XYZ) / TAA (temporal reprojection)
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/path-tracing-gi.md)
--- a/skills/shader-dev/techniques/polar-uv-manipulation.md
+++ b/skills/shader-dev/techniques/polar-uv-manipulation.md
@@ -0,0 +1,373 @@
+## WebGL2 Adaptation Requirements
+
+Code templates in this document use ShaderToy GLSL style. When generating standalone HTML pages, you must adapt to WebGL2:
+
+- Use `canvas.getContext("webgl2")`
+- **IMPORTANT: Version directive must strictly be on the first line**: When injecting shader code into HTML, ensure nothing precedes `#version 300 es` — no newlines, spaces, comments, or other characters. Common pitfall: accidentally adding `\n` when concatenating template strings, causing the version directive to appear on line 2-3
+- First line of shader: `#version 300 es`, add `precision highp float;` for fragment shaders
+- Vertex shader: `attribute` → `in`, `varying` → `out`
+- Fragment shader: `varying` → `in`, `gl_FragColor` → custom `out vec4 fragColor`, `texture2D()` → `texture()`
+- ShaderToy's `void mainImage(out vec4 fragColor, in vec2 fragCoord)` must be adapted to standard `void main()` entry
+
+**IMPORTANT: GLSL Type Strictness Warning**:
+- `vec2 = float` is illegal: types must match exactly, e.g., `float r = length(uv)` not `vec2 r = length(uv)`
+- Function return types must match: commonly used `fbm()` / `noise()` return `float`, cannot be assigned to `vec2`
+- If you need a vec2 type, use `vec2(fbm(...), fbm(...))` or `vec2(value)` constructor
+
+# Polar Coordinates & UV Manipulation
+
+## Use Cases
+- Radially symmetric effects: flowers, kaleidoscopes, gears, radial patterns
+- Spiral patterns: galaxies, vortices, spiral staircases
+- Ring/tunnel effects: tube flying, torus twisting, circular UI elements
+- Polar coordinate shapes: cardioid, rose curves, stars, and other shapes defined by r(θ)
+- Vortex animations: swirls, rotational warping, card game backgrounds (e.g., Balatro)
+- Fractal/repetitive structures: recursive symmetric patterns based on angular subdivision
+
+## Core Principles
+
+Polar coordinates convert (x, y) to (r, θ):
+- **r = length(p)** — distance to origin
+- **θ = atan(y, x)** — angle from positive x-axis, range [-π, π]
+
+Inverse transform: x = r·cos(θ), y = r·sin(θ)
+
+Manipulation effects:
+- Modifying θ → rotation, warping, kaleidoscope
+- Modifying r → scaling, radial ripples
+- θ += f(r) → spiral effect
+
+| Spiral Type | Equation | Code |
+|------------|----------|------|
+| Archimedean spiral | r = a + bθ | `theta += radius` |
+| Logarithmic spiral | r = ae^(bθ) | `theta += log(radius)` |
+| Rose curve | r = cos(nθ) | `r - A*sin(n*theta)` |
+
+## Implementation Steps
+
+### Step 1: UV Normalization and Centering
+```glsl
+// Range [-1, 1], most commonly used
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / min(iResolution.x, iResolution.y);
+
+// Range [-aspect, aspect] x [-1, 1]
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Pixelated style (Balatro style)
+float pixel_size = length(iResolution.xy) / PIXEL_FILTER;
+vec2 uv = (floor(fragCoord * (1.0/pixel_size)) * pixel_size - 0.5*iResolution.xy) / length(iResolution.xy);
+```
+
+### Step 2: Cartesian → Polar Coordinates
+```glsl
+float r = length(uv);
+float theta = atan(uv.y, uv.x); // [-PI, PI]
+
+// Reusable function
+vec2 toPolar(vec2 p) { return vec2(length(p), atan(p.y, p.x)); }
+
+// Normalized angle to [0, 1]
+vec2 polar = vec2(atan(uv.y, uv.x) / 6.283 + 0.5, length(uv));
+```
+
+### Step 3: Polar Space Operations
+
+**3a. Radial Swirl**
+```glsl
+float spin_amount = 0.25;
+float new_theta = theta - spin_amount * 20.0 * r;
+```
+
+**3b. Angular Twist**
+```glsl
+float twist_angle = theta + 2.0 * iTime + sin(theta) * sin(iTime) * 3.14159;
+```
+
+**3c. Archimedean Spiral**
+```glsl
+vec2 spiral_uv = vec2(theta_normalized, r);
+spiral_uv.y -= spiral_uv.x; // Unwrap into spiral band
+```
+
+**3d. Logarithmic Spiral**
+```glsl
+float shear = 2.0 * log(r);
+float c = cos(shear), s = sin(shear);
+mat2 spiral_mat = mat2(c, -s, s, c);
+```
+
+**3e. Kaleidoscope**
+```glsl
+float rep = 12.0;          // Number of symmetry axes
+float sector = TAU / rep;
+float a = polar.y;
+float c_idx = floor((a + sector * 0.5) / sector);
+a = mod(a + sector * 0.5, sector) - sector * 0.5;
+a *= mod(c_idx, 2.0) * 2.0 - 1.0; // Mirror
+```
+
+**3f. Spiral Arm Compression**
+```glsl
+float NB_ARMS = 5.0;
+float COMPR = 0.1;
+float phase = NB_ARMS * (theta - shear);
+theta = theta - COMPR * cos(phase);
+float arm_density = 1.0 + NB_ARMS * COMPR * sin(phase);
+```
+
+### Step 4: Polar → Cartesian Reconstruction
+```glsl
+vec2 new_uv = vec2(r * cos(new_theta), r * sin(new_theta));
+
+vec2 toRect(vec2 p) { return vec2(p.x * cos(p.y), p.x * sin(p.y)); }
+
+// Balatro-style round-trip (offset to screen center)
+vec2 mid = (iResolution.xy / length(iResolution.xy)) / 2.0;
+vec2 warped_uv = vec2(r * cos(new_theta) + mid.x, r * sin(new_theta) + mid.y) - mid;
+```
+
+### Step 5: Polar Coordinate Shape SDF
+```glsl
+// Cardioid
+float a = atan(p.x, p.y) / 3.141593; // atan(x,y) makes the heart face upward
+float h = abs(a);
+float heart_r = (13.0*h - 22.0*h*h + 10.0*h*h*h) / (6.0 - 5.0*h);
+float dist = r - heart_r;
+
+// Rose curve
+float rose_dist = abs(r - A_coeff * sin(PETAL_FREQ * theta) - 0.5);
+
+// Rendering
+float shape = smoothstep(0.01, -0.01, dist);
+```
+
+### Step 6: Coloring and Anti-Aliasing
+```glsl
+// fwidth adaptive anti-aliasing
+float aa = smoothstep(-1.0, 1.0, value / fwidth(value));
+
+// Resolution-based anti-aliasing
+float aa_size = 2.0 / iResolution.y;
+float edge = smoothstep(0.5 - aa_size, 0.5 + aa_size, value);
+
+// Radial gradient coloring
+vec3 color = vec3(1.0, 0.4 * r, 0.3);
+color *= 1.0 - 0.4 * r;
+
+// Inter-spiral-band anti-aliasing
+float inter_spiral_aa = 1.0 - pow(abs(2.0 * fract(spiral_uv.y) - 1.0), 10.0);
+```
+
+## Complete Code Template
+
+```glsl
+// === Polar Coordinates & UV Manipulation Complete Template ===
+// Paste directly into ShaderToy to run
+
+#define PI 3.14159265359
+#define TAU 6.28318530718
+
+// ===== Adjustable Parameters =====
+#define MODE 0            // 0=swirl, 1=spiral, 2=kaleidoscope, 3=rose curve
+#define SPIRAL_TYPE 0     // 0=Archimedean, 1=logarithmic (MODE=1)
+#define NUM_ARMS 5.0      // Number of spiral arms (MODE=1)
+#define KALEID_SEGMENTS 6.0 // Kaleidoscope segments (MODE=2)
+#define PETAL_COUNT 5.0   // Number of petals (MODE=3)
+#define SWIRL_STRENGTH 3.0 // Swirl intensity (MODE=0)
+#define ANIM_SPEED 1.0    // Animation speed
+#define COLOR_SCHEME 0    // 0=warm, 1=cool, 2=rainbow
+
+vec2 toPolar(vec2 p) {
+    return vec2(length(p), atan(p.y, p.x));
+}
+
+vec2 toRect(vec2 p) {
+    return vec2(p.x * cos(p.y), p.x * sin(p.y));
+}
+
+vec2 kaleidoscope(vec2 polar, float segments) {
+    float sector = TAU / segments;
+    float a = polar.y;
+    float c = floor((a + sector * 0.5) / sector);
+    a = mod(a + sector * 0.5, sector) - sector * 0.5;
+    a *= mod(c, 2.0) * 2.0 - 1.0;
+    return vec2(polar.x, a);
+}
+
+vec3 getColor(float t, int scheme) {
+    if (scheme == 1) return 0.5 + 0.5 * cos(TAU * (t + vec3(0.0, 0.33, 0.67)));
+    if (scheme == 2) return 0.5 + 0.5 * cos(TAU * t + vec3(0.0, 2.1, 4.2));
+    return vec3(1.0, 0.4 + 0.4 * cos(t * TAU), 0.3 + 0.2 * sin(t * TAU));
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / min(iResolution.x, iResolution.y);
+    vec2 polar = toPolar(uv);
+    float r = polar.x;
+    float theta = polar.y;
+    float t = iTime * ANIM_SPEED;
+    vec3 col = vec3(0.0);
+    float aa = 2.0 / iResolution.y;
+
+    #if MODE == 0
+    // --- Swirl mode ---
+    float swirl_theta = theta - SWIRL_STRENGTH * r + t;
+    vec2 warped = toRect(vec2(r, swirl_theta));
+    warped *= 10.0;
+    float pattern = sin(warped.x) * cos(warped.y);
+    pattern += 0.5 * sin(2.0 * warped.x + t) * cos(2.0 * warped.y - t);
+    float val = smoothstep(-0.1, 0.1, pattern);
+    col = mix(
+        getColor(r * 0.5, COLOR_SCHEME),
+        getColor(r * 0.5 + 0.5, COLOR_SCHEME),
+        val
+    );
+    col *= exp(-r * 0.5);
+
+    #elif MODE == 1
+    // --- Spiral mode ---
+    #if SPIRAL_TYPE == 0
+        float spiral = theta / TAU + 0.5;
+        float bands = spiral + r;
+        bands -= t * 0.1;
+        float arm = fract(bands * NUM_ARMS);
+    #else
+        float shear = 2.0 * log(max(r, 0.001));
+        float phase = NUM_ARMS * (theta - shear);
+        float arm = 0.5 + 0.5 * cos(phase);
+        arm *= 1.0 + NUM_ARMS * 0.1 * sin(phase);
+    #endif
+    float brightness = smoothstep(0.0, 0.4, arm) * smoothstep(1.0, 0.6, arm);
+    col = getColor(theta / TAU + t * 0.05, COLOR_SCHEME) * brightness;
+    col *= exp(-r * r * 0.5);
+    col += 0.15 * exp(-r * r * 8.0);
+
+    #elif MODE == 2
+    // --- Kaleidoscope mode ---
+    vec2 kp = kaleidoscope(polar, KALEID_SEGMENTS);
+    vec2 rect = toRect(kp);
+    rect *= 4.0;
+    rect += vec2(t * 0.3, 0.0);
+    vec2 cell_id = floor(rect + 0.5);
+    vec2 cell_uv = fract(rect + 0.5) - 0.5;
+    float cell_hash = fract(sin(dot(cell_id, vec2(127.1, 311.7))) * 43758.5453);
+    float d = length(cell_uv);
+    float truchet = abs(d - 0.35);
+    if (cell_hash > 0.5) {
+        truchet = min(truchet, abs(length(cell_uv - 0.5) - 0.5));
+    } else {
+        truchet = min(truchet, abs(length(cell_uv + 0.5) - 0.5));
+    }
+    col = getColor(cell_hash + r * 0.2, COLOR_SCHEME);
+    col *= smoothstep(0.05, 0.0, truchet - 0.03);
+    col *= smoothstep(3.0, 0.0, r);
+
+    #elif MODE == 3
+    // --- Rose curve mode ---
+    float rose_r = 0.6 * cos(PETAL_COUNT * theta + t);
+    float dist = abs(r - abs(rose_r));
+    float ribbon_width = 0.04;
+    float rose_shape = smoothstep(ribbon_width + aa, ribbon_width - aa, dist);
+    float depth = 0.5 + 0.5 * cos(PETAL_COUNT * theta + t);
+    col = getColor(theta / TAU, COLOR_SCHEME) * depth;
+    col *= rose_shape;
+    float center = smoothstep(0.08 + aa, 0.08 - aa, r);
+    col += getColor(0.5, COLOR_SCHEME) * center * 0.5;
+    #endif
+
+    col = pow(col, vec3(1.0 / 2.2));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Dynamic Vortex Background (Balatro Style)
+Cartesian→Polar→Cartesian round-trip + iterative domain warping
+```glsl
+float new_angle = atan(uv.y, uv.x) + speed
+    - SPIN_EASE * 20.0 * (SPIN_AMOUNT * uv_len + (1.0 - SPIN_AMOUNT));
+vec2 mid = (screenSize.xy / length(screenSize.xy)) / 2.0;
+uv = vec2(uv_len * cos(new_angle) + mid.x,
+           uv_len * sin(new_angle) + mid.y) - mid;
+uv *= 30.0;
+for (int i = 0; i < 5; i++) {
+    uv2 += sin(max(uv.x, uv.y)) + uv;
+    uv  += 0.5 * vec2(cos(5.1123 + 0.353*uv2.y + speed*0.131),
+                       sin(uv2.x - 0.113*speed));
+    uv  -= cos(uv.x + uv.y) - sin(uv.x*0.711 - uv.y);
+}
+```
+
+### Variant 2: Polar Torus Twist (Ring Twister Style)
+Direct rendering in polar space, angular slicing to simulate 3D torus
+```glsl
+vec2 uvr = vec2(length(uv), atan(uv.y, uv.x) + PI);
+uvr.x -= OUT_RADIUS;
+float twist = uvr.y + 2.0*iTime + sin(uvr.y)*sin(iTime)*PI;
+for (int i = 0; i < NUM_FACES; i++) {
+    float x0 = IN_RADIUS * sin(twist + TAU * float(i) / float(NUM_FACES));
+    float x1 = IN_RADIUS * sin(twist + TAU * float(i+1) / float(NUM_FACES));
+    vec4 face = slice(x0, x1, uvr);
+    col = mix(col, face.rgb, face.a);
+}
+```
+
+### Variant 3: Galaxy / Logarithmic Spiral (Galaxy Style)
+`log(r)` equiangular spiral + FBM noise + spiral arm compression
+```glsl
+float rho = length(uv);
+float ang = atan(uv.y, uv.x);
+float shear = 2.0 * log(rho);
+mat2 R = mat2(cos(shear), -sin(shear), sin(shear), cos(shear));
+float phase = NB_ARMS * (ang - shear);
+ang = ang - COMPR * cos(phase) + SPEED * t;
+uv = rho * vec2(cos(ang), sin(ang));
+float gaz = fbm_noise(0.09 * R * uv);
+```
+
+### Variant 4: Archimedean Spiral Band (Wave Greek Frieze Style)
+Polar unwrap into spiral band, creating vortex animation within the band
+```glsl
+vec2 U = vec2(atan(U.y, U.x)/TAU + 0.5, length(U));
+U.y -= U.x;                                    // Archimedean unwrap
+U.x = arc_length(ceil(U.y) + U.x) - iTime;     // Arc length parameterization
+vec2 cell_uv = fract(U) - 0.5;
+float vortex = dot(cell_uv,
+    cos(vec2(-33.0, 0.0)
+        + 0.3 * (iTime + cell_id.x)
+        * max(0.0, 0.5 - length(cell_uv))));
+```
+
+### Variant 5: Complex / Polar Duality (Jeweled Vortex Style)
+Complex arithmetic replaces explicit trigonometric functions for conformal mapping
+```glsl
+float e = n * 2.0;
+float a = atan(u.y, u.x) - PI/2.0;
+float r = exp(log(length(u)) / e);      // r^(1/e)
+float sc = ceil(r - a/TAU);
+float s = pow(sc + a/TAU, 2.0);
+col += sin(cr + s/n * TAU / 2.0);
+col *= cos(cr + s/n * TAU);
+col *= pow(abs(sin((r - a/TAU) * PI)), abs(e) + 5.0);
+```
+
+## Performance & Composition
+
+### Performance Tips
+- **Pole safety**: `float r = max(length(uv), 1e-6);` to avoid division by zero
+- **Trigonometric optimization**: When both sin/cos are needed, use a rotation matrix `mat2 ROT(float a) { float c=cos(a),s=sin(a); return mat2(c,s,-s,c); }`
+- **Kaleidoscope is naturally optimized**: All expensive computation happens in a single sector, visual complexity ×N
+- **Loop control**: Rose curves and other multi-loop effects work well with 4-8 loops; don't go too high
+- **Pixel downsampling**: `floor(fragCoord / pixel_size) * pixel_size` quantizes coordinates to reduce computation
+
+### Composition Tips
+- **Polar + FBM**: Sample noise in transformed space → organic spiral textures
+- **Polar + Truchet**: Lay Truchet tiles after kaleidoscope folding → geometric tunnel effects
+- **Polar + SDF**: `r(θ)` defines contour + SDF boolean operations / glow
+- **Polar + Checkerboard**: `sign(sin(u*PI*4.0)*cos(uvr.y*16.0))` → circular checkerboard
+- **Polar + Post-Processing**: Gamma + vignette + contrast enhancement for improved visual quality
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/polar-uv-manipulation.md)
--- a/skills/shader-dev/techniques/post-processing.md
+++ b/skills/shader-dev/techniques/post-processing.md
@@ -0,0 +1,788 @@
+## WebGL2 Adaptation Requirements
+
+**IMPORTANT: Critical Warning for Standalone HTML Deployment**: Post-processing effects require an input texture to work. When generating standalone HTML, you must:
+1. Set `#define USE_DEMO_SCENE 1` to use the built-in demo scene (recommended), or
+2. Pass a valid input texture to the `iChannel0` channel, otherwise the screen will be completely black
+3. **Critical**: When USE_DEMO_SCENE=1, ensure the #else branch code does not reference non-existent uniforms (e.g., iChannel0)
+
+**IMPORTANT: GLSL Type Strictness Rules**:
+- `vec2 = float` is illegal — must use `vec2(x, x)` or `vec2(x)`
+- Function parameters must be defined before use; using a variable name in its own initializer is forbidden (e.g., `float w = filmicCurve(w, w)` is an error)
+- Variables must be declared before use
+- **#version must be the very first line of shader code**: No characters (including whitespace or comments) may precede `#version 300 es`
+- **Code in preprocessor branches is still compiled**: Even if `#if USE_DEMO_SCENE` is true, the `#else` branch code is still compiled by the GPU — all branches must be valid GLSL code
+
+Code templates in this document use ShaderToy GLSL style. When generating standalone HTML pages, you must adapt to WebGL2:
+
+- Use `canvas.getContext("webgl2")`
+- First line of shader: `#version 300 es`, add `precision highp float;` for fragment shaders
+- Vertex shader: `attribute` → `in`, `varying` → `out`
+- Fragment shader: `varying` → `in`, `gl_FragColor` → custom `out vec4 fragColor`, `texture2D()` → `texture()`
+- ShaderToy's `void mainImage(out vec4 fragColor, in vec2 fragCoord)` must be adapted to standard `void main()` entry
+- Must create Framebuffers and render to texture before post-processing
+
+### Complete WebGL2 Standalone HTML Template
+
+```html
+<!DOCTYPE html>
+<html>
+<head>
+    <meta charset="utf-8">
+    <title>Post-Processing Shader</title>
+    <style>
+        body { margin: 0; overflow: hidden; background: #000; }
+        canvas { display: block; width: 100vw; height: 100vh; }
+    </style>
+</head>
+<body>
+    <canvas id="canvas"></canvas>
+
+    <!-- Vertex Shader: #version must be the first line -->
+    <script id="vs" type="x-shader/x-vertex">
+        #version 300 es
+        in vec2 a_position;
+        out vec2 v_uv;
+        void main() {
+            v_uv = a_position * 0.5 + 0.5;
+            gl_Position = vec4(a_position, 0.0, 1.0);
+        }
+    </script>
+
+    <!-- Fragment Shader: #version must be the first line, precision follows -->
+    <script id="fs" type="x-shader/x-fragment">
+        #version 300 es
+        precision highp float;
+
+        in vec2 v_uv;
+        out vec4 fragColor;
+
+        uniform float iTime;
+        uniform vec2 iResolution;
+        // Note: Do not use iChannel0 in standalone HTML unless a valid texture is bound
+        // Recommended: Use USE_DEMO_SCENE=1 for the built-in demo scene
+
+        #define USE_DEMO_SCENE 1  // Recommended: use built-in demo scene
+
+        // Demo scene function (replaces iChannel0 sampling)
+        vec3 demoScene(vec2 uv, float time) {
+            vec3 col = 0.5 + 0.5 * cos(time + uv.xyx + vec3(0.0, 2.0, 4.0));
+            float d = length(uv - 0.5) - 0.12;
+            col += vec3(3.0, 2.5, 1.8) * smoothstep(0.02, 0.0, d);
+            return col;
+        }
+
+        // Tone mapping and other post-processing functions...
+        // Note: Do not reference iChannel0 in the #else branch of #if USE_DEMO_SCENE
+        // If you need iChannel0, use #ifdef or ensure the texture is bound when USE_DEMO_SCENE=0
+
+        void main() {
+            vec2 uv = v_uv;
+            vec3 color;
+
+            #if USE_DEMO_SCENE
+                color = demoScene(uv, iTime);
+            #else
+                // This branch only executes when USE_DEMO_SCENE=0 and a texture is bound
+                // Requires binding in JavaScript: gl.bindTexture(gl.TEXTURE_2D, texture);
+                color = vec3(0.0);  // fallback
+            #endif
+
+            fragColor = vec4(color, 1.0);
+        }
+    </script>
+
+    <script>
+        const canvas = document.getElementById('canvas');
+        const gl = canvas.getContext('webgl2');
+
+        // Compile shader
+        function createShader(gl, type, source) {
+            const shader = gl.createShader(type);
+            gl.shaderSource(shader, source);
+            gl.compileShader(shader);
+            if (!gl.getShaderParameter(shader, gl.COMPILE_STATUS)) {
+                console.error(gl.getShaderInfoLog(shader));
+                gl.deleteShader(shader);
+                return null;
+            }
+            return shader;
+        }
+
+        // Create program
+        const vs = createShader(gl, gl.VERTEX_SHADER, document.getElementById('vs').textContent);
+        const fs = createShader(gl, gl.FRAGMENT_SHADER, document.getElementById('fs').textContent);
+        const program = gl.createProgram();
+        gl.attachShader(program, vs);
+        gl.attachShader(program, fs);
+        gl.linkProgram(program);
+
+        // Fullscreen quad
+        const positions = new Float32Array([-1,-1, 1,-1, -1,1, 1,1]);
+        const vao = gl.createVertexArray();
+        gl.bindVertexArray(vao);
+        const buffer = gl.createBuffer();
+        gl.bindBuffer(gl.ARRAY_BUFFER, buffer);
+        gl.bufferData(gl.ARRAY_BUFFER, positions, gl.STATIC_DRAW);
+
+        const posLoc = gl.getAttribLocation(program, 'a_position');
+        gl.enableVertexAttribArray(posLoc);
+        gl.vertexAttribPointer(posLoc, 2, gl.FLOAT, false, 0, 0);
+
+        // Uniform locations
+        const timeLoc = gl.getUniformLocation(program, 'iTime');
+        const resLoc = gl.getUniformLocation(program, 'iResolution');
+
+        // Render loop
+        function render(time) {
+            canvas.width = window.innerWidth;
+            canvas.height = window.innerHeight;
+            gl.viewport(0, 0, canvas.width, canvas.height);
+
+            gl.useProgram(program);
+            gl.uniform1f(timeLoc, time * 0.001);
+            gl.uniform2f(resLoc, canvas.width, canvas.height);
+
+            gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
+            requestAnimationFrame(render);
+        }
+        requestAnimationFrame(render);
+    </script>
+</body>
+</html>
+```
+
+### Multi-Pass Post-Processing HTML Template (with FBO)
+
+Bloom separable blur, TAA, multi-step post-processing pipelines, etc. require rendering to intermediate textures. The following skeleton demonstrates the pattern: render scene to FBO → post-processing reads FBO → output to screen:
+
+```html
+<!DOCTYPE html>
+<html>
+<head>
+    <meta charset="utf-8">
+    <title>Multi-Pass Post-Processing</title>
+    <style>
+        body { margin: 0; overflow: hidden; background: #000; }
+        canvas { display: block; width: 100vw; height: 100vh; }
+    </style>
+</head>
+<body>
+<canvas id="c"></canvas>
+<script>
+let frameCount = 0;
+
+const canvas = document.getElementById('c');
+const gl = canvas.getContext('webgl2');
+const ext = gl.getExtension('EXT_color_buffer_float');
+
+function createShader(type, src) {
+    const s = gl.createShader(type);
+    gl.shaderSource(s, src);
+    gl.compileShader(s);
+    if (!gl.getShaderParameter(s, gl.COMPILE_STATUS))
+        console.error(gl.getShaderInfoLog(s));
+    return s;
+}
+function createProgram(vsSrc, fsSrc) {
+    const p = gl.createProgram();
+    gl.attachShader(p, createShader(gl.VERTEX_SHADER, vsSrc));
+    gl.attachShader(p, createShader(gl.FRAGMENT_SHADER, fsSrc));
+    gl.linkProgram(p);
+    return p;
+}
+
+const vsSource = `#version 300 es
+in vec2 pos;
+void main(){ gl_Position=vec4(pos,0,1); }`;
+
+// fsScene: Scene rendering shader (outputs HDR color to FBO)
+// fsPost:  Post-processing shader (samples scene texture from iChannel0, applies bloom/tonemap/etc)
+const progScene = createProgram(vsSource, fsScene);
+const progPost = createProgram(vsSource, fsPost);
+
+function createFBO(w, h) {
+    const tex = gl.createTexture();
+    gl.bindTexture(gl.TEXTURE_2D, tex);
+    // IMPORTANT: Critical: Check for float texture extension, fall back to RGBA8 if unsupported
+    const fmt = ext ? gl.RGBA16F : gl.RGBA;
+    const typ = ext ? gl.FLOAT : gl.UNSIGNED_BYTE;
+    gl.texImage2D(gl.TEXTURE_2D, 0, fmt, w, h, 0, gl.RGBA, typ, null);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.LINEAR);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.LINEAR);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
+    gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
+    const fbo = gl.createFramebuffer();
+    gl.bindFramebuffer(gl.FRAMEBUFFER, fbo);
+    gl.framebufferTexture2D(gl.FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, tex, 0);
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+    return { fbo, tex };
+}
+
+let W, H, sceneFBO;
+
+const vao = gl.createVertexArray();
+gl.bindVertexArray(vao);
+const vbo = gl.createBuffer();
+gl.bindBuffer(gl.ARRAY_BUFFER, vbo);
+gl.bufferData(gl.ARRAY_BUFFER, new Float32Array([-1,-1, 1,-1, -1,1, 1,1]), gl.STATIC_DRAW);
+gl.enableVertexAttribArray(0);
+gl.vertexAttribPointer(0, 2, gl.FLOAT, false, 0, 0);
+
+function resize() {
+    canvas.width = W = innerWidth;
+    canvas.height = H = innerHeight;
+    sceneFBO = createFBO(W, H);
+}
+addEventListener('resize', resize);
+resize();
+
+function render(t) {
+    t *= 0.001;
+    // Pass 1: Scene rendering → FBO
+    gl.useProgram(progScene);
+    gl.bindFramebuffer(gl.FRAMEBUFFER, sceneFBO.fbo);
+    gl.viewport(0, 0, W, H);
+    gl.uniform2f(gl.getUniformLocation(progScene, 'iResolution'), W, H);
+    gl.uniform1f(gl.getUniformLocation(progScene, 'iTime'), t);
+    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
+
+    // Pass 2: Post-processing reads scene texture → screen
+    gl.useProgram(progPost);
+    gl.bindFramebuffer(gl.FRAMEBUFFER, null);
+    gl.viewport(0, 0, W, H);
+    gl.activeTexture(gl.TEXTURE0);
+    gl.bindTexture(gl.TEXTURE_2D, sceneFBO.tex);
+    gl.uniform1i(gl.getUniformLocation(progPost, 'iChannel0'), 0);
+    gl.uniform2f(gl.getUniformLocation(progPost, 'iResolution'), W, H);
+    gl.uniform1f(gl.getUniformLocation(progPost, 'iTime'), t);
+    gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
+
+    frameCount++;
+    requestAnimationFrame(render);
+}
+requestAnimationFrame(render);
+</script>
+</body>
+</html>
+```
+
+# Screen-Space Post-Processing Effects
+
+## Use Cases
+
+Screen-space image enhancement on already-rendered scenes: Tone Mapping, Bloom, Vignette, Chromatic Aberration, Motion Blur, DoF, FXAA/TAA, Color Grading, Film Grain, Lens Flare, etc.
+
+Typical pipeline order: Scene Rendering → AA → Bloom → Chromatic Aberration → Motion Blur/DoF → Tone Mapping → Color Grading → Contrast → Vignette → Film Grain → Gamma → Dithering.
+
+## Core Principles
+
+The essence of post-processing is **per-pixel transformation of an already-rendered image** — input is a framebuffer texture, output is the transformed color value.
+
+- **Tone Mapping**: HDR [0, ∞) → LDR [0, 1]. Reinhard `c/(1+c)`, Filmic Reinhard (white point/shoulder parameters), ACES (3×3 matrix + rational polynomial), generic rational polynomial
+- **Gaussian Blur**: 2D Gaussian kernel is separable into two 1D passes, O(n²) → O(2n)
+- **Bloom**: Bright-pass extraction → multi-level Gaussian blur → additive blend back to original
+- **Vignette**: Brightness falloff based on pixel distance to center. Multiplicative or radial
+- **Chromatic Aberration**: Sample the same texture at different scales for R/G/B channels
+
+## Implementation Steps
+
+### Step 1: Tone Mapping
+
+```glsl
+// Reinhard
+vec3 reinhard(vec3 color) { return color / (1.0 + color); }
+
+// Filmic Reinhard (W=white point, T2=shoulder parameter)
+// IMPORTANT: GLSL critical rule: function parameters must be defined before use; using a variable name in its own initializer is forbidden
+const float W = 1.2, T2 = 7.5; // adjustable
+float filmic_reinhard_curve(float x) {
+    float q = (T2 * T2 + 1.0) * x * x;
+    return q / (q + x + T2 * T2);
+}
+vec3 filmic_reinhard(vec3 x) {
+    float w = filmic_reinhard_curve(W);  // compute w using constant W first
+    return vec3(filmic_reinhard_curve(x.r), filmic_reinhard_curve(x.g), filmic_reinhard_curve(x.b)) / w;
+}
+
+// ACES industry standard
+vec3 aces_tonemap(vec3 color) {
+    mat3 m1 = mat3(0.59719,0.07600,0.02840, 0.35458,0.90834,0.13383, 0.04823,0.01566,0.83777);
+    mat3 m2 = mat3(1.60475,-0.10208,-0.00327, -0.53108,1.10813,-0.07276, -0.07367,-0.00605,1.07602);
+    vec3 v = m1 * color;
+    vec3 a = v * (v + 0.0245786) - 0.000090537;
+    vec3 b = v * (0.983729 * v + 0.4329510) + 0.238081;
+    return clamp(m2 * (a / b), 0.0, 1.0);
+}
+
+// Generic rational polynomial
+vec3 rational_tonemap(vec3 x) {
+    float a=0.010, b=0.132, c=0.010, d=0.163, e=0.101; // adjustable
+    return (x * (a * x + b)) / (x * (c * x + d) + e);
+}
+```
+
+### Step 2: Gamma Correction
+
+```glsl
+color = pow(color, vec3(1.0 / 2.2)); // after tone mapping; ACES already includes gamma, skip this step
+```
+
+### Step 3: Contrast Enhancement (Hermite S-Curve)
+
+```glsl
+color = clamp(color, 0.0, 1.0);
+color = color * color * (3.0 - 2.0 * color);
+// Controllable intensity: color = mix(color, color*color*(3.0-2.0*color), strength);
+// smoothstep equivalent: color = smoothstep(-0.025, 1.0, color);
+```
+
+### Step 4: Color Grading
+
+```glsl
+color = color * vec3(1.11, 0.89, 0.79); // per-channel multiply (warm tone), adjustable
+color = pow(color, vec3(1.3, 1.2, 1.0)); // pow color grading, adjustable
+// HSV hue shift: hsv.x = fract(hsv.x + 0.05); hsv.y *= 1.1;
+// Desaturation: color = mix(color, vec3(dot(color, vec3(0.299,0.587,0.114))), 0.2);
+```
+
+### Step 5: Vignette
+
+```glsl
+// Option A: Multiplicative
+vec2 q = fragCoord / iResolution.xy;
+float vignette = pow(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y), 0.25);
+color *= 0.5 + 0.5 * vignette;
+
+// Option B: Radial distance
+vec2 centered = (uv - 0.5) * vec2(iResolution.x / iResolution.y, 1.0);
+float vig = mix(1.0, max(0.0, 1.0 - pow(length(centered)/1.414 * 0.6, 3.0)), 0.5);
+color *= vig;
+
+// Option C: Inverse quadratic falloff
+vec2 p = 1.0 - 2.0 * fragCoord / iResolution.xy;
+p.y *= iResolution.y / iResolution.x;
+float vig2 = 1.25 / (1.1 + 1.1 * dot(p, p)); vig2 *= vig2;
+color *= mix(1.0, smoothstep(0.1, 1.1, vig2), 0.25);
+```
+
+### Step 6: Gaussian Blur
+
+```glsl
+float normpdf(float x, float sigma) {
+    return 0.39894 * exp(-0.5 * x * x / (sigma * sigma)) / sigma;
+}
+vec3 gaussianBlur(sampler2D tex, vec2 fragCoord, vec2 resolution) {
+    const int KERNEL_SIZE = 11, HALF = 5; // adjustable: KERNEL_SIZE must be odd
+    float sigma = 7.0; // adjustable
+    float kernel[KERNEL_SIZE]; float Z = 0.0;
+    for (int j = 0; j <= HALF; ++j)
+        kernel[HALF + j] = kernel[HALF - j] = normpdf(float(j), sigma);
+    for (int j = 0; j < KERNEL_SIZE; ++j) Z += kernel[j];
+    vec3 result = vec3(0.0);
+    for (int i = -HALF; i <= HALF; ++i)
+        for (int j = -HALF; j <= HALF; ++j)
+            result += kernel[HALF+j] * kernel[HALF+i]
+                    * texture(tex, (fragCoord + vec2(float(i), float(j))) / resolution).rgb;
+    return result / (Z * Z);
+}
+```
+
+### Step 7: Bloom (Single Pass, Hardware Mipmap)
+
+```glsl
+vec3 simpleBloom(sampler2D tex, vec2 uv) {
+    vec3 bloom = vec3(0.0); float tw = 0.0; float maxB = 5.0; // adjustable
+    for (int x = -1; x <= 1; x++)
+        for (int y = -1; y <= 1; y++) {
+            vec2 off = vec2(float(x), float(y)) / iResolution.xy; float w = 1.0;
+            bloom += w * min(vec3(maxB), textureLod(tex, uv+off*exp2(5.0), 5.0).rgb); tw += w;
+            bloom += w * min(vec3(maxB), textureLod(tex, uv+off*exp2(6.0), 6.0).rgb); tw += w;
+            bloom += w * min(vec3(maxB), textureLod(tex, uv+off*exp2(7.0), 7.0).rgb); tw += w;
+        }
+    return pow(bloom / tw, vec3(1.5)) * 0.3; // adjustable: gamma and intensity
+}
+// Usage: color = color * 0.8 + simpleBloom(iChannel0, uv);
+```
+
+### Step 8: Chromatic Aberration
+
+```glsl
+#define CA_SAMPLES 8       // adjustable
+#define CA_STRENGTH 0.003  // adjustable
+vec3 chromaticAberration(sampler2D tex, vec2 uv) {
+    vec2 center = uv - 0.5; vec3 color = vec3(0.0);
+    float rf = 1.0, gf = 1.0, bf = 1.0, f = 1.0 / float(CA_SAMPLES);
+    for (int i = 0; i < CA_SAMPLES; ++i) {
+        color.r += f * texture(tex, 0.5 - 0.5 * (center * 2.0 * rf)).r;
+        color.g += f * texture(tex, 0.5 - 0.5 * (center * 2.0 * gf)).g;
+        color.b += f * texture(tex, 0.5 - 0.5 * (center * 2.0 * bf)).b;
+        rf *= 1.0 - CA_STRENGTH; gf *= 1.0 - CA_STRENGTH*0.3; bf *= 1.0 + CA_STRENGTH*0.4;
+    }
+    return clamp(color, 0.0, 1.0);
+}
+```
+
+### Step 9: Film Grain
+
+```glsl
+float hash(float c) { return fract(sin(dot(c, vec2(12.9898, 78.233))) * 43758.5453); }
+#define GRAIN_STRENGTH 0.012 // adjustable
+color += vec3(GRAIN_STRENGTH * hash(length(fragCoord / iResolution.xy) + iTime));
+
+// Bayer matrix ordered dithering (eliminates color banding)
+const mat4 bayerMatrix = mat4(
+    vec4(0.,8.,2.,10.), vec4(12.,4.,14.,6.), vec4(3.,11.,1.,9.), vec4(15.,7.,13.,5.));
+float orderedDither(vec2 fc) {
+    return (bayerMatrix[int(fc.x)&3][int(fc.y)&3] + 1.0) / 17.0;
+}
+color += (orderedDither(fragCoord) - 0.5) * 4.0 / 255.0;
+```
+
+### Step 10: Demo Scene (Required for Standalone HTML!)
+
+**IMPORTANT: Critical Warning**: Standalone HTML deployment must provide an input texture, otherwise post-processing effects will output solid black.
+
+```glsl
+// Demo scene fallback: used when no valid input texture is available
+vec3 demoScene(vec2 uv, float time) {
+    // Dynamic gradient background
+    vec3 col = 0.5 + 0.5 * cos(time + uv.xyx + vec3(0, 2, 4));
+
+    // Center glowing sphere (for testing bloom)
+    float d = length(uv - 0.5) - 0.15;
+    col += vec3(2.0) * smoothstep(0.02, 0.0, d); // extremely bright region
+
+    // Moving highlight bar (for testing bloom bleed)
+    float bar = step(0.48, uv.y) * step(uv.y, 0.52);
+    bar *= step(0.0, sin(uv.x * 10.0 - time * 2.0));
+    col += vec3(1.5, 0.8, 0.3) * bar;
+
+    // Colored blocks (for testing chromatic aberration and tone mapping)
+    vec2 id = floor(uv * 4.0);
+    float rand = fract(sin(dot(id, vec2(12.9898, 78.233))) * 43758.5453);
+    vec2 rect = fract(uv * 4.0);
+    float box = step(0.1, rect.x) * step(rect.x, 0.9) * step(0.1, rect.y) * step(rect.y, 0.9);
+    col += vec3(rand, 1.0 - rand, 0.5) * box * 0.5;
+
+    return col;
+}
+```
+
+### Step 10: Motion Blur
+
+```glsl
+#define MB_SAMPLES 32    // adjustable
+#define MB_STRENGTH 0.25 // adjustable
+vec3 motionBlur(sampler2D tex, vec2 uv, vec2 velocity) {
+    vec2 dir = velocity * MB_STRENGTH; vec3 color = vec3(0.0); float tw = 0.0;
+    for (int i = 0; i < MB_SAMPLES; i++) {
+        float t = float(i) / float(MB_SAMPLES - 1), w = 1.0 - t;
+        color += w * textureLod(tex, uv + dir * t, 0.0).rgb; tw += w;
+    }
+    return color / tw;
+}
+```
+
+### Step 11: Depth of Field
+
+```glsl
+#define DOF_SAMPLES 64
+#define DOF_FOCAL_LENGTH 0.03
+float getCoC(float depth, float focusDist) {
+    float aperture = min(1.0, focusDist * focusDist * 0.5);
+    return abs(aperture * (DOF_FOCAL_LENGTH * (depth - focusDist))
+             / (depth * (focusDist - DOF_FOCAL_LENGTH)));
+}
+float goldenAngle = 3.14159265 * (3.0 - sqrt(5.0));
+vec3 depthOfField(sampler2D tex, vec2 uv, float depth, float focusDist) {
+    float coc = getCoC(depth, focusDist);
+    vec3 result = texture(tex, uv).rgb * max(0.001, coc);
+    float tw = max(0.001, coc);
+    for (int i = 1; i < DOF_SAMPLES; i++) {
+        float fi = float(i);
+        float theta = fi * goldenAngle * float(DOF_SAMPLES);
+        float r = coc * sqrt(fi) / sqrt(float(DOF_SAMPLES));
+        vec2 tapUV = uv + vec2(sin(theta), cos(theta)) * r;
+        vec4 s = textureLod(tex, tapUV, 0.0);
+        float w = max(0.001, getCoC(s.w, focusDist));
+        result += s.rgb * w; tw += w;
+    }
+    return result / tw;
+}
+```
+
+### Step 12: FXAA
+
+```glsl
+vec3 fxaa(sampler2D tex, vec2 fragCoord, vec2 resolution) {
+    vec2 pp = 1.0 / resolution;
+    vec4 color = texture(tex, fragCoord * pp);
+    vec3 luma = vec3(0.299, 0.587, 0.114);
+    float lumaNW = dot(texture(tex, (fragCoord+vec2(-1.,-1.))*pp).rgb, luma);
+    float lumaNE = dot(texture(tex, (fragCoord+vec2( 1.,-1.))*pp).rgb, luma);
+    float lumaSW = dot(texture(tex, (fragCoord+vec2(-1., 1.))*pp).rgb, luma);
+    float lumaSE = dot(texture(tex, (fragCoord+vec2( 1., 1.))*pp).rgb, luma);
+    float lumaM  = dot(color.rgb, luma);
+    float lumaMin = min(lumaM, min(min(lumaNW,lumaNE), min(lumaSW,lumaSE)));
+    float lumaMax = max(lumaM, max(max(lumaNW,lumaNE), max(lumaSW,lumaSE)));
+    vec2 dir = vec2(-((lumaNW+lumaNE)-(lumaSW+lumaSE)), ((lumaNW+lumaSW)-(lumaNE+lumaSE)));
+    float dirReduce = max((lumaNW+lumaNE+lumaSW+lumaSE)*0.03125, 1.0/128.0);
+    dir = clamp(dir * 2.5/(min(abs(dir.x),abs(dir.y))+dirReduce), vec2(-8.0), vec2(8.0)) * pp;
+    vec3 rgbA = 0.5 * (texture(tex, fragCoord*pp+dir*(1./3.-0.5)).rgb
+                      + texture(tex, fragCoord*pp+dir*(2./3.-0.5)).rgb);
+    vec3 rgbB = rgbA*0.5 + 0.25*(texture(tex, fragCoord*pp+dir*-0.5).rgb
+                                 + texture(tex, fragCoord*pp+dir*0.5).rgb);
+    float lumaB = dot(rgbB, luma);
+    return (lumaB < lumaMin || lumaB > lumaMax) ? rgbA : rgbB;
+}
+```
+
+## Complete Code Template
+
+Can be run directly in ShaderToy. `iChannel0` is the scene texture.
+
+**IMPORTANT: Important Warning**: For standalone HTML deployment, you must:
+1. Pass a valid input texture to iChannel0 (or uChannel0)
+2. Or set `#define USE_DEMO_SCENE 1` to use the built-in demo scene
+
+```glsl
+// Post-Processing Pipeline — ShaderToy Template
+#define ENABLE_TONEMAP  1
+#define ENABLE_BLOOM    1
+#define ENABLE_CA       1
+#define ENABLE_VIGNETTE 1
+#define ENABLE_GRAIN    1
+#define ENABLE_CONTRAST 1
+#define USE_DEMO_SCENE  1    // set to 1 to use built-in demo scene (required for standalone HTML)
+#define TONEMAP_MODE    2    // 0=Reinhard, 1=Filmic, 2=ACES
+#define BRIGHTNESS      1.0
+#define WHITE_POINT     1.2
+#define SHOULDER        7.5
+#define BLOOM_STRENGTH  0.08
+#define BLOOM_LOD_START 4.0
+#define COLOR_TINT      vec3(1.11, 0.89, 0.79)
+#define CA_SAMPLES      8
+#define CA_INTENSITY    0.003
+#define VIG_POWER       0.25
+#define GRAIN_AMOUNT    0.012
+
+float hash11(float p) { return fract(sin(p * 12.9898) * 43758.5453); }
+
+// Demo scene fallback: used when no input texture is available
+vec3 demoScene(vec2 uv, float time) {
+    // Dynamic gradient background
+    vec3 col = 0.5 + 0.5 * cos(time + uv.xyx + vec3(0, 2, 4));
+    // Center glowing sphere (for testing bloom)
+    float d = length(uv - 0.5) - 0.15;
+    col += vec3(2.0) * smoothstep(0.02, 0.0, d);
+    // Moving highlight bar (for testing bloom bleed)
+    float bar = step(0.48, uv.y) * step(uv.y, 0.52);
+    bar *= step(0.0, sin(uv.x * 10.0 - time * 2.0));
+    col += vec3(1.5, 0.8, 0.3) * bar;
+    // Colored blocks (for testing chromatic aberration and tone mapping)
+    vec2 id = floor(uv * 4.0);
+    float rand = fract(sin(dot(id, vec2(12.9898, 78.233))) * 43758.5453);
+    vec2 rect = fract(uv * 4.0);
+    float box = step(0.1, rect.x) * step(rect.x, 0.9) * step(0.1, rect.y) * step(rect.y, 0.9);
+    col += vec3(rand, 1.0 - rand, 0.5) * box * 0.5;
+    return col;
+}
+
+vec3 tonemapReinhard(vec3 c) { return c / (1.0 + c); }
+// IMPORTANT: Critical: filmicCurve takes only one parameter x; w is computed externally via WHITE_POINT
+float filmicCurve(float x) {
+    float q = (SHOULDER*SHOULDER+1.0)*x*x; return q/(q+x+SHOULDER*SHOULDER);
+}
+vec3 tonemapFilmic(vec3 c) {
+    float w = filmicCurve(WHITE_POINT);  // compute w using WHITE_POINT constant first
+    return vec3(filmicCurve(c.r), filmicCurve(c.g), filmicCurve(c.b)) / w;
+}
+vec3 tonemapACES(vec3 color) {
+    mat3 m1 = mat3(0.59719,0.07600,0.02840, 0.35458,0.90834,0.13383, 0.04823,0.01566,0.83777);
+    mat3 m2 = mat3(1.60475,-0.10208,-0.00327, -0.53108,1.10813,-0.07276, -0.07367,-0.00605,1.07602);
+    vec3 v = m1*color;
+    vec3 a = v*(v+0.0245786)-0.000090537;
+    vec3 b = v*(0.983729*v+0.4329510)+0.238081;
+    return clamp(m2*(a/b), 0.0, 1.0);
+}
+vec3 applyTonemap(vec3 c) {
+    c *= BRIGHTNESS;
+    #if TONEMAP_MODE == 0
+        return tonemapReinhard(c);
+    #elif TONEMAP_MODE == 1
+        return tonemapFilmic(c);
+    #else
+        return tonemapACES(c);
+    #endif
+}
+
+vec3 sampleBloom(sampler2D tex, vec2 uv) {
+    vec3 bloom = vec3(0.0); float tw = 0.0;
+    for (int x = -1; x <= 1; x++)
+        for (int y = -1; y <= 1; y++) {
+            vec2 off = vec2(float(x),float(y))/iResolution.xy; float w = 1.0;
+            bloom += w*textureLod(tex, uv+off*exp2(BLOOM_LOD_START), BLOOM_LOD_START).rgb;
+            bloom += w*textureLod(tex, uv+off*exp2(BLOOM_LOD_START+1.0), BLOOM_LOD_START+1.0).rgb;
+            bloom += w*textureLod(tex, uv+off*exp2(BLOOM_LOD_START+2.0), BLOOM_LOD_START+2.0).rgb;
+            tw += w*3.0;
+        }
+    return bloom / tw;
+}
+
+vec3 applyChromaticAberration(sampler2D tex, vec2 uv) {
+    vec2 center = 1.0 - 2.0*uv; vec3 color = vec3(0.0);
+    float rf=1.0, gf=1.0, bf=1.0, f=1.0/float(CA_SAMPLES);
+    for (int i = 0; i < CA_SAMPLES; ++i) {
+        color.r += f*texture(tex, 0.5-0.5*(center*rf)).r;
+        color.g += f*texture(tex, 0.5-0.5*(center*gf)).g;
+        color.b += f*texture(tex, 0.5-0.5*(center*bf)).b;
+        rf *= 1.0-CA_INTENSITY; gf *= 1.0-CA_INTENSITY*0.3; bf *= 1.0+CA_INTENSITY*0.4;
+    }
+    return clamp(color, 0.0, 1.0);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+
+    // Get input color: demo scene or input texture
+    #if USE_DEMO_SCENE
+        vec3 color = demoScene(uv, iTime);
+    #else
+        #if ENABLE_CA
+            vec3 color = applyChromaticAberration(iChannel0, uv);
+        #else
+            vec3 color = texture(iChannel0, uv).rgb;
+        #endif
+    #endif
+
+    #if ENABLE_BLOOM && !USE_DEMO_SCENE
+        color += sampleBloom(iChannel0, uv) * BLOOM_STRENGTH;
+    #else
+        // In demo scene mode, use simplified bloom sampling from itself
+        #if ENABLE_BLOOM
+            vec3 bloom = vec3(0.0); float tw = 0.0;
+            for (int x = -1; x <= 1; x++)
+                for (int y = -1; y <= 1; y++) {
+                    vec2 off = vec2(float(x),float(y))/iResolution.xy * 0.02;
+                    vec3 s = demoScene(uv + off, iTime);
+                    float w = 1.0;
+                    bloom += w * min(vec3(5.0), s); tw += w;
+                }
+            color += bloom / tw * BLOOM_STRENGTH;
+        #endif
+    #endif
+
+    color *= COLOR_TINT;
+    #if ENABLE_TONEMAP
+        #if TONEMAP_MODE == 2
+            color = applyTonemap(color);
+        #else
+            color = applyTonemap(color);
+            color = pow(color, vec3(1.0/2.2));
+        #endif
+    #else
+        color = pow(color, vec3(1.0/2.2));
+    #endif
+    #if ENABLE_CONTRAST
+        color = clamp(color, 0.0, 1.0);
+        color = color*color*(3.0-2.0*color);
+    #endif
+    #if ENABLE_VIGNETTE
+        vec2 q = fragCoord/iResolution.xy;
+        color *= 0.5 + 0.5*pow(16.0*q.x*q.y*(1.0-q.x)*(1.0-q.y), VIG_POWER);
+    #endif
+    #if ENABLE_GRAIN
+        color += GRAIN_AMOUNT * hash11(dot(uv, vec2(12.9898,78.233)) + iTime);
+    #endif
+    fragColor = vec4(clamp(color, 0.0, 1.0), 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Multi-Pass Separable Bloom
+
+```glsl
+// Buffer A: Horizontal Gaussian blur + bright-pass
+#define BLOOM_THRESHOLD vec3(0.2)
+#define BLOOM_DOWNSAMPLE 3
+#define BLUR_RADIUS 16
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    ivec2 xy = ivec2(fragCoord);
+    if (xy.x >= int(iResolution.x)/BLOOM_DOWNSAMPLE) { fragColor = vec4(0); return; }
+    vec3 sum = vec3(0.0); float tw = 0.0;
+    for (int k = -BLUR_RADIUS; k <= BLUR_RADIUS; ++k) {
+        vec3 texel = max(vec3(0.0), texelFetch(iChannel0, (xy+ivec2(k,0))*BLOOM_DOWNSAMPLE, 0).rgb - BLOOM_THRESHOLD);
+        float w = exp(-8.0 * pow(abs(float(k))/float(BLUR_RADIUS), 2.0));
+        sum += texel*w; tw += w;
+    }
+    fragColor = vec4(sum/tw, 1.0);
+}
+// Buffer B: Vertical blur, same as above but with direction changed to ivec2(0, k)
+```
+
+### Variant 2: ACES + Full Color Pipeline (with Built-in Gamma)
+
+```glsl
+vec3 aces_tonemap(vec3 color) {
+    mat3 m1 = mat3(0.59719,0.07600,0.02840, 0.35458,0.90834,0.13383, 0.04823,0.01566,0.83777);
+    mat3 m2 = mat3(1.60475,-0.10208,-0.00327, -0.53108,1.10813,-0.07276, -0.07367,-0.00605,1.07602);
+    vec3 v = m1*color;
+    vec3 a = v*(v+0.0245786)-0.000090537;
+    vec3 b = v*(0.983729*v+0.4329510)+0.238081;
+    return pow(clamp(m2*(a/b), 0.0, 1.0), vec3(1.0/2.2));
+}
+```
+
+### Variant 3: DoF + Motion Blur Combination
+
+```glsl
+for (int i = 1; i < BLUR_TAPS; i++) {
+    float t = float(i)/float(BLUR_TAPS);
+    float randomT = hash(iTime + t + uv.x + uv.y*12.345);
+    vec2 tapUV = mix(currentUV, prevFrameUV, (randomT-0.5)*shutterAngle); // motion blur
+    float theta = t*goldenAngle*float(BLUR_TAPS);
+    float r = coc*sqrt(t*float(BLUR_TAPS))/sqrt(float(BLUR_TAPS));
+    tapUV += vec2(sin(theta), cos(theta))*r; // DoF
+    vec4 tap = textureLod(sceneTex, tapUV, 0.0);
+    float w = max(0.001, getCoC(decodeDepth(tap.w), focusDistance));
+    result += tap.rgb*w; totalWeight += w;
+}
+```
+
+### Variant 4: TAA Temporal Anti-Aliasing
+
+```glsl
+vec4 current = textureLod(currentFrame, uv - jitterOffset/iResolution.xy, 0.0);
+vec3 vMin = vec3(1e5), vMax = vec3(-1e5);
+for (int iy = -1; iy <= 1; iy++)
+    for (int ix = -1; ix <= 1; ix++) {
+        vec3 s = texelFetch(currentFrame, ivec2(fragCoord)+ivec2(ix,iy), 0).rgb;
+        vMin = min(vMin, s); vMax = max(vMax, s);
+    }
+vec4 history = textureLod(historyBuffer, reprojectToPrevFrame(worldPos, prevViewProjMatrix), 0.0);
+float blend = (all(greaterThanEqual(history.rgb, vMin)) && all(lessThanEqual(history.rgb, vMax))) ? 0.9 : 0.0;
+color = mix(current.rgb, history.rgb, blend);
+```
+
+### Variant 5: Lens Flare + Starburst
+
+```glsl
+#define NUM_APERTURE_BLADES 8.0
+vec2 toSun = normalize(sunScreenPos - uv);
+float angle = atan(toSun.y, toSun.x);
+float starburst = pow(0.5+0.5*cos(1.5*3.14159+angle*NUM_APERTURE_BLADES),
+                      max(1.0, 500.0-sunDist*sunDist*501.0));
+float ghost = smoothstep(0.015, 0.0, length(ghostCenter-uv)-ghostRadius);
+totalFlare += wavelengthToRGB(300.0+fract((length(ghostCenter-uv)-ghostRadius)*5.0)*500.0) * ghost * 0.25;
+```
+
+## Performance & Composition
+
+**Performance**: Separable blur 121→22 samples | `textureLod` hardware mipmap for free downsampling | Downsample 2-4x before blurring | Sample counts: MB 16-32, DoF 32-64, CA 4-8 | Inter-texel sampling = free bilinear | `#define` switches have zero cost | Use `mix`/`step`/`smoothstep` instead of branches
+
+**Composition**: Bloom+ToneMap (compute bloom in HDR space then tonemap, not reversible) | TAA+MB+DoF (shared sampling loop) | CA+Vignette+Grain (lens trio) | ColorGrading+ToneMap+Contrast (grade in linear space → HDR compression → gamma-space S-curve) | Bloom+LensFlare (shared bright-pass) | Multi-pass pipeline: BufA scene → BufB/C Bloom H/V → BufD TAA → Image compositing
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/post-processing.md)
--- a/skills/shader-dev/techniques/procedural-2d-pattern.md
+++ b/skills/shader-dev/techniques/procedural-2d-pattern.md
@@ -0,0 +1,346 @@
+# 2D Procedural Patterns
+
+## Use Cases
+- Repeating/aperiodic 2D patterns: grids, hexagons, Truchet, interference patterns, kaleidoscopes, spirals, Lissajous
+- Procedural backgrounds, UI textures, sci-fi HUD/radar
+- Fractals, water caustics, and other natural phenomena
+- Infinite detail, seamless tiling, parameter-driven visual effects
+
+## Core Principles
+
+2D procedural patterns = **domain transforms + distance fields + color mapping**:
+
+1. **Domain repetition**: `fract()`/`mod()` folds the infinite plane into repeating cells
+2. **Cell identification**: `floor()` extracts integer coordinates as hash seeds, driving per-cell random variations
+3. **Distance field (SDF)**: mathematical functions compute pixel-to-shape distance, `smoothstep` renders edges
+4. **Color mapping**: cosine palette `a + b*cos(2pi(c*t+d))` or HSV
+5. **Layer compositing**: multi-layer loop results blended via addition/multiplication/`mix`
+
+Key formulas:
+```glsl
+// UV normalization
+uv = (fragCoord * 2.0 - iResolution.xy) / iResolution.y;
+// Domain repetition
+cell_uv = fract(uv * SCALE) - 0.5;
+cell_id = floor(uv * SCALE);
+// Cosine palette
+col = a + b * cos(6.28318 * (c * t + d));
+// Hexagon SDF
+hex(p) = max(dot(abs(p), vec2(0.5, 0.866025)), abs(p).x);
+// 2D rotation
+mat2(cos(a), -sin(a), sin(a), cos(a));
+```
+
+## Implementation Steps
+
+### Step 1: UV Normalization
+```glsl
+vec2 uv = (fragCoord * 2.0 - iResolution.xy) / iResolution.y;
+```
+
+### Step 2: Domain Repetition
+```glsl
+#define SCALE 4.0
+vec2 cell_uv = fract(uv * SCALE) - 0.5;
+vec2 cell_id = floor(uv * SCALE);
+```
+
+Hexagonal grid domain repetition:
+```glsl
+const vec2 s = vec2(1, 1.7320508);
+vec4 hC = floor(vec4(p, p - vec2(0.5, 1.0)) / s.xyxy) + 0.5;
+vec4 h = vec4(p - hC.xy * s, p - (hC.zw + 0.5) * s);
+vec4 hex_data = dot(h.xy, h.xy) < dot(h.zw, h.zw)
+    ? vec4(h.xy, hC.xy)
+    : vec4(h.zw, hC.zw + vec2(0.5, 1.0));
+```
+
+### Step 3: Per-Cell Randomization
+```glsl
+float hash21(vec2 p) {
+    return fract(sin(dot(p, vec2(141.173, 289.927))) * 43758.5453);
+}
+float rnd = hash21(cell_id);
+float radius = 0.15 + 0.1 * rnd;
+```
+
+### Step 4: SDF Shape Drawing
+```glsl
+// Circle
+float d = length(cell_uv) - radius;
+
+// Hexagon
+float hex_sdf(vec2 p) {
+    p = abs(p);
+    return max(dot(p, vec2(0.5, 0.866025)), p.x);
+}
+
+// Line segment
+float line_sdf(vec2 a, vec2 b, vec2 p) {
+    vec2 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h);
+}
+
+// Anti-aliased rendering
+float shape = 1.0 - smoothstep(radius - 0.008, radius + 0.008, length(cell_uv));
+```
+
+### Step 5: Polar Coordinate Rings/Arcs
+```glsl
+vec2 polar = vec2(length(uv), atan(uv.y, uv.x));
+float ring_id = floor(polar.x * NUM_RINGS + 0.5) / NUM_RINGS;
+float ring = 1.0 - pow(abs(sin(polar.x * 3.14159 * NUM_RINGS)) * 1.25, 2.5);
+float arc_end = polar.y + sin(iTime + ring_id * 5.5) * 1.52 - 1.5;
+ring *= smoothstep(0.0, 0.05, arc_end);
+```
+
+### Step 6: Cosine Palette
+```glsl
+vec3 palette(float t) {
+    vec3 a = vec3(0.5, 0.5, 0.5);
+    vec3 b = vec3(0.5, 0.5, 0.5);
+    vec3 c = vec3(1.0, 1.0, 1.0);
+    vec3 d = vec3(0.263, 0.416, 0.557);
+    return a + b * cos(6.28318 * (c * t + d));
+}
+```
+
+### Step 7: Iterative Stacking & Glow
+```glsl
+#define NUM_LAYERS 4.0
+vec3 finalColor = vec3(0.0);
+vec2 uv0 = uv;
+for (float i = 0.0; i < NUM_LAYERS; i++) {
+    uv = fract(uv * 1.5) - 0.5;
+    float d = length(uv) * exp(-length(uv0));
+    vec3 col = palette(length(uv0) + i * 0.4 + iTime * 0.4);
+    d = sin(d * 8.0 + iTime) / 8.0;
+    d = abs(d);
+    d = pow(0.01 / d, 1.2);
+    finalColor += col * d;
+}
+```
+
+### Step 8: Trigonometric Interference
+```glsl
+#define MAX_ITER 5
+vec2 p = mod(uv * TAU, TAU) - 250.0;
+vec2 i = p;
+float c = 1.0;
+float inten = 0.005;
+for (int n = 0; n < MAX_ITER; n++) {
+    float t = iTime * (1.0 - 3.5 / float(n + 1));
+    i = p + vec2(cos(t - i.x) + sin(t + i.y),
+                 sin(t - i.y) + cos(t + i.x));
+    c += 1.0 / length(vec2(p.x / (sin(i.x + t) / inten),
+                            p.y / (cos(i.y + t) / inten)));
+}
+c /= float(MAX_ITER);
+c = 1.17 - pow(c, 1.4);
+vec3 colour = vec3(pow(abs(c), 8.0));
+```
+
+### Step 9: Multi-Layer Depth Compositing
+```glsl
+#define NUM_DEPTH_LAYERS 4.0
+float m = 0.0;
+for (float i = 0.0; i < 1.0; i += 1.0 / NUM_DEPTH_LAYERS) {
+    float z = fract(iTime * 0.1 + i);
+    float size = mix(15.0, 1.0, z);
+    float fade = smoothstep(0.0, 0.6, z) * smoothstep(1.0, 0.8, z);
+    m += fade * patternLayer(uv * size, i, iTime);
+}
+```
+
+### Step 10: Post-Processing
+```glsl
+col = pow(clamp(col, 0.0, 1.0), vec3(1.0 / 2.2));                          // Gamma
+col = col * 0.6 + 0.4 * col * col * (3.0 - 2.0 * col);                    // Contrast S-curve
+col = mix(col, vec3(dot(col, vec3(0.33))), -0.4);                          // Saturation
+vec2 q = fragCoord / iResolution.xy;
+col *= 0.5 + 0.5 * pow(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y), 0.7); // Vignette
+```
+
+## Complete Code Template
+
+```glsl
+// ====== 2D Procedural Pattern Template ======
+// Ready to run in ShaderToy
+
+#define SCALE 3.0
+#define NUM_LAYERS 4.0
+#define ZOOM_FACTOR 1.5
+#define GLOW_WIDTH 0.01
+#define GLOW_POWER 1.2
+#define WAVE_FREQ 8.0
+#define ANIM_SPEED 0.4
+#define RING_COUNT 10.0
+
+vec3 palette(float t) {
+    vec3 a = vec3(0.5, 0.5, 0.5);
+    vec3 b = vec3(0.5, 0.5, 0.5);
+    vec3 c = vec3(1.0, 1.0, 1.0);
+    vec3 d = vec3(0.263, 0.416, 0.557);
+    return a + b * cos(6.28318 * (c * t + d));
+}
+
+float hash21(vec2 p) {
+    return fract(sin(dot(p, vec2(141.173, 289.927))) * 43758.5453);
+}
+
+mat2 rot2(float a) {
+    float c = cos(a), s = sin(a);
+    return mat2(c, -s, s, c);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (fragCoord * 2.0 - iResolution.xy) / iResolution.y;
+    vec2 uv0 = uv;
+    vec3 finalColor = vec3(0.0);
+
+    for (float i = 0.0; i < NUM_LAYERS; i++) {
+        uv = fract(uv * ZOOM_FACTOR) - 0.5;
+        float d = length(uv) * exp(-length(uv0));
+        vec3 col = palette(length(uv0) + i * 0.4 + iTime * ANIM_SPEED);
+        d = sin(d * WAVE_FREQ + iTime) / WAVE_FREQ;
+        d = abs(d);
+        d = pow(GLOW_WIDTH / d, GLOW_POWER);
+        finalColor += col * d;
+    }
+
+    finalColor = pow(clamp(finalColor, 0.0, 1.0), vec3(1.0 / 2.2));
+    finalColor = finalColor * 0.6 + 0.4 * finalColor * finalColor * (3.0 - 2.0 * finalColor);
+    vec2 q = fragCoord / iResolution.xy;
+    finalColor *= 0.5 + 0.5 * pow(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y), 0.7);
+
+    fragColor = vec4(finalColor, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Hexagonal Truchet Arcs
+```glsl
+float hex(vec2 p) {
+    p = abs(p);
+    return max(dot(p, vec2(0.5, 0.866025)), p.x);
+}
+
+const vec2 s = vec2(1.0, 1.7320508);
+vec4 getHex(vec2 p) {
+    vec4 hC = floor(vec4(p, p - vec2(0.5, 1.0)) / s.xyxy) + 0.5;
+    vec4 h = vec4(p - hC.xy * s, p - (hC.zw + 0.5) * s);
+    return dot(h.xy, h.xy) < dot(h.zw, h.zw)
+        ? vec4(h.xy, hC.xy)
+        : vec4(h.zw, hC.zw + vec2(0.5, 1.0));
+}
+
+// Truchet triple arcs
+float r = 1.0;
+vec2 q1 = p - vec2(0.0, r) / s;
+vec2 q2 = rot2(6.28318 / 3.0) * p - vec2(0.0, r) / s;
+vec2 q3 = rot2(6.28318 * 2.0 / 3.0) * p - vec2(0.0, r) / s;
+float d = min(min(length(q1), length(q2)), length(q3));
+d = abs(d - 0.288675) - 0.1;
+```
+
+### Variant 2: Water Caustic Interference
+```glsl
+#define TAU 6.28318530718
+#define MAX_ITER 5
+vec2 p = mod(uv * TAU, TAU) - 250.0;
+vec2 i = p;
+float c = 1.0;
+float inten = 0.005;
+for (int n = 0; n < MAX_ITER; n++) {
+    float t = iTime * (1.0 - 3.5 / float(n + 1));
+    i = p + vec2(cos(t - i.x) + sin(t + i.y),
+                 sin(t - i.y) + cos(t + i.x));
+    c += 1.0 / length(vec2(p.x / (sin(i.x + t) / inten),
+                            p.y / (cos(i.y + t) / inten)));
+}
+c /= float(MAX_ITER);
+c = 1.17 - pow(c, 1.4);
+vec3 colour = vec3(pow(abs(c), 8.0));
+colour = clamp(colour + vec3(0.0, 0.35, 0.5), 0.0, 1.0);
+```
+
+### Variant 3: Polar Concentric Ring Arc Segments
+```glsl
+#define NUM_RINGS 20.0
+#define PALETTE vec3(0.0, 1.4, 2.0) + 1.5
+vec2 plr = vec2(length(p), atan(p.y, p.x));
+float id = floor(plr.x * NUM_RINGS + 0.5) / NUM_RINGS;
+p *= rot2(id * 11.0);
+p.y = abs(p.y);
+float rz = 1.0 - pow(abs(sin(plr.x * 3.14159 * NUM_RINGS)) * 1.25, 2.5);
+float arc = plr.y + sin(iTime + id * 5.5) * 1.52 - 1.5;
+rz *= smoothstep(0.0, 0.05, arc);
+vec3 col = (sin(PALETTE + id * 5.0 + iTime) * 0.5 + 0.5) * rz;
+```
+
+### Variant 4: Multi-Layer Depth Parallax Network
+```glsl
+#define NUM_DEPTH_LAYERS 4.0
+vec2 GetPos(vec2 id, vec2 offs, float t) {
+    float n = hash21(id + offs);
+    return offs + vec2(sin(t + n * 6.28), cos(t + fract(n * 100.0) * 6.28)) * 0.4;
+}
+float df_line(vec2 a, vec2 b, vec2 p) {
+    vec2 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h);
+}
+float m = 0.0;
+for (float i = 0.0; i < 1.0; i += 1.0 / NUM_DEPTH_LAYERS) {
+    float z = fract(iTime * 0.1 + i);
+    float size = mix(15.0, 1.0, z);
+    float fade = smoothstep(0.0, 0.6, z) * smoothstep(1.0, 0.8, z);
+    m += fade * NetLayer(uv * size, i, iTime);
+}
+```
+
+### Variant 5: Fractal Apollonian
+```glsl
+float apollian(vec4 p, float s) {
+    float scale = 1.0;
+    for (int i = 0; i < 7; ++i) {
+        p = -1.0 + 2.0 * fract(0.5 * p + 0.5);
+        float r2 = dot(p, p);
+        float k = s / r2;
+        p *= k;
+        scale *= k;
+    }
+    return abs(p.y) / scale;
+}
+vec4 pp = vec4(p.x, p.y, 0.0, 0.0) + offset;
+pp.w = 0.125 * (1.0 - tanh(length(pp.xyz)));
+float d = apollian(pp / 4.0, 1.2) * 4.0;
+float hue = fract(0.75 * length(p) - 0.3 * iTime) + 0.3;
+float sat = 0.75 * tanh(2.0 * length(p));
+vec3 col = hsv2rgb(vec3(hue, sat, 1.0));
+```
+
+## Performance & Composition
+
+**Performance:**
+- Iteration loops are the biggest bottleneck; `NUM_LAYERS` 4->8 halves performance; mobile should use 3 layers or fewer
+- Use `step()`/`smoothstep()`/`mix()` instead of `if/else`
+- Merge multiple SDFs with `min()`/`max()`, then apply a single `smoothstep`
+- Precompute `sin`/`cos` pairs outside loops; write irrational constants as literal values
+- `atan` is expensive; use `dot` approximation when only periodicity is needed
+- LOD: reduce iterations for distant objects `int iters = int(mix(3.0, float(MAX_ITER), smoothstep(...)));`
+- `smoothstep` is often better than `pow` and inherently clamps to [0,1]
+
+**Combinations:**
+- **+ Noise**: `d += triangleNoise(uv * 10.0) * 0.05;` for organic erosion feel
+- **+ Cross-hatch**: grayscale thresholds + `sin` lines to simulate hand-drawn style
+- **+ SDF Boolean**: `min` (union) / `max` (intersection) / subtraction for complex geometry
+- **+ Domain distortion**: `uv += 0.05 * vec2(sin(uv.y*5.+iTime), sin(uv.x*3.+iTime));`
+- **+ Radial blur**: multi-sample average along polar coordinate direction
+- **+ Pseudo-3D lighting**: SDF gradient as normal, add diffuse/specular for embossed look
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/procedural-2d-pattern.md)
--- a/skills/shader-dev/techniques/procedural-noise.md
+++ b/skills/shader-dev/techniques/procedural-noise.md
@@ -0,0 +1,554 @@
+# Procedural Noise Skill
+
+## Use Cases
+
+Procedural noise is the most fundamental technique in real-time GPU graphics. It applies to natural phenomena (fire, clouds, water, lava), terrain generation, texture synthesis, volume rendering, motion effects, and more.
+
+Core idea: use mathematical functions to generate pseudo-random, spatially continuous signals on the GPU in real time, then produce multi-scale detail through FBM and domain warping.
+
+## Core Principles
+
+### Noise Functions
+
+Generate random values at integer lattice points, then smoothly interpolate between them.
+
+- **Value Noise**: random scalars at lattice points + bilinear Hermite interpolation. `N(p) = mix(mix(h00,h10,u), mix(h01,h11,u), v)`
+- **Simplex Noise**: triangular lattice gradient dot products + radial falloff kernel. Skew `K1=(sqrt(3)-1)/2`, unskew `K2=(3-sqrt(3))/6`. Fewer lattice lookups, no axis-aligned artifacts.
+
+### Hash Functions
+
+Map integer coordinates to pseudo-random values:
+
+- **sin-based** (short but precision-sensitive): `fract(sin(dot(p, vec2(127.1,311.7))) * 43758.5453)`
+- **sin-free** (cross-platform stable): `fract(p * 0.1031)` + dot mixing + fract
+
+### FBM (Fractal Brownian Motion)
+
+Multi-octave noise summation: `FBM(p) = sum of amplitude_i * noise(frequency_i * p)`
+
+- Lacunarity ~2.0, Gain ~0.5, inter-octave rotation to eliminate artifacts
+
+### Domain Warping
+
+Feed noise output back as coordinate offset: `fbm(p + fbm(p))` or cascaded `fbm(p + fbm(p + fbm(p)))`
+
+### FBM Variant Quick Reference
+
+| Variant | Formula | Effect |
+|---------|---------|--------|
+| Standard | `sum a*noise(p)` | Soft clouds |
+| Ridged | `sum a*abs(noise(p))` | Sharp ridges/lightning |
+| Sinusoidal ridged | `sum a*sin(noise(p)*k)` | Periodic ridges/lava |
+| Erosion | `sum a*noise(p)/(1+dot(d,d))` | Realistic terrain |
+| Ocean waves | `sum a*sea_octave(p)` | Peaked wave crests |
+
+## Implementation Code
+
+### Hash Functions
+
+```glsl
+// Sin-free hash (Dave Hoskins) — cross-platform stable
+float hash12(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * .1031);
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+vec2 hash22(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * vec3(.1031, .1030, .0973));
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.xx + p3.yz) * p3.zy);
+}
+
+// Sin hash — shorter code, precision-sensitive on some GPUs
+float hash(vec2 p) {
+    float h = dot(p, vec2(127.1, 311.7));
+    return fract(sin(h) * 43758.5453123);
+}
+
+vec2 hash2(vec2 p) {
+    p = vec2(dot(p, vec2(127.1, 311.7)),
+             dot(p, vec2(269.5, 183.3)));
+    return -1.0 + 2.0 * fract(sin(p) * 43758.5453123);
+}
+```
+
+### Value Noise
+
+```glsl
+// Hermite smooth bilinear interpolation
+float noise(in vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+    f = f * f * (3.0 - 2.0 * f);
+    float a = hash(p + vec2(0.0, 0.0));
+    float b = hash(p + vec2(1.0, 0.0));
+    float c = hash(p + vec2(0.0, 1.0));
+    float d = hash(p + vec2(1.0, 1.0));
+    return mix(mix(a, b, f.x), mix(c, d, f.x), f.y);
+}
+```
+
+### Simplex Noise
+
+```glsl
+// 2D Simplex (skewed triangular grid + h^4 falloff kernel)
+float noise(in vec2 p) {
+    const float K1 = 0.366025404;  // (sqrt(3)-1)/2
+    const float K2 = 0.211324865;  // (3-sqrt(3))/6
+    vec2 i = floor(p + (p.x + p.y) * K1);
+    vec2 a = p - i + (i.x + i.y) * K2;
+    vec2 o = (a.x > a.y) ? vec2(1.0, 0.0) : vec2(0.0, 1.0);
+    vec2 b = a - o + K2;
+    vec2 c = a - 1.0 + 2.0 * K2;
+    vec3 h = max(0.5 - vec3(dot(a, a), dot(b, b), dot(c, c)), 0.0);
+    vec3 n = h * h * h * h * vec3(
+        dot(a, hash2(i + 0.0)),
+        dot(b, hash2(i + o)),
+        dot(c, hash2(i + 1.0))
+    );
+    return dot(n, vec3(70.0));
+}
+```
+
+### Standard FBM
+
+```glsl
+#define OCTAVES 4
+#define GAIN 0.5
+mat2 m = mat2(1.6, 1.2, -1.2, 1.6);  // rotation+scale, |m|=2.0, ~36.87 deg
+
+float fbm(vec2 p) {
+    float f = 0.0, a = 0.5;
+    for (int i = 0; i < OCTAVES; i++) {
+        f += a * noise(p);
+        p = m * p;
+        a *= GAIN;
+    }
+    return f;
+}
+```
+
+Manually unrolled version (slightly varying lacunarity to break self-similarity):
+
+```glsl
+const mat2 mtx = mat2(0.80, 0.60, -0.60, 0.80);
+float fbm4(vec2 p) {
+    float f = 0.0;
+    f += 0.5000 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.02;
+    f += 0.2500 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.03;
+    f += 0.1250 * (-1.0 + 2.0 * noise(p)); p = mtx * p * 2.01;
+    f += 0.0625 * (-1.0 + 2.0 * noise(p));
+    return f / 0.9375;
+}
+```
+
+### Ridged FBM
+
+```glsl
+// abs() produces V-shaped ridges at zero crossings
+float fbm_ridged(in vec2 p) {
+    float z = 2.0, rz = 0.0;
+    for (float i = 1.0; i < 6.0; i++) {
+        rz += abs((noise(p) - 0.5) * 2.0) / z;
+        z *= 2.0;
+        p *= 2.0;
+    }
+    return rz;
+}
+
+// Sinusoidal ridged variant — lava texture
+// rz += (sin(noise(p) * 7.0) * 0.5 + 0.5) / z;
+```
+
+### Domain Warping
+
+```glsl
+// Basic domain warping ("2D Clouds")
+float q = fbm(uv * 0.5);
+uv -= q - time;
+float f = fbm(uv);
+
+// Classic three-level cascade
+vec2 fbm4_2(vec2 p) {
+    return vec2(fbm4(p + vec2(1.0)), fbm4(p + vec2(6.2)));
+}
+float func(vec2 q, out vec2 o, out vec2 n) {
+    o = 0.5 + 0.5 * fbm4_2(q);
+    n = fbm6_2(4.0 * o);
+    vec2 p = q + 2.0 * n + 1.0;
+    float f = 0.5 + 0.5 * fbm4(2.0 * p);
+    f = mix(f, f * f * f * 3.5, f * abs(n.x));
+    return f;
+}
+
+// Dual-axis domain warping
+float dualfbm(in vec2 p) {
+    vec2 p2 = p * 0.7;
+    vec2 basis = vec2(fbm(p2 - time * 1.6), fbm(p2 + time * 1.7));
+    basis = (basis - 0.5) * 0.2;
+    p += basis;
+    return fbm(p * makem2(time * 0.2));
+}
+```
+
+### Fluid Noise
+
+```glsl
+// Per-octave gradient displacement simulating fluid transport
+#define FLOW_SPEED 0.6
+#define BASE_SPEED 1.9
+#define ADVECTION 0.77
+#define GRAD_SCALE 0.5
+
+vec2 gradn(vec2 p) {
+    float ep = 0.09;
+    float gradx = noise(vec2(p.x + ep, p.y)) - noise(vec2(p.x - ep, p.y));
+    float grady = noise(vec2(p.x, p.y + ep)) - noise(vec2(p.x, p.y - ep));
+    return vec2(gradx, grady);
+}
+
+float flow(in vec2 p) {
+    float z = 2.0, rz = 0.0;
+    vec2 bp = p;
+    for (float i = 1.0; i < 7.0; i++) {
+        p += time * FLOW_SPEED;
+        bp += time * BASE_SPEED;
+        vec2 gr = gradn(i * p * 0.34 + time * 1.0);
+        gr *= makem2(time * 6.0 - (0.05 * p.x + 0.03 * p.y) * 40.0);
+        p += gr * GRAD_SCALE;
+        rz += (sin(noise(p) * 7.0) * 0.5 + 0.5) / z;
+        p = mix(bp, p, ADVECTION);
+        z *= 1.4;
+        p *= 2.0;
+        bp *= 1.9;
+    }
+    return rz;
+}
+```
+
+### Derivative FBM
+
+```glsl
+// Value noise with analytic derivatives
+vec3 noised(in vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+    vec2 u = f * f * (3.0 - 2.0 * f);
+    vec2 du = 6.0 * f * (1.0 - f);
+    float a = hash(p + vec2(0, 0));
+    float b = hash(p + vec2(1, 0));
+    float c = hash(p + vec2(0, 1));
+    float d = hash(p + vec2(1, 1));
+    return vec3(
+        a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y,
+        du * (vec2(b - a, c - a) + (a - b - c + d) * u.yx)
+    );
+}
+
+// Erosion FBM: higher gradient = lower contribution
+float terrainFBM(in vec2 x) {
+    const mat2 m2 = mat2(0.8, -0.6, 0.6, 0.8);
+    float a = 0.0, b = 1.0;
+    vec2 d = vec2(0.0);
+    for (int i = 0; i < 16; i++) {
+        vec3 n = noised(x);
+        d += n.yz;
+        a += b * n.x / (1.0 + dot(d, d));  // 1/(1+|grad|^2) erosion factor
+        b *= 0.5;
+        x = m2 * x * 2.0;
+    }
+    return a;
+}
+```
+
+### Quintic Noise with Analytical Derivatives
+
+C2-continuous noise using quintic interpolation — eliminates visible grid artifacts in derivatives:
+
+```glsl
+// Returns vec3(value, dFdx, dFdy) — derivatives are exact, not finite-differenced
+vec3 noisedQ(vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+    // Quintic interpolation for C2 continuity
+    vec2 u = f * f * f * (f * (f * 6.0 - 15.0) + 10.0);
+    vec2 du = 30.0 * f * f * (f * (f - 2.0) + 1.0);
+
+    float a = hash12(i + vec2(0.0, 0.0));
+    float b = hash12(i + vec2(1.0, 0.0));
+    float c = hash12(i + vec2(0.0, 1.0));
+    float d = hash12(i + vec2(1.0, 1.0));
+
+    float k0 = a, k1 = b - a, k2 = c - a, k3 = a - b - c + d;
+    return vec3(
+        k0 + k1 * u.x + k2 * u.y + k3 * u.x * u.y,  // value
+        du * vec2(k1 + k3 * u.y, k2 + k3 * u.x)       // derivatives
+    );
+}
+```
+
+### FBM with Derivatives (Erosion Terrain)
+
+Accumulates derivatives across octaves — derivative magnitude dampens amplitude, creating realistic erosion patterns:
+
+```glsl
+vec3 fbmDerivative(vec2 p, int octaves) {
+    float value = 0.0;
+    vec2 deriv = vec2(0.0);
+    float amplitude = 0.5;
+    float frequency = 1.0;
+    mat2 rot = mat2(0.8, 0.6, -0.6, 0.8); // inter-octave rotation
+
+    for (int i = 0; i < octaves; i++) {
+        vec3 n = noisedQ(p * frequency);
+        deriv += n.yz;
+        // Key: divide by (1 + dot(deriv, deriv)) for erosion effect
+        value += amplitude * n.x / (1.0 + dot(deriv, deriv));
+        frequency *= 2.0;
+        amplitude *= 0.5;
+        p = rot * p;  // rotate to break axis-aligned artifacts
+    }
+    return vec3(value, deriv);
+}
+```
+
+Key insights:
+- **Quintic interpolation**: `6t^5 - 15t^4 + 10t^3` gives C2 continuous noise (vs Hermite's C1), eliminating visible grid artifacts in derivatives
+- **Erosion FBM**: The `1/(1+dot(d,d))` term causes flat areas to accumulate more detail while steep slopes stay smooth — mimicking real erosion
+- **Inter-octave rotation**: The 2x2 rotation matrix between octaves prevents axis-aligned patterns especially visible in ridged noise
+
+### Voronoise (Voronoi-Noise Hybrid)
+
+Unified interpolation between value noise and Voronoi patterns:
+
+```glsl
+// u=0: Value noise, u=1: Voronoi, v: smoothness (0=sharp cells, 1=smooth)
+vec3 hash32(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * vec3(.1031, .1030, .0973));
+    p3 += dot(p3, p3.yxz + 33.33);
+    return fract((p3.xxy + p3.yzz) * p3.zyx);
+}
+
+float voronoise(vec2 p, float u, float v) {
+    float k = 1.0 + 63.0 * pow(1.0 - v, 6.0);
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+    vec2 a = vec2(0.0);
+    for (int y = -2; y <= 2; y++)
+    for (int x = -2; x <= 2; x++) {
+        vec2 g = vec2(float(x), float(y));
+        vec3 o = hash32(i + g) * vec3(u, u, 1.0);
+        vec2 d = g - f + o.xy;
+        float w = pow(1.0 - smoothstep(0.0, 1.414, length(d)), k);
+        a += vec2(o.z * w, w);
+    }
+    return a.x / a.y;
+}
+```
+
+Extremely versatile — smoothly interpolates between cellular Voronoi and continuous noise.
+
+### Preventing Aliasing in Procedural Textures
+
+For distant surfaces, high-frequency noise octaves create moiré artifacts. Solutions:
+
+1. **LOD-based octave count**: `int octaves = min(MAX_OCTAVES, int(log2(pixelSize)))` — skip octaves finer than pixel size
+2. **Analytical filtering**: For simple patterns (checkers, stripes), use smoothstep with pixel width: `smoothstep(-fw, fw, pattern)` where `fw = fwidth(uv)`
+3. **Derivative-based mip**: Use `textureGrad()` with manually computed ray differentials for texture lookups in ray-marched scenes (see texture-mapping-advanced technique)
+
+## Complete Code Template
+
+Ready to run in ShaderToy. Switch between standard FBM / ridged FBM / domain warping modes via `#define`:
+
+```glsl
+// ============================================================
+// Procedural Noise Skill — Complete Template
+// ============================================================
+
+// ========== Mode selection (uncomment to switch) ==========
+#define MODE_STANDARD_FBM     // Standard FBM clouds
+//#define MODE_RIDGED_FBM     // Ridged FBM lightning texture
+//#define MODE_DOMAIN_WARP    // Domain warped organic pattern
+
+// ========== Tunable parameters ==========
+#define OCTAVES 6
+#define GAIN 0.5
+#define LACUNARITY 2.0
+#define NOISE_SCALE 3.0
+#define ANIM_SPEED 0.3
+#define WARP_STRENGTH 0.4
+
+// ========== Hash function ==========
+float hash(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * 0.1031);
+    p3 += dot(p3, p3.yzx + 33.33);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+// ========== Value noise ==========
+float noise(in vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+    f = f * f * (3.0 - 2.0 * f);
+    float a = hash(p + vec2(0.0, 0.0));
+    float b = hash(p + vec2(1.0, 0.0));
+    float c = hash(p + vec2(0.0, 1.0));
+    float d = hash(p + vec2(1.0, 1.0));
+    return mix(mix(a, b, f.x), mix(c, d, f.x), f.y);
+}
+
+// ========== Rotation+scale matrix ==========
+const mat2 m = mat2(1.6, 1.2, -1.2, 1.6);
+
+// ========== Standard FBM ==========
+float fbm(vec2 p) {
+    float f = 0.0, a = 0.5;
+    for (int i = 0; i < OCTAVES; i++) {
+        f += a * (-1.0 + 2.0 * noise(p));
+        p = m * p;
+        a *= GAIN;
+    }
+    return f;
+}
+
+// ========== Ridged FBM ==========
+float fbm_ridged(vec2 p) {
+    float f = 0.0, a = 0.5;
+    for (int i = 0; i < OCTAVES; i++) {
+        f += a * abs(-1.0 + 2.0 * noise(p));
+        p = m * p;
+        a *= GAIN;
+    }
+    return f;
+}
+
+// ========== Domain warping vec2 FBM ==========
+vec2 fbm2(vec2 p) {
+    return vec2(fbm(p + vec2(1.7, 9.2)), fbm(p + vec2(8.3, 2.8)));
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+    uv *= NOISE_SCALE;
+    float time = iTime * ANIM_SPEED;
+    float f = 0.0;
+    vec3 col = vec3(0.0);
+
+#ifdef MODE_STANDARD_FBM
+    f = 0.5 + 0.5 * fbm(uv + vec2(0.0, -time));
+    vec3 sky = mix(vec3(0.4, 0.7, 1.0), vec3(0.2, 0.4, 0.6), fragCoord.y / iResolution.y);
+    vec3 cloud = vec3(1.1, 1.1, 0.9) * f;
+    col = mix(sky, cloud, smoothstep(0.4, 0.7, f));
+#endif
+
+#ifdef MODE_RIDGED_FBM
+    f = fbm_ridged(uv + vec2(time * 0.5, time * 0.3));
+    col = vec3(0.2, 0.1, 0.4) / max(f, 0.05);
+    col = pow(col, vec3(0.99));
+#endif
+
+#ifdef MODE_DOMAIN_WARP
+    vec2 q = fbm2(uv + time * 0.1);
+    vec2 r = fbm2(uv + WARP_STRENGTH * q + vec2(1.7, 9.2));
+    f = 0.5 + 0.5 * fbm(uv + WARP_STRENGTH * r);
+    f = mix(f, f * f * f * 3.5, f * length(r));
+    col = vec3(0.2, 0.1, 0.4);
+    col = mix(col, vec3(0.3, 0.05, 0.05), f);
+    col = mix(col, vec3(0.9, 0.9, 0.9), dot(r, r));
+    col = mix(col, vec3(0.5, 0.2, 0.2), 0.5 * q.y * q.y);
+    col *= f * 2.0;
+    vec2 eps = vec2(1.0 / iResolution.x, 0.0);
+    float fx = 0.5 + 0.5 * fbm(uv + eps.xy + WARP_STRENGTH * fbm2(uv + eps.xy + time * 0.1));
+    float fy = 0.5 + 0.5 * fbm(uv + eps.yx + WARP_STRENGTH * fbm2(uv + eps.yx + time * 0.1));
+    vec3 nor = normalize(vec3(fx - f, eps.x, fy - f));
+    vec3 lig = normalize(vec3(0.9, -0.2, -0.4));
+    float dif = clamp(0.3 + 0.7 * dot(nor, lig), 0.0, 1.0);
+    col *= vec3(0.85, 0.90, 0.95) * (nor.y * 0.5 + 0.5) + vec3(0.15, 0.10, 0.05) * dif;
+#endif
+
+    vec2 p = fragCoord / iResolution.xy;
+    col *= 0.5 + 0.5 * sqrt(16.0 * p.x * p.y * (1.0 - p.x) * (1.0 - p.y));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Ridged FBM
+
+```glsl
+f += a * abs(noise(p));           // V-shaped ridges
+f += a * (sin(noise(p)*7.0)*0.5+0.5); // Sinusoidal ridges (lava)
+```
+
+### Domain Warped FBM
+
+```glsl
+vec2 o = 0.5 + 0.5 * vec2(fbm(q + vec2(1.0)), fbm(q + vec2(6.2)));
+vec2 n = vec2(fbm(4.0 * o + vec2(9.2)), fbm(4.0 * o + vec2(5.7)));
+float f = 0.5 + 0.5 * fbm(q + 2.0 * n + 1.0);
+```
+
+### Derivative Erosion FBM
+
+```glsl
+vec2 d = vec2(0.0);
+for (int i = 0; i < N; i++) {
+    vec3 n = noised(p);
+    d += n.yz;
+    a += b * n.x / (1.0 + dot(d, d));
+    b *= 0.5; p = m2 * p * 2.0;
+}
+```
+
+### Fluid Noise
+
+```glsl
+for (float i = 1.0; i < 7.0; i++) {
+    vec2 gr = gradn(i * p * 0.34 + time);
+    gr *= makem2(time * 6.0 - (0.05*p.x+0.03*p.y)*40.0);
+    p += gr * 0.5;
+    rz += (sin(noise(p)*7.0)*0.5+0.5) / z;
+    p = mix(bp, p, 0.77);
+}
+```
+
+### Ocean Wave Octave Function
+
+```glsl
+float sea_octave(vec2 uv, float choppy) {
+    uv += noise(uv);
+    vec2 wv = 1.0 - abs(sin(uv));
+    vec2 swv = abs(cos(uv));
+    wv = mix(wv, swv, wv);
+    return pow(1.0 - pow(wv.x * wv.y, 0.65), choppy);
+}
+// Bidirectional propagation in FBM:
+d  = sea_octave((uv + SEA_TIME) * freq, choppy);
+d += sea_octave((uv - SEA_TIME) * freq, choppy);
+choppy = mix(choppy, 1.0, 0.2);
+```
+
+## Performance & Composition
+
+**Performance optimization:**
+- Reducing octave count is the most direct optimization; use fewer octaves for distant objects: `int oct = 5 - int(log2(1.0 + t * 0.5));`
+- Multi-level LOD: `terrainL` (3 oct) / `terrainM` (9 oct) / `terrainH` (16 oct)
+- Texture sampling instead of math hash: `texture(iChannel0, x * 0.01).x`
+- Manually unroll small loops + slightly vary lacunarity
+- Adaptive step size: `float dt = max(0.05, 0.02 * t);`
+- Directional derivative instead of full gradient (1 sample vs 3)
+- Early termination: `if (sum.a > 0.99) break;`
+
+**Common combinations:**
+- FBM + Raymarching: noise-driven height/density fields, ray marching for intersection (terrain/ocean)
+- FBM + finite-difference normals + lighting: `nor = normalize(vec3(f(p+ex)-f(p), eps, f(p+ey)-f(p)))`
+- FBM + color mapping: different power curves mapping to RGB, e.g. flame `vec3(1.5*c, 1.5*c^3, c^6)` or inverse `vec3(k)/rz`
+- FBM + Fresnel water surface: `fresnel = pow(1.0 - dot(n, -eye), 3.0)`
+- Multi-layer FBM compositing: shape layer (low freq) + ridged layer (mid freq) + color layer (high freq)
+- FBM + volumetric lighting: density difference along light direction approximates illumination
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/procedural-noise.md)
--- a/skills/shader-dev/techniques/ray-marching.md
+++ b/skills/shader-dev/techniques/ray-marching.md
@@ -0,0 +1,467 @@
+# Ray Marching
+
+## Use Cases
+
+- Rendering implicit surfaces (geometry defined by mathematical functions) without triangle meshes
+- Creating fractals, organic forms, liquid metal, and other shapes difficult to express with traditional modeling
+- Implementing volumetric effects: fire, smoke, clouds, glow
+- Rapid prototyping of procedural scenes: building complex scenes by combining SDF primitives with boolean operations
+- Advanced distance-field-based lighting: soft shadows, ambient occlusion, subsurface scattering
+
+## Core Principles
+
+Cast a ray from the camera along each pixel direction, advancing step by step using a **Signed Distance Function (SDF)** (Sphere Tracing). Each step advances by the SDF value at the current point, guaranteeing no surface penetration.
+
+- Ray equation: `P(t) = ro + t * rd`
+- Stepping logic: `t += SDF(P(t))`
+- Hit test: `SDF(P) < epsilon`
+- Normal estimation: `N = normalize(gradient of SDF(P))` (direction of the SDF gradient)
+- Volumetric rendering: advance at fixed step size, accumulating density and color per step (front-to-back compositing)
+
+## Implementation Steps
+
+### Step 1: UV Normalization and Ray Direction
+
+```glsl
+// Concise version
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+vec3 ro = vec3(0.0, 0.0, -3.0);
+vec3 rd = normalize(vec3(uv, 1.0));          // z=1.0 ~ 90 deg FOV
+
+// Precise FOV control
+vec2 xy = fragCoord - iResolution.xy / 2.0;
+float z = iResolution.y / tan(radians(FOV) / 2.0);
+vec3 rd = normalize(vec3(xy, -z));
+```
+
+### Step 2: Camera Matrix (Look-At)
+
+```glsl
+mat3 setCamera(vec3 ro, vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);
+    vec3 cu = normalize(cross(cw, cp));
+    vec3 cv = cross(cu, cw);
+    return mat3(cu, cv, cw);
+}
+
+mat3 ca = setCamera(ro, ta, 0.0);
+vec3 rd = ca * normalize(vec3(uv, FOCAL_LENGTH)); // 1.0~3.0, larger = narrower FOV
+```
+
+### Step 3: Scene SDF
+
+```glsl
+// SDF primitives
+float sdSphere(vec3 p, float r) { return length(p) - r; }
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+
+float sdTorus(vec3 p, vec2 t) {
+    return length(vec2(length(p.xz) - t.x, p.y)) - t.y;
+}
+
+// Boolean operations
+float opUnion(float a, float b)        { return min(a, b); }
+float opSubtraction(float a, float b)  { return max(a, -b); }
+float opIntersection(float a, float b) { return max(a, b); }
+
+// Smooth blending, adjustable k: 0.1~0.5
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+
+// Scene composition
+float map(vec3 p) {
+    float d = sdSphere(p - vec3(0.0, 0.5, 0.0), 0.5);
+    d = opUnion(d, p.y);                                           // ground
+    d = smin(d, sdBox(p - vec3(1.0, 0.3, 0.0), vec3(0.3)), 0.2); // smooth blend with box
+    return d;
+}
+```
+
+### Step 4: Ray Marching Loop
+
+```glsl
+#define MAX_STEPS 128
+#define MAX_DIST 100.0
+#define SURF_DIST 0.001
+
+float rayMarch(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + t * rd;
+        float d = map(p);
+        if (d < SURF_DIST) return t;
+        t += d;
+        if (t > MAX_DIST) break;
+    }
+    return -1.0;
+}
+```
+
+### Step 5: Normal Estimation
+
+```glsl
+// Central differences (6 SDF evaluations)
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.001, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)
+    ));
+}
+
+// Tetrahedral trick (4 SDF evaluations, recommended)
+vec3 calcNormal(vec3 pos) {
+    vec3 n = vec3(0.0);
+    for (int i = 0; i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3((((i+3)>>1)&1), ((i>>1)&1), (i&1)) - 1.0);
+        n += e * map(pos + 0.001 * e);
+    }
+    return normalize(n);
+}
+```
+
+### Step 6: Lighting and Shading
+
+```glsl
+vec3 shade(vec3 p, vec3 rd) {
+    vec3 nor = calcNormal(p);
+    vec3 lightDir = normalize(vec3(0.6, 0.35, 0.5));
+    vec3 halfDir = normalize(lightDir - rd);
+
+    float diff = clamp(dot(nor, lightDir), 0.0, 1.0);
+    float spec = pow(clamp(dot(nor, halfDir), 0.0, 1.0), SHININESS); // 8~64
+    float sky = sqrt(clamp(0.5 + 0.5 * nor.y, 0.0, 1.0));
+
+    vec3 col = vec3(0.2, 0.2, 0.25);
+    vec3 lin = vec3(0.0);
+    lin += diff * vec3(1.3, 1.0, 0.7) * 2.2;
+    lin += sky  * vec3(0.4, 0.6, 1.15) * 0.6;
+    lin += vec3(0.25) * 0.55;
+    col *= lin;
+    col += spec * vec3(1.3, 1.0, 0.7) * 5.0;
+    return col;
+}
+```
+
+### Step 7: Post-Processing
+
+```glsl
+col = pow(col, vec3(0.4545));                 // Gamma correction (1/2.2)
+col = col / (1.0 + col);                      // Reinhard tone mapping (optional, before gamma)
+
+// Vignette (optional)
+vec2 q = fragCoord / iResolution.xy;
+col *= 0.5 + 0.5 * pow(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y), 0.25);
+```
+
+## Full Code Template
+
+Can be pasted directly into ShaderToy. Includes SDF scene, Phong lighting, soft shadows, and ambient occlusion:
+
+```glsl
+// ============================================================
+// Ray Marching Full Template — ShaderToy
+// ============================================================
+
+#define MAX_STEPS 128
+#define MAX_DIST 100.0
+#define SURF_DIST 0.001
+#define SHADOW_STEPS 24
+#define AO_STEPS 5
+#define FOCAL_LENGTH 2.5
+#define SHININESS 16.0
+
+// --- SDF Primitives ---
+float sdSphere(vec3 p, float r) { return length(p) - r; }
+
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+
+float sdTorus(vec3 p, vec2 t) {
+    return length(vec2(length(p.xz) - t.x, p.y)) - t.y;
+}
+
+// --- Boolean Operations ---
+float opUnion(float a, float b) { return min(a, b); }
+float opSubtraction(float a, float b) { return max(a, -b); }
+float opIntersection(float a, float b) { return max(a, b); }
+
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+
+mat2 rot2D(float a) {
+    float c = cos(a), s = sin(a);
+    return mat2(c, -s, s, c);
+}
+
+// --- Scene Definition ---
+float map(vec3 p) {
+    float ground = p.y;
+    vec3 q = p - vec3(0.0, 0.8, 0.0);
+    q.xz *= rot2D(iTime * 0.5);
+    float body = smin(sdSphere(q, 0.5), sdTorus(q, vec2(0.8, 0.15)), 0.3);
+    return opUnion(ground, body);
+}
+
+// --- Normal (Tetrahedral Trick) ---
+vec3 calcNormal(vec3 pos) {
+    vec3 n = vec3(0.0);
+    for (int i = min(iFrame,0); i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3((((i+3)>>1)&1), ((i>>1)&1), (i&1)) - 1.0);
+        n += e * map(pos + 0.001 * e);
+    }
+    return normalize(n);
+}
+
+// --- Soft Shadows ---
+float calcSoftShadow(vec3 ro, vec3 rd, float tmin, float tmax) {
+    float res = 1.0, t = tmin;
+    for (int i = 0; i < SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        float s = clamp(8.0 * h / t, 0.0, 1.0);
+        res = min(res, s);
+        t += clamp(h, 0.01, 0.2);
+        if (res < 0.004 || t > tmax) break;
+    }
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);
+}
+
+// --- Ambient Occlusion ---
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0, sca = 1.0;
+    for (int i = 0; i < AO_STEPS; i++) {
+        float h = 0.01 + 0.12 * float(i) / float(AO_STEPS - 1);
+        float d = map(pos + h * nor);
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+
+// --- Ray March ---
+float rayMarch(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + t * rd;
+        float d = map(p);
+        if (abs(d) < SURF_DIST * (1.0 + t * 0.1)) return t;
+        t += d;
+        if (t > MAX_DIST) break;
+    }
+    return -1.0;
+}
+
+// --- Camera ---
+mat3 setCamera(vec3 ro, vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);
+    vec3 cu = normalize(cross(cw, cp));
+    vec3 cv = cross(cu, cw);
+    return mat3(cu, cv, cw);
+}
+
+// --- Rendering ---
+vec3 render(vec3 ro, vec3 rd) {
+    vec3 col = vec3(0.7, 0.7, 0.9) - max(rd.y, 0.0) * 0.3; // sky
+
+    float t = rayMarch(ro, rd);
+    if (t > 0.0) {
+        vec3 pos = ro + t * rd;
+        vec3 nor = calcNormal(pos);
+
+        // Material
+        vec3 mate = vec3(0.18);
+        if (pos.y < 0.001) {
+            float f = mod(floor(pos.x) + floor(pos.z), 2.0);
+            mate = vec3(0.1 + 0.05 * f);
+        } else {
+            mate = 0.2 + 0.2 * sin(vec3(0.0, 1.0, 2.0));
+        }
+
+        // Lighting
+        vec3 lightDir = normalize(vec3(-0.5, 0.4, -0.6));
+        float occ = calcAO(pos, nor);
+        float dif = clamp(dot(nor, lightDir), 0.0, 1.0);
+        dif *= calcSoftShadow(pos + nor * 0.01, lightDir, 0.02, 2.5);
+        vec3 hal = normalize(lightDir - rd);
+        float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), SHININESS) * dif;
+        float sky = sqrt(clamp(0.5 + 0.5 * nor.y, 0.0, 1.0));
+
+        vec3 lin = vec3(0.0);
+        lin += dif * vec3(1.3, 1.0, 0.7) * 2.2;
+        lin += sky * vec3(0.4, 0.6, 1.15) * 0.6 * occ;
+        lin += vec3(0.25) * 0.55 * occ;
+        col = mate * lin;
+        col += spe * vec3(1.3, 1.0, 0.7) * 5.0;
+
+        col = mix(col, vec3(0.7, 0.7, 0.9), 1.0 - exp(-0.0001 * t * t * t)); // distance fog
+    }
+    return clamp(col, 0.0, 1.0);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    float time = 32.0 + iTime * 1.5;
+    vec2 mo = iMouse.xy / iResolution.xy;
+    vec3 ta = vec3(0.0, 0.5, 0.0);
+    vec3 ro = ta + vec3(4.0*cos(0.1*time+7.0*mo.x), 1.5, 4.0*sin(0.1*time+7.0*mo.x));
+    mat3 ca = setCamera(ro, ta, 0.0);
+
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+    vec3 rd = ca * normalize(vec3(uv, FOCAL_LENGTH));
+
+    vec3 col = render(ro, rd);
+    col = pow(col, vec3(0.4545));
+
+    vec2 q = fragCoord / iResolution.xy;
+    col *= 0.5 + 0.5 * pow(16.0 * q.x * q.y * (1.0 - q.x) * (1.0 - q.y), 0.25);
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### 1. Volumetric Ray Marching
+
+Advance at fixed step size, accumulating density/color per step. Used for fire, smoke, and clouds.
+
+```glsl
+#define VOL_STEPS 150
+#define VOL_STEP_SIZE 0.05
+
+float fbmDensity(vec3 p) {
+    float den = 0.2 - p.y;
+    vec3 q = p - vec3(0.0, 1.0, 0.0) * iTime;
+    float f  = 0.5000 * noise(q); q = q * 2.02 - vec3(0.0, 1.0, 0.0) * iTime;
+          f += 0.2500 * noise(q); q = q * 2.03 - vec3(0.0, 1.0, 0.0) * iTime;
+          f += 0.1250 * noise(q); q = q * 2.01 - vec3(0.0, 1.0, 0.0) * iTime;
+          f += 0.0625 * noise(q);
+    return den + 4.0 * f;
+}
+
+vec3 volumetricMarch(vec3 ro, vec3 rd) {
+    vec4 sum = vec4(0.0);
+    float t = 0.05;
+    for (int i = 0; i < VOL_STEPS; i++) {
+        vec3 pos = ro + t * rd;
+        float den = fbmDensity(pos);
+        if (den > 0.0) {
+            den = min(den, 1.0);
+            vec3 col = mix(vec3(1.0,0.5,0.05), vec3(0.48,0.53,0.5), clamp(pos.y*0.5,0.0,1.0));
+            col *= den; col.a = den * 0.6; col.rgb *= col.a;
+            sum += col * (1.0 - sum.a);
+            if (sum.a > 0.99) break;
+        }
+        t += VOL_STEP_SIZE;
+    }
+    return clamp(sum.rgb, 0.0, 1.0);
+}
+```
+
+### 2. CSG Scene Construction
+
+```glsl
+float sceneSDF(vec3 p) {
+    p = rotateY(iTime * 0.5) * p;
+    float sphere = sdSphere(p, 1.2);
+    float cube = sdBox(p, vec3(0.9));
+    float cyl = sdCylinder(p, vec2(0.4, 2.0));
+    float cylX = sdCylinder(p.yzx, vec2(0.4, 2.0));
+    float cylZ = sdCylinder(p.xzy, vec2(0.4, 2.0));
+    return opSubtraction(opIntersection(sphere, cube), opUnion(cyl, opUnion(cylX, cylZ)));
+}
+```
+
+### 3. Physically-Based Volumetric Scattering
+
+```glsl
+void getParticipatingMedia(out float sigmaS, out float sigmaE, vec3 pos) {
+    float heightFog = 0.3 * clamp((7.0 - pos.y), 0.0, 1.0);
+    sigmaS = 0.02 + heightFog;
+    sigmaE = max(0.000001, sigmaS);
+}
+
+vec3 S = lightColor * sigmaS * phaseFunction() * volShadow;
+vec3 Sint = (S - S * exp(-sigmaE * stepLen)) / sigmaE;
+scatteredLight += transmittance * Sint;
+transmittance *= exp(-sigmaE * stepLen);
+```
+
+### 4. Glow Accumulation
+
+```glsl
+vec2 rayMarchWithGlow(vec3 ro, vec3 rd) {
+    float t = 0.0, dMin = MAX_DIST;
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 p = ro + t * rd;
+        float d = map(p);
+        if (d < dMin) dMin = d;
+        if (d < SURF_DIST) break;
+        t += d;
+        if (t > MAX_DIST) break;
+    }
+    return vec2(t, dMin);
+}
+
+float glow = 0.02 / max(dMin, 0.001);
+col += glow * vec3(1.0, 0.8, 0.9);
+```
+
+### 5. Refraction and Bidirectional Marching
+
+```glsl
+float castRay(vec3 ro, vec3 rd) {
+    float sign = (map(ro) < 0.0) ? -1.0 : 1.0;
+    float t = 0.0;
+    for (int i = 0; i < 120; i++) {
+        float h = sign * map(ro + rd * t);
+        if (abs(h) < 0.0001 || t > 12.0) break;
+        t += h;
+    }
+    return t;
+}
+
+vec3 refDir = refract(rd, nor, IOR);    // IOR: index of refraction, e.g. 0.9
+float t2 = 2.0;
+for (int i = 0; i < 50; i++) {
+    float h = map(hitPos + refDir * t2);
+    t2 -= h;
+    if (abs(h) > 3.0) break;
+}
+vec3 nor2 = calcNormal(hitPos + refDir * t2);
+```
+
+## Performance & Composition
+
+**Performance tips:**
+- Use tetrahedral trick for normals (4 SDF evaluations instead of 6)
+- `min(iFrame,0)` as loop start value to prevent compiler unrolling
+- AABB bounding box pre-test to skip empty regions
+- Adaptive hit threshold: `SURF_DIST * (1.0 + t * 0.1)`
+- Step clamping: `t += clamp(h, 0.01, 0.2)`
+- Early exit for volumetric rendering when `sum.a > 0.99`
+- Use cheap bounding SDF first, then compute precise SDF
+
+**Composition directions:**
+- + FBM noise: terrain/rock texture, cloud/smoke volumetric density fields
+- + Domain transforms (twist/bend/repeat): infinite repeating corridors, surreal geometry
+- + PBR materials (Cook-Torrance BRDF + Fresnel + environment mapping)
+- + Multi-pass post-processing: depth of field, motion blur, tone mapping
+- + Procedural animation: time-driven SDF parameters + smoothstep easing
+
+## Further Reading
+
+Full step-by-step tutorials, mathematical derivations, and advanced usage in [reference](../reference/ray-marching.md)
--- a/skills/shader-dev/techniques/sdf-2d.md
+++ b/skills/shader-dev/techniques/sdf-2d.md
@@ -0,0 +1,631 @@
+# 2D SDF Rendering Skill
+
+## Use Cases
+
+- 2D shape rendering: circles, rectangles, triangles, ellipses, line segments, Bezier curves, etc.
+- UI elements and icons: drawn with math functions, naturally resolution-independent
+- Anti-aliased graphics, shape boolean operations, outlines and glow
+- Motion graphics and animation, 2D soft shadows and lighting
+
+## Core Principles
+
+For each pixel, compute the signed distance `d` to the shape boundary: `d < 0` inside, `d = 0` boundary, `d > 0` outside.
+
+Map to color via `smoothstep`/`clamp`:
+- **Fill**: color when `d < 0`
+- **Anti-aliasing**: `smoothstep(-aa, aa, d)`
+- **Stroke**: apply smoothstep to `abs(d) - strokeWidth`
+- **Boolean operations**: `min(d1, d2)` union, `max(d1, d2)` intersection, `max(-d1, d2)` subtraction
+
+Key formulas:
+```
+Circle:       d = length(p - center) - radius
+Rectangle:    d = length(max(abs(p) - halfSize, 0.0)) + min(max(abs(p).x - halfSize.x, abs(p).y - halfSize.y), 0.0)
+Line segment: d = length(p - a - clamp(dot(p-a, b-a)/dot(b-a, b-a), 0, 1) * (b-a)) - width/2
+Smooth union: d = mix(d2, d1, h) - k*h*(1-h),  h = clamp(0.5 + 0.5*(d2-d1)/k, 0, 1)
+```
+
+## Implementation Steps
+
+### Step 1: Coordinate Normalization
+
+```glsl
+// Origin at center, y range [-1, 1] (standard approach)
+vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+// Pixel space (suitable for fixed pixel-size UI)
+vec2 p = fragCoord.xy;
+vec2 center = iResolution.xy * 0.5;
+
+// [0, 1] range (requires manual aspect ratio handling)
+vec2 uv = fragCoord.xy / iResolution.xy;
+```
+
+### Step 2: SDF Primitive Functions
+
+```glsl
+float sdCircle(vec2 p, float radius) {
+    return length(p) - radius;
+}
+
+// halfSize is half-width/half-height, radius is corner rounding
+float sdBox(vec2 p, vec2 halfSize, float radius) {
+    halfSize -= vec2(radius);
+    vec2 d = abs(p) - halfSize;
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0)) - radius;
+}
+
+float sdLine(vec2 p, vec2 start, vec2 end, float width) {
+    vec2 dir = end - start;
+    float h = clamp(dot(p - start, dir) / dot(dir, dir), 0.0, 1.0);
+    return length(p - start - dir * h) - width * 0.5;
+}
+
+// Exact signed distance, requires only one sqrt
+float sdTriangle(vec2 p, vec2 p0, vec2 p1, vec2 p2) {
+    vec2 e0 = p1 - p0, v0 = p - p0;
+    vec2 e1 = p2 - p1, v1 = p - p1;
+    vec2 e2 = p0 - p2, v2 = p - p2;
+    float d0 = dot(v0 - e0 * clamp(dot(v0, e0) / dot(e0, e0), 0.0, 1.0),
+                   v0 - e0 * clamp(dot(v0, e0) / dot(e0, e0), 0.0, 1.0));
+    float d1 = dot(v1 - e1 * clamp(dot(v1, e1) / dot(e1, e1), 0.0, 1.0),
+                   v1 - e1 * clamp(dot(v1, e1) / dot(e1, e1), 0.0, 1.0));
+    float d2 = dot(v2 - e2 * clamp(dot(v2, e2) / dot(e2, e2), 0.0, 1.0),
+                   v2 - e2 * clamp(dot(v2, e2) / dot(e2, e2), 0.0, 1.0));
+    float o = e0.x * e2.y - e0.y * e2.x;
+    vec2 d = min(min(vec2(d0, o * (v0.x * e0.y - v0.y * e0.x)),
+                     vec2(d1, o * (v1.x * e1.y - v1.y * e1.x))),
+                     vec2(d2, o * (v2.x * e2.y - v2.y * e2.x)));
+    return -sqrt(d.x) * sign(d.y);
+}
+
+// Approximate ellipse SDF
+float sdEllipse(vec2 p, vec2 center, float a, float b) {
+    float a2 = a * a, b2 = b * b;
+    vec2 d = p - center;
+    return (b2 * d.x * d.x + a2 * d.y * d.y - a2 * b2) / (a2 * b2);
+}
+```
+
+### Step 3: CSG Boolean Operations
+
+```glsl
+float opUnion(float d1, float d2) { return min(d1, d2); }
+float opIntersect(float d1, float d2) { return max(d1, d2); }
+float opSubtract(float d1, float d2) { return max(-d1, d2); }
+float opXor(float d1, float d2) { return min(max(-d1, d2), max(-d2, d1)); }
+
+// k controls transition width
+float opSmoothUnion(float d1, float d2, float k) {
+    float h = clamp(0.5 + 0.5 * (d2 - d1) / k, 0.0, 1.0);
+    return mix(d2, d1, h) - k * h * (1.0 - h);
+}
+```
+
+### Step 4: Coordinate Transforms
+
+```glsl
+vec2 translate(vec2 p, vec2 t) { return p - t; }
+
+vec2 rotateCCW(vec2 p, float angle) {
+    mat2 m = mat2(cos(angle), sin(angle), -sin(angle), cos(angle));
+    return p * m;
+}
+
+// Usage: translate first, then rotate
+float d = sdBox(rotateCCW(translate(p, vec2(0.5, 0.3)), iTime), vec2(0.2), 0.05);
+```
+
+### Step 5: Rendering and Anti-Aliasing
+
+```glsl
+// smoothstep anti-aliasing (recommended)
+float px = 2.0 / iResolution.y;
+float mask = smoothstep(px, -px, d);  // 1.0 inside, 0.0 outside
+vec3 col = mix(backgroundColor, shapeColor, mask);
+
+// fwidth adaptive anti-aliasing (suitable for scaled scenes)
+float anti = fwidth(d) * 1.0;
+float mask = 1.0 - smoothstep(-anti, anti, d);
+
+// Classic distance field debug visualization
+vec3 col = (d > 0.0) ? vec3(0.9, 0.6, 0.3) : vec3(0.65, 0.85, 1.0);
+col *= 1.0 - exp(-12.0 * abs(d));
+col *= 0.8 + 0.2 * cos(120.0 * d);
+col = mix(col, vec3(1.0), smoothstep(1.5*px, 0.0, abs(d) - 0.002));
+```
+
+### Step 6: Stroke and Border
+
+```glsl
+// Fill + stroke rendering (fwidth adaptive)
+vec4 renderShape(float d, vec3 color, float stroke) {
+    float anti = fwidth(d) * 1.0;
+    vec4 strokeLayer = vec4(vec3(0.05), 1.0 - smoothstep(-anti, anti, d - stroke));
+    vec4 colorLayer  = vec4(color,      1.0 - smoothstep(-anti, anti, d));
+    if (stroke < 0.0001) return colorLayer;
+    return vec4(mix(strokeLayer.rgb, colorLayer.rgb, colorLayer.a), strokeLayer.a);
+}
+
+float fillMask(float d) { return clamp(-d, 0.0, 1.0); }
+float innerBorderMask(float d, float width) {
+    return clamp(d + width, 0.0, 1.0) - clamp(d, 0.0, 1.0);
+}
+float outerBorderMask(float d, float width) {
+    return clamp(d, 0.0, 1.0) - clamp(d - width, 0.0, 1.0);
+}
+```
+
+### Step 7: Multi-Layer Compositing
+
+```glsl
+vec3 bgColor = vec3(1.0, 0.8, 0.7 - 0.07 * p.y) * (1.0 - 0.25 * length(p));
+
+float d1 = sdCircle(translate(p, pos1), 0.3);
+vec4 layer1 = renderShape(d1, vec3(0.9, 0.3, 0.2), 0.02);
+
+float d2 = sdBox(translate(p, pos2), vec2(0.2), 0.05);
+vec4 layer2 = renderShape(d2, vec3(0.2, 0.5, 0.8), 0.0);
+
+// Composite back to front
+vec3 col = bgColor;
+col = mix(col, layer1.rgb, layer1.a);
+col = mix(col, layer2.rgb, layer2.a);
+fragColor = vec4(col, 1.0);
+```
+
+## Full Code Template
+
+```glsl
+// ===== 2D SDF Full Template (runs directly in ShaderToy) =====
+
+#define AA_WIDTH 1.0           // Anti-aliasing width factor
+#define STROKE_WIDTH 0.015     // Stroke width
+#define SMOOTH_K 0.05          // Smooth union transition width
+#define CONTOUR_FREQ 80.0      // Contour line frequency (for debugging)
+#define ANIM_SPEED 1.0         // Animation speed multiplier
+
+// --- SDF Primitives ---
+float sdCircle(vec2 p, float r) { return length(p) - r; }
+
+float sdBox(vec2 p, vec2 b, float r) {
+    b -= vec2(r);
+    vec2 d = abs(p) - b;
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0)) - r;
+}
+
+float sdLine(vec2 p, vec2 a, vec2 b, float w) {
+    vec2 d = b - a;
+    float h = clamp(dot(p - a, d) / dot(d, d), 0.0, 1.0);
+    return length(p - a - d * h) - w * 0.5;
+}
+
+float sdTriangle(vec2 p, vec2 p0, vec2 p1, vec2 p2) {
+    vec2 e0 = p1 - p0, v0 = p - p0;
+    vec2 e1 = p2 - p1, v1 = p - p1;
+    vec2 e2 = p0 - p2, v2 = p - p2;
+    float d0 = dot(v0 - e0 * clamp(dot(v0,e0)/dot(e0,e0),0.0,1.0),
+                   v0 - e0 * clamp(dot(v0,e0)/dot(e0,e0),0.0,1.0));
+    float d1 = dot(v1 - e1 * clamp(dot(v1,e1)/dot(e1,e1),0.0,1.0),
+                   v1 - e1 * clamp(dot(v1,e1)/dot(e1,e1),0.0,1.0));
+    float d2 = dot(v2 - e2 * clamp(dot(v2,e2)/dot(e2,e2),0.0,1.0),
+                   v2 - e2 * clamp(dot(v2,e2)/dot(e2,e2),0.0,1.0));
+    float o = e0.x*e2.y - e0.y*e2.x;
+    vec2 dd = min(min(vec2(d0, o*(v0.x*e0.y-v0.y*e0.x)),
+                      vec2(d1, o*(v1.x*e1.y-v1.y*e1.x))),
+                      vec2(d2, o*(v2.x*e2.y-v2.y*e2.x)));
+    return -sqrt(dd.x) * sign(dd.y);
+}
+
+// --- CSG ---
+float opUnion(float a, float b) { return min(a, b); }
+float opSubtract(float a, float b) { return max(-a, b); }
+float opIntersect(float a, float b) { return max(a, b); }
+float opSmoothUnion(float a, float b, float k) {
+    float h = clamp(0.5 + 0.5*(b - a)/k, 0.0, 1.0);
+    return mix(b, a, h) - k*h*(1.0-h);
+}
+float opXor(float a, float b) { return min(max(-a, b), max(-b, a)); }
+
+// --- Coordinate Transforms ---
+vec2 translate(vec2 p, vec2 t) { return p - t; }
+vec2 rotateCCW(vec2 p, float a) {
+    return mat2(cos(a), sin(a), -sin(a), cos(a)) * p;
+}
+
+// --- Rendering Utilities ---
+vec4 render(float d, vec3 color, float stroke) {
+    float anti = fwidth(d) * AA_WIDTH;
+    vec4 strokeLayer = vec4(vec3(0.05), 1.0 - smoothstep(-anti, anti, d - stroke));
+    vec4 colorLayer  = vec4(color,      1.0 - smoothstep(-anti, anti, d));
+    if (stroke < 0.0001) return colorLayer;
+    return vec4(mix(strokeLayer.rgb, colorLayer.rgb, colorLayer.a), strokeLayer.a);
+}
+
+float fillAA(float d, float px) { return smoothstep(px, -px, d); }
+
+// --- Scene ---
+float sceneDist(vec2 p) {
+    float t = iTime * ANIM_SPEED;
+    float c = sdCircle(translate(p, vec2(-0.6, 0.3)), 0.25);
+    float b = sdBox(translate(p, vec2(0.0, 0.3)), vec2(0.25, 0.18), 0.05);
+    vec2 tp = rotateCCW(translate(p, vec2(0.6, 0.3)), t * 0.5);
+    float tr = sdTriangle(tp, vec2(0.0, 0.25), vec2(-0.22, -0.12), vec2(0.22, -0.12));
+    float row1 = opUnion(c, opUnion(b, tr));
+
+    float c2 = sdCircle(translate(p, vec2(-0.5, -0.35)), 0.2);
+    float b2 = sdBox(translate(p, vec2(-0.3, -0.35)), vec2(0.15, 0.15), 0.0);
+    float smooth_demo = opSmoothUnion(c2, b2, SMOOTH_K);
+
+    float c3 = sdCircle(translate(p, vec2(0.15, -0.35)), 0.22);
+    float b3 = sdBox(translate(p, vec2(0.15, -0.35 + sin(t) * 0.15)), vec2(0.3, 0.08), 0.0);
+    float sub_demo = opSubtract(b3, c3);
+
+    float c4 = sdCircle(translate(p, vec2(0.65, -0.35)), 0.2);
+    float b4 = sdBox(translate(p, vec2(0.65, -0.35 + sin(t + 1.0) * 0.15)), vec2(0.3, 0.08), 0.0);
+    float xor_demo = opXor(b4, c4);
+
+    float row2 = opUnion(smooth_demo, opUnion(sub_demo, xor_demo));
+    return opUnion(row1, row2);
+}
+
+// --- Main Function ---
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+    float px = 2.0 / iResolution.y;
+    float d = sceneDist(p);
+
+    vec3 bgCol = vec3(0.15, 0.15, 0.18) + 0.05 * p.y;
+    bgCol *= 1.0 - 0.3 * length(p);
+
+    vec3 col = (d > 0.0) ? vec3(0.9, 0.6, 0.3) : vec3(0.4, 0.7, 1.0);
+    col *= 1.0 - exp(-10.0 * abs(d));
+    col *= 0.8 + 0.2 * cos(CONTOUR_FREQ * d);
+    col = mix(col, vec3(1.0), smoothstep(1.5 * px, 0.0, abs(d) - 0.002));
+    col = mix(bgCol, col, 0.85);
+
+    // Uncomment to switch to solid rendering mode:
+    // vec3 shapeCol = vec3(0.2, 0.8, 0.6);
+    // float mask = fillAA(d, px);
+    // col = mix(bgCol, shapeCol, mask);
+
+    col = pow(col, vec3(1.0 / 2.2));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Solid Fill + Stroke Mode
+
+```glsl
+vec3 shapeColor = vec3(0.32, 0.56, 0.53);
+float strokeW = 0.015;
+vec4 shape = render(d, shapeColor, strokeW);
+vec3 col = bgCol;
+col = mix(col, shape.rgb, shape.a);
+```
+
+### Variant 2: Multi-Layer CSG Illustration
+
+```glsl
+float a = sdEllipse(p, vec2(0.0, 0.16), 0.25, 0.25);
+float b = sdEllipse(p, vec2(0.0, -0.03), 0.8, 0.35);
+float body = opIntersect(a, b);
+vec4 layer1 = render(body, vec3(0.32, 0.56, 0.53), fwidth(body) * 2.0);
+
+float handle = sdLine(p, vec2(0.0, 0.05), vec2(0.0, -0.42), 0.01);
+float arc = sdCircle(translate(p, vec2(-0.04, -0.42)), 0.04);
+float arcInner = sdCircle(translate(p, vec2(-0.04, -0.42)), 0.03);
+handle = opUnion(handle, opSubtract(arcInner, arc));
+vec4 layer0 = render(handle, vec3(0.4, 0.3, 0.28), STROKE_WIDTH);
+
+vec3 col = bgCol;
+col = mix(col, layer0.rgb, layer0.a);
+col = mix(col, layer1.rgb, layer1.a);
+```
+
+### Variant 3: Hexagonal Grid Tiling
+
+```glsl
+vec4 hexagon(vec2 p) {
+    vec2 q = vec2(p.x * 2.0 * 0.5773503, p.y + p.x * 0.5773503);
+    vec2 pi = floor(q);
+    vec2 pf = fract(q);
+    float v = mod(pi.x + pi.y, 3.0);
+    float ca = step(1.0, v);
+    float cb = step(2.0, v);
+    vec2 ma = step(pf.xy, pf.yx);
+    float e = dot(ma, 1.0 - pf.yx + ca*(pf.x+pf.y-1.0) + cb*(pf.yx-2.0*pf.xy));
+    p = vec2(q.x + floor(0.5 + p.y / 1.5), 4.0 * p.y / 3.0) * 0.5 + 0.5;
+    float f = length((fract(p) - 0.5) * vec2(1.0, 0.85));
+    return vec4(pi + ca - cb * ma, e, f);
+}
+
+#define HEX_SCALE 8.0
+vec4 h = hexagon(HEX_SCALE * p + 0.5 * iTime);
+vec3 col = 0.15 + 0.15 * hash1(h.xy + 1.2);
+col *= smoothstep(0.10, 0.11, h.z);
+col *= smoothstep(0.10, 0.11, h.w);
+```
+
+### Variant 4: Organic Shapes (Polar SDF)
+
+```glsl
+// Heart SDF
+p.y -= 0.25;
+float a = atan(p.x, p.y) / 3.141593;
+float r = length(p);
+float h = abs(a);
+float d = (13.0*h - 22.0*h*h + 10.0*h*h*h) / (6.0 - 5.0*h);
+
+// Pulse animation
+float tt = mod(iTime, 1.5) / 1.5;
+float ss = pow(tt, 0.2) * 0.5 + 0.5;
+ss = 1.0 + ss * 0.5 * sin(tt * 6.2831 * 3.0) * exp(-tt * 4.0);
+vec3 col = mix(bgCol, heartCol, smoothstep(-0.01, 0.01, d - r));
+```
+
+### Variant 5: Bezier Curve SDF
+
+```glsl
+vec3 solveCubic(float a, float b, float c) {
+    float p = b - a*a/3.0, p3 = p*p*p;
+    float q = a*(2.0*a*a - 9.0*b)/27.0 + c;
+    float d = q*q + 4.0*p3/27.0;
+    float offset = -a/3.0;
+    if (d >= 0.0) {
+        float z = sqrt(d);
+        vec2 x = (vec2(z,-z) - q) / 2.0;
+        vec2 uv = sign(x) * pow(abs(x), vec2(1.0/3.0));
+        return vec3(offset + uv.x + uv.y);
+    }
+    float v = acos(-sqrt(-27.0/p3)*q/2.0) / 3.0;
+    float m = cos(v), n = sin(v) * 1.732050808;
+    return vec3(m+m, -n-m, n-m) * sqrt(-p/3.0) + offset;
+}
+
+float sdBezier(vec2 A, vec2 B, vec2 C, vec2 p) {
+    B = mix(B + vec2(1e-4), B, step(1e-6, abs(B*2.0-A-C)));
+    vec2 a = B-A, b = A-B*2.0+C, c = a*2.0, d = A-p;
+    vec3 k = vec3(3.*dot(a,b), 2.*dot(a,a)+dot(d,b), dot(d,a)) / dot(b,b);
+    vec3 t = clamp(solveCubic(k.x, k.y, k.z), 0.0, 1.0);
+    vec2 pos = A+(c+b*t.x)*t.x; float dis = length(pos-p);
+    pos = A+(c+b*t.y)*t.y; dis = min(dis, length(pos-p));
+    pos = A+(c+b*t.z)*t.z; dis = min(dis, length(pos-p));
+    return dis * signBezier(A, B, C, p);
+}
+```
+
+## Extended 2D SDF Library
+
+```glsl
+// === Extended 2D SDF Library ===
+
+// Rounded Box with independent corner radii (vec4 r = top-right, bottom-right, top-left, bottom-left)
+float sdRoundedBox(vec2 p, vec2 b, vec4 r) {
+    r.xy = (p.x > 0.0) ? r.xy : r.zw;
+    r.x  = (p.y > 0.0) ? r.x  : r.y;
+    vec2 q = abs(p) - b + r.x;
+    return min(max(q.x, q.y), 0.0) + length(max(q, 0.0)) - r.x;
+}
+
+// Oriented Box (from point a to point b with thickness th)
+float sdOrientedBox(vec2 p, vec2 a, vec2 b, float th) {
+    float l = length(b - a);
+    vec2 d = (b - a) / l;
+    vec2 q = (p - (a + b) * 0.5);
+    q = mat2(d.x, -d.y, d.y, d.x) * q;
+    q = abs(q) - vec2(l, th) * 0.5;
+    return length(max(q, 0.0)) + min(max(q.x, q.y), 0.0);
+}
+
+// Arc (sc = vec2(sin,cos) of aperture angle, ra = radius, rb = thickness)
+float sdArc(vec2 p, vec2 sc, float ra, float rb) {
+    p.x = abs(p.x);
+    return ((sc.y * p.x > sc.x * p.y) ? length(p - sc * ra) : abs(length(p) - ra)) - rb;
+}
+
+// Pie / Sector (c = vec2(sin,cos) of aperture angle)
+float sdPie(vec2 p, vec2 c, float r) {
+    p.x = abs(p.x);
+    float l = length(p) - r;
+    float m = length(p - c * clamp(dot(p, c), 0.0, r));
+    return max(l, m * sign(c.y * p.x - c.x * p.y));
+}
+
+// Ring (n = vec2(sin,cos) of aperture, r = radius, th = thickness)
+float sdRing(vec2 p, vec2 n, float r, float th) {
+    p.x = abs(p.x);
+    float d = length(p);
+    // If within aperture angle
+    if (n.y * p.x > n.x * p.y) {
+        return abs(d - r) - th;
+    }
+    // Cap endpoints
+    return min(length(p - n * r), length(p + n * r)) - th;
+}
+
+// Moon shape
+float sdMoon(vec2 p, float d, float ra, float rb) {
+    p.y = abs(p.y);
+    float a = (ra * ra - rb * rb + d * d) / (2.0 * d);
+    float b2 = ra * ra - a * a;
+    if (d * (p.x * rb * rb - p.y * a * rb * rb - a * b2) > 0.0)
+        return length(p - vec2(a, sqrt(max(b2, 0.0))));
+    return max(length(p) - ra, -(length(p - vec2(d, 0.0)) - rb));
+}
+
+// Heart (approximate)
+float sdHeart(vec2 p) {
+    p.x = abs(p.x);
+    if (p.y + p.x > 1.0)
+        return sqrt(dot(p - vec2(0.25, 0.75), p - vec2(0.25, 0.75))) - sqrt(2.0) / 4.0;
+    return sqrt(min(dot(p - vec2(0.0, 1.0), p - vec2(0.0, 1.0)),
+                    dot(p - 0.5 * max(p.x + p.y, 0.0), p - 0.5 * max(p.x + p.y, 0.0)))) *
+           sign(p.x - p.y);
+}
+
+// Vesica (lens shape)
+float sdVesica(vec2 p, float w, float h) {
+    p = abs(p);
+    float b = sqrt(h * h + w * w * 0.25) / w;
+    return ((p.y - h) * b * w > p.x * b * h)
+        ? length(p - vec2(0.0, h))
+        : length(p - vec2(-w * 0.5, 0.0)) - b;
+}
+
+// Egg shape
+float sdEgg(vec2 p, float he, float ra, float rb) {
+    p.x = abs(p.x);
+    float r = (p.y < 0.0) ? ra : rb;
+    return length(vec2(p.x, p.y - clamp(p.y, -he, he))) - r;
+}
+
+// Equilateral Triangle
+float sdEquilateralTriangle(vec2 p, float r) {
+    const float k = sqrt(3.0);
+    p.x = abs(p.x) - r;
+    p.y = p.y + r / k;
+    if (p.x + k * p.y > 0.0) p = vec2(p.x - k * p.y, -k * p.x - p.y) / 2.0;
+    p.x -= clamp(p.x, -2.0 * r, 0.0);
+    return -length(p) * sign(p.y);
+}
+
+// Pentagon
+float sdPentagon(vec2 p, float r) {
+    const vec3 k = vec3(0.809016994, 0.587785252, 0.726542528);
+    p.x = abs(p.x);
+    p -= 2.0 * min(dot(vec2(-k.x, k.y), p), 0.0) * vec2(-k.x, k.y);
+    p -= 2.0 * min(dot(vec2(k.x, k.y), p), 0.0) * vec2(k.x, k.y);
+    p -= vec2(clamp(p.x, -r * k.z, r * k.z), r);
+    return length(p) * sign(p.y);
+}
+
+// Hexagon
+float sdHexagon(vec2 p, float r) {
+    const vec3 k = vec3(-0.866025404, 0.5, 0.577350269);
+    p = abs(p);
+    p -= 2.0 * min(dot(k.xy, p), 0.0) * k.xy;
+    p -= vec2(clamp(p.x, -k.z * r, k.z * r), r);
+    return length(p) * sign(p.y);
+}
+
+// Octagon
+float sdOctagon(vec2 p, float r) {
+    const vec3 k = vec3(-0.9238795325, 0.3826834323, 0.4142135623);
+    p = abs(p);
+    p -= 2.0 * min(dot(vec2(k.x, k.y), p), 0.0) * vec2(k.x, k.y);
+    p -= 2.0 * min(dot(vec2(-k.x, k.y), p), 0.0) * vec2(-k.x, k.y);
+    p -= vec2(clamp(p.x, -k.z * r, k.z * r), r);
+    return length(p) * sign(p.y);
+}
+
+// Star (n-pointed, m = inner radius ratio)
+float sdStar(vec2 p, float r, int n, float m) {
+    float an = 3.141593 / float(n);
+    float en = 3.141593 / m;
+    vec2 acs = vec2(cos(an), sin(an));
+    vec2 ecs = vec2(cos(en), sin(en));
+    float bn = mod(atan(p.x, p.y), 2.0 * an) - an;
+    p = length(p) * vec2(cos(bn), abs(sin(bn)));
+    p -= r * acs;
+    p += ecs * clamp(-dot(p, ecs), 0.0, r * acs.y / ecs.y);
+    return length(p) * sign(p.x);
+}
+
+// Quadratic Bezier curve SDF
+float sdBezier(vec2 pos, vec2 A, vec2 B, vec2 C) {
+    vec2 a = B - A;
+    vec2 b = A - 2.0 * B + C;
+    vec2 c = a * 2.0;
+    vec2 d = A - pos;
+    float kk = 1.0 / dot(b, b);
+    float kx = kk * dot(a, b);
+    float ky = kk * (2.0 * dot(a, a) + dot(d, b)) / 3.0;
+    float kz = kk * dot(d, a);
+    float res = 0.0;
+    float p2 = ky - kx * kx;
+    float q = kx * (2.0 * kx * kx - 3.0 * ky) + kz;
+    float h = q * q + 4.0 * p2 * p2 * p2;
+    if (h >= 0.0) {
+        h = sqrt(h);
+        vec2 x = (vec2(h, -h) - q) / 2.0;
+        vec2 uv2 = sign(x) * pow(abs(x), vec2(1.0 / 3.0));
+        float t = clamp(uv2.x + uv2.y - kx, 0.0, 1.0);
+        res = dot(d + (c + b * t) * t, d + (c + b * t) * t);
+    } else {
+        float z = sqrt(-p2);
+        float v = acos(q / (p2 * z * 2.0)) / 3.0;
+        float m2 = cos(v);
+        float n2 = sin(v) * 1.732050808;
+        vec3 t = clamp(vec3(m2 + m2, -n2 - m2, n2 - m2) * z - kx, 0.0, 1.0);
+        res = min(dot(d + (c + b * t.x) * t.x, d + (c + b * t.x) * t.x),
+                  dot(d + (c + b * t.y) * t.y, d + (c + b * t.y) * t.y));
+    }
+    return sqrt(res);
+}
+
+// Parabola
+float sdParabola(vec2 pos, float k) {
+    pos.x = abs(pos.x);
+    float ik = 1.0 / k;
+    float p2 = ik * (pos.y - 0.5 * ik) / 3.0;
+    float q = 0.25 * ik * ik * pos.x;
+    float h = q * q - p2 * p2 * p2;
+    float r = sqrt(abs(h));
+    float x = (h > 0.0) ?
+        pow(q + r, 1.0 / 3.0) + pow(abs(q - r), 1.0 / 3.0) * sign(p2) :
+        2.0 * cos(atan(r, q) / 3.0) * sqrt(p2);
+    return length(pos - vec2(x, k * x * x)) * sign(pos.x - x);
+}
+
+// Cross shape
+float sdCross(vec2 p, vec2 b, float r) {
+    p = abs(p); p = (p.y > p.x) ? p.yx : p.xy;
+    vec2 q = p - b;
+    float k = max(q.y, q.x);
+    vec2 w = (k > 0.0) ? q : vec2(b.y - p.x, -k);
+    return sign(k) * length(max(w, 0.0)) + r;
+}
+```
+
+## 2D SDF Modifiers
+
+```glsl
+// === 2D SDF Modifiers ===
+
+// Round any 2D SDF
+float opRound2D(float d, float r) { return d - r; }
+
+// Create annular (ring) version of any 2D SDF
+float opAnnular2D(float d, float r) { return abs(d) - r; }
+
+// Repeat a 2D SDF in a grid
+vec2 opRepeat2D(vec2 p, float s) { return mod(p + s * 0.5, s) - s * 0.5; }
+
+// Mirror across arbitrary 2D direction
+vec2 opMirror2D(vec2 p, vec2 dir) {
+    return p - 2.0 * dir * max(dot(p, dir), 0.0);
+}
+```
+
+## Performance & Composition Tips
+
+**Performance:**
+- In polygon SDFs, compare squared distances first; use a single `sqrt` at the end
+- For simple scenes, use fixed `px = 2.0/iResolution.y` instead of `fwidth(d)`; use `fwidth` when coordinate scaling is involved
+- For many primitives, spatially partition and skip distant ones early
+- Supersampling (2x2/3x3) only for offline rendering; for real-time, single-pixel AA with `smoothstep`/`fwidth` is sufficient
+- For 2D soft shadow marching, use adaptive step size `dt += max(1.0, abs(sd))`
+
+**Composition:**
+- **SDF + Noise**: `d += noise(p * 10.0 + iTime) * 0.05` to create organic edges
+- **SDF + 2D Lighting**: cone marching for soft shadows, query occlusion via `sceneDist()`
+- **SDF + Normal Mapping**: finite differences for normals + Blinn-Phong lighting to simulate bump effects
+- **SDF + Domain Repetition**: `fract`/`mod` for infinite repetition, `floor` for cell ID
+- **SDF + Animation**: parameters driven by `sin/cos` periodic motion, `exp` decay, `mod` looping
+
+## Further Reading
+
+Full step-by-step tutorials, mathematical derivations, and advanced usage in [reference](../reference/sdf-2d.md)
--- a/skills/shader-dev/techniques/sdf-3d.md
+++ b/skills/shader-dev/techniques/sdf-3d.md
@@ -0,0 +1,589 @@
+# 3D Signed Distance Fields (3D SDF) Skill
+
+## Use Cases
+
+- Real-time rendering of 3D geometry in ShaderToy / fragment shaders (no traditional meshes needed)
+- Complex scenes composed from basic primitives (sphere, box, cylinder, torus, etc.)
+- Smooth organic blending (character modeling, fluid blobs, biological forms)
+- Infinitely repeating architectural/pattern structures (corridors, gear arrays, grids)
+- Precise boolean operations (drilling holes, cutting, intersection) for sculpting geometry
+
+## Core Principles
+
+An SDF returns the **signed distance** from any point in space to the nearest surface: positive = outside, negative = inside, zero = surface.
+
+**Sphere Tracing**: advance along a ray, stepping by the current SDF value (the safe marching distance) at each step. The SDF guarantees no surface exists within that radius. A hit is registered when the distance falls below epsilon.
+
+Key math:
+- Sphere: `f(p) = |p| - r`
+- Box: `f(p) = |max(|p|-b, 0)| + min(max(|p-b|), 0)`
+- Union: `min(d1, d2)` / Subtraction: `max(d1, -d2)`
+- Smooth union: `min(d1,d2) - h^2/4k`, `h = max(k-|d1-d2|, 0)`
+- Normal = SDF gradient: `n = normalize(gradient of f(p))` (finite difference approximation)
+
+## Rendering Pipeline Overview
+
+1. **SDF Primitive Library** -- `sdSphere`, `sdBox`, `sdEllipsoid`, `sdTorus`, `sdCapsule`, `sdCylinder`
+2. **Boolean Operations** -- `opUnion`/`opSubtraction`/`opIntersection` + smooth variants `smin`/`smax`
+3. **Scene Definition** -- `map(p)` returns `vec2(distance, materialID)`, combining all primitives
+4. **Ray Marching** -- `raycast(ro, rd)` sphere tracing loop (128 steps, adaptive threshold `SURF_DIST * t`)
+5. **Normal Calculation** -- tetrahedral differencing (4 map calls, ZERO macro to prevent inlining)
+6. **Soft Shadows** -- quadratic stepping with `k*h/t` to estimate occlusion softness, Hermite smoothing
+7. **Ambient Occlusion** -- 5-layer sampling along the normal, comparing SDF values with expected distances
+8. **Camera + Rendering** -- look-at matrix, multiple lights (sun + sky + SSS), gamma correction, fog
+
+## Full Code Template
+
+Runs directly in ShaderToy. Includes multi-primitive scene, smooth blending, soft shadows, AO, and material system.
+
+**IMPORTANT:** When using the `vec2(distance, materialID)` material system, `smin` needs to handle `vec2` types. The template includes a `vec2 smin(vec2 a, vec2 b, float k)` overload that ensures the material ID is correctly passed through during smooth blending (taking the material of the closer distance).
+
+```glsl
+// 3D SDF Full Rendering Pipeline Template - Runs in ShaderToy
+#define AA 1                // Anti-aliasing (1=off, 2=4xAA, 3=9xAA)
+#define MAX_STEPS 128
+#define MAX_DIST 40.0
+#define SURF_DIST 0.0001
+#define SHADOW_STEPS 24
+#define SHADOW_SOFTNESS 8.0
+#define SMOOTH_K 0.3
+#define ZERO (min(iFrame, 0))
+
+// === SDF Primitives ===
+float sdSphere(vec3 p, float r) { return length(p) - r; }
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+float sdEllipsoid(vec3 p, vec3 r) {
+    float k0 = length(p / r); float k1 = length(p / (r * r));
+    return k0 * (k0 - 1.0) / k1;
+}
+float sdTorus(vec3 p, vec2 t) {
+    return length(vec2(length(p.xz) - t.x, p.y)) - t.y;
+}
+float sdCapsule(vec3 p, vec3 a, vec3 b, float r) {
+    vec3 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return length(pa - ba * h) - r;
+}
+float sdCylinder(vec3 p, vec2 h) {
+    vec2 d = abs(vec2(length(p.xz), p.y)) - h;
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0));
+}
+
+// === Extended SDF Primitives ===
+float sdRoundBox(vec3 p, vec3 b, float r) {
+    vec3 q = abs(p) - b + r;
+    return length(max(q, 0.0)) + min(max(q.x, max(q.y, q.z)), 0.0) - r;
+}
+
+float sdBoxFrame(vec3 p, vec3 b, float e) {
+    p = abs(p) - b;
+    vec3 q = abs(p + e) - e;
+    return min(min(
+        length(max(vec3(p.x, q.y, q.z), 0.0)) + min(max(p.x, max(q.y, q.z)), 0.0),
+        length(max(vec3(q.x, p.y, q.z), 0.0)) + min(max(q.x, max(p.y, q.z)), 0.0)),
+        length(max(vec3(q.x, q.y, p.z), 0.0)) + min(max(q.x, max(q.y, p.z)), 0.0));
+}
+
+float sdCone(vec3 p, vec2 c, float h) {
+    vec2 q = h * vec2(c.x / c.y, -1.0);
+    vec2 w = vec2(length(p.xz), p.y);
+    vec2 a = w - q * clamp(dot(w, q) / dot(q, q), 0.0, 1.0);
+    vec2 b = w - q * vec2(clamp(w.x / q.x, 0.0, 1.0), 1.0);
+    float k = sign(q.y);
+    float d = min(dot(a, a), dot(b, b));
+    float s = max(k * (w.x * q.y - w.y * q.x), k * (w.y - q.y));
+    return sqrt(d) * sign(s);
+}
+
+float sdCappedCone(vec3 p, float h, float r1, float r2) {
+    vec2 q = vec2(length(p.xz), p.y);
+    vec2 k1 = vec2(r2, h);
+    vec2 k2 = vec2(r2 - r1, 2.0 * h);
+    vec2 ca = vec2(q.x - min(q.x, (q.y < 0.0) ? r1 : r2), abs(q.y) - h);
+    vec2 cb = q - k1 + k2 * clamp(dot(k1 - q, k2) / dot(k2, k2), 0.0, 1.0);
+    float s = (cb.x < 0.0 && ca.y < 0.0) ? -1.0 : 1.0;
+    return s * sqrt(min(dot(ca, ca), dot(cb, cb)));
+}
+
+float sdRoundCone(vec3 p, float r1, float r2, float h) {
+    float b = (r1 - r2) / h;
+    float a = sqrt(1.0 - b * b);
+    vec2 q = vec2(length(p.xz), p.y);
+    float k = dot(q, vec2(-b, a));
+    if (k < 0.0) return length(q) - r1;
+    if (k > a * h) return length(q - vec2(0.0, h)) - r2;
+    return dot(q, vec2(a, b)) - r1;
+}
+
+float sdSolidAngle(vec3 p, vec2 c, float ra) {
+    vec2 q = vec2(length(p.xz), p.y);
+    float l = length(q) - ra;
+    float m = length(q - c * clamp(dot(q, c), 0.0, ra));
+    return max(l, m * sign(c.y * q.x - c.x * q.y));
+}
+
+float sdOctahedron(vec3 p, float s) {
+    p = abs(p);
+    float m = p.x + p.y + p.z - s;
+    vec3 q;
+    if (3.0 * p.x < m) q = p.xyz;
+    else if (3.0 * p.y < m) q = p.yzx;
+    else if (3.0 * p.z < m) q = p.zxy;
+    else return m * 0.57735027;
+    float k = clamp(0.5 * (q.z - q.y + s), 0.0, s);
+    return length(vec3(q.x, q.y - s + k, q.z - k));
+}
+
+float sdPyramid(vec3 p, float h) {
+    float m2 = h * h + 0.25;
+    p.xz = abs(p.xz);
+    p.xz = (p.z > p.x) ? p.zx : p.xz;
+    p.xz -= 0.5;
+    vec3 q = vec3(p.z, h * p.y - 0.5 * p.x, h * p.x + 0.5 * p.y);
+    float s = max(-q.x, 0.0);
+    float t = clamp((q.y - 0.5 * p.z) / (m2 + 0.25), 0.0, 1.0);
+    float a = m2 * (q.x + s) * (q.x + s) + q.y * q.y;
+    float b = m2 * (q.x + 0.5 * t) * (q.x + 0.5 * t) + (q.y - m2 * t) * (q.y - m2 * t);
+    float d2 = min(q.y, -q.x * m2 - q.y * 0.5) > 0.0 ? 0.0 : min(a, b);
+    return sqrt((d2 + q.z * q.z) / m2) * sign(max(q.z, -p.y));
+}
+
+float sdHexPrism(vec3 p, vec2 h) {
+    const vec3 k = vec3(-0.8660254, 0.5, 0.57735);
+    p = abs(p);
+    p.xy -= 2.0 * min(dot(k.xy, p.xy), 0.0) * k.xy;
+    vec2 d = vec2(length(p.xy - vec2(clamp(p.x, -k.z * h.x, k.z * h.x), h.x)) * sign(p.y - h.x), p.z - h.y);
+    return min(max(d.x, d.y), 0.0) + length(max(d, 0.0));
+}
+
+float sdCutSphere(vec3 p, float r, float h) {
+    float w = sqrt(r * r - h * h);
+    vec2 q = vec2(length(p.xz), p.y);
+    float s = max((h - r) * q.x * q.x + w * w * (h + r - 2.0 * q.y), h * q.x - w * q.y);
+    return (s < 0.0) ? length(q) - r : (q.x < w) ? h - q.y : length(q - vec2(w, h));
+}
+
+float sdCappedTorus(vec3 p, vec2 sc, float ra, float rb) {
+    p.x = abs(p.x);
+    float k = (sc.y * p.x > sc.x * p.y) ? dot(p.xy, sc) : length(p.xy);
+    return sqrt(dot(p, p) + ra * ra - 2.0 * ra * k) - rb;
+}
+
+float sdLink(vec3 p, float le, float r1, float r2) {
+    vec3 q = vec3(p.x, max(abs(p.y) - le, 0.0), p.z);
+    return length(vec2(length(q.xy) - r1, q.z)) - r2;
+}
+
+float sdPlane(vec3 p, vec3 n, float h) {
+    return dot(p, n) + h;
+}
+
+float sdRhombus(vec3 p, float la, float lb, float h, float ra) {
+    p = abs(p);
+    vec2 b = vec2(la, lb);
+    float f = clamp((dot(b, b - 2.0 * p.xz)) / dot(b, b), -1.0, 1.0);
+    vec2 q = vec2(length(p.xz - 0.5 * b * vec2(1.0 - f, 1.0 + f)) * sign(p.x * b.y + p.z * b.x - b.x * b.y) - ra, p.y - h);
+    return min(max(q.x, q.y), 0.0) + length(max(q, 0.0));
+}
+
+// Unsigned distance (exact)
+float udTriangle(vec3 p, vec3 a, vec3 b, vec3 c) {
+    vec3 ba = b - a; vec3 pa = p - a;
+    vec3 cb = c - b; vec3 pb = p - b;
+    vec3 ac = a - c; vec3 pc = p - c;
+    vec3 nor = cross(ba, ac);
+    return sqrt(
+        (sign(dot(cross(ba, nor), pa)) +
+         sign(dot(cross(cb, nor), pb)) +
+         sign(dot(cross(ac, nor), pc)) < 2.0)
+        ? min(min(
+            dot(ba * clamp(dot(ba, pa) / dot(ba, ba), 0.0, 1.0) - pa,
+                ba * clamp(dot(ba, pa) / dot(ba, ba), 0.0, 1.0) - pa),
+            dot(cb * clamp(dot(cb, pb) / dot(cb, cb), 0.0, 1.0) - pb,
+                cb * clamp(dot(cb, pb) / dot(cb, cb), 0.0, 1.0) - pb)),
+            dot(ac * clamp(dot(ac, pc) / dot(ac, ac), 0.0, 1.0) - pc,
+                ac * clamp(dot(ac, pc) / dot(ac, ac), 0.0, 1.0) - pc))
+        : dot(nor, pa) * dot(nor, pa) / dot(nor, nor));
+}
+
+// === Boolean Operations ===
+vec2 opU(vec2 d1, vec2 d2) { return (d1.x < d2.x) ? d1 : d2; }
+float smin(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return min(a, b) - h * h * 0.25 / k;
+}
+vec2 smin(vec2 a, vec2 b, float k) {
+    // vec2 smin: x=distance (smooth blend), y=materialID (take material of closer distance)
+    float h = max(k - abs(a.x - b.x), 0.0);
+    float d = min(a.x, b.x) - h * h * 0.25 / k;
+    float m = (a.x < b.x) ? a.y : b.y;
+    return vec2(d, m);
+}
+float smax(float a, float b, float k) {
+    float h = max(k - abs(a - b), 0.0);
+    return max(a, b) + h * h * 0.25 / k;
+}
+
+// === Deformation Operators ===
+
+// Round: soften edges of any SDF
+// Usage: sdRound(sdBox(p, vec3(1.0)), 0.1)
+float opRound(float d, float r) { return d - r; }
+
+// Onion: hollow out any SDF into a shell
+// Usage: opOnion(sdSphere(p, 1.0), 0.1) — sphere shell of thickness 0.1
+float opOnion(float d, float t) { return abs(d) - t; }
+
+// Elongate: stretch a shape along axes
+// Usage: elongate a sphere into a capsule-like shape
+float opElongate(in vec3 p, in vec3 h, in vec3 center, in vec3 size) {
+    // Generic elongation: subtract h from abs(p), clamp to 0
+    vec3 q = abs(p) - h;
+    // Then evaluate original SDF with max(q, 0.0)
+    // Return: sdOriginal(max(q, 0.0)) + min(max(q.x, max(q.y, q.z)), 0.0)
+    return sdBox(max(q, 0.0), size) + min(max(q.x, max(q.y, q.z)), 0.0); // example with box
+}
+
+// Twist: rotate around Y axis based on height
+vec3 opTwist(vec3 p, float k) {
+    float c = cos(k * p.y);
+    float s = sin(k * p.y);
+    mat2 m = mat2(c, -s, s, c);
+    return vec3(m * p.xz, p.y);
+}
+
+// Cheap Bend: bend along X axis based on X position
+vec3 opCheapBend(vec3 p, float k) {
+    float c = cos(k * p.x);
+    float s = sin(k * p.x);
+    mat2 m = mat2(c, -s, s, c);
+    vec2 q = m * p.xy;
+    return vec3(q, p.z);
+}
+
+// Displacement: add procedural detail to surface
+float opDisplace(float d, vec3 p) {
+    float displacement = sin(20.0 * p.x) * sin(20.0 * p.y) * sin(20.0 * p.z);
+    return d + displacement * 0.02;
+}
+
+// === 2D-to-3D Constructors ===
+
+// Revolution: rotate a 2D SDF around the Y axis to create a 3D solid of revolution
+// sdf2d: any 2D SDF function, o: offset from axis
+float opRevolution(vec3 p, float sdf2d_result, float o) {
+    vec2 q = vec2(length(p.xz) - o, p.y);
+    // Example: revolve a 2D circle to make a torus
+    // float d2d = length(q) - 0.3;  // 2D circle as cross-section
+    // return d2d;
+    return sdf2d_result; // pass pre-computed 2D SDF of vec2(length(p.xz)-o, p.y)
+}
+
+// Extrusion: extend a 2D SDF along the Z axis with finite height
+float opExtrusion(vec3 p, float d2d, float h) {
+    vec2 w = vec2(d2d, abs(p.z) - h);
+    return min(max(w.x, w.y), 0.0) + length(max(w, 0.0));
+}
+
+// Usage example: extruded 2D star
+// float d2d = sdStar2D(p.xy, 0.5, 5, 2.0);  // any 2D SDF
+// float d3d = opExtrusion(p, d2d, 0.2);       // extrude 0.2 units
+
+// === Symmetry Operators ===
+
+// Mirror across X axis (most common — bilateral symmetry)
+// Place this at the beginning of map() to model only one half
+vec3 opSymX(vec3 p) { p.x = abs(p.x); return p; }
+
+// Mirror across X and Z (four-fold symmetry)
+vec3 opSymXZ(vec3 p) { p.xz = abs(p.xz); return p; }
+
+// Mirror across arbitrary direction
+vec3 opMirror(vec3 p, vec3 dir) {
+    return p - 2.0 * dir * max(dot(p, dir), 0.0);
+}
+
+// === Scene ===
+vec2 map(vec3 pos) {
+    vec2 res = vec2(pos.y, 0.0);
+    // Animated blob cluster
+    float dBlob = 2.0;
+    for (int i = 0; i < 8; i++) {
+        float fi = float(i);
+        float t = iTime * (fract(fi * 412.531 + 0.513) - 0.5) * 2.0;
+        vec3 offset = sin(t + fi * vec3(52.5126, 64.627, 632.25)) * vec3(2.0, 2.0, 0.8);
+        float radius = mix(0.3, 0.6, fract(fi * 412.531 + 0.5124));
+        dBlob = smin(dBlob, sdSphere(pos + offset, radius), SMOOTH_K);
+    }
+    res = opU(res, vec2(dBlob, 1.0));
+    float dBox = sdBox(pos - vec3(3.0, 0.4, 0.0), vec3(0.3, 0.4, 0.3));
+    res = opU(res, vec2(dBox, 2.0));
+    float dTorus = sdTorus((pos - vec3(-3.0, 0.5, 0.0)).xzy, vec2(0.4, 0.1));
+    res = opU(res, vec2(dTorus, 3.0));
+    // CSG subtraction: sphere minus box
+    float dCSG = sdSphere(pos - vec3(0.0, 0.5, 3.0), 0.5);
+    dCSG = max(dCSG, -sdBox(pos - vec3(0.0, 0.5, 3.0), vec3(0.3)));
+    res = opU(res, vec2(dCSG, 4.0));
+    return res;
+}
+
+// === Normals ===
+vec3 calcNormal(vec3 pos) {
+    vec3 n = vec3(0.0);
+    for (int i = ZERO; i < 4; i++) {
+        vec3 e = 0.5773 * (2.0 * vec3((((i+3)>>1)&1), ((i>>1)&1), (i&1)) - 1.0);
+        n += e * map(pos + 0.0005 * e).x;
+    }
+    return normalize(n);
+}
+
+// === Shadows ===
+float calcSoftshadow(vec3 ro, vec3 rd, float mint, float tmax) {
+    float res = 1.0, t = mint;
+    for (int i = ZERO; i < SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t).x;
+        float s = clamp(SHADOW_SOFTNESS * h / t, 0.0, 1.0);
+        res = min(res, s);
+        t += clamp(h, 0.01, 0.2);
+        if (res < 0.004 || t > tmax) break;
+    }
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);
+}
+
+// === AO ===
+float calcAO(vec3 pos, vec3 nor) {
+    float occ = 0.0, sca = 1.0;
+    for (int i = ZERO; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0;
+        float d = map(pos + h * nor).x;
+        occ += (h - d) * sca;
+        sca *= 0.95;
+        if (occ > 0.35) break;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0) * (0.5 + 0.5 * nor.y);
+}
+
+// === Ray Marching ===
+vec2 raycast(vec3 ro, vec3 rd) {
+    vec2 res = vec2(-1.0);
+    float t = 0.01;
+    for (int i = 0; i < MAX_STEPS && t < MAX_DIST; i++) {
+        vec2 h = map(ro + rd * t);
+        if (abs(h.x) < SURF_DIST * t) { res = vec2(t, h.y); break; }
+        t += h.x;
+    }
+    return res;
+}
+
+// === Camera ===
+mat3 setCamera(vec3 ro, vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);
+    vec3 cu = normalize(cross(cw, cp));
+    vec3 cv = cross(cu, cw);
+    return mat3(cu, cv, cw);
+}
+
+// === Rendering ===
+vec3 render(vec3 ro, vec3 rd) {
+    vec3 col = vec3(0.7, 0.7, 0.9) - max(rd.y, 0.0) * 0.3;
+    vec2 res = raycast(ro, rd);
+    float t = res.x, m = res.y;
+    if (m > -0.5) {
+        vec3 pos = ro + t * rd;
+        vec3 nor = (m < 0.5) ? vec3(0.0, 1.0, 0.0) : calcNormal(pos);
+        vec3 ref = reflect(rd, nor);
+        vec3 mate = 0.2 + 0.2 * sin(m * 2.0 + vec3(0.0, 1.0, 2.0));
+        if (m < 0.5) mate = vec3(0.15);
+        float occ = calcAO(pos, nor);
+        vec3 lin = vec3(0.0);
+        // Key light
+        {
+            vec3 lig = normalize(vec3(-0.5, 0.4, -0.6));
+            vec3 hal = normalize(lig - rd);
+            float dif = clamp(dot(nor, lig), 0.0, 1.0);
+            dif *= calcSoftshadow(pos, lig, 0.02, 2.5);
+            float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), 16.0);
+            spe *= dif * (0.04 + 0.96 * pow(clamp(1.0 - dot(hal, lig), 0.0, 1.0), 5.0));
+            lin += mate * 2.20 * dif * vec3(1.30, 1.00, 0.70);
+            lin += 5.00 * spe * vec3(1.30, 1.00, 0.70);
+        }
+        // Sky light
+        {
+            float dif = sqrt(clamp(0.5 + 0.5 * nor.y, 0.0, 1.0)) * occ;
+            lin += mate * 0.60 * dif * vec3(0.40, 0.60, 1.15);
+        }
+        // Subsurface scattering approximation
+        {
+            float dif = pow(clamp(1.0 + dot(nor, rd), 0.0, 1.0), 2.0) * occ;
+            lin += mate * 0.25 * dif;
+        }
+        col = lin;
+        col = mix(col, vec3(0.7, 0.7, 0.9), 1.0 - exp(-0.0001 * t * t * t));
+    }
+    return clamp(col, 0.0, 1.0);
+}
+
+// === Main Function ===
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 mo = iMouse.xy / iResolution.xy;
+    float time = 32.0 + iTime * 1.5;
+    vec3 ta = vec3(0.0, 0.0, 0.0);
+    vec3 ro = ta + vec3(4.5 * cos(0.1 * time + 7.0 * mo.x), 2.2,
+                        4.5 * sin(0.1 * time + 7.0 * mo.x));
+    mat3 ca = setCamera(ro, ta, 0.0);
+    vec3 tot = vec3(0.0);
+#if AA > 1
+    for (int m = ZERO; m < AA; m++)
+    for (int n = ZERO; n < AA; n++) {
+        vec2 o = vec2(float(m), float(n)) / float(AA) - 0.5;
+        vec2 p = (2.0 * (fragCoord + o) - iResolution.xy) / iResolution.y;
+#else
+        vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+#endif
+        vec3 rd = ca * normalize(vec3(p, 2.5));
+        vec3 col = render(ro, rd);
+        col = pow(col, vec3(0.4545));
+        tot += col;
+#if AA > 1
+    }
+    tot /= float(AA * AA);
+#endif
+    fragColor = vec4(tot, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Dynamic Organic Body (Smooth Blob Animation)
+
+```glsl
+vec2 map(vec3 p) {
+    float d = 2.0;
+    for (int i = 0; i < 16; i++) {
+        float fi = float(i);
+        float t = iTime * (fract(fi * 412.531 + 0.513) - 0.5) * 2.0;
+        d = smin(sdSphere(p + sin(t + fi * vec3(52.5126, 64.627, 632.25)) * vec3(2.0, 2.0, 0.8),
+                          mix(0.5, 1.0, fract(fi * 412.531 + 0.5124))), d, 0.4);
+    }
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 2: Infinite Repeating Corridor (Domain Repetition)
+
+```glsl
+float repeat(float v, float c) { return mod(v, c) - c * 0.5; }
+
+float amod(inout vec2 p, float count) {
+    float an = 6.283185 / count;
+    float a = atan(p.y, p.x) + an * 0.5;
+    float c = floor(a / an);
+    a = mod(a, an) - an * 0.5;
+    p = vec2(cos(a), sin(a)) * length(p);
+    return c;
+}
+
+vec2 map(vec3 p) {
+    p.z = repeat(p.z, 4.0);
+    p.x += 2.0 * sin(p.z * 0.1);
+    float d = -sdBox(p, vec3(2.0, 2.0, 20.0));
+    d = max(d, -sdBox(p, vec3(1.8, 1.8, 1.9)));
+    d = min(d, sdCylinder(p - vec3(1.5, -2.0, 0.0), vec2(0.1, 2.0)));
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 3: Character/Creature Modeling
+
+```glsl
+vec2 sdStick(vec3 p, vec3 a, vec3 b, float r1, float r2) {
+    vec3 pa = p - a, ba = b - a;
+    float h = clamp(dot(pa, ba) / dot(ba, ba), 0.0, 1.0);
+    return vec2(length(pa - ba * h) - mix(r1, r2, h * h * (3.0 - 2.0 * h)), h);
+}
+
+vec2 map(vec3 pos) {
+    float d = sdEllipsoid(pos, vec3(0.25, 0.3, 0.25));         // body
+    d = smin(d, sdEllipsoid(pos - vec3(0.0, 0.35, 0.02),
+             vec3(0.12, 0.15, 0.13)), 0.1);                     // head
+    vec2 arm = sdStick(abs(pos.x) > 0.0 ? vec3(abs(pos.x), pos.yz) : pos,
+                       vec3(0.18, 0.2, -0.05), vec3(0.35, -0.1, -0.15), 0.03, 0.05);
+    d = smin(d, arm.x, 0.04);                                   // arms
+    d = smax(d, -sdEllipsoid(pos - vec3(0.0, 0.3, 0.15),
+             vec3(0.08, 0.03, 0.1)), 0.03);                     // mouth carving
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 4: Symmetry Optimization
+
+```glsl
+vec2 rot45(vec2 v) { return vec2(v.x - v.y, v.y + v.x) * 0.707107; }
+
+vec2 map(vec3 p) {
+    float d = sdSphere(p, 0.12);
+    // Octahedral symmetry: 18-gear evaluations reduced to 4
+    vec3 qx = vec3(rot45(p.zy), p.x);
+    if (abs(qx.x) > abs(qx.y)) qx = qx.zxy;
+    vec3 qy = vec3(rot45(p.xz), p.y);
+    if (abs(qy.x) > abs(qy.y)) qy = qy.zxy;
+    vec3 qz = vec3(rot45(p.yx), p.z);
+    if (abs(qz.x) > abs(qz.y)) qz = qz.zxy;
+    vec3 qa = abs(p);
+    qa = (qa.x > qa.y && qa.x > qa.z) ? p.zxy : (qa.z > qa.y) ? p.yzx : p.xyz;
+    d = min(d, min(min(gear(qa, 0.0), gear(qx, 1.0)), min(gear(qy, 1.0), gear(qz, 1.0))));
+    return vec2(d, 1.0);
+}
+```
+
+### Variant 5: PBR Material Rendering
+
+```glsl
+float D_GGX(float NoH, float roughness) {
+    float a = roughness * roughness; float a2 = a * a;
+    float d = NoH * NoH * (a2 - 1.0) + 1.0;
+    return a2 / (3.14159 * d * d);
+}
+vec3 F_Schlick(float VoH, vec3 f0) {
+    return f0 + (1.0 - f0) * pow(1.0 - VoH, 5.0);
+}
+vec3 pbrLighting(vec3 pos, vec3 nor, vec3 rd, vec3 albedo, float roughness, float metallic) {
+    vec3 lig = normalize(vec3(-0.5, 0.4, -0.6));
+    vec3 hal = normalize(lig - rd);
+    vec3 f0 = mix(vec3(0.04), albedo, metallic);
+    float NoL = max(dot(nor, lig), 0.0);
+    float NoH = max(dot(nor, hal), 0.0);
+    float VoH = max(dot(-rd, hal), 0.0);
+    vec3 spec = D_GGX(NoH, roughness) * F_Schlick(VoH, f0) * 0.25;
+    vec3 diff = albedo * (1.0 - metallic) / 3.14159;
+    float shadow = calcSoftshadow(pos, lig, 0.02, 2.5);
+    return (diff + spec) * NoL * shadow * vec3(1.3, 1.0, 0.7) * 3.0;
+}
+```
+
+## Performance & Composition
+
+### Performance Optimization Tips
+
+- **Bounding volume acceleration**: test ray against AABB first to narrow `tmin/tmax`, avoiding wasted steps in empty regions
+- **Sub-scene bounding**: in `map()`, use a cheap `sdBox` to check proximity before computing the precise SDF
+- **Adaptive step size**: `abs(h.x) < SURF_DIST * t` -- looser tolerance at distance, stricter up close
+- **Prevent compiler inlining**: `#define ZERO (min(iFrame, 0))` + loop prevents `calcNormal` from inlining map 4 times
+- **Exploit symmetry**: fold into the fundamental domain, reducing 18 evaluations to 4
+
+### Common Composition Techniques
+
+- **Noise displacement**: `d += 0.05 * sin(p.x*10.)*sin(p.y*10.)*sin(p.z*10.)` adds organic detail; breaks the Lipschitz condition, so step size should be multiplied by 0.5~0.7
+- **Bump mapping**: perturb only during normal calculation, leaving ray marching unaffected for better performance
+- **Domain transforms**: warp coordinates before entering map (bending, polar coordinate transforms, etc.)
+- **Procedural animation**: bone angles driven by time to position primitives, `smin` ensures smooth joints
+- **Motion blur**: multi-frame temporal sampling averaged
+
+## Further Reading
+
+Full step-by-step tutorials, mathematical derivations, and advanced usage in [reference](../reference/sdf-3d.md)
--- a/skills/shader-dev/techniques/sdf-tricks.md
+++ b/skills/shader-dev/techniques/sdf-tricks.md
@@ -0,0 +1,100 @@
+# SDF Advanced Tricks & Optimization
+
+## Use Cases
+- Optimizing complex SDF scenes for real-time performance
+- Adding fine detail to SDF surfaces without increasing geometric complexity
+- Creating special effects with SDF manipulation (hollowing, layered edges, interior structures)
+- Debugging and visualizing SDF fields
+
+## Core Techniques
+
+### Hollowing (Shell Creation)
+Convert any solid SDF into a thin shell:
+```glsl
+float hollowed = abs(sdf) - thickness;
+// Example: hollow sphere with 0.02 wall thickness
+float d = abs(sdSphere(p, 1.0)) - 0.02;
+```
+
+### Layered Edges (Concentric Contour Lines)
+Create equidistant contour rings from any SDF:
+```glsl
+float spacing = 0.2;
+float thickness = 0.02;
+float layered = abs(mod(d + spacing * 0.5, spacing) - spacing * 0.5) - thickness;
+```
+Useful for: topographic map effects, neon outlines, energy shields, wireframe-like rendering.
+
+### FBM Detail on SDF (Distance-Based LOD)
+Add procedural noise detail only where it's visible — near the camera:
+```glsl
+float map(vec3 p) {
+    float d = sdBasicShape(p);
+    // Only add expensive FBM detail when close to surface
+    if (d < 1.0) {
+        d += 0.02 * fbm(p * 8.0) * smoothstep(1.0, 0.0, d);
+    }
+    return d;
+}
+```
+**Critical**: The `smoothstep` fade prevents the FBM from disrupting the SDF's Lipschitz continuity far from the surface, which would cause ray marching to overshoot.
+
+### SDF Bounding Volumes (Performance Optimization)
+Skip expensive SDF evaluation when the point is far from the object:
+```glsl
+float map(vec3 p) {
+    // Cheap bounding sphere test first
+    float bound = sdSphere(p - objectCenter, boundingRadius);
+    if (bound > 0.1) return bound;  // far away — return bounding distance
+    // Expensive detailed SDF only when close
+    return complexSDF(p);
+}
+```
+For scenes with multiple distant objects, this can provide 5-10x speedup.
+
+### Binary Search Refinement
+After ray marching finds an approximate hit, refine with binary search for sub-pixel precision:
+```glsl
+// After ray march loop finds t where map(ro+rd*t) < epsilon:
+for (int i = 0; i < 6; i++) {
+    float mid = map(ro + rd * t);
+    t += mid * 0.5;  // or use proper bisection:
+    // float dt = step * 0.5^i;
+    // t += (map(ro+rd*t) > 0.0) ? dt : -dt;
+}
+```
+Especially useful for: sharp edge rendering, precise shadow termination, accurate reflection points.
+
+### XOR Boolean Operation
+Create interesting geometric patterns by combining SDFs with XOR:
+```glsl
+float opXor(float d1, float d2) {
+    return max(min(d1, d2), -max(d1, d2));
+}
+// Creates a "difference of unions" — geometry exists where exactly one shape is present
+```
+
+### Interior SDF Structures
+Use the sign of the SDF to create interior geometry:
+```glsl
+float interiorPattern(vec3 p) {
+    float outer = sdSphere(p, 1.0);
+    float inner = sdBox(fract(p * 4.0) - 0.5, vec3(0.1)); // repeating inner pattern
+    return (outer < 0.0) ? max(outer, inner) : outer;      // inner visible only inside
+}
+```
+
+## SDF Debugging Visualization
+
+```glsl
+// Visualize SDF distance as color bands
+vec3 debugSDF(float d) {
+    vec3 col = (d > 0.0) ? vec3(0.9, 0.6, 0.3) : vec3(0.4, 0.7, 0.85);  // outside/inside
+    col *= 1.0 - exp(-6.0 * abs(d));                    // darken near surface
+    col *= 0.8 + 0.2 * cos(150.0 * d);                  // distance bands
+    col = mix(col, vec3(1.0), 1.0 - smoothstep(0.0, 0.01, abs(d)));  // white at surface
+    return col;
+}
+```
+
+→ For deeper details, see [reference/sdf-tricks.md](../reference/sdf-tricks.md)
--- a/skills/shader-dev/techniques/shadow-techniques.md
+++ b/skills/shader-dev/techniques/shadow-techniques.md
@@ -0,0 +1,776 @@
+# SDF Soft Shadow Techniques
+
+## Core Principles
+
+March from the surface point toward the light source, using the **ratio of nearest distance to marching distance** to estimate penumbra width.
+
+### Key Formulas
+
+Classic formula: `shadow = min(shadow, k * h / t)`
+- `h` = SDF value at current position, `t` = distance traveled, `k` = penumbra hardness
+
+Improved formula (geometric triangulation) — eliminates sharp edge banding artifacts:
+```
+y = h² / (2 * ph)       // ph = SDF value from previous step
+d = sqrt(h² - y²)       // true closest distance perpendicular to the ray
+shadow = min(shadow, d / (w * max(0, t - y)))
+```
+
+Negative extension — allows `res` to drop to -1, remapped with a C1 continuous function to eliminate hard creases:
+```
+res = max(res, -1.0)
+shadow = 0.25 * (1 + res)² * (2 - res)
+```
+This is equivalent to `smoothstep` over [-1, 1] instead of [0, 1]. The step size is clamped with `clamp(h, 0.005, 0.50)` to ensure the ray penetrates slightly into geometry, capturing both outer and inner penumbra. This produces results close to ground truth for varying light sizes.
+
+## Implementation Steps
+
+### Step 1: Scene SDF
+
+```glsl
+float sdSphere(vec3 p, float r) { return length(p) - r; }
+float sdPlane(vec3 p) { return p.y; }
+float sdRoundBox(vec3 p, vec3 b, float r) {
+    vec3 q = abs(p) - b;
+    return length(max(q, 0.0)) + min(max(q.x, max(q.y, q.z)), 0.0) - r;
+}
+
+float map(vec3 p) {
+    float d = sdPlane(p);
+    d = min(d, sdSphere(p - vec3(0.0, 0.5, 0.0), 0.5));
+    d = min(d, sdRoundBox(p - vec3(-1.2, 0.3, 0.5), vec3(0.3), 0.05));
+    return d;
+}
+```
+
+### Step 2: Classic Soft Shadow
+
+```glsl
+// Classic SDF soft shadow
+float calcSoftShadow(vec3 ro, vec3 rd, float mint, float tmax) {
+    float res = 1.0;
+    float t = mint;
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        float s = clamp(SHADOW_K * h / t, 0.0, 1.0);
+        res = min(res, s);
+        t += clamp(h, MIN_STEP, MAX_STEP);
+        if (res < 0.004 || t > tmax) break;
+    }
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);  // smoothstep smoothing
+}
+```
+
+### Step 3: Improved Soft Shadow (Geometric Triangulation)
+
+```glsl
+// Improved version - geometric triangulation using adjacent SDF values
+float calcSoftShadowImproved(vec3 ro, vec3 rd, float mint, float tmax, float w) {
+    float res = 1.0;
+    float t = mint;
+    float ph = 1e10;
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        float y = h * h / (2.0 * ph);
+        float d = sqrt(h * h - y * y);
+        res = min(res, d / (w * max(0.0, t - y)));
+        ph = h;
+        t += h;
+        if (res < 0.0001 || t > tmax) break;
+    }
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);
+}
+```
+
+### Step 4: Negative Extension (Smoothest Penumbra)
+
+```glsl
+// Negative extension - allows res to go negative for C1 continuous penumbra
+float calcSoftShadowSmooth(vec3 ro, vec3 rd, float mint, float tmax, float w) {
+    float res = 1.0;
+    float t = mint;
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        res = min(res, h / (w * t));
+        t += clamp(h, MIN_STEP, MAX_STEP);
+        if (res < -1.0 || t > tmax) break;
+    }
+    res = max(res, -1.0);
+    return 0.25 * (1.0 + res) * (1.0 + res) * (2.0 - res);
+}
+```
+
+### Step 5: Bounding Volume Optimization
+
+```glsl
+// plane clipping -- clip the ray to the scene's upper bound
+float tp = (SCENE_Y_MAX - ro.y) / rd.y;
+if (tp > 0.0) tmax = min(tmax, tp);
+
+// AABB bounding box clipping
+vec2 iBox(vec3 ro, vec3 rd, vec3 rad) {
+    vec3 m = 1.0 / rd;
+    vec3 n = m * ro;
+    vec3 k = abs(m) * rad;
+    vec3 t1 = -n - k;
+    vec3 t2 = -n + k;
+    float tN = max(max(t1.x, t1.y), t1.z);
+    float tF = min(min(t2.x, t2.y), t2.z);
+    if (tN > tF || tF < 0.0) return vec2(-1.0);
+    return vec2(tN, tF);
+}
+
+// usage: return 1.0 immediately if the ray misses the bounding box entirely
+vec2 dis = iBox(ro, rd, BOUND_SIZE);
+if (dis.y < 0.0) return 1.0;
+tmin = max(tmin, dis.x);
+tmax = min(tmax, dis.y);
+```
+
+### Step 6: Shadow Color Rendering
+
+```glsl
+// Classic colored shadow
+vec3 shadowColor = vec3(sha, sha * sha * 0.5 + 0.5 * sha, sha * sha);
+
+// per-channel power (penumbra region shifts warm)
+vec3 shadowColor = pow(vec3(sha), vec3(1.0, 1.2, 1.5));
+```
+
+### Step 7: Integration with Lighting Model
+
+```glsl
+vec3 sunDir = normalize(vec3(-0.5, 0.4, -0.6));
+vec3 hal = normalize(sunDir - rd);
+
+float dif = clamp(dot(nor, sunDir), 0.0, 1.0);
+if (dif > 0.0001)
+    dif *= calcSoftShadow(pos + nor * 0.01, sunDir, 0.02, 8.0);
+
+float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), 16.0);
+spe *= dif;
+
+vec3 col = vec3(0.0);
+col += albedo * 2.0 * dif * vec3(1.0, 0.9, 0.8);
+col += 5.0 * spe * vec3(1.0, 0.9, 0.8);
+col += albedo * 0.5 * clamp(0.5 + 0.5 * nor.y, 0.0, 1.0) * vec3(0.4, 0.6, 1.0);
+```
+
+## Complete Code Template
+
+Runs directly in ShaderToy, with A/B comparison of three soft shadow techniques.
+
+```glsl
+#define ZERO (min(iFrame, 0))
+
+// ---- Adjustable Parameters ----
+#define MAX_MARCH_STEPS   128
+#define MAX_SHADOW_STEPS   64   // 16~128
+#define SHADOW_K          8.0   // 4~64, higher = harder
+#define SHADOW_MINT      0.02   // 0.01~0.05
+#define SHADOW_TMAX      8.0
+#define SHADOW_MIN_STEP  0.01
+#define SHADOW_MAX_STEP  0.20
+#define SHADOW_W         0.10   // improved version penumbra width
+
+// 0=classic, 1=improved(Aaltonen), 2=negative extension
+#define SHADOW_TECHNIQUE   0
+
+// ---- SDF Primitives ----
+float sdSphere(vec3 p, float r) { return length(p) - r; }
+float sdPlane(vec3 p) { return p.y; }
+float sdRoundBox(vec3 p, vec3 b, float r) {
+    vec3 q = abs(p) - b;
+    return length(max(q, 0.0)) + min(max(q.x, max(q.y, q.z)), 0.0) - r;
+}
+float sdTorus(vec3 p, vec2 t) {
+    vec2 q = vec2(length(p.xz) - t.x, p.y);
+    return length(q) - t.y;
+}
+
+// ---- Scene SDF ----
+float map(vec3 p) {
+    float d = sdPlane(p);
+    d = min(d, sdSphere(p - vec3(0.0, 0.5, 0.0), 0.5));
+    d = min(d, sdRoundBox(p - vec3(-1.2, 0.30, 0.5), vec3(0.25), 0.05));
+    d = min(d, sdTorus(p - vec3(1.2, 0.25, -0.3), vec2(0.40, 0.08)));
+    return d;
+}
+
+// ---- Normal ----
+vec3 calcNormal(vec3 p) {
+    vec2 e = vec2(0.0005, 0.0);
+    return normalize(vec3(
+        map(p + e.xyy) - map(p - e.xyy),
+        map(p + e.yxy) - map(p - e.yxy),
+        map(p + e.yyx) - map(p - e.yyx)));
+}
+
+// ---- Raymarching ----
+float castRay(vec3 ro, vec3 rd) {
+    float t = 0.0;
+    for (int i = ZERO; i < MAX_MARCH_STEPS; i++) {
+        float h = map(ro + rd * t);
+        if (h < 0.0002) return t;
+        t += h;
+        if (t > 20.0) break;
+    }
+    return -1.0;
+}
+
+// ---- Bounding Volume Clipping ----
+float clipTmax(vec3 ro, vec3 rd, float tmax, float yMax) {
+    float tp = (yMax - ro.y) / rd.y;
+    if (tp > 0.0) tmax = min(tmax, tp);
+    return tmax;
+}
+
+// ---- Shadow: Classic ----
+float softShadowClassic(vec3 ro, vec3 rd, float mint, float tmax) {
+    tmax = clipTmax(ro, rd, tmax, 1.5);
+    float res = 1.0, t = mint;
+    for (int i = ZERO; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        float s = clamp(SHADOW_K * h / t, 0.0, 1.0);
+        res = min(res, s);
+        t += clamp(h, SHADOW_MIN_STEP, SHADOW_MAX_STEP);
+        if (res < 0.004 || t > tmax) break;
+    }
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);
+}
+
+// ---- Shadow: Improved ----
+float softShadowImproved(vec3 ro, vec3 rd, float mint, float tmax, float w) {
+    tmax = clipTmax(ro, rd, tmax, 1.5);
+    float res = 1.0, t = mint, ph = 1e10;
+    for (int i = ZERO; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        float y = h * h / (2.0 * ph);
+        float d = sqrt(h * h - y * y);
+        res = min(res, d / (w * max(0.0, t - y)));
+        ph = h;
+        t += h;
+        if (res < 0.0001 || t > tmax) break;
+    }
+    res = clamp(res, 0.0, 1.0);
+    return res * res * (3.0 - 2.0 * res);
+}
+
+// ---- Shadow: Negative Extension ----
+float softShadowSmooth(vec3 ro, vec3 rd, float mint, float tmax, float w) {
+    tmax = clipTmax(ro, rd, tmax, 1.5);
+    float res = 1.0, t = mint;
+    for (int i = ZERO; i < MAX_SHADOW_STEPS; i++) {
+        float h = map(ro + rd * t);
+        res = min(res, h / (w * t));
+        t += clamp(h, SHADOW_MIN_STEP, SHADOW_MAX_STEP);
+        if (res < -1.0 || t > tmax) break;
+    }
+    res = max(res, -1.0);
+    return 0.25 * (1.0 + res) * (1.0 + res) * (2.0 - res);
+}
+
+// ---- Unified Interface ----
+float calcSoftShadow(vec3 ro, vec3 rd, float mint, float tmax) {
+    #if SHADOW_TECHNIQUE == 0
+        return softShadowClassic(ro, rd, mint, tmax);
+    #elif SHADOW_TECHNIQUE == 1
+        return softShadowImproved(ro, rd, mint, tmax, SHADOW_W);
+    #else
+        return softShadowSmooth(ro, rd, mint, tmax, SHADOW_W);
+    #endif
+}
+
+// ---- AO ----
+float calcAO(vec3 p, vec3 n) {
+    float occ = 0.0, sca = 1.0;
+    for (int i = ZERO; i < 5; i++) {
+        float h = 0.01 + 0.12 * float(i) / 4.0;
+        float d = map(p + h * n);
+        occ += (h - d) * sca;
+        sca *= 0.95;
+    }
+    return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+}
+
+// ---- Checkerboard ----
+float checkerboard(vec2 p) {
+    vec2 q = floor(p);
+    return mix(0.3, 1.0, mod(q.x + q.y, 2.0));
+}
+
+// ---- Render ----
+vec3 render(vec3 ro, vec3 rd) {
+    vec3 col = vec3(0.7, 0.75, 0.85) - 0.3 * rd.y;
+    float t = castRay(ro, rd);
+    if (t < 0.0) return col;
+
+    vec3 pos = ro + rd * t;
+    vec3 nor = calcNormal(pos);
+    vec3 albedo = vec3(0.18);
+    if (pos.y < 0.001)
+        albedo = vec3(0.08 + 0.15 * checkerboard(pos.xz * 2.0));
+
+    vec3 sunDir = normalize(vec3(-0.5, 0.4, -0.6));
+    vec3 hal = normalize(sunDir - rd);
+
+    float dif = clamp(dot(nor, sunDir), 0.0, 1.0);
+    if (dif > 0.0001)
+        dif *= calcSoftShadow(pos + nor * 0.001, sunDir, SHADOW_MINT, SHADOW_TMAX);
+
+    float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), 16.0);
+    spe *= dif;
+    float fre = pow(clamp(1.0 + dot(nor, rd), 0.0, 1.0), 5.0);
+    spe *= 0.04 + 0.96 * fre;
+
+    float sky = clamp(0.5 + 0.5 * nor.y, 0.0, 1.0);
+    float occ = calcAO(pos, nor);
+
+    vec3 lin = vec3(0.0);
+    lin += 2.5 * dif * vec3(1.30, 1.00, 0.70);
+    lin += 8.0 * spe * vec3(1.30, 1.00, 0.70);
+    lin += 0.5 * sky * vec3(0.40, 0.60, 1.00) * occ;
+    lin += 0.25 * occ * vec3(0.40, 0.50, 0.60);
+
+    col = albedo * lin;
+    col = pow(col, vec3(0.4545));
+    return col;
+}
+
+// ---- Camera ----
+mat3 setCamera(vec3 ro, vec3 ta) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cu = normalize(cross(cw, vec3(0.0, 1.0, 0.0)));
+    vec3 cv = cross(cu, cw);
+    return mat3(cu, cv, cw);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+    float an = 0.3 * iTime;
+    vec3 ro = vec3(3.5 * sin(an), 1.8, 3.5 * cos(an));
+    vec3 ta = vec3(0.0, 0.3, 0.0);
+    mat3 ca = setCamera(ro, ta);
+    vec3 rd = ca * normalize(vec3(p, 1.8));
+    vec3 col = render(ro, rd);
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Standalone HTML + WebGL2 Template
+
+When generating standalone HTML files, use the following complete template. Key points:
+- Must use `canvas.getContext('webgl2')`
+- Shaders use `#version 300 es`
+- Entry function is `void main()`, not `void mainImage()`
+- Use `gl_FragCoord.xy` to get pixel coordinates (available in WebGL2)
+
+```html
+<!DOCTYPE html>
+<html>
+<head>
+    <meta charset="utf-8">
+    <title>Soft Shadows - SDF Raymarching</title>
+    <style>
+        body { margin: 0; overflow: hidden; background: #000; }
+        canvas { display: block; width: 100vw; height: 100vh; }
+    </style>
+</head>
+<body>
+    <canvas id="canvas"></canvas>
+    <script>
+        const canvas = document.getElementById('canvas');
+        const gl = canvas.getContext('webgl2');
+
+        if (!gl) {
+            document.body.innerHTML = '<p style="color:#fff;">WebGL2 not supported</p>';
+            throw new Error('WebGL2 not supported');
+        }
+
+        // Vertex shader: fullscreen quad
+        const vsSource = `#version 300 es
+            in vec4 aPosition;
+            void main() {
+                gl_Position = aPosition;
+            }
+        `;
+
+        // Fragment shader: SDF soft shadows
+        const fsSource = `#version 300 es
+            precision highp float;
+
+            uniform float iTime;
+            uniform vec2 iResolution;
+            uniform vec4 iMouse;
+
+            out vec4 fragColor;
+
+            #define ZERO (min(int(iTime), 0))
+            #define MAX_MARCH_STEPS 128
+            #define MAX_SHADOW_STEPS 64
+            #define SHADOW_MINT 0.02
+            #define SHADOW_TMAX 10.0
+            #define SHADOW_MIN_STEP 0.01
+            #define SHADOW_MAX_STEP 0.25
+            #define SHADOW_W 0.08
+            #define SHADOW_K 16.0
+
+            // SDF primitives
+            float sdSphere(vec3 p, float r) { return length(p) - r; }
+            float sdPlane(vec3 p) { return p.y; }
+            float sdRoundBox(vec3 p, vec3 b, float r) {
+                vec3 q = abs(p) - b;
+                return length(max(q, 0.0)) + min(max(q.x, max(q.y, q.z)), 0.0) - r;
+            }
+            float sdTorus(vec3 p, vec2 t) {
+                vec2 q = vec2(length(p.xz) - t.x, p.y);
+                return length(q) - t.y;
+            }
+
+            // Scene SDF
+            float map(vec3 p) {
+                float d = sdPlane(p);
+                d = min(d, sdSphere(p - vec3(0.0, 0.6, 0.0), 0.6));
+                d = min(d, sdRoundBox(p - vec3(-1.5, 0.4, 0.8), vec3(0.35), 0.08));
+                d = min(d, sdTorus(p - vec3(1.6, 0.35, -0.5), vec2(0.45, 0.12)));
+                return d;
+            }
+
+            // Normal
+            vec3 calcNormal(vec3 p) {
+                vec2 e = vec2(0.0005, 0.0);
+                return normalize(vec3(
+                    map(p + e.xyy) - map(p - e.xyy),
+                    map(p + e.yxy) - map(p - e.yxy),
+                    map(p + e.yyx) - map(p - e.yyx)));
+            }
+
+            // Raymarching
+            float castRay(vec3 ro, vec3 rd) {
+                float t = 0.0;
+                for (int i = ZERO; i < MAX_MARCH_STEPS; i++) {
+                    float h = map(ro + rd * t);
+                    if (h < 0.0002) return t;
+                    t += h;
+                    if (t > 25.0) break;
+                }
+                return -1.0;
+            }
+
+            // Plane clipping
+            float clipTmax(vec3 ro, vec3 rd, float tmax, float yMax) {
+                float tp = (yMax - ro.y) / rd.y;
+                if (tp > 0.0) tmax = min(tmax, tp);
+                return tmax;
+            }
+
+            // Soft shadow (negative extension)
+            float softShadow(vec3 ro, vec3 rd, float mint, float tmax, float w) {
+                tmax = clipTmax(ro, rd, tmax, 2.0);
+                float res = 1.0;
+                float t = mint;
+                for (int i = ZERO; i < MAX_SHADOW_STEPS; i++) {
+                    float h = map(ro + rd * t);
+                    res = min(res, h / (w * t));
+                    t += clamp(h, SHADOW_MIN_STEP, SHADOW_MAX_STEP);
+                    if (res < -1.0 || t > tmax) break;
+                }
+                res = max(res, -1.0);
+                return 0.25 * (1.0 + res) * (1.0 + res) * (2.0 - res);
+            }
+
+            // Soft shadow call
+            float calcSoftShadow(vec3 ro, vec3 rd) {
+                return softShadow(ro, rd, SHADOW_MINT, SHADOW_TMAX, SHADOW_W);
+            }
+
+            // AO
+            float calcAO(vec3 p, vec3 n) {
+                float occ = 0.0, sca = 1.0;
+                for (int i = ZERO; i < 5; i++) {
+                    float h = 0.01 + 0.12 * float(i) / 4.0;
+                    float d = map(p + h * n);
+                    occ += (h - d) * sca;
+                    sca *= 0.95;
+                }
+                return clamp(1.0 - 3.0 * occ, 0.0, 1.0);
+            }
+
+            // Checkerboard
+            float checkerboard(vec2 p) {
+                vec2 q = floor(p);
+                return mix(0.25, 0.35, mod(q.x + q.y, 2.0));
+            }
+
+            // Render
+            vec3 render(vec3 ro, vec3 rd) {
+                // sky
+                vec3 col = vec3(0.65, 0.72, 0.85) - 0.4 * rd.y;
+                col = mix(col, vec3(0.3, 0.35, 0.45), exp(-0.8 * max(rd.y, 0.0)));
+
+                float t = castRay(ro, rd);
+                if (t < 0.0) return col;
+
+                vec3 pos = ro + rd * t;
+                vec3 nor = calcNormal(pos);
+
+                // material color
+                vec3 albedo = vec3(0.18);
+                if (pos.y < 0.01) {
+                    albedo = vec3(0.12 + 0.12 * checkerboard(pos.xz * 1.5));
+                } else if (pos.y > 0.5 && length(pos.xz) < 0.7) {
+                    albedo = vec3(0.85, 0.25, 0.2);
+                } else if (pos.x < -1.0) {
+                    albedo = vec3(0.2, 0.4, 0.85);
+                } else if (pos.x > 1.0) {
+                    albedo = vec3(0.25, 0.75, 0.35);
+                } else {
+                    albedo = vec3(0.9, 0.6, 0.2);
+                }
+
+                // lighting
+                vec3 sunDir = normalize(vec3(-0.6, 0.45, -0.65));
+                vec3 hal = normalize(sunDir - rd);
+
+                float dif = clamp(dot(nor, sunDir), 0.0, 1.0);
+                if (dif > 0.0001) {
+                    dif *= calcSoftShadow(pos + nor * 0.01, sunDir);
+                }
+
+                float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), 32.0);
+                spe *= dif;
+
+                float fre = pow(clamp(1.0 + dot(nor, rd), 0.0, 1.0), 5.0);
+                spe *= 0.04 + 0.96 * fre;
+
+                float sky = clamp(0.5 + 0.5 * nor.y, 0.0, 1.0);
+                float occ = calcAO(pos, nor);
+
+                vec3 lin = vec3(0.0);
+                lin += 2.2 * dif * vec3(1.35, 1.05, 0.75);
+                lin += 6.0 * spe * vec3(1.35, 1.05, 0.75);
+                lin += 0.4 * sky * vec3(0.45, 0.6, 0.9) * occ;
+                lin += 0.25 * occ * vec3(0.5, 0.55, 0.6);
+
+                col = albedo * lin;
+                col = pow(col, vec3(0.4545));
+
+                // vignette
+                vec2 uv = gl_FragCoord.xy / iResolution.xy;
+                col *= 0.5 + 0.5 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.2);
+
+                return col;
+            }
+
+            // Camera
+            mat3 setCamera(vec3 ro, vec3 ta) {
+                vec3 cw = normalize(ta - ro);
+                vec3 cu = normalize(cross(cw, vec3(0.0, 1.0, 0.0)));
+                vec3 cv = cross(cu, cw);
+                return mat3(cu, cv, cw);
+            }
+
+            void main() {
+                vec2 fragCoord = gl_FragCoord.xy;
+                vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+
+                // slowly rotating camera
+                float an = 0.15 * iTime;
+                float dist = 5.5;
+                vec3 ro = vec3(dist * sin(an), 2.2, dist * cos(an));
+                vec3 ta = vec3(0.0, 0.3, 0.0);
+
+                mat3 ca = setCamera(ro, ta);
+                vec3 rd = ca * normalize(vec3(p, 2.0));
+
+                vec3 col = render(ro, rd);
+                fragColor = vec4(col, 1.0);
+            }
+        `;
+
+        // Compile shader
+        function createShader(gl, type, source) {
+            const shader = gl.createShader(type);
+            gl.shaderSource(shader, source);
+            gl.compileShader(shader);
+            if (!gl.getShaderParameter(shader, gl.COMPILE_STATUS)) {
+                console.error('Shader compile error:', gl.getShaderInfoLog(shader));
+                gl.deleteShader(shader);
+                return null;
+            }
+            return shader;
+        }
+
+        // Create program
+        function createProgram(gl, vs, fs) {
+            const program = gl.createProgram();
+            gl.attachShader(program, vs);
+            gl.attachShader(program, fs);
+            gl.linkProgram(program);
+            if (!gl.getProgramParameter(program, gl.LINK_STATUS)) {
+                console.error('Program link error:', gl.getProgramInfoLog(program));
+                return null;
+            }
+            return program;
+        }
+
+        const vs = createShader(gl, gl.VERTEX_SHADER, vsSource);
+        const fs = createShader(gl, gl.FRAGMENT_SHADER, fsSource);
+        const program = createProgram(gl, vs, fs);
+
+        // Fullscreen quad
+        const positions = new Float32Array([
+            -1, -1,  1, -1,  -1,  1,  1,  1
+        ]);
+
+        const posBuffer = gl.createBuffer();
+        gl.bindBuffer(gl.ARRAY_BUFFER, posBuffer);
+        gl.bufferData(gl.ARRAY_BUFFER, positions, gl.STATIC_DRAW);
+
+        const posLoc = gl.getAttribLocation(program, 'aPosition');
+        gl.enableVertexAttribArray(posLoc);
+        gl.vertexAttribPointer(posLoc, 2, gl.FLOAT, false, 0, 0);
+
+        // Uniforms
+        const uTime = gl.getUniformLocation(program, 'iTime');
+        const uResolution = gl.getUniformLocation(program, 'iResolution');
+        const uMouse = gl.getUniformLocation(program, 'iMouse');
+
+        // Mouse tracking
+        let mouseX = 0, mouseY = 0;
+        canvas.addEventListener('mousemove', (e) => {
+            mouseX = e.clientX;
+            mouseY = canvas.height - e.clientY;
+        });
+
+        // Window resize
+        function resize() {
+            const dpr = Math.min(window.devicePixelRatio, 2);
+            canvas.width = window.innerWidth * dpr;
+            canvas.height = window.innerHeight * dpr;
+            gl.viewport(0, 0, canvas.width, canvas.height);
+        }
+        window.addEventListener('resize', resize);
+        resize();
+
+        // Render loop
+        function render(time) {
+            time *= 0.001;
+            gl.useProgram(program);
+            gl.uniform1f(uTime, time);
+            gl.uniform2f(uResolution, canvas.width, canvas.height);
+            gl.uniform4f(uMouse, mouseX, mouseY, mouseX, mouseY);
+            gl.drawArrays(gl.TRIANGLE_STRIP, 0, 4);
+            requestAnimationFrame(render);
+        }
+        requestAnimationFrame(render);
+    </script>
+</body>
+</html>
+```
+
+## Common Variants
+
+### Analytic Sphere Shadow
+
+```glsl
+vec2 sphDistances(vec3 ro, vec3 rd, vec4 sph) {
+    vec3 oc = ro - sph.xyz;
+    float b = dot(oc, rd);
+    float c = dot(oc, oc) - sph.w * sph.w;
+    float h = b * b - c;
+    float d = sqrt(max(0.0, sph.w * sph.w - h)) - sph.w;
+    return vec2(d, -b - sqrt(max(h, 0.0)));
+}
+float sphSoftShadow(vec3 ro, vec3 rd, vec4 sph, float k) {
+    vec2 r = sphDistances(ro, rd, sph);
+    if (r.y > 0.0)
+        return clamp(k * max(r.x, 0.0) / r.y, 0.0, 1.0);
+    return 1.0;
+}
+```
+
+### Terrain Heightfield Shadow
+
+```glsl
+float terrainShadow(vec3 ro, vec3 rd, float dis) {
+    float minStep = clamp(dis * 0.01, 0.5, 50.0);
+    float res = 1.0, t = 0.01;
+    for (int i = 0; i < 80; i++) {
+        vec3 p = ro + t * rd;
+        float h = p.y - terrainMap(p.xz);
+        res = min(res, 16.0 * h / t);
+        t += max(minStep, h);
+        if (res < 0.001 || p.y > MAX_TERRAIN_HEIGHT) break;
+    }
+    return clamp(res, 0.0, 1.0);
+}
+```
+
+### Per-Material Soft/Hard Blending
+
+```glsl
+float hsha = 1.0;  // global variable, set per material in map()
+float mapWithShadowHardness(vec3 p) {
+    float d = sdPlane(p); hsha = 1.0;
+    float dChar = sdCharacter(p);
+    if (dChar < d) { d = dChar; hsha = 0.0; }
+    return d;
+}
+// in shadow loop: res = min(res, mix(1.0, SHADOW_K * h / t, hsha));
+```
+
+### Multi-Layer Shadow Compositing
+
+```glsl
+float sha_terrain = terrainShadow(pos, sunDir, 0.02);
+float sha_trees   = treesShadow(pos, sunDir);
+float sha_clouds  = cloudShadow(pos, sunDir);
+float sha = sha_terrain * sha_trees;
+sha *= smoothstep(-0.3, -0.1, sha_clouds);
+dif *= sha;
+```
+
+### Volumetric Light / God Rays
+
+```glsl
+float godRays(vec3 ro, vec3 rd, float tmax, vec3 sunDir) {
+    float v = 0.0, dt = 0.15;
+    float t = dt * fract(texelFetch(iChannel0, ivec2(fragCoord) & 255, 0).x);
+    for (int i = 0; i < 32; i++) {
+        if (t > tmax) break;
+        vec3 p = ro + rd * t;
+        float sha = calcSoftShadow(p, sunDir, 0.02, 8.0);
+        v += sha * exp(-0.2 * t);
+        t += dt;
+    }
+    v /= 32.0;
+    return v * v;
+}
+// col += intensity * godRays(...) * vec3(1.0, 0.75, 0.4);
+```
+
+## Performance & Composition
+
+**Performance optimization:**
+- Bounding volume clipping (plane/AABB) can reduce 30-70% of wasted iterations
+- Step clamping `clamp(h, minStep, maxStep)` prevents stalling / skipping thin objects
+- Early exit: `res < 0.004` (classic) or `res < -1.0` (negative extension)
+- Simplified `map()` omitting material calculations, returning distance only
+- Only compute shadow when `dif > 0.0001`; skip for backlit faces
+- Iteration count: simple scenes 16~32, complex FBM 64~128, terrain ~80
+- `#define ZERO (min(iFrame,0))` prevents compiler loop unrolling
+
+**Composition tips:**
+- AO: shadows control direct light, AO controls indirect light, `col = diffuse * sha + ambient * ao`
+- SSS: `sss *= 0.25 + 0.75 * sha` -- SSS weakens but does not vanish in shadow
+- Fog: complete lit+shadowed shading first, then `mix(col, fogColor, 1.0 - exp(-0.001*t*t))`
+- Normal mapping: perturbed normals for lighting, geometric normals for shadow determination
+- Reflection: `refSha = calcSoftShadow(pos + nor*0.01, reflect(rd, nor), 0.02, 8.0)`
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/shadow-techniques.md)
--- a/skills/shader-dev/techniques/simulation-physics.md
+++ b/skills/shader-dev/techniques/simulation-physics.md
--- a/skills/shader-dev/techniques/sound-synthesis.md
+++ b/skills/shader-dev/techniques/sound-synthesis.md
@@ -0,0 +1,490 @@
+**IMPORTANT - GLSL ES 3.00 Critical Rules**:
+1. **Type strictness**: `int` and `float` cannot be mixed directly; array indices must be of `int` type
+2. **Reserved words**: `sample` is a reserved word in GLSL ES 3.00; it cannot be used as a variable name
+3. **Constant arrays**: Must explicitly specify size when declaring, e.g., `const float ARR[4] = float[4](1.,2.,3.,4.);`
+4. **Integer division**: In GLSL ES 3.00, `1/2` evaluates to 0 (integer division); must use `1.0/2.0` or `float(1)/float(2)`
+
+# Sound Synthesis (Procedural Audio)
+
+## Use Cases
+- Generate procedural audio using `mainSound()` in ShaderToy
+- Synthesize melodies, chords, rhythm patterns, and complete music
+- Synthesize instrument timbres: piano, bass, acid synth, percussion
+- Implement audio effects: delay, reverb, distortion, filters
+- Pure mathematical audio generation without external samples
+
+## Core Principles
+
+ShaderToy sound shader four-layer architecture:
+
+1. **Oscillator layer**: `sin(2π·f·t)`, layering harmonics or FM modulation to build timbre
+2. **Envelope layer**: `exp(-rate·t)` + `smoothstep` attack, simulating strike→decay
+3. **Sequencer layer**: Macro definitions / array lookup / hash pseudo-random for arranging melodies
+4. **Effects layer**: Reverb, delay, distortion, filters, and other post-processing
+
+Key formulas:
+- MIDI → frequency: `f = 440.0 × 2^((n - 69) / 12)`
+- Sine oscillator: `y = sin(2π × freq × time)`
+- Exponential decay: `env = exp(-decay_rate × time)`
+- FM modulation: `y = sin(2π × f_c × t + depth × sin(2π × f_m × t))`
+
+## Implementation Steps
+
+### Step 1: mainSound Entry Framework
+```glsl
+#define TAU 6.28318530718
+#define BPM 120.0
+#define SPB (60.0 / BPM)
+
+vec2 mainSound(int samp, float time) {
+    vec2 audio = vec2(0.0);
+    // Layer each instrument/track
+    audio *= 0.5 * smoothstep(0.0, 0.5, time);  // Master volume + pop prevention
+    return clamp(audio, -1.0, 1.0);
+}
+```
+
+### Step 2: MIDI Note to Frequency
+```glsl
+float noteFreq(float note) {
+    return 440.0 * pow(2.0, (note - 69.0) / 12.0);
+}
+```
+
+### Step 3: Basic Oscillators
+```glsl
+float osc_sin(float t) { return sin(TAU * t); }
+float osc_saw(float t) { return fract(t) * 2.0 - 1.0; }
+float osc_sqr(float t) { return step(fract(t), 0.5) * 2.0 - 1.0; }
+float osc_tri(float t) { return abs(fract(t) - 0.5) * 4.0 - 1.0; }
+```
+
+### Step 4: Additive Synthesis Instrument
+```glsl
+// Layer harmonics to build timbre; higher harmonics decay faster
+float instrument_additive(float freq, float t) {
+    float y = 0.0;
+    y += 0.50 * sin(TAU * 1.00 * freq * t) * exp(-0.0015 * 1.0 * freq * t);
+    y += 0.30 * sin(TAU * 2.01 * freq * t) * exp(-0.0015 * 2.0 * freq * t);
+    y += 0.20 * sin(TAU * 4.01 * freq * t) * exp(-0.0015 * 4.0 * freq * t);
+    y += 0.1 * y * y * y;                          // Nonlinear waveshaping
+    y *= 0.9 + 0.1 * cos(40.0 * t);                // Tremolo
+    y *= smoothstep(0.0, 0.01, t);                  // Smooth attack
+    return y;
+}
+```
+
+### Step 5: FM Synthesis Instrument
+```glsl
+// FM electric piano (stereo)
+vec2 fm_epiano(float freq, float t) {
+    vec2 f0 = vec2(freq * 0.998, freq * 1.002);    // Stereo micro-detuning
+    // "Glass" layer - high-frequency FM, metallic attack
+    vec2 glass = sin(TAU * (f0 + 3.0) * t
+        + sin(TAU * 14.0 * f0 * t) * exp(-30.0 * t)
+    ) * exp(-4.0 * t);
+    glass = sin(glass);
+    // "Body" layer - low-frequency FM, warm sustained tone
+    vec2 body = sin(TAU * f0 * t
+        + sin(TAU * f0 * t) * exp(-0.5 * t) * pow(440.0 / f0.x, 0.5)
+    ) * exp(-t);
+    return (glass + body) * smoothstep(0.0, 0.001, t) * 0.1;
+}
+
+// FM generic instrument (struct parameterized)
+struct Instr {
+    float att, fo, vibe, vphas, phas, dtun;
+};
+
+float fm_instrument(float freq, float t, float beatTime, Instr ins) {
+    float f = freq - beatTime * ins.dtun;
+    float phase = f * t * TAU;
+    float vibrato = cos(beatTime * ins.vibe * 3.14159 / 8.0 + ins.vphas * 1.5708);
+    float fm = sin(phase + vibrato * sin(phase * ins.phas));
+    float env = exp(-beatTime * ins.fo) * (1.0 - exp(-beatTime * ins.att));
+    return fm * env * (1.0 - beatTime * 0.125);
+}
+```
+
+### Step 6: Percussion Synthesis
+```glsl
+float hash(float p) {
+    p = fract(p * 0.1031); p *= p + 33.33; p *= p + p; return fract(p);
+}
+
+// 909 kick drum: frequency sweep + noise click
+float kick(float t) {
+    float phase = TAU * (60.0 * t - 512.0 * 0.01 * exp(-t / 0.01));
+    float body = sin(phase) * smoothstep(0.3, 0.0, t) * 1.5;
+    float click = sin(TAU * 8000.0 * fract(t)) * hash(t * 2000.0)
+                * smoothstep(0.007, 0.0, t);
+    return body + click;
+}
+
+// Hi-hat: noise + exponential decay. decay: 5.0=open, 15.0=closed
+float hihat(float t, float decay) {
+    float noise = hash(floor(t * 44100.0)) * 2.0 - 1.0;
+    return noise * exp(-decay * t) * smoothstep(0.0, 0.02, t);
+}
+
+// Clap/snare
+float clap(float t) {
+    float noise = hash(floor(t * 44100.0)) * 2.0 - 1.0;
+    return noise * smoothstep(0.1, 0.0, t);
+}
+```
+
+### Step 7: Note Sequence Arrangement
+```glsl
+// === Method A: D() macro accumulation (good for handwritten melodies) ===
+#define D(duration, note) b += float(duration); if(t > b) { x = b; n = float(note); }
+
+float melody_macro(float time) {
+    float t = time / 0.18;
+    float n = 0.0, b = 0.0, x = 0.0;
+    D(10,71) D(2,76) D(3,79) D(1,78) D(2,76) D(4,83) D(2,81) D(6,78)
+    float freq = noteFreq(n);
+    float noteTime = 0.18 * (t - x);
+    return instrument_additive(freq, noteTime);
+}
+
+// === Method B: Array lookup (good for complex arrangements) ===
+// NOTE: Array indices must be int type in GLSL ES 3.00
+const float NOTES[16] = float[16](
+    60., 62., 64., 65., 67., 69., 71., 72.,
+    60., 64., 67., 72., 65., 69., 64., 60.
+);
+
+float melody_array(float time, float bpm) {
+    float beat = time * bpm / 60.0;
+    int idx = int(mod(beat, 16.0));  // IMPORTANT: Must use int() conversion
+    float noteTime = fract(beat);
+    float freq = noteFreq(NOTES[idx]);
+    return instrument_additive(freq, noteTime * 60.0 / bpm);
+}
+
+// === Method C: Hash pseudo-random (good for algorithmic composition) ===
+float nse(float x) { return fract(sin(x * 110.082) * 19871.8972); }
+
+float scale_filter(float note) {
+    float n2 = mod(note, 12.0);
+    if (n2==1.||n2==3.||n2==6.||n2==8.||n2==10.) return -100.0;
+    return note;
+}
+
+float melody_random(float time, float bpm) {
+    float beat = time * bpm / 60.0;
+    float note = 48.0 + floor(nse(floor(beat)) * 24.0);
+    note = scale_filter(note);
+    return instrument_additive(noteFreq(note), fract(beat) * 60.0 / bpm);
+}
+```
+
+### Step 8: Chord Construction
+```glsl
+vec2 chord(float time, float root, float isMinor) {
+    vec2 result = vec2(0.0);
+    float bass = root - 24.0;
+    result += fm_epiano(noteFreq(bass), time, 2.0);
+    result += fm_epiano(noteFreq(root), time - SPB * 0.5, 1.25);
+    result += fm_epiano(noteFreq(root + 4.0 - isMinor), time - SPB, 1.5);  // Third
+    result += fm_epiano(noteFreq(root + 7.0), time - SPB * 0.5, 1.25);     // Fifth
+    result += fm_epiano(noteFreq(root + 11.0 - isMinor), time - SPB, 1.5); // Seventh
+    result += fm_epiano(noteFreq(root + 14.0), time - SPB, 1.5);           // Ninth
+    return result;
+}
+```
+
+### Step 9: Delay and Reverb
+```glsl
+// Multi-tap echo
+// NOTE: "sample" is a reserved word in GLSL ES 3.00; use "samp" instead
+vec2 echo_reverb(float time) {
+    vec2 tot = vec2(0.0);
+    float hh = 1.0;
+    for (int i = 0; i < 6; i++) {
+        float h = float(i) / 5.0;
+        float samp = get_instrument_sample(time - 0.7 * h);
+        tot += samp * vec2(0.5 + 0.1 * h, 0.5 - 0.1 * h) * hh;
+        hh *= 0.5;
+    }
+    return tot;
+}
+
+// Ping-pong stereo delay
+vec2 pingpong_delay(float time) {
+    vec2 mx = get_stereo_sample(time) * 0.5;
+    float ec = 0.4, fb = 0.6, dt = 0.222;
+    float et = dt;
+    mx += get_stereo_sample(time - et) * ec * vec2(1.0, 0.5); ec *= fb; et += dt;
+    mx += get_stereo_sample(time - et) * ec * vec2(0.5, 1.0); ec *= fb; et += dt;
+    mx += get_stereo_sample(time - et) * ec * vec2(1.0, 0.5); ec *= fb; et += dt;
+    mx += get_stereo_sample(time - et) * ec * vec2(0.5, 1.0); ec *= fb; et += dt;
+    return mx;
+}
+```
+
+### Step 10: Beat and Arrangement Structure
+```glsl
+vec2 mainSound(int samp, float time) {
+    vec2 audio = vec2(0.0);
+    float beat = time * BPM / 60.0;
+    float bar = beat / 4.0;
+
+    // Kick (every beat) + hi-hat (every half beat) + melody
+    float kickTime = mod(time, SPB);
+    audio += vec2(kick(kickTime) * 0.5);
+    float hatTime = mod(time, SPB * 0.5);
+    audio += vec2(hihat(hatTime, 15.0) * 0.15);
+    audio += vec2(melody_array(time, BPM)) * 0.3;
+
+    // Arrangement: smoothstep controls intro/outro
+    audio *= smoothstep(0.0, 4.0, bar);              // Fade in over first 4 bars
+    audio *= 0.35 * smoothstep(0.0, 0.5, time);
+    // IMPORTANT: Array indices must be int type
+    // float idx = mod(beat, 16.0);        // WRONG: float cannot be used as index
+    int idx = int(mod(beat, 16.0));       // CORRECT: int(mod(...)) conversion
+    return clamp(audio, -1.0, 1.0);
+}
+```
+
+## Complete Code Template
+
+Can be pasted directly into the ShaderToy Sound tab to run. Includes FM piano melody, kick drum rhythm, and ping-pong delay.
+
+```glsl
+// === Sound Synthesis Complete Template ===
+#define TAU 6.28318530718
+#define BPM 130.0
+#define SPB (60.0 / BPM)
+#define NUM_HARMONICS 4
+#define ECHO_TAPS 4
+#define ECHO_DELAY 0.18
+#define ECHO_DECAY 0.45
+
+float noteFreq(float note) {
+    return 440.0 * pow(2.0, (note - 69.0) / 12.0);
+}
+
+float hash11(float p) {
+    p = fract(p * 0.1031); p *= p + 33.33; p *= p + p; return fract(p);
+}
+
+float osc_tri(float t) { return abs(fract(t) - 0.5) * 4.0 - 1.0; }
+
+float instrument(float freq, float t) {
+    float y = 0.0;
+    for (int i = 1; i <= NUM_HARMONICS; i++) {
+        float h = float(i);
+        float amp = 0.6 / h;
+        float decay = 0.002 * h * freq;
+        y += amp * sin(TAU * h * 1.003 * freq * t) * exp(-decay * t);
+    }
+    y += 0.15 * y * y * y;
+    y *= 0.9 + 0.1 * cos(35.0 * t);
+    y *= smoothstep(0.0, 0.008, t);
+    return y;
+}
+
+vec2 epiano(float freq, float t) {
+    vec2 f0 = vec2(freq * 0.998, freq * 1.002);
+    vec2 glass = sin(TAU * (f0 + 3.0) * t
+        + sin(TAU * 14.0 * f0 * t) * exp(-30.0 * t)
+    ) * exp(-4.0 * t);
+    glass = sin(glass);
+    vec2 body = sin(TAU * f0 * t
+        + sin(TAU * f0 * t) * exp(-0.5 * t) * pow(440.0 / max(f0.x, 1.0), 0.5)
+    ) * exp(-t);
+    return (glass + body) * smoothstep(0.0, 0.001, t) * 0.12;
+}
+
+float kick(float t) {
+    float df = 512.0, dftime = 0.01, freq = 60.0;
+    float phase = TAU * (freq * t - df * dftime * exp(-t / dftime));
+    float body = sin(phase) * smoothstep(0.3, 0.0, t) * 1.5;
+    float click = sin(TAU * 8000.0 * fract(t)) * hash11(t * 2048.0)
+                * smoothstep(0.007, 0.0, t);
+    return body + click;
+}
+
+float hihat(float t) {
+    float noise = hash11(floor(t * 44100.0)) * 2.0 - 1.0;
+    return noise * exp(-15.0 * t) * smoothstep(0.0, 0.002, t);
+}
+
+const float MELODY[16] = float[16](
+    67., 67., 72., 71.,  69., 67., 64., 64.,
+    65., 65., 69., 67.,  67., 65., 64., 62.
+);
+
+const float BASS[4] = float[4](43., 48., 45., 41.);
+
+vec2 mainSound(int samp, float time) {
+    time = mod(time, 32.0 * SPB * 4.0);
+    vec2 audio = vec2(0.0);
+    float beat = time / SPB;
+    float bar = beat / 4.0;
+
+    // Melody
+    { int idx = int(mod(beat, 16.0));
+      float noteTime = fract(beat) * SPB;
+      audio += vec2(instrument(noteFreq(MELODY[idx]), noteTime) * 0.25); }
+
+    // Bass
+    { int idx = int(mod(bar, 4.0));
+      float noteTime = fract(bar) * SPB * 4.0;
+      float freq = noteFreq(BASS[idx]);
+      audio += vec2(osc_tri(freq * noteTime) * exp(-1.5 * noteTime)
+                   * smoothstep(0.0, 0.01, noteTime) * 0.3); }
+
+    // Kick (every beat) + sidechain compression
+    { float kt = mod(time, SPB);
+      float k = kick(kt) * 0.4;
+      audio *= min(1.0, kt * 6.0 / SPB);
+      audio += vec2(k); }
+
+    // Hi-hat (every half beat, panned right)
+    { float ht = mod(time, SPB * 0.5);
+      audio += vec2(0.4, 0.6) * hihat(ht) * 0.12; }
+
+    // Ping-pong delay (melody)
+    { float ec = 0.3;
+      for (int i = 1; i <= ECHO_TAPS; i++) {
+        float dt = ECHO_DELAY * float(i);
+        int idx = int(mod((time - dt) / SPB, 16.0));
+        float nt = fract((time - dt) / SPB) * SPB;
+        float echoed = instrument(noteFreq(MELODY[idx]), nt) * 0.25 * ec;
+        if (i % 2 == 0) audio += vec2(0.3, 1.0) * echoed;
+        else             audio += vec2(1.0, 0.3) * echoed;
+        ec *= ECHO_DECAY;
+      } }
+
+    audio *= 0.4 * smoothstep(0.0, 2.0, time);
+    return clamp(audio, -1.0, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Subtractive Synthesis / TB-303 Acid Synth
+Sawtooth wave through resonant low-pass filter, cutoff frequency modulated by envelope to produce the "wow" sound.
+```glsl
+#define NSPC 128
+float lpf_response(float h, float cutoff, float reso) {
+    cutoff -= 20.0;
+    float df = max(h - cutoff, 0.0);
+    float df2 = abs(h - cutoff);
+    return exp(-0.005 * df * df) * 0.5 + exp(df2 * df2 * -0.1) * reso;
+}
+
+vec2 acid_synth(float freq, float noteTime) {
+    vec2 v = vec2(0.0);
+    float cutoff = exp(noteTime * -1.5) * 50.0 + 10.0;
+    float sqr = step(0.5, fract(noteTime * 4.5));
+    for (int i = 0; i < NSPC; i++) {
+        float h = float(i + 1);
+        float inten = 1.0 / h;
+        inten = mix(inten, inten * mod(h, 2.0), sqr);
+        inten *= lpf_response(h, cutoff, 2.2);
+        v.x += inten * sin((TAU + 0.01) * noteTime * freq * h);
+        v.y += inten * sin(TAU * noteTime * freq * h);
+    }
+    float amp = smoothstep(0.05, 0.0, abs(noteTime - 0.31) - 0.26) * exp(noteTime * -1.0);
+    return clamp(v * amp * 2.0, -1.0, 1.0);
+}
+```
+
+### Variant 2: IIR Biquad Filter
+Time-domain IIR filter based on the Audio EQ Cookbook, supporting 7 types including low-pass/high-pass/band-pass.
+```glsl
+float waveSaw(float freq, int samp) {
+    return fract(freq * float(samp) / iSampleRate) * 2.0 - 1.0;
+}
+
+vec2 widerSaw(float freq, int samp) {
+    int offset = int(freq) * 64;
+    return vec2(waveSaw(freq, samp - offset), waveSaw(freq, samp + offset));
+}
+
+void biquadLPF(float freq, float Q, float sr,
+    out float b0, out float b1, out float b2,
+    out float a0, out float a1, out float a2) {
+    float omega = TAU * freq / sr;
+    float sn = sin(omega), cs = cos(omega);
+    float alpha = sn / (2.0 * Q);
+    b0 = (1.0 - cs) * 0.5; b1 = 1.0 - cs; b2 = (1.0 - cs) * 0.5;
+    a0 = 1.0 + alpha; a1 = -2.0 * cs; a2 = 1.0 - alpha;
+}
+```
+
+### Variant 3: Vocal / Formant Synthesis
+Vocal tract model simulating human voice by synthesizing vowels through formant frequencies and bandwidths.
+```glsl
+float tract(float x, float formantFreq, float bandwidth) {
+    return sin(TAU * formantFreq * x) * exp(-bandwidth * 3.14159 * x);
+}
+
+float vowel_aah(float t, float pitch) {
+    float x = mod(t, 1.0 / pitch);
+    float aud = tract(x, 710.0, 70.0) * 0.5     // F1
+              + tract(x, 1000.0, 90.0) * 0.6     // F2
+              + tract(x, 2450.0, 140.0) * 0.4;   // F3
+    return aud;
+}
+
+float fricative(float t, float formantFreq) {
+    return (hash11(floor(formantFreq * t) * 20.0) - 0.5) * 3.0;
+}
+```
+
+### Variant 4: Algorithmic Composition
+Hash pseudo-random melody + scale quantization, multi-layer rhythmic subdivision producing fractal music structures.
+```glsl
+vec2 noteRing(float n) {
+    float r = 0.5 + 0.5 * fract(sin(mod(floor(n), 32.123) * 32.123) * 41.123);
+    n = mod(n, 8.0);
+    float note = n<1.?0. : n<2.?5. : n<3.?-2. : n<4.?4. : n<5.?7. : n<6.?4. : n<7.?2. : 0.;
+    return vec2(note, r);
+}
+
+vec2 generativeNote(float beat) {
+    float b2 = floor(beat * 0.25);
+    return noteRing(b2 * 0.0625) + noteRing(b2 * 0.25) + noteRing(b2);
+}
+```
+
+### Variant 5: Circle of Fifths Chord Progressions
+Automatically generates harmony based on the circle of fifths, advancing +7 semitones every 4 beats, alternating major/minor chords.
+```glsl
+vec2 mainSound(int samp, float time) {
+    float id = floor(time / SPB / 4.0);
+    float offset = id * 7.0;
+    float minor = mod(id, 4.0) >= 3.0 ? 1.0 : 0.0;
+    float t = mod(time, SPB * 4.0);
+    float root = 57.0 + mod(offset, 12.0);
+    vec2 result = chord(t, root, minor);
+    result += vec2(0.5, 0.2) * chord(t - SPB * 0.5, root, minor);
+    result += vec2(0.05, 0.1) * chord(t - SPB, root, minor);
+    return result;
+}
+```
+
+## Performance & Composition
+
+**Performance Tips:**
+- Harmonic count (`NUM_HARMONICS` / `NSPC`) is the biggest bottleneck; start with 4-8, stop when sufficient
+- IIR filters require looping through sample history per output sample; prefer frequency-domain methods
+- Each delay tap requires recomputing the full signal chain; 4 taps = 5x computation
+- `fract(x)` is faster than `mod(x, 1.0)`; hoist constants out of loops
+- Use Common Pass to share constants; avoid redundant computation between Sound and Image
+
+**Composition Tips:**
+- **Audio visualization**: Sound output is read via `iChannel0` in the Image shader for spectrum display
+- **Raymarching sync**: Common Pass defines shared timeline; Sound/Image reference it synchronously
+- **Particle systems**: Use kick triggers to drive particle emission; share BPM/SPB for beat position calculation
+- **Post-processing linkage**: Sidechain compression coefficients drive bloom/chromatic aberration/dithering via Common Pass
+- **Text overlay**: `message()` in Image shader renders parameter display or interaction instructions
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/sound-synthesis.md)
--- a/skills/shader-dev/techniques/terrain-rendering.md
+++ b/skills/shader-dev/techniques/terrain-rendering.md
@@ -0,0 +1,408 @@
+# Heightfield Ray Marching Terrain Rendering
+
+## Use Cases
+
+- Procedural generation of natural landscapes (mountains, canyons, dunes, etc.) in ShaderToy / Fragment Shaders
+- Complete 3D terrain flythrough scenes in a single pixel shader, without geometry
+- Cinematic aerial perspective, soft shadows, and layered material effects
+
+## Core Principles
+
+Rendering pipeline: height field definition → ray marching intersection → normals & materials → lighting → atmospheric effects
+
+- **FBM**: `f(p) = Σ (aⁿ × noise(2ⁿ × R × p))`, a=0.5, R=rotation matrix, 2ⁿ=frequency doubling
+- **Derivative erosion**: `f(p) = Σ (aⁿ × noise(p) / (1 + dot(d,d)))`, d=accumulated gradient, suppresses detail on steep slopes
+- **Adaptive step size**: `step = factor × (ray.y - terrain_height)`
+
+## Implementation Steps
+
+1. **Noise & hash** — sin-free hash + Value Noise with analytic derivatives (`noised` returns value + partial derivatives)
+2. **FBM terrain** — derivative erosion FBM, `mat2(0.8,-0.6,0.6,0.8)` per-layer rotation to eliminate banding; LOD tiers (L=3/M=9/H=16 octaves)
+3. **Ray marching** — upper bound clipping + adaptive step `STEP_FACTOR * h` + distance-adaptive precision `abs(h) < 0.0015*t`
+4. **Normals** — finite differences, epsilon increases with distance to avoid distant aliasing, using high-precision `terrainH`
+5. **Soft shadows** — march toward sun, track `min(k*h/t)` to estimate penumbra
+6. **Materials** — blend rock/grass/snow/sand by height + slope + noise
+7. **Lighting** — Lambert diffuse + hemisphere ambient + backlight + Fresnel rim light + Blinn-Phong specular
+8. **Atmospheric fog** — wavelength-dependent attenuation `exp(-t*k*vec3(1,1.5,4))` + sun scatter fog color
+9. **Sky** — zenith-to-horizon gradient + sun disk/halo
+10. **Camera** — Look-At matrix + path-following flight, height tracks terrain
+
+## Complete Code Template
+
+```glsl
+// =====================================================
+// Heightfield Terrain Rendering - Complete Template
+// =====================================================
+#define TERRAIN_OCTAVES 9     // FBM octave count (3~16)
+#define TERRAIN_SCALE 0.003   // Terrain spatial frequency
+#define TERRAIN_HEIGHT 120.0  // Terrain elevation scale
+#define MAX_STEPS 300         // Ray march step count (80~400)
+#define MAX_DIST 5000.0       // Maximum render distance
+#define STEP_FACTOR 0.4       // March conservative factor (0.3~0.8)
+#define SHADOW_STEPS 80       // Shadow step count (32~128)
+#define SHADOW_K 16.0         // Penumbra softness (8~64)
+#define FOG_DENSITY 0.00025   // Fog density
+#define SNOW_HEIGHT 80.0      // Snow line height
+#define CAM_ALTITUDE 20.0     // Camera height above ground
+#define SUN_DIR normalize(vec3(0.8, 0.4, -0.6))
+#define SUN_COL vec3(8.0, 5.0, 3.0)
+#define SKY_COL vec3(0.5, 0.7, 1.0)
+
+// ---- Hash & Noise ----
+float hash(vec2 p) {
+    vec3 p3 = fract(vec3(p.xyx) * 0.1031);
+    p3 += dot(p3, p3.yzx + 19.19);
+    return fract((p3.x + p3.y) * p3.z);
+}
+
+vec3 noised(in vec2 p) {
+    vec2 i = floor(p);
+    vec2 f = fract(p);
+    vec2 u  = f * f * (3.0 - 2.0 * f);
+    vec2 du = 6.0 * f * (1.0 - f);
+    float a = hash(i + vec2(0.0, 0.0));
+    float b = hash(i + vec2(1.0, 0.0));
+    float c = hash(i + vec2(0.0, 1.0));
+    float d = hash(i + vec2(1.0, 1.0));
+    float v = a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y;
+    vec2  g = du * (vec2(b - a, c - a) + (a - b - c + d) * u.yx);
+    return vec3(v, g);
+}
+
+float noise(in vec2 p) { return noised(p).x; }
+
+// ---- FBM Terrain (derivative erosion) + LOD ----
+const mat2 m2 = mat2(0.8, -0.6, 0.6, 0.8);
+
+float terrainFBM(in vec2 p, int octaves) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0, b = 1.0;
+    vec2  d = vec2(0.0);
+    for (int i = 0; i < 16; i++) {
+        if (i >= octaves) break;
+        vec3 n = noised(p);
+        d += n.yz;
+        a += b * n.x / (1.0 + dot(d, d));
+        b *= 0.5;
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+
+float terrainL(vec2 p) { return terrainFBM(p, 3); }
+float terrainM(vec2 p) { return terrainFBM(p, TERRAIN_OCTAVES); }
+float terrainH(vec2 p) { return terrainFBM(p, 16); }
+
+// ---- Ray Marching ----
+float raymarch(in vec3 ro, in vec3 rd) {
+    float t = 0.0;
+    if (ro.y > TERRAIN_HEIGHT && rd.y >= 0.0) return -1.0;
+    if (ro.y > TERRAIN_HEIGHT) t = (ro.y - TERRAIN_HEIGHT) / (-rd.y);
+    for (int i = 0; i < MAX_STEPS; i++) {
+        vec3 pos = ro + t * rd;
+        float h = pos.y - terrainM(pos.xz);
+        if (abs(h) < 0.0015 * t) break;
+        if (t > MAX_DIST) return -1.0;
+        t += STEP_FACTOR * h;
+    }
+    return t;
+}
+
+// ---- Normals ----
+vec3 calcNormal(in vec3 pos, float t) {
+    float eps = 0.02 + 0.00005 * t * t;
+    float hC = terrainH(pos.xz);
+    float hR = terrainH(pos.xz + vec2(eps, 0.0));
+    float hU = terrainH(pos.xz + vec2(0.0, eps));
+    return normalize(vec3(hC - hR, eps, hC - hU));
+}
+
+// ---- Soft Shadows ----
+float calcShadow(in vec3 pos, in vec3 sunDir) {
+    float res = 1.0, t = 1.0;
+    for (int i = 0; i < SHADOW_STEPS; i++) {
+        vec3 p = pos + t * sunDir;
+        float h = p.y - terrainM(p.xz);
+        if (h < 0.001) return 0.0;
+        res = min(res, SHADOW_K * h / t);
+        t += clamp(h, 2.0, 100.0);
+    }
+    return clamp(res, 0.0, 1.0);
+}
+
+// ---- Materials ----
+vec3 getMaterial(in vec3 pos, in vec3 nor) {
+    float slope = nor.y, h = pos.y;
+    float nz = noise(pos.xz * 0.04) * noise(pos.xz * 0.005);
+    vec3 rock  = vec3(0.10, 0.09, 0.08);
+    vec3 grass = mix(vec3(0.10, 0.08, 0.04), vec3(0.05, 0.09, 0.02), nz);
+    vec3 snow  = vec3(0.62, 0.65, 0.70);
+    vec3 sand  = vec3(0.50, 0.45, 0.35);
+    vec3 col = rock;
+    col = mix(col, grass, smoothstep(0.5, 0.8, slope));
+    float snowMask = smoothstep(SNOW_HEIGHT - 20.0 * nz, SNOW_HEIGHT + 10.0, h)
+                   * smoothstep(0.3, 0.7, slope);
+    col = mix(col, snow, snowMask);
+    float beachMask = smoothstep(2.5, 0.0, h) * smoothstep(0.5, 0.9, slope);
+    col = mix(col, sand, beachMask);
+    return col;
+}
+
+// ---- Lighting ----
+vec3 calcLighting(in vec3 pos, in vec3 nor, in vec3 rd, float shadow) {
+    float dif = clamp(dot(nor, SUN_DIR), 0.0, 1.0);
+    float amb = 0.5 + 0.5 * nor.y;
+    vec3 backDir = normalize(vec3(-SUN_DIR.x, 0.0, -SUN_DIR.z));
+    float bac = clamp(0.2 + 0.8 * dot(nor, backDir), 0.0, 1.0);
+    float fre = pow(clamp(1.0 + dot(rd, nor), 0.0, 1.0), 2.0);
+    vec3 hal = normalize(SUN_DIR - rd);
+    float spe = pow(clamp(dot(nor, hal), 0.0, 1.0), 16.0)
+              * (0.04 + 0.96 * pow(1.0 + dot(hal, rd), 5.0));
+    vec3 lin = vec3(0.0);
+    lin += dif * shadow * SUN_COL * 0.1;
+    lin += amb * SKY_COL * 0.2;
+    lin += bac * vec3(0.15, 0.05, 0.04);
+    lin += fre * SKY_COL * 0.3;
+    lin += spe * shadow * SUN_COL * 0.05;
+    return lin;
+}
+
+// ---- Atmosphere ----
+vec3 applyFog(in vec3 col, float t, in vec3 rd) {
+    vec3 ext = exp(-t * FOG_DENSITY * vec3(1.0, 1.5, 4.0));
+    float sundot = clamp(dot(rd, SUN_DIR), 0.0, 1.0);
+    vec3 fogCol = mix(vec3(0.55, 0.55, 0.58), vec3(1.0, 0.7, 0.3), 0.3 * pow(sundot, 8.0));
+    return col * ext + fogCol * (1.0 - ext);
+}
+
+// ---- Sky ----
+vec3 getSky(in vec3 rd) {
+    vec3 col = vec3(0.3, 0.5, 0.85) - rd.y * vec3(0.2, 0.15, 0.0);
+    float horizon = pow(1.0 - max(rd.y, 0.0), 4.0);
+    col = mix(col, vec3(0.8, 0.75, 0.7), 0.5 * horizon);
+    float sundot = clamp(dot(rd, SUN_DIR), 0.0, 1.0);
+    col += vec3(1.0, 0.7, 0.3) * 0.3 * pow(sundot, 8.0);
+    col += vec3(1.0, 0.9, 0.7) * 0.5 * pow(sundot, 64.0);
+    col += vec3(1.0, 1.0, 0.9) * min(pow(sundot, 1150.0), 0.3);
+    return col;
+}
+
+// ---- Camera ----
+vec3 cameraPath(float t) {
+    return vec3(100.0 * sin(0.2 * t), 0.0, -100.0 * t);
+}
+
+mat3 setCamera(in vec3 ro, in vec3 ta) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cu = normalize(cross(cw, vec3(0.0, 1.0, 0.0)));
+    vec3 cv = cross(cu, cw);
+    return mat3(cu, cv, cw);
+}
+
+// ======== Main Function ========
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+    float time = iTime * 0.5;
+    vec3 ro = cameraPath(time);
+    ro.y = terrainL(ro.xz) + CAM_ALTITUDE;
+    vec3 ta = cameraPath(time + 2.0);
+    ta.y = terrainL(ta.xz) + CAM_ALTITUDE * 0.5;
+    mat3 cam = setCamera(ro, ta);
+    vec3 rd = cam * normalize(vec3(uv, 1.5));
+
+    float t = raymarch(ro, rd);
+    vec3 col;
+    if (t > 0.0) {
+        vec3 pos = ro + t * rd;
+        vec3 nor = calcNormal(pos, t);
+        vec3 mate = getMaterial(pos, nor);
+        float sha = calcShadow(pos + nor * 0.5, SUN_DIR);
+        vec3 lin = calcLighting(pos, nor, rd, sha);
+        col = mate * lin;
+        col = applyFog(col, t, rd);
+    } else {
+        col = getSky(rd);
+    }
+    col = 1.0 - exp(-col * 2.0);
+    col = pow(col, vec3(1.0 / 2.2));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+### Binary Refinement (optional, called after raymarch)
+
+```glsl
+float bisect(in vec3 ro, in vec3 rd, float tNear, float tFar) {
+    for (int i = 0; i < 5; i++) {
+        float tMid = 0.5 * (tNear + tFar);
+        vec3 pos = ro + tMid * rd;
+        float h = pos.y - terrainM(pos.xz);
+        if (h > 0.0) tNear = tMid; else tFar = tMid;
+    }
+    return 0.5 * (tNear + tFar);
+}
+```
+
+## Common Variants
+
+### Relaxation Marching
+
+Automatically increases step size at far distances, covering greater range in 90 steps.
+
+```glsl
+float raymarchRelax(in vec3 ro, in vec3 rd) {
+    float t = 0.0;
+    float d = (ro + rd * t).y - terrainM((ro + rd * t).xz);
+    for (int i = 0; i < 90; i++) {
+        if (abs(d) < t * 0.0001 || t > 400.0) break;
+        float rl = max(t * 0.02, 1.0);
+        t += d * rl;
+        vec3 pos = ro + t * rd;
+        d = (pos.y - terrainM(pos.xz)) * 0.7;
+    }
+    return t;
+}
+```
+
+### Sign-Alternating FBM
+
+Amplitude flips sign each layer, producing rugged alternating ridge/valley patterns.
+
+```glsl
+float terrainSignFlip(in vec2 p) {
+    p *= TERRAIN_SCALE;
+    float a = 0.0, w = 1.0;
+    for (int i = 0; i < TERRAIN_OCTAVES; i++) {
+        a += w * noise(p);
+        w = -w * 0.4;
+        p = m2 * p * 2.0;
+    }
+    return a * TERRAIN_HEIGHT;
+}
+```
+
+### Canyon Style (Texture-Driven + 3D Displacement)
+
+Texture sampling + 3D FBM displacement, supporting cliffs/caves and other non-heightfield formations.
+
+```glsl
+float noise3D(in vec3 x) {
+    vec3 p = floor(x); vec3 f = fract(x);
+    f = f * f * (3.0 - 2.0 * f);
+    vec2 uv = (p.xy + vec2(37.0, 17.0) * p.z) + f.xy;
+    vec2 rg = textureLod(iChannel0, (uv + 0.5) / 256.0, 0.0).yx;
+    return mix(rg.x, rg.y, f.z);
+}
+
+const mat3 m3 = mat3(0.00, 0.80, 0.60, -0.80, 0.36,-0.48, -0.60,-0.48, 0.64);
+
+float displacement(vec3 p) {
+    float f = 0.5 * noise3D(p); p = m3 * p * 2.02;
+    f += 0.25 * noise3D(p);    p = m3 * p * 2.03;
+    f += 0.125 * noise3D(p);   p = m3 * p * 2.01;
+    f += 0.0625 * noise3D(p);
+    return f;
+}
+
+float mapCanyon(vec3 p) {
+    float h = terrainM(p.xz);
+    float dis = displacement(0.25 * p * vec3(1.0, 4.0, 1.0)) * 3.0;
+    return (dis + p.y - h) * 0.25;
+}
+```
+
+### Directional Erosion Noise
+
+Slope direction drives Gabor noise projection, producing realistic dendritic drainage patterns.
+
+```glsl
+#define EROSION_BRANCH 1.5
+
+vec3 erosionNoise(vec2 p, vec2 dir) {
+    vec2 ip = floor(p); vec2 fp = fract(p) - 0.5;
+    float va = 0.0, wt = 0.0; vec2 dva = vec2(0.0);
+    for (int i = -2; i <= 1; i++)
+    for (int j = -2; j <= 1; j++) {
+        vec2 o = vec2(float(i), float(j));
+        vec2 h = hash2(ip - o) * 0.5;
+        vec2 pp = fp + o + h;
+        float d = dot(pp, pp);
+        float w = exp(-d * 2.0);
+        float mag = dot(pp, dir);
+        va += cos(mag * 6.283) * w;
+        dva += -sin(mag * 6.283) * dir * w;
+        wt += w;
+    }
+    return vec3(va, dva) / wt;
+}
+
+float terrainErosion(vec2 p, vec2 baseSlope) {
+    float e = 0.0, a = 0.5;
+    vec2 dir = normalize(baseSlope + vec2(0.001));
+    for (int i = 0; i < 5; i++) {
+        vec3 n = erosionNoise(p * 4.0, dir);
+        e += a * n.x;
+        dir = normalize(dir + n.zy * vec2(1.0, -1.0) * EROSION_BRANCH);
+        a *= 0.5; p *= 2.0;
+    }
+    return e;
+}
+```
+
+### Volumetric Clouds + God Rays
+
+Front-to-back alpha compositing of cloud slabs, accumulating god ray factor.
+
+```glsl
+#define CLOUD_BASE 200.0
+#define CLOUD_TOP 300.0
+
+vec4 raymarchClouds(vec3 ro, vec3 rd) {
+    float tmin = (CLOUD_BASE - ro.y) / rd.y;
+    float tmax = (CLOUD_TOP  - ro.y) / rd.y;
+    if (tmin > tmax) { float tmp = tmin; tmin = tmp; tmax = tmp; }
+    if (tmin < 0.0) tmin = 0.0;
+    float t = tmin;
+    vec4 sum = vec4(0.0); float rays = 0.0;
+    for (int i = 0; i < 64; i++) {
+        if (sum.a > 0.99 || t > tmax) break;
+        vec3 pos = ro + t * rd;
+        float hFrac = (pos.y - CLOUD_BASE) / (CLOUD_TOP - CLOUD_BASE);
+        float shape = 1.0 - 2.0 * abs(hFrac - 0.5);
+        float den = shape - 1.6 * (1.0 - noise(pos.xz * 0.01));
+        if (den > 0.0) {
+            float shadowDen = shape - 1.6 * (1.0 - noise((pos.xz + SUN_DIR.xz * 30.0) * 0.01));
+            float shadow = clamp(1.0 - shadowDen * 2.0, 0.0, 1.0);
+            vec3 cloudCol = mix(vec3(0.4, 0.4, 0.45), vec3(1.0, 0.95, 0.8), shadow);
+            float alpha = clamp(den * 0.4, 0.0, 1.0);
+            rays += 0.02 * shadow * (1.0 - sum.a);
+            cloudCol *= alpha;
+            sum += vec4(cloudCol, alpha) * (1.0 - sum.a);
+        }
+        t += max(0.5, 0.05 * t);
+    }
+    sum.rgb += pow(rays, 3.0) * 0.4 * vec3(1.0, 0.8, 0.7);
+    return sum;
+}
+```
+
+## Performance & Composition
+
+**Performance:**
+- LOD tiers: low octaves for marching (3-9), high octaves for normals (16), lowest for camera (3)
+- Upper bound clipping: intersect ray with terrain max height plane before marching
+- Adaptive precision: hit threshold `abs(h) < k * t`, tolerates larger error at distance
+- Texture instead of noise: `textureLod` sampling of pre-baked noise, 2-3x speed
+- Early exit: `t > MAX_DIST`, `alpha > 0.99`, shadow `h < 0`
+- Dithered start: `t += hash(fragCoord) * step_size` to eliminate banding artifacts
+
+**Composition:**
+- Terrain + water: water at a fixed y-plane, multi-frequency noise perturbing normals, Fresnel controlling reflection/refraction
+- Terrain + volumetric clouds: render terrain first, then march cloud slab, front-to-back alpha compositing
+- Terrain + volumetric fog: additionally sample 3D FBM density field along ray, decay with distance
+- Terrain + SDF objects: `floor(p.xz/gridSize)` grid placement, `hash(cell)` randomization
+- Terrain + TAA: inter-frame reprojection blending, ~10% new frame + 90% history frame
+
+## Further Reading
+
+For full step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/terrain-rendering.md)
--- a/skills/shader-dev/techniques/texture-mapping-advanced.md
+++ b/skills/shader-dev/techniques/texture-mapping-advanced.md
@@ -0,0 +1,121 @@
+# Advanced Texture Mapping Techniques
+
+## Use Cases
+- Texturing 3D surfaces without UV seams (triplanar/biplanar mapping)
+- Eliminating visible tiling repetition on large surfaces
+- Proper texture filtering in ray-marched scenes (mip-level selection)
+- Combining procedural and sampled textures
+
+## Techniques
+
+### 1. Biplanar Mapping (Optimized Triplanar)
+
+Uses only 2 texture fetches instead of 3, selecting the two most relevant projection axes:
+
+```glsl
+vec4 biplanar(sampler2D sam, vec3 p, vec3 n, float k) {
+    vec3 dpdx = dFdx(p);
+    vec3 dpdy = dFdy(p);
+    n = abs(n);
+
+    // Determine major, minor, median axes
+    ivec3 ma = (n.x > n.y && n.x > n.z) ? ivec3(0,1,2) :
+               (n.y > n.z)              ? ivec3(1,2,0) : ivec3(2,0,1);
+    ivec3 mi = (n.x < n.y && n.x < n.z) ? ivec3(0,1,2) :
+               (n.y < n.z)              ? ivec3(1,2,0) : ivec3(2,0,1);
+    ivec3 me = ivec3(3) - mi - ma;
+
+    // Two texture fetches (major and median projections)
+    vec4 x = textureGrad(sam, vec2(p[ma.y], p[ma.z]),
+                         vec2(dpdx[ma.y], dpdx[ma.z]),
+                         vec2(dpdy[ma.y], dpdy[ma.z]));
+    vec4 y = textureGrad(sam, vec2(p[me.y], p[me.z]),
+                         vec2(dpdx[me.y], dpdx[me.z]),
+                         vec2(dpdy[me.y], dpdy[me.z]));
+
+    // Blend weights with local support
+    vec2 w = vec2(n[ma.x], n[me.x]);
+    w = clamp((w - 0.5773) / (1.0 - 0.5773), 0.0, 1.0);  // 0.5773 = 1/sqrt(3)
+    w = pow(w, vec2(k / 8.0));
+
+    return (x * w.x + y * w.y) / (w.x + w.y);
+}
+// Usage: vec4 col = biplanar(tex, worldPos * scale, worldNormal, 8.0);
+```
+
+**Why biplanar over triplanar**: Saves one texture fetch (bandwidth-bound advantage), with k=8 visually equivalent to triplanar. The `dFdx/dFdy` gradient propagation prevents mipmap seams at axis-switching boundaries.
+
+### 2. Texture Repetition Avoidance
+
+Three approaches to eliminate visible tiling patterns:
+
+#### Method A: Per-Tile Random Offset (4 fetches)
+```glsl
+vec4 textureNoTile(sampler2D sam, vec2 uv) {
+    vec2 iuv = floor(uv);
+    vec2 fuv = fract(uv);
+
+    // Generate 4 random offsets for the 4 surrounding tiles
+    vec4 ofa = hash42(iuv + vec2(0, 0));
+    vec4 ofb = hash42(iuv + vec2(1, 0));
+    vec4 ofc = hash42(iuv + vec2(0, 1));
+    vec4 ofd = hash42(iuv + vec2(1, 1));
+
+    // Transform UVs per tile
+    vec2 uva = uv + ofa.xy;
+    vec2 uvb = uv + ofb.xy;
+    vec2 uvc = uv + ofc.xy;
+    vec2 uvd = uv + ofd.xy;
+
+    // Blend near borders with smooth weights
+    vec2 b = smoothstep(0.25, 0.75, fuv);
+    return mix(mix(texture(sam, uva), texture(sam, uvb), b.x),
+               mix(texture(sam, uvc), texture(sam, uvd), b.x), b.y);
+}
+```
+
+#### Method B: Virtual Pattern (2 fetches, cheapest)
+```glsl
+vec4 textureNoTileCheap(sampler2D sam, vec2 uv) {
+    float k = texture(iChannel1, 0.005 * uv).x;  // low-freq variation index
+    float index = k * 8.0;
+    float i = floor(index);
+    float f = fract(index);
+
+    // Two offset lookups based on index
+    vec2 offa = sin(vec2(3.0, 7.0) * (i + 0.0));
+    vec2 offb = sin(vec2(3.0, 7.0) * (i + 1.0));
+
+    return mix(texture(sam, uv + offa), texture(sam, uv + offb), smoothstep(0.2, 0.8, f));
+}
+```
+
+### 3. Ray Differential Texture Filtering
+
+For ray-marched scenes, compute proper mip levels using ray differentials:
+```glsl
+// After finding hit point pos with normal nor:
+// 1. Compute neighbor pixel ray directions
+vec3 rdx = normalize(rd + dFdx(rd));  // x-neighbor ray
+vec3 rdy = normalize(rd + dFdy(rd));  // y-neighbor ray
+
+// 2. Intersect neighbors with tangent plane at hit point
+float dt_dx = -dot(pos - ro, nor) / dot(rdx, nor);
+float dt_dy = -dot(pos - ro, nor) / dot(rdy, nor);
+vec3 posDx = ro + rdx * dt_dx;
+vec3 posDy = ro + rdy * dt_dy;
+
+// 3. World-space position derivatives = pixel footprint
+vec3 dposdx = posDx - pos;
+vec3 dposdy = posDy - pos;
+
+// 4. Transform to texture derivatives and use textureGrad
+// For simple planar mapping (e.g. ground plane):
+vec2 duvdx = dposdx.xz * textureScale;
+vec2 duvdy = dposdy.xz * textureScale;
+vec4 color = textureGrad(tex, pos.xz * textureScale, duvdx, duvdy);
+```
+
+This provides correct mip-level selection for procedural and sampled textures on ray-marched surfaces, eliminating shimmer and aliasing at distance.
+
+→ For deeper details, see [reference/texture-mapping-advanced.md](../reference/texture-mapping-advanced.md)
--- a/skills/shader-dev/techniques/texture-sampling.md
+++ b/skills/shader-dev/techniques/texture-sampling.md
@@ -0,0 +1,382 @@
+**IMPORTANT - GLSL Type Strictness**:
+- GLSL is a strongly-typed language and does not support the `string` type (you cannot define `string var`)
+- `vec2`/`vec3`/`vec4` are vector types and cannot be directly assigned a float (e.g., `vec2 a = 1.0` must be `vec2 a = vec2(1.0)`)
+- Array indices must be integer constants or uniform variables; runtime-computed floats cannot be used
+- Avoid uninitialized variables — GLSL default values are undefined
+
+# Texture Sampling
+
+## Use Cases
+
+- **Post-processing effects**: Blur, bloom, dispersion, chromatic aberration
+- **Procedural noise**: FBM layering from noise textures to generate terrain, clouds, fire
+- **PBR/IBL**: Cubemap environment lighting, BRDF LUT lookup
+- **Simulation/feedback systems**: Reaction-diffusion, fluid simulation multi-buffer feedback
+- **Data storage**: Textures used as structured data (game state, keyboard input)
+- **Temporal accumulation**: TAA, motion blur, previous frame reading
+
+## Core Principles
+
+| Function | Coordinate Type | Filtering | Typical Use |
+|----------|----------------|-----------|-------------|
+| `texture(sampler, uv)` | Float UV `[0,1]` | Hardware bilinear | General texture reading |
+| `textureLod(sampler, uv, lod)` | Float UV + LOD | Specified mip level | Control blur level / avoid auto mip |
+| `texelFetch(sampler, ivec2, lod)` | Integer pixel coordinates | No filtering | Exact pixel data reading |
+
+Key mathematics:
+1. **Hardware bilinear interpolation**: `texture()` automatically linearly blends between 4 adjacent texels
+2. **Quintic Hermite smoothing**: `u = f^3(6f^2 - 15f + 10)`, C2 continuous (eliminates hardware linear interpolation seams)
+3. **LOD control**: `textureLod` third parameter selects mipmap level, `lod=0` is original resolution, each +1 halves resolution
+4. **Coordinate wrapping**: `fract(uv)` implements torus boundary, equivalent to `GL_REPEAT`
+
+## Implementation Steps
+
+### Step 1: Basic Sampling and UV Normalization
+
+```glsl
+vec2 uv = fragCoord / iResolution.xy;
+vec4 col = texture(iChannel0, uv);
+```
+
+### Step 2: textureLod for Mipmap Control
+
+```glsl
+// In ray marching: force LOD 0 to avoid artifacts
+vec3 groundCol = textureLod(iChannel2, groundUv * 0.05, 0.0).rgb;
+
+// Depth of field blur: LOD varies with distance
+float focus = mix(maxBlur - coverage, minBlur, smoothstep(.1, .2, coverage));
+vec3 col = textureLod(iChannel0, uv + normal, focus).rgb;
+
+// Bloom: sample high mip levels
+#define BLOOM_LOD_A 4.0  // adjustable: bloom first mip level
+#define BLOOM_LOD_B 5.0
+#define BLOOM_LOD_C 6.0
+vec3 bloom = vec3(0.0);
+bloom += textureLod(iChannel0, uv + off * exp2(BLOOM_LOD_A), BLOOM_LOD_A).rgb;
+bloom += textureLod(iChannel0, uv + off * exp2(BLOOM_LOD_B), BLOOM_LOD_B).rgb;
+bloom += textureLod(iChannel0, uv + off * exp2(BLOOM_LOD_C), BLOOM_LOD_C).rgb;
+bloom /= 3.0;
+```
+
+### Step 3: texelFetch for Exact Pixel Reading
+
+```glsl
+// Data storage addresses
+const ivec2 txBallPosVel = ivec2(0, 0);
+const ivec2 txPaddlePos  = ivec2(1, 0);
+const ivec2 txPoints     = ivec2(2, 0);
+const ivec2 txState      = ivec2(3, 0);
+
+vec4 loadValue(in ivec2 addr) {
+    return texelFetch(iChannel0, addr, 0);
+}
+
+void storeValue(in ivec2 addr, in vec4 val, inout vec4 fragColor, in ivec2 fragPos) {
+    fragColor = (fragPos == addr) ? val : fragColor;
+}
+
+// Keyboard input
+float key = texelFetch(iChannel1, ivec2(KEY_SPACE, 0), 0).x;
+```
+
+### Step 4: Manual Bilinear + Quintic Hermite Smoothing
+
+```glsl
+float noise(vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+    vec2 u = f * f * f * (f * (f * 6.0 - 15.0) + 10.0); // C2 continuous
+
+    #define TEX_RES 1024.0  // adjustable: noise texture resolution
+    float a = texture(iChannel0, (p + vec2(0.0, 0.0)) / TEX_RES).x;
+    float b = texture(iChannel0, (p + vec2(1.0, 0.0)) / TEX_RES).x;
+    float c = texture(iChannel0, (p + vec2(0.0, 1.0)) / TEX_RES).x;
+    float d = texture(iChannel0, (p + vec2(1.0, 1.0)) / TEX_RES).x;
+
+    return a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y;
+}
+```
+
+### Step 5: FBM Texture Noise
+
+```glsl
+#define FBM_OCTAVES 5       // adjustable: number of layers
+#define FBM_PERSISTENCE 0.5 // adjustable: amplitude decay rate
+
+float fbm(vec2 x) {
+    float v = 0.0;
+    float a = 0.5;
+    float totalWeight = 0.0;
+    for (int i = 0; i < FBM_OCTAVES; i++) {
+        v += a * noise(x);
+        totalWeight += a;
+        x *= 2.0;
+        a *= FBM_PERSISTENCE;
+    }
+    return v / totalWeight;
+}
+```
+
+### Step 6: Separable Gaussian Blur
+
+```glsl
+#define BLUR_RADIUS 4  // adjustable: blur radius
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    vec2 d = vec2(1.0 / iResolution.x, 0.0); // horizontal pass; for vertical pass change to vec2(0, 1/iResolution.y)
+    float w[9] = float[9](0.05, 0.09, 0.12, 0.15, 0.16, 0.15, 0.12, 0.09, 0.05);
+
+    vec4 col = vec4(0.0);
+    for (int i = -4; i <= 4; i++) {
+        col += w[i + 4] * texture(iChannel0, fract(uv + float(i) * d));
+    }
+    col /= 0.98;
+    fragColor = col;
+}
+```
+
+### Step 7: Dispersion Sampling
+
+```glsl
+#define DISP_SAMPLES 64  // adjustable: sample count
+
+vec3 sampleWeights(float i) {
+    return vec3(i * i, 46.6666 * pow((1.0 - i) * i, 3.0), (1.0 - i) * (1.0 - i));
+}
+
+vec3 sampleDisp(sampler2D tex, vec2 uv, vec2 disp) {
+    vec3 col = vec3(0.0);
+    vec3 totalWeight = vec3(0.0);
+    for (int i = 0; i < DISP_SAMPLES; i++) {
+        float t = float(i) / float(DISP_SAMPLES);
+        vec3 w = sampleWeights(t);
+        col += w * texture(tex, fract(uv + disp * t)).rgb;
+        totalWeight += w;
+    }
+    return col / totalWeight;
+}
+```
+
+### Step 8: IBL Environment Sampling
+
+```glsl
+#define MAX_LOD 7.0     // adjustable: cubemap max mip level
+#define DIFFUSE_LOD 6.5 // adjustable: diffuse sampling LOD
+
+vec3 getSpecularLightColor(vec3 N, float roughness) {
+    vec3 raw = textureLod(iChannel0, N, roughness * MAX_LOD).rgb;
+    return pow(raw, vec3(4.5)) * 6.5; // HDR approximation
+}
+
+vec3 getDiffuseLightColor(vec3 N) {
+    return textureLod(iChannel0, N, DIFFUSE_LOD).rgb;
+}
+
+// BRDF LUT lookup
+vec2 brdf = texture(iChannel3, vec2(NdotV, roughness)).rg;
+vec3 specular = envColor * (F * brdf.x + brdf.y);
+```
+
+## Complete Code Template
+
+iChannel0 bound to a noise texture (e.g., "Gray Noise Medium"), with mipmap enabled.
+
+```glsl
+// === Texture Sampling Comprehensive Demo ===
+// iChannel0: noise texture (requires mipmap enabled)
+
+#define TEX_RES 256.0
+#define FBM_OCTAVES 6
+#define FBM_PERSISTENCE 0.5
+#define CLOUD_LAYERS 4
+#define CLOUD_SPEED 0.02
+#define DOF_MAX_BLUR 5.0
+#define DOF_FOCUS_DIST 0.5
+#define BLOOM_STRENGTH 0.3
+#define BLOOM_LOD 4.0
+
+float noise(vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+    vec2 u = f * f * f * (f * (f * 6.0 - 15.0) + 10.0);
+
+    float a = textureLod(iChannel0, (p + vec2(0.0, 0.0)) / TEX_RES, 0.0).x;
+    float b = textureLod(iChannel0, (p + vec2(1.0, 0.0)) / TEX_RES, 0.0).x;
+    float c = textureLod(iChannel0, (p + vec2(0.0, 1.0)) / TEX_RES, 0.0).x;
+    float d = textureLod(iChannel0, (p + vec2(1.0, 1.0)) / TEX_RES, 0.0).x;
+
+    return a + (b - a) * u.x + (c - a) * u.y + (a - b - c + d) * u.x * u.y;
+}
+
+float fbm(vec2 x) {
+    float v = 0.0;
+    float a = 0.5;
+    float w = 0.0;
+    for (int i = 0; i < FBM_OCTAVES; i++) {
+        v += a * noise(x);
+        w += a;
+        x *= 2.0;
+        a *= FBM_PERSISTENCE;
+    }
+    return v / w;
+}
+
+float cloudLayer(vec2 uv, float height, float time) {
+    vec2 offset = vec2(time * CLOUD_SPEED * (1.0 + height), 0.0);
+    float n = fbm((uv + offset) * (2.0 + height * 3.0));
+    return smoothstep(0.4, 0.7, n);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 uv = fragCoord / iResolution.xy;
+    float aspect = iResolution.x / iResolution.y;
+
+    // 1. Procedural sky
+    vec3 sky = mix(vec3(0.1, 0.15, 0.4), vec3(0.5, 0.7, 1.0), uv.y);
+
+    // 2. FBM cloud layers
+    vec3 col = sky;
+    for (int i = 0; i < CLOUD_LAYERS; i++) {
+        float h = float(i) / float(CLOUD_LAYERS);
+        float density = cloudLayer(vec2(uv.x * aspect, uv.y), h, iTime);
+        vec3 cloudCol = mix(vec3(0.8, 0.85, 0.9), vec3(1.0), h);
+        col = mix(col, cloudCol, density * (0.3 + 0.7 * h));
+    }
+
+    // 3. textureLod depth of field blur
+    float dist = abs(uv.y - DOF_FOCUS_DIST);
+    float lod = dist * DOF_MAX_BLUR;
+    vec3 blurred = textureLod(iChannel0, uv, lod).rgb;
+    col = mix(col, blurred * 0.5 + col * 0.5, 0.3);
+
+    // 4. Bloom
+    vec3 bloom = textureLod(iChannel0, uv, BLOOM_LOD).rgb;
+    bloom += textureLod(iChannel0, uv, BLOOM_LOD + 1.0).rgb;
+    bloom += textureLod(iChannel0, uv, BLOOM_LOD + 2.0).rgb;
+    bloom /= 3.0;
+    col += bloom * BLOOM_STRENGTH;
+
+    // 5. Post-processing
+    col = (col * (6.2 * col + 0.5)) / (col * (6.2 * col + 1.7) + 0.06);
+    col *= 0.5 + 0.5 * pow(16.0 * uv.x * uv.y * (1.0 - uv.x) * (1.0 - uv.y), 0.2);
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Anisotropic Flow-Field Blur
+
+```glsl
+#define BLUR_ITERATIONS 32  // adjustable: number of samples along flow field
+#define BLUR_STEP 0.008     // adjustable: UV offset per step
+
+vec3 flowBlur(vec2 uv) {
+    vec3 col = vec3(0.0);
+    float acc = 0.0;
+    for (int i = 0; i < BLUR_ITERATIONS; i++) {
+        float h = float(i) / float(BLUR_ITERATIONS);
+        float w = 4.0 * h * (1.0 - h);
+        col += w * texture(iChannel0, uv).rgb;
+        acc += w;
+        vec2 dir = texture(iChannel1, uv).xy * 2.0 - 1.0;
+        uv += BLUR_STEP * dir;
+    }
+    return col / acc;
+}
+```
+
+### Variant 2: Buffer-as-Data Storage
+
+```glsl
+const ivec2 txPosition = ivec2(0, 0);
+const ivec2 txVelocity = ivec2(1, 0);
+const ivec2 txState    = ivec2(2, 0);
+
+vec4 load(ivec2 addr) { return texelFetch(iChannel0, addr, 0); }
+
+void store(ivec2 addr, vec4 val, inout vec4 fragColor, ivec2 fragPos) {
+    fragColor = (fragPos == addr) ? val : fragColor;
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    ivec2 p = ivec2(fragCoord);
+    fragColor = texelFetch(iChannel0, p, 0);
+    vec4 pos = load(txPosition);
+    vec4 vel = load(txVelocity);
+    // ... update logic ...
+    store(txPosition, pos + vel * 0.016, fragColor, p);
+    store(txVelocity, vel, fragColor, p);
+}
+```
+
+### Variant 3: Dispersion Effect
+
+```glsl
+#define DISP_SAMPLES 64     // adjustable: sample count
+#define DISP_STRENGTH 0.05  // adjustable: dispersion strength
+
+vec3 dispersion(vec2 uv, vec2 displacement) {
+    vec3 col = vec3(0.0);
+    vec3 w_total = vec3(0.0);
+    for (int i = 0; i < DISP_SAMPLES; i++) {
+        float t = float(i) / float(DISP_SAMPLES);
+        vec3 w = vec3(t * t, 46.666 * pow((1.0 - t) * t, 3.0), (1.0 - t) * (1.0 - t));
+        col += w * texture(iChannel0, fract(uv + displacement * t * DISP_STRENGTH)).rgb;
+        w_total += w;
+    }
+    return col / w_total;
+}
+```
+
+### Variant 4: Triplanar Texture Mapping
+
+```glsl
+#define TRIPLANAR_SHARPNESS 2.0  // adjustable: blend sharpness
+
+vec3 triplanarSample(sampler2D tex, vec3 pos, vec3 normal, float scale) {
+    vec3 w = pow(abs(normal), vec3(TRIPLANAR_SHARPNESS));
+    w /= (w.x + w.y + w.z);
+    vec3 xSample = texture(tex, pos.yz * scale).rgb;
+    vec3 ySample = texture(tex, pos.xz * scale).rgb;
+    vec3 zSample = texture(tex, pos.xy * scale).rgb;
+    return xSample * w.x + ySample * w.y + zSample * w.z;
+}
+```
+
+### Variant 5: Temporal Reprojection (TAA)
+
+```glsl
+#define TAA_BLEND 0.9  // adjustable: history frame blend ratio
+
+vec3 temporalBlend(vec2 currUv, vec2 prevUv, vec3 currColor) {
+    vec3 history = textureLod(iChannel0, prevUv, 0.0).rgb;
+    vec3 minCol = currColor - 0.1;
+    vec3 maxCol = currColor + 0.1;
+    history = clamp(history, minCol, maxCol);
+    return mix(currColor, history, TAA_BLEND);
+}
+```
+
+## Performance & Composition
+
+**Performance Tips**:
+- Heavy sampling (e.g., 64 dispersion samples) is a bandwidth bottleneck — reduce sample count + use smart weight compensation; use `textureLod` with high LOD to reduce cache misses
+- 2D Gaussian blur uses separable two-pass (O(N^2) -> O(2N)), leveraging hardware bilinear for (N+1)/2 samples to achieve N-tap
+- Must use `textureLod(..., 0.0)` inside ray marching — the GPU cannot correctly estimate screen-space derivatives
+- Manual Hermite interpolation is ~4x slower than hardware — only use for the first two FBM octaves, fall back to `texture()` for higher frequencies
+- Each multi-buffer feedback adds one frame of latency — merge operations into the same pass; use `texelFetch` to avoid filtering overhead
+
+**Composition Tips**:
+- **+ SDF Ray Marching**: Noise textures for displacement maps/materials; use `textureLod(..., 0.0)` inside ray marching
+- **+ Procedural Noise**: Hermite + FBM driving domain warping to generate terrain/clouds/fire; texture noise is faster than pure mathematical noise
+- **+ Post-Processing Pipeline**: Multi-LOD bloom → separable DOF → dispersion → tone mapping, chaining a complete post-processing pipeline
+- **+ PBR/IBL**: `textureLod` samples cubemap by roughness + BRDF LUT lookup = split-sum IBL
+- **+ Simulation/Feedback**: Multi-buffer reaction-diffusion/fluid; Buffer A state, B/C separable blur diffusion, Image visualization; `fract()` torus boundary
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/texture-sampling.md)
--- a/skills/shader-dev/techniques/volumetric-rendering.md
+++ b/skills/shader-dev/techniques/volumetric-rendering.md
@@ -0,0 +1,375 @@
+# Volumetric Rendering Skill
+
+## Use Cases
+- Rendering participating media: clouds, fog, smoke, fire, explosions, atmospheric scattering
+- Visual effects of light passing through and scattering/absorbing within semi-transparent volumes
+- Suitable for ShaderToy real-time fragment shaders, also portable to game engines
+
+## Core Principles
+
+Advance along each view ray at fixed or adaptive step sizes (Ray Marching), querying medium density at each sample point, accumulating color and opacity.
+
+### Key Formulas
+
+**Beer-Lambert transmittance**: `T = exp(-σe × d)`, where `σe = σs + σa`
+
+**Front-to-back alpha compositing (premultiplied form)**:
+```glsl
+col.rgb *= col.a;
+sum += col * (1.0 - sum.a);
+```
+
+**Henyey-Greenstein phase function**: `HG(cosθ, g) = (1 - g²) / (1 + g² - 2g·cosθ)^(3/2)`
+- `g > 0` forward scattering, `g < 0` back scattering, `g = 0` isotropic
+
+**Frostbite improved integration**: `Sint = (S - S × exp(-σe × dt)) / σe`
+
+## Implementation Steps
+
+### Step 1: Camera and Ray Construction
+```glsl
+vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+vec3 ro = vec3(0.0, 1.0, -5.0);  // Camera position
+vec3 ta = vec3(0.0, 0.0, 0.0);   // Look-at target
+vec3 ww = normalize(ta - ro);
+vec3 uu = normalize(cross(ww, vec3(0.0, 1.0, 0.0)));
+vec3 vv = cross(uu, ww);
+float fl = 1.5; // Focal length
+vec3 rd = normalize(uv.x * uu + uv.y * vv + fl * ww);
+```
+
+### Step 2: Volume Bounds Intersection
+```glsl
+// Method A: Horizontal plane bounds (cloud layers)
+float tmin = (yBottom - ro.y) / rd.y;
+float tmax = (yTop    - ro.y) / rd.y;
+if (tmin > tmax) { float tmp = tmin; tmin = tmax; tmax = tmp; }
+
+// Method B: Sphere bounds (explosions, atmosphere)
+vec2 intersectSphere(vec3 ro, vec3 rd, float r) {
+    float b = dot(ro, rd);
+    float c = dot(ro, ro) - r * r;
+    float d = b * b - c;
+    if (d < 0.0) return vec2(1e5, -1e5);
+    d = sqrt(d);
+    return vec2(-b - d, -b + d);
+}
+```
+
+### Step 3: Density Field Definition
+```glsl
+// 3D Value Noise (texture-based)
+float noise(vec3 x) {
+    vec3 p = floor(x);
+    vec3 f = fract(x);
+    f = f * f * (3.0 - 2.0 * f);
+    vec2 uv = (p.xy + vec2(37.0, 239.0) * p.z) + f.xy;
+    vec2 rg = textureLod(iChannel0, (uv + 0.5) / 256.0, 0.0).yx;
+    return mix(rg.x, rg.y, f.z);
+}
+
+// fBM
+float fbm(vec3 p) {
+    float f = 0.0;
+    f += 0.50000 * noise(p); p *= 2.02;
+    f += 0.25000 * noise(p); p *= 2.03;
+    f += 0.12500 * noise(p); p *= 2.01;
+    f += 0.06250 * noise(p); p *= 2.02;
+    f += 0.03125 * noise(p);
+    return f;
+}
+
+// Cloud density
+float cloudDensity(vec3 p) {
+    vec3 q = p - vec3(0.0, 0.1, 1.0) * iTime;
+    float f = fbm(q);
+    return clamp(1.5 - p.y - 2.0 + 1.75 * f, 0.0, 1.0);
+}
+```
+
+### Step 4: Ray Marching Main Loop
+```glsl
+#define NUM_STEPS 64
+#define STEP_SIZE 0.05
+
+vec4 raymarch(vec3 ro, vec3 rd, float tmin, float tmax, vec3 bgCol) {
+    vec4 sum = vec4(0.0);
+    // Dither start position to eliminate banding artifacts
+    float t = tmin + STEP_SIZE * fract(sin(dot(fragCoord, vec2(12.9898, 78.233))) * 43758.5453);
+
+    for (int i = 0; i < NUM_STEPS; i++) {
+        if (t > tmax || sum.a > 0.99) break;
+        vec3 pos = ro + t * rd;
+        float den = cloudDensity(pos);
+        if (den > 0.01) {
+            vec4 col = vec4(1.0, 0.95, 0.8, den);
+            col.a *= 0.4;
+            col.rgb *= col.a;
+            sum += col * (1.0 - sum.a);
+        }
+        t += STEP_SIZE;
+    }
+    return clamp(sum, 0.0, 1.0);
+}
+```
+
+### Step 5: Lighting Calculation
+```glsl
+// Method A: Directional derivative lighting (1 extra sample)
+vec3 sundir = normalize(vec3(1.0, 0.0, -1.0));
+float dif = clamp((den - cloudDensity(pos + 0.3 * sundir)) / 0.6, 0.0, 1.0);
+vec3 lin = vec3(1.0, 0.6, 0.3) * dif + vec3(0.91, 0.98, 1.05);
+
+// Method B: Volumetric shadow (secondary ray march)
+float volumetricShadow(vec3 from, vec3 lightDir) {
+    float shadow = 1.0, dt = 0.5, d = dt * 0.5;
+    for (int s = 0; s < 6; s++) {
+        shadow *= exp(-cloudDensity(from + lightDir * d) * dt);
+        dt *= 1.3; d += dt;
+    }
+    return shadow;
+}
+
+// Method C: HG phase function mixed scattering
+float HenyeyGreenstein(float cosTheta, float g) {
+    float gg = g * g;
+    return (1.0 - gg) / pow(1.0 + gg - 2.0 * g * cosTheta, 1.5);
+}
+float scattering = mix(
+    HenyeyGreenstein(dot(rd, -sundir), 0.8),
+    HenyeyGreenstein(dot(rd, -sundir), -0.2),
+    0.5
+);
+```
+
+### Step 6: Color Mapping
+```glsl
+// Method A: Density-interpolated coloring (clouds)
+vec3 cloudColor = mix(vec3(1.0, 0.95, 0.8), vec3(0.25, 0.3, 0.35), den);
+
+// Method B: Radial gradient coloring (explosions, fire)
+vec3 computeColor(float density, float radius) {
+    vec3 result = mix(vec3(1.0, 0.9, 0.8), vec3(0.4, 0.15, 0.1), density);
+    result *= mix(7.0 * vec3(0.8, 1.0, 1.0), 1.5 * vec3(0.48, 0.53, 0.5), min(radius / 0.9, 1.15));
+    return result;
+}
+
+// Method C: Height-based ambient light gradient
+vec3 ambientLight = mix(
+    vec3(39., 67., 87.) * (1.5 / 255.),
+    vec3(149., 167., 200.) * (1.5 / 255.),
+    normalizedHeight
+);
+```
+
+### Step 7: Final Compositing and Post-Processing
+```glsl
+// Sky background
+vec3 bgCol = vec3(0.6, 0.71, 0.75) - rd.y * 0.2 * vec3(1.0, 0.5, 1.0);
+float sun = clamp(dot(sundir, rd), 0.0, 1.0);
+bgCol += 0.2 * vec3(1.0, 0.6, 0.1) * pow(sun, 8.0);
+
+// Compositing
+vec4 vol = raymarch(ro, rd, tmin, tmax, bgCol);
+vec3 col = bgCol * (1.0 - vol.a) + vol.rgb;
+col += vec3(0.2, 0.08, 0.04) * pow(sun, 3.0); // Sun glare
+col = smoothstep(0.15, 1.1, col);               // Tone mapping
+```
+
+## Complete Code Template
+
+Runnable volumetric cloud renderer for ShaderToy (iChannel0 = Gray Noise Small 256x256):
+
+```glsl
+// Volumetric Cloud Renderer — ShaderToy Template
+
+#define NUM_STEPS 80
+#define SUN_DIR normalize(vec3(-0.7, 0.0, -0.7))
+#define CLOUD_BOTTOM -1.0
+#define CLOUD_TOP     2.0
+#define WIND_SPEED 0.1
+#define DENSITY_SCALE 1.75
+#define DENSITY_THRESHOLD 0.01
+
+float noise(vec3 x) {
+    vec3 p = floor(x);
+    vec3 f = fract(x);
+    f = f * f * (3.0 - 2.0 * f);
+    vec2 uv = (p.xy + vec2(37.0, 239.0) * p.z) + f.xy;
+    vec2 rg = textureLod(iChannel0, (uv + 0.5) / 256.0, 0.0).yx;
+    return mix(rg.x, rg.y, f.z) * 2.0 - 1.0;
+}
+
+float map(vec3 p, int lod) {
+    vec3 q = p - vec3(0.0, WIND_SPEED, 1.0) * iTime;
+    float f;
+    f  = 0.50000 * noise(q); q *= 2.02;
+    if (lod >= 2)
+    f += 0.25000 * noise(q); q *= 2.03;
+    if (lod >= 3)
+    f += 0.12500 * noise(q); q *= 2.01;
+    if (lod >= 4)
+    f += 0.06250 * noise(q); q *= 2.02;
+    if (lod >= 5)
+    f += 0.03125 * noise(q);
+    return clamp(1.5 - p.y - 2.0 + DENSITY_SCALE * f, 0.0, 1.0);
+}
+
+vec3 lightSample(vec3 pos, float den, int lod) {
+    float dif = clamp((den - map(pos + 0.3 * SUN_DIR, lod)) / 0.6, 0.0, 1.0);
+    vec3 lin = vec3(1.0, 0.6, 0.3) * dif + vec3(0.91, 0.98, 1.05);
+    vec3 col = mix(vec3(1.0, 0.95, 0.8), vec3(0.25, 0.3, 0.35), den);
+    return col * lin;
+}
+
+vec4 raymarch(vec3 ro, vec3 rd, vec3 bgcol, ivec2 px) {
+    float tmin = (CLOUD_BOTTOM - ro.y) / rd.y;
+    float tmax = (CLOUD_TOP - ro.y) / rd.y;
+    if (tmin > tmax) { float tmp = tmin; tmin = tmax; tmax = tmp; }
+    if (tmax < 0.0) return vec4(0.0);
+    tmin = max(tmin, 0.0);
+    tmax = min(tmax, 60.0);
+
+    float t = tmin + 0.1 * fract(sin(float(px.x * 73 + px.y * 311)) * 43758.5453);
+    vec4 sum = vec4(0.0);
+
+    for (int i = 0; i < NUM_STEPS; i++) {
+        float dt = max(0.05, 0.02 * t);
+        int lod = 5 - int(log2(1.0 + t * 0.5));
+        vec3 pos = ro + t * rd;
+        float den = map(pos, lod);
+
+        if (den > DENSITY_THRESHOLD) {
+            vec3 litCol = lightSample(pos, den, lod);
+            litCol = mix(litCol, bgcol, 1.0 - exp(-0.003 * t * t));
+            vec4 col = vec4(litCol, den);
+            col.a *= 0.4;
+            col.rgb *= col.a;
+            sum += col * (1.0 - sum.a);
+        }
+
+        t += dt;
+        if (t > tmax || sum.a > 0.99) break;
+    }
+    return clamp(sum, 0.0, 1.0);
+}
+
+mat3 setCamera(vec3 ro, vec3 ta, float cr) {
+    vec3 cw = normalize(ta - ro);
+    vec3 cp = vec3(sin(cr), cos(cr), 0.0);
+    vec3 cu = normalize(cross(cw, cp));
+    vec3 cv = normalize(cross(cu, cw));
+    return mat3(cu, cv, cw);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 p = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+    vec2 m = iMouse.xy / iResolution.xy;
+
+    vec3 ro = 4.0 * normalize(vec3(sin(3.0 * m.x), 0.8 * m.y, cos(3.0 * m.x)));
+    ro.y += 0.5;
+    vec3 ta = vec3(0.0, -1.0, 0.0);
+    mat3 ca = setCamera(ro, ta, 0.07 * cos(0.25 * iTime));
+    vec3 rd = ca * normalize(vec3(p, 1.5));
+
+    float sun = clamp(dot(SUN_DIR, rd), 0.0, 1.0);
+    vec3 bgcol = vec3(0.6, 0.71, 0.75) - rd.y * 0.2 * vec3(1.0, 0.5, 1.0) + 0.075;
+    bgcol += 0.2 * vec3(1.0, 0.6, 0.1) * pow(sun, 8.0);
+
+    vec4 res = raymarch(ro, rd, bgcol, ivec2(fragCoord - 0.5));
+    vec3 col = bgcol * (1.0 - res.a) + res.rgb;
+    col += vec3(0.2, 0.08, 0.04) * pow(sun, 3.0);
+    col = smoothstep(0.15, 1.1, col);
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Self-Emissive Volume (Fire/Explosions)
+```glsl
+vec3 emissionColor(float density, float radius) {
+    vec3 result = mix(vec3(1.0, 0.9, 0.8), vec3(0.4, 0.15, 0.1), density);
+    vec3 colCenter = 7.0 * vec3(0.8, 1.0, 1.0);
+    vec3 colEdge = 1.5 * vec3(0.48, 0.53, 0.5);
+    result *= mix(colCenter, colEdge, min(radius / 0.9, 1.15));
+    return result;
+}
+// Bloom effect
+sum.rgb += lightColor / exp(lDist * lDist * lDist * 0.08) / 30.0;
+```
+
+### Variant 2: Physical Scattering Atmosphere (Rayleigh + Mie)
+```glsl
+float density(vec3 p, float scaleHeight) {
+    return exp(-max(length(p) - R_INNER, 0.0) / scaleHeight);
+}
+float opticDepth(vec3 from, vec3 to, float scaleHeight) {
+    vec3 s = (to - from) / float(NUM_STEPS_LIGHT);
+    vec3 v = from + s * 0.5;
+    float sum = 0.0;
+    for (int i = 0; i < NUM_STEPS_LIGHT; i++) { sum += density(v, scaleHeight); v += s; }
+    return sum * length(s);
+}
+float phaseRayleigh(float cc) { return (3.0 / 16.0 / PI) * (1.0 + cc); }
+vec3 scatter = sumRay * kRay * phaseRayleigh(cc) + sumMie * kMie * phaseMie(-0.78, c, cc);
+```
+
+### Variant 3: Frostbite Energy-Conserving Integration
+```glsl
+vec3 S = evaluateLight(p) * sigmaS * phaseFunction() * volumetricShadow(p, lightPos);
+vec3 Sint = (S - S * exp(-sigmaE * dt)) / sigmaE;
+scatteredLight += transmittance * Sint;
+transmittance *= exp(-sigmaE * dt);
+```
+
+### Variant 4: Production-Grade Clouds (Horizon Zero Dawn Style)
+```glsl
+float m = cloudMapBase(pos, norY);
+m *= cloudGradient(norY);
+m -= cloudMapDetail(pos) * dstrength * 0.225;
+m = smoothstep(0.0, 0.1, m + (COVERAGE - 1.0));
+float scattering = mix(HenyeyGreenstein(sundotrd, 0.8), HenyeyGreenstein(sundotrd, -0.2), 0.5);
+// Temporal reprojection
+vec2 spos = reprojectPos(ro + rd * dist, iResolution.xy, iChannel1);
+col = mix(texture(iChannel1, spos, 0.0), col, 0.05);
+```
+
+### Variant 5: Gradient Normal Surface Lighting (Fur Ball / Volume Surface)
+```glsl
+vec3 furNormal(vec3 pos, float density) {
+    float eps = 0.01;
+    vec3 n;
+    n.x = sampleDensity(pos + vec3(eps, 0, 0)) - density;
+    n.y = sampleDensity(pos + vec3(0, eps, 0)) - density;
+    n.z = sampleDensity(pos + vec3(0, 0, eps)) - density;
+    return normalize(n);
+}
+vec3 N = -furNormal(pos, density);
+float diff = max(0.0, dot(N, L) * 0.5 + 0.5);  // Half-Lambert
+float spec = pow(max(0.0, dot(N, H)), 50.0);     // Blinn-Phong
+```
+
+## Performance & Composition
+
+### Performance Tips
+- **Early exit**: break out of loop when `sum.a > 0.99`
+- **LOD noise**: `int lod = 5 - int(log2(1.0 + t * 0.5));` reduce fBM octaves at distance
+- **Adaptive step size**: `float dt = max(0.05, 0.02 * t);` fine near, coarse far
+- **Dithering**: add pixel-dependent random offset to start position, eliminates banding artifacts
+- **Bounds clipping**: only march within the ray-volume intersection interval
+- **Density threshold skip**: only compute lighting when `den > 0.01`
+- **Minimal shadow steps**: 6-16 steps with increasing step size
+- **Temporal reprojection**: blend history frames (e.g., 5% new frame + 95% history frame)
+
+### Composition Tips
+- **SDF terrain + volumetric clouds**: mutual depth occlusion (Himalayas style)
+- **Volumetric fog + scene lighting**: `color = color * transmittance + scatteredLight`
+- **Multi-layer volumes**: different density functions at different heights, march independently then composite
+- **Post-process light shafts (God Rays)**: radial blur or screen-space ray marching
+- **Procedural sky + volumetric clouds**: distance fogging for natural transitions
+
+## Further Reading
+
+For full step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/volumetric-rendering.md)
--- a/skills/shader-dev/techniques/voronoi-cellular-noise.md
+++ b/skills/shader-dev/techniques/voronoi-cellular-noise.md
@@ -0,0 +1,458 @@
+- **IMPORTANT:** All declared `uniform` variables must be used in the shader code, otherwise the compiler will optimize them away. After optimization, `gl.getUniformLocation()` returns `null`, and setting that uniform triggers a WebGL `INVALID_OPERATION` error, which may cause rendering failure. Ensure uniforms like `iTime` are actually used in `main()` (e.g., `float t = iTime * 1.0;`)
+
+# Voronoi & Cellular Noise
+
+## Use Cases
+- Natural textures: cells, cracked soil, stone, skin pores
+- Structured patterns: crystals, honeycombs, shattered glass, mosaics
+- Effects: fire/nebula (fBm stacking), crack generation
+- Procedural materials: cloud noise, terrain height maps, stylized partitioning
+
+## Core Principles
+
+Voronoi noise = **spatial partitioning**: scatter feature points, assign each pixel to the "cell" of its nearest feature point.
+
+Algorithm flow:
+1. `floor` divides into an integer grid; each cell contains a randomly offset feature point
+2. Search the 3x3 (2D) or 3x3x3 (3D) neighborhood for all feature points
+3. Record the nearest distance F1 (optionally second-nearest F2)
+4. Map F1, F2, or F2-F1 to color/height/shape
+
+Distance metrics:
+- Euclidean: `dot(r,r)` (squared, fast) -> final `sqrt`
+- Manhattan: `abs(r.x)+abs(r.y)`
+- Chebyshev: `max(abs(r.x), abs(r.y))`
+
+Exact border distance (two-pass algorithm): `dot(0.5*(mr+r), normalize(r-mr))`
+Rounded borders (harmonic mean): `1/(1/(d2-d1) + 1/(d3-d1))`
+
+## Implementation Steps
+
+### Step 1: Hash Functions
+
+```glsl
+// sin-dot hash (suitable for most cases)
+vec2 hash2(vec2 p) {
+    p = vec2(dot(p, vec2(127.1, 311.7)),
+             dot(p, vec2(269.5, 183.3)));
+    return fract(sin(p) * 43758.5453);
+}
+
+// 3D version
+vec3 hash3(vec3 p) {
+    float n = sin(dot(p, vec3(7.0, 157.0, 113.0)));
+    return fract(vec3(2097152.0, 262144.0, 32768.0) * n);
+}
+
+// High-quality integer hash (ES 3.0+, more uniform)
+vec3 hash3_uint(vec3 p) {
+    uvec3 q = uvec3(ivec3(p)) * uvec3(1597334673U, 3812015801U, 2798796415U);
+    q = (q.x ^ q.y ^ q.z) * uvec3(1597334673U, 3812015801U, 2798796415U);
+    return vec3(q) / float(0xffffffffU);
+}
+```
+
+### Step 2: Basic F1 Voronoi
+
+```glsl
+// Returns (F1 distance, cell ID)
+vec2 voronoi(vec2 x) {
+    vec2 n = floor(x);
+    vec2 f = fract(x);
+    vec3 m = vec3(8.0);
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 g = vec2(float(i), float(j));
+        vec2 o = hash2(n + g);
+        vec2 r = g - f + o;
+        float d = dot(r, r);
+        if (d < m.x) {
+            m = vec3(d, o);
+        }
+    }
+    return vec2(sqrt(m.x), m.y + m.z);
+}
+```
+
+### Step 3: F1 + F2 (Edge Detection)
+
+```glsl
+// Returns vec2(F1, F2), edge value = F2 - F1
+vec2 voronoi_f1f2(vec2 x) {
+    vec2 p = floor(x);
+    vec2 f = fract(x);
+    vec2 res = vec2(8.0);
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 b = vec2(i, j);
+        vec2 r = b - f + hash2(p + b);
+        float d = dot(r, r);
+        if (d < res.x) {
+            res.y = res.x;
+            res.x = d;
+        } else if (d < res.y) {
+            res.y = d;
+        }
+    }
+    return sqrt(res);
+}
+```
+
+### Step 4: Exact Border Distance (Two-Pass Algorithm)
+
+```glsl
+// Returns vec3(border distance, nearest point offset)
+vec3 voronoi_border(vec2 x) {
+    vec2 ip = floor(x);
+    vec2 fp = fract(x);
+
+    // First pass: find nearest feature point
+    vec2 mg, mr;
+    float md = 8.0;
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 g = vec2(float(i), float(j));
+        vec2 o = hash2(ip + g);
+        vec2 r = g + o - fp;
+        float d = dot(r, r);
+        if (d < md) { md = d; mr = r; mg = g; }
+    }
+
+    // Second pass: exact border distance (5x5 range)
+    md = 8.0;
+    for (int j = -2; j <= 2; j++)
+    for (int i = -2; i <= 2; i++) {
+        vec2 g = mg + vec2(float(i), float(j));
+        vec2 o = hash2(ip + g);
+        vec2 r = g + o - fp;
+        if (dot(mr - r, mr - r) > 0.00001)
+            md = min(md, dot(0.5 * (mr + r), normalize(r - mr)));
+    }
+    return vec3(md, mr);
+}
+```
+
+### Step 5: Feature Point Animation
+
+```glsl
+// Replace static hash inside the neighborhood search loop:
+vec2 o = hash2(n + g);
+o = 0.5 + 0.5 * sin(iTime + 6.2831 * o); // different phase per point
+vec2 r = g - f + o;
+```
+
+### Step 6: Coloring & Visualization
+
+```glsl
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // Must use iTime, otherwise the compiler will optimize away the uniform
+    float time = iTime * 1.0;
+    vec2 p = fragCoord.xy / iResolution.xy;
+    vec2 uv = p * SCALE;
+
+    vec2 c = voronoi(uv);
+    float dist = c.x;
+    float id   = c.y;
+
+    // Cell coloring (ID-driven palette)
+    vec3 col = 0.5 + 0.5 * cos(id * 6.2831 + vec3(0.0, 1.0, 2.0));
+    // Distance falloff
+    col *= clamp(1.0 - 0.4 * dist * dist, 0.0, 1.0);
+    // Border lines
+    col -= (1.0 - smoothstep(0.08, 0.09, dist));
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Complete Code Template
+
+```glsl
+// === Voronoi Cellular Noise — Complete ShaderToy Template ===
+// Supports F1/F2/F2-F1 modes, multiple distance metrics, animation, exact borders
+
+#define SCALE 8.0            // Cell density
+#define ANIMATE 1            // 0=static, 1=animated
+#define MODE 0               // 0=F1 fill, 1=F2-F1 edges, 2=exact borders
+#define DIST_METRIC 0        // 0=Euclidean, 1=Manhattan, 2=Chebyshev
+
+vec2 hash2(vec2 p) {
+    p = vec2(dot(p, vec2(127.1, 311.7)),
+             dot(p, vec2(269.5, 183.3)));
+    return fract(sin(p) * 43758.5453);
+}
+
+float distFunc(vec2 r) {
+    #if DIST_METRIC == 0
+        return dot(r, r);
+    #elif DIST_METRIC == 1
+        return abs(r.x) + abs(r.y);
+    #elif DIST_METRIC == 2
+        return max(abs(r.x), abs(r.y));
+    #endif
+}
+
+vec2 getPoint(vec2 cellId) {
+    vec2 o = hash2(cellId);
+    #if ANIMATE
+        o = 0.5 + 0.5 * sin(iTime + 6.2831 * o);
+    #endif
+    return o;
+}
+
+vec4 voronoi(vec2 x) {
+    vec2 n = floor(x);
+    vec2 f = fract(x);
+    float d1 = 8.0, d2 = 8.0;
+    vec2 nearestCell = vec2(0.0);
+
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 g = vec2(float(i), float(j));
+        vec2 o = getPoint(n + g);
+        vec2 r = g - f + o;
+        float d = distFunc(r);
+        if (d < d1) {
+            d2 = d1; d1 = d;
+            nearestCell = n + g;
+        } else if (d < d2) {
+            d2 = d;
+        }
+    }
+
+    #if DIST_METRIC == 0
+        d1 = sqrt(d1); d2 = sqrt(d2);
+    #endif
+    return vec4(d1, d2, nearestCell);
+}
+
+vec3 voronoiBorder(vec2 x) {
+    vec2 ip = floor(x);
+    vec2 fp = fract(x);
+
+    vec2 mg, mr;
+    float md = 8.0;
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++) {
+        vec2 g = vec2(float(i), float(j));
+        vec2 o = getPoint(ip + g);
+        vec2 r = g + o - fp;
+        float d = dot(r, r);
+        if (d < md) { md = d; mr = r; mg = g; }
+    }
+
+    md = 8.0;
+    for (int j = -2; j <= 2; j++)
+    for (int i = -2; i <= 2; i++) {
+        vec2 g = mg + vec2(float(i), float(j));
+        vec2 o = getPoint(ip + g);
+        vec2 r = g + o - fp;
+        if (dot(mr - r, mr - r) > 0.00001)
+            md = min(md, dot(0.5 * (mr + r), normalize(r - mr)));
+    }
+    return vec3(md, mr);
+}
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // Must use iTime, otherwise the compiler will optimize away the uniform (especially important when ANIMATE=1)
+    float time = iTime * 1.0;
+    vec2 p = fragCoord.xy / iResolution.xy;
+    p.x *= iResolution.x / iResolution.y;
+    vec2 uv = p * SCALE;
+    vec3 col = vec3(0.0);
+
+    #if MODE == 0
+        vec4 v = voronoi(uv);
+        float id = dot(v.zw, vec2(127.1, 311.7));
+        col = 0.5 + 0.5 * cos(id * 6.2831 + vec3(0.0, 1.0, 2.0));
+        col *= clamp(1.0 - 0.4 * v.x * v.x, 0.0, 1.0);
+        col -= (1.0 - smoothstep(0.08, 0.09, v.x));
+    #elif MODE == 1
+        vec4 v = voronoi(uv);
+        float edge = v.y - v.x;
+        col = vec3(1.0 - smoothstep(0.0, 0.15, edge));
+        col *= vec3(0.2, 0.6, 1.0);
+    #elif MODE == 2
+        vec3 c = voronoiBorder(uv);
+        col = c.x * (0.5 + 0.5 * sin(64.0 * c.x)) * vec3(1.0);
+        col = mix(vec3(1.0, 0.6, 0.0), col, smoothstep(0.04, 0.07, c.x));
+        float dd = length(c.yz);
+        col = mix(vec3(1.0, 0.6, 0.1), col, smoothstep(0.0, 0.12, dd));
+    #endif
+
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: 3D Voronoi + fBm Fire
+
+```glsl
+#define NUM_OCTAVES 5
+
+vec3 hash3(vec3 p) {
+    float n = sin(dot(p, vec3(7.0, 157.0, 113.0)));
+    return fract(vec3(2097152.0, 262144.0, 32768.0) * n);
+}
+
+float voronoi3D(vec3 p) {
+    vec3 g = floor(p); p = fract(p);
+    float d = 1.0;
+    for (int j = -1; j <= 1; j++)
+    for (int i = -1; i <= 1; i++)
+    for (int k = -1; k <= 1; k++) {
+        vec3 b = vec3(i, j, k);
+        vec3 r = b - p + hash3(g + b);
+        d = min(d, dot(r, r));
+    }
+    return d;
+}
+
+float fbmVoronoi(vec3 p) {
+    vec3 t = vec3(0.0, 0.0, p.z + iTime * 1.5);
+    float tot = 0.0, sum = 0.0, amp = 1.0;
+    for (int i = 0; i < NUM_OCTAVES; i++) {
+        tot += voronoi3D(p + t) * amp;
+        p *= 2.0; t *= 1.5;
+        sum += amp; amp *= 0.5;
+    }
+    return tot / sum;
+}
+
+// Blackbody radiation palette
+vec3 firePalette(float i) {
+    float T = 1400.0 + 1300.0 * i;
+    vec3 L = vec3(7.4, 5.6, 4.4);
+    L = pow(L, vec3(5.0)) * (exp(1.43876719683e5 / (T * L)) - 1.0);
+    return 1.0 - exp(-5e8 / L);
+}
+```
+
+### Variant 2: Rounded Borders (3rd-Order Voronoi)
+
+```glsl
+float voronoiRounded(vec2 p) {
+    vec2 g = floor(p); p -= g;
+    vec3 d = vec3(1.0); // F1, F2, F3
+
+    for (int y = -1; y <= 1; y++)
+    for (int x = -1; x <= 1; x++) {
+        vec2 o = vec2(x, y);
+        o += hash2(g + o) - p;
+        float r = dot(o, o);
+        d.z = max(d.x, max(d.y, min(d.z, r)));
+        d.y = max(d.x, min(d.y, r));
+        d.x = min(d.x, r);
+    }
+    d = sqrt(d);
+    return min(2.0 / (1.0 / max(d.y - d.x, 0.001)
+                    + 1.0 / max(d.z - d.x, 0.001)), 1.0);
+}
+```
+
+### Variant 3: Voronoise (Unified Noise-Voronoi Framework)
+
+```glsl
+#define JITTER 1.0    // 0=regular grid, 1=fully random
+#define SMOOTH 0.0    // 0=sharp Voronoi, 1=smooth noise
+
+vec3 hash3(vec2 p) {
+    vec3 q = vec3(dot(p, vec2(127.1, 311.7)),
+                  dot(p, vec2(269.5, 183.3)),
+                  dot(p, vec2(419.2, 371.9)));
+    return fract(sin(q) * 43758.5453);
+}
+
+float voronoise(vec2 p, float u, float v) {
+    float k = 1.0 + 63.0 * pow(1.0 - v, 6.0);
+    vec2 i = floor(p); vec2 f = fract(p);
+    vec2 a = vec2(0.0);
+    for (int y = -2; y <= 2; y++)
+    for (int x = -2; x <= 2; x++) {
+        vec2 g = vec2(x, y);
+        vec3 o = hash3(i + g) * vec3(u, u, 1.0);
+        vec2 d = g - f + o.xy;
+        float w = pow(1.0 - smoothstep(0.0, 1.414, length(d)), k);
+        a += vec2(o.z * w, w);
+    }
+    return a.x / a.y;
+}
+```
+
+### Variant 4: Crack Texture (Multi-Layer Recursive Voronoi)
+
+```glsl
+#define CRACK_DEPTH 3.0
+#define CRACK_WIDTH 0.0
+#define CRACK_SLOPE 50.0
+
+float ofs = 0.5;
+#define disp(p) (-ofs + (1.0 + 2.0 * ofs) * hash2(p))
+
+// Main loop
+vec4 O = vec4(0.0);
+vec2 U = uv;
+for (float i = 0.0; i < CRACK_DEPTH; i++) {
+    vec2 D = fbm22(U) * 0.67;
+    vec3 H = voronoiBorder(U + D);
+    float d = H.x;
+    d = min(1.0, CRACK_SLOPE * pow(max(0.0, d - CRACK_WIDTH), 1.0));
+    O += vec4(1.0 - d) / exp2(i);
+    U *= 1.5 * rot(0.37);
+}
+```
+
+### Variant 5: Tileable 3D Worley (Cloud Noise)
+
+```glsl
+#define TILE_FREQ 4.0
+
+float worleyTileable(vec3 uv, float freq) {
+    vec3 id = floor(uv); vec3 p = fract(uv);
+    float minDist = 1e4;
+    for (float x = -1.0; x <= 1.0; x++)
+    for (float y = -1.0; y <= 1.0; y++)
+    for (float z = -1.0; z <= 1.0; z++) {
+        vec3 offset = vec3(x, y, z);
+        vec3 h = hash3_uint(mod(id + offset, vec3(freq))) * 0.5 + 0.5;
+        h += offset;
+        vec3 d = p - h;
+        minDist = min(minDist, dot(d, d));
+    }
+    return 1.0 - minDist;
+}
+
+float worleyFbm(vec3 p, float freq) {
+    return worleyTileable(p * freq, freq) * 0.625
+         + worleyTileable(p * freq * 2.0, freq * 2.0) * 0.25
+         + worleyTileable(p * freq * 4.0, freq * 4.0) * 0.125;
+}
+
+float remap(float x, float a, float b, float c, float d) {
+    return (((x - a) / (b - a)) * (d - c)) + c;
+}
+// cloud = remap(perlinNoise, worleyFbm - 1.0, 1.0, 0.0, 1.0);
+```
+
+## Performance & Composition
+
+**Performance:**
+- Use `dot(r,r)` instead of `length` during comparison; only `sqrt` for final output
+- 3D loops can be manually unrolled along the z-axis to reduce nesting
+- Search range: basic F1 uses 3x3; exact borders/Voronoise/extended jitter uses 5x5
+- Hash choice: `sin(dot(...))` is fastest; integer hash is more uniform but requires ES 3.0+
+- fBm layers: 3 is sufficient, 5 is the upper limit
+
+**Combinations:**
+- **+fBm distortion**: `uv + 0.5*fbm22(uv*2.0)` -> organic cell shapes
+- **+Bump Mapping**: finite-difference normal computation -> pseudo-3D bumps
+- **+Palette**: `0.5+0.5*cos(6.2831*(t+vec3(0,0.33,0.67)))` -> rich colors
+- **+Raymarching**: Voronoi distance as part of the SDF -> cellular surfaces
+- **+Multi-scale stacking**: Voronoi at different frequencies stacked -> primary structure + fine detail
+
+## Further Reading
+
+For complete step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/voronoi-cellular-noise.md)
--- a/skills/shader-dev/techniques/voxel-rendering.md
+++ b/skills/shader-dev/techniques/voxel-rendering.md
@@ -0,0 +1,985 @@
+## WebGL2 Adaptation Requirements
+
+The code templates in this document use ShaderToy GLSL style. When generating standalone HTML pages, you must adapt for WebGL2:
+
+- Use `canvas.getContext("webgl2")` **(required! WebGL1 does not support in/out keywords)**
+- Shader first line: `#version 300 es`, add `precision highp float;` to fragment shader
+- **IMPORTANT: #version must be the very first line of the shader! No characters before it (including blank lines/comments/Unicode BOM)**
+- Vertex shader: `attribute` → `in`, `varying` → `out`
+- Fragment shader: `varying` → `in`, `gl_FragColor` → custom `out vec4 fragColor`, `texture2D()` → `texture()`
+- ShaderToy's `void mainImage(out vec4 fragColor, in vec2 fragCoord)` needs to be adapted to the standard `void main()` entry point
+
+### WebGL2 Full Adaptation Example
+
+```glsl
+// === Vertex Shader ===
+const vertexShaderSource = `#version 300 es
+in vec2 a_position;
+void main() {
+    gl_Position = vec4(a_position, 0.0, 1.0);
+}`;
+
+// === Fragment Shader ===
+const fragmentShaderSource = `#version 300 es
+precision highp float;
+
+uniform float iTime;
+uniform vec2 iResolution;
+
+// IMPORTANT: Important: WebGL2 must declare the output variable!
+out vec4 fragColor;
+
+// ... other functions ...
+
+void main() {
+    // IMPORTANT: Use gl_FragCoord.xy instead of fragCoord
+    vec2 fragCoord = gl_FragCoord.xy;
+
+    vec3 col = vec3(0.0);
+
+    // ... rendering logic ...
+
+    // IMPORTANT: Write to fragColor, not gl_FragColor!
+    fragColor = vec4(col, 1.0);
+}`;
+```
+
+**IMPORTANT: Common GLSL compile errors:**
+- `in/out storage qualifier supported in GLSL ES 3.00 only` → Check that you are using `getContext("webgl2")` and `#version 300 es`
+- `#version directive must occur on the first line` → Check that the shader string starts with #version, with no characters before it
+- **IMPORTANT: GLSL reserved words**: `cast`, `class`, `template`, `namespace`, `union`, `enum`, `typedef`, `sizeof`, `input`, `output`, `filter`, `image`, `sampler`, `fixed`, `volatile`, `public`, `static`, `extern`, `external`, `interface`, `long`, `short`, `double`, `half`, `unsigned`, `superp`, `inline`, `noinline`, etc. are all GLSL reserved words and **must never be used as variable or function names**! Common pitfall: naming a function `cast` for ray casting → compile failure. **Use compound names like `castRay`, `castShadow`, `shootRay` instead**.
+- **IMPORTANT: GLSL strict typing**: float/int cannot be mixed. `if (x > 0)` for int, `if (y < 0.0)` for float. Comparing ivec3 members to float requires explicit conversion: `float(c.y) < height`. When getVoxel returns int, compare with `> 0` not `> 0.0`. Function parameter types must match exactly.
+- **IMPORTANT: Vector dimension mismatch (vec2 vs vec3)**: `p.xz` returns `vec2` and **must never** be added to `vec3` or passed to functions expecting `vec3` parameters (e.g., `fbm(vec3)`, `noise(vec3)`)! Common error: `fbm(p.xz * 0.08 + vec3(...))` — `vec2 + vec3` compile failure. **Fix**: either use a `vec2` version of noise/fbm, or construct a full vec3: `fbm(vec3(p.xz * 0.08, p.y * 0.05))`. Similarly, `vec2` only has `.x`/`.y`, cannot access `.z`/`.w`.
+- **IMPORTANT: length() / floating-point precision**: `length(ivec2)` must first convert to `vec2`: `length(vec2(d))`. Exact floating-point equality comparison almost never works; use range comparison: `floor(p.y) == floor(height)`
+
+# Voxel Rendering Skill
+
+## Use Cases
+- Rendering discrete volumetric data on regular 3D grids (Minecraft-style worlds, medical volume data, architectural voxel models)
+- Pixel-accurate block/cube scenes
+- "Block art", "3D pixel art", "low-poly voxel" visual styles
+- Real-time voxel scenes in pure fragment shader environments like ShaderToy
+- Advanced lighting effects including shadows, AO, and global illumination
+
+## Core Principles
+
+The core of voxel rendering is the **DDA (Digital Differential Analyzer) ray traversal algorithm**: cast a ray from the camera through each pixel, stepping through the 3D grid cell by cell along the ray direction until hitting an occupied voxel.
+
+For ray `P(t) = rayPos + t * rayDir`, DDA maintains:
+- **`mapPos`** = `floor(rayPos)`: current grid coordinate (integer)
+- **`deltaDist`** = `abs(1.0 / rayDir)`: t cost to cross one cell
+- **`sideDist`** = `(sign(rayDir) * (mapPos - rayPos) + sign(rayDir) * 0.5 + 0.5) * deltaDist`: t distance to the next boundary on each axis
+
+Each step advances along the axis with the smallest `sideDist`, updating `sideDist += deltaDist` and `mapPos += rayStep`.
+
+Normal on hit: `normal = -mask * rayStep`
+
+Face UV is obtained by projecting the hit point onto the two tangent axes of the hit face.
+
+## Implementation Steps
+
+### Step 1: Camera Ray Construction
+```glsl
+vec2 screenPos = (fragCoord.xy / iResolution.xy) * 2.0 - 1.0;
+vec3 cameraDir = vec3(0.0, 0.0, 0.8);  // Focal length; larger = narrower FOV
+vec3 cameraPlaneU = vec3(1.0, 0.0, 0.0);
+vec3 cameraPlaneV = vec3(0.0, 1.0, 0.0) * iResolution.y / iResolution.x;
+vec3 rayDir = cameraDir + screenPos.x * cameraPlaneU + screenPos.y * cameraPlaneV;
+vec3 rayPos = vec3(0.0, 2.0, -12.0);
+```
+
+### Step 2: DDA Initialization
+```glsl
+ivec3 mapPos = ivec3(floor(rayPos));
+vec3 rayStep = sign(rayDir);
+vec3 deltaDist = abs(1.0 / rayDir);  // When ray is normalized, equivalent to abs(1.0/rd), no length() needed
+vec3 sideDist = (sign(rayDir) * (vec3(mapPos) - rayPos) + (sign(rayDir) * 0.5) + 0.5) * deltaDist;
+```
+
+### Step 3: DDA Traversal Loop (Branchless Version)
+```glsl
+#define MAX_RAY_STEPS 64
+
+bvec3 mask;
+for (int i = 0; i < MAX_RAY_STEPS; i++) {
+    if (getVoxel(mapPos)) break;
+    // Branchless axis selection
+    mask = lessThanEqual(sideDist.xyz, min(sideDist.yzx, sideDist.zxy));
+    sideDist += vec3(mask) * deltaDist;
+    mapPos += ivec3(vec3(mask)) * ivec3(rayStep);
+}
+```
+
+Alternative form (step version):
+```glsl
+vec3 mask = step(sideDist.xyz, sideDist.yzx) * step(sideDist.xyz, sideDist.zxy);
+sideDist += mask * deltaDist;
+mapPos += mask * rayStep;
+```
+
+### Step 4: Voxel Occupancy Function
+```glsl
+// Basic version: solid block (most common; use this when user asks for "voxel cube")
+// IMPORTANT: Important: getVoxel receives ivec3, but all internal calculations must use float!
+bool getVoxel(ivec3 c) {
+    vec3 p = vec3(c) + vec3(0.5);  // ivec3 → vec3 conversion (required!)
+    float d = sdBox(p, vec3(6.0));  // Solid 12x12x12 cube
+    return d < 0.0;
+}
+
+// Advanced version: SDF boolean operations (sphere carved from box = only corners remain)
+bool getVoxelCarved(ivec3 c) {
+    vec3 p = vec3(c) + vec3(0.5);
+    float d = max(-sdSphere(p, 7.5), sdBox(p, vec3(6.0)));  // box ∩ ¬sphere
+    return d < 0.0;
+}
+
+// Advanced version: height map terrain with material IDs
+// IMPORTANT: Key: all comparisons must use float! c.y is int and must be converted to float for comparison
+// IMPORTANT: Important: must use range comparison, not exact equality (floating-point precision issues)
+int getVoxelMaterial(ivec3 c) {
+    vec3 p = vec3(c);  // ivec3 → vec3 conversion (required!)
+    float groundHeight = getTerrainHeight(p.xz);  // p.xz is vec2, passes float parameters
+    if (float(c.y) < groundHeight) return 1;       // int → float comparison
+    if (float(c.y) < groundHeight + 4.0) return 7;  // int → float comparison
+    return 0;
+}
+
+// Pure float version (simpler, recommended):
+int getVoxelMaterial(vec3 c) {
+    float groundHeight = getTerrainHeight(c.xz);
+    // IMPORTANT: Use range comparison, never exact equality!
+    if (c.y >= groundHeight && c.y < groundHeight + 1.0) return 1;  // Grass top layer
+    if (c.y >= groundHeight - 3.0 && c.y < groundHeight) return 2; // Dirt layer
+    if (c.y < groundHeight - 3.0) return 3;  // Stone layer
+    return 0;
+}
+
+// Advanced version: mountain terrain (height-based coloring: grass green → rock gray → snow white)
+// IMPORTANT: Key 1: color thresholds must be based on heightRatio (normalized height 0~1), not absolute height!
+// IMPORTANT: Key 2: maxH must match the actual maximum return value of getMountainHeight!
+//           If getMountainHeight returns at most 15.0, maxH must be 15.0, not arbitrarily 20.0
+// IMPORTANT: Key 3: threshold spacing must be large enough (at least 0.2), otherwise color bands are too narrow to see
+// IMPORTANT: Key 4: grass area typically covers the largest terrain area (low elevation); set grass threshold high (0.4) to ensure green is clearly visible
+float maxH = 15.0;  // IMPORTANT: Must equal the actual max value of getMountainHeight!
+int getMountainVoxel(vec3 c) {
+    float height = getMountainHeight(c.xz);  // Returns 0 ~ maxH
+    if (c.y > height) return 0;  // Air
+    float heightRatio = c.y / maxH;  // Normalize to 0~1
+    // IMPORTANT: Thresholds from low to high: grass < 0.4, rock 0.4~0.7, snow > 0.7
+    if (heightRatio < 0.4) return 1;  // Grass (green) — largest area
+    if (heightRatio < 0.7) return 2;  // Rock (gray)
+    return 3;                          // Snow cap (white)
+}
+// IMPORTANT: Corresponding material colors must have sufficient saturation and clear contrast:
+// mat==1: vec3(0.25, 0.55, 0.15)  Grass green (saturated green, must not be grayish!)
+// mat==2: vec3(0.5, 0.45, 0.4)   Rock gray-brown
+// mat==3: vec3(0.92, 0.93, 0.96) Snow white
+// IMPORTANT: Lighting must not be too bright or it washes out colors! Sun intensity ≤ 2.0, sky light ≤ 1.0
+// IMPORTANT: Gamma correction pow(col, vec3(0.4545)) brightens dark colors and reduces saturation;
+//    if colors look grayish-white, make grass green more saturated: vec3(0.2, 0.5, 0.1)
+
+// IMPORTANT: Rotating objects: to rotate a voxel object, apply inverse rotation to the sample point in getVoxel!
+// Do not rotate the camera to simulate object rotation (that only changes the viewpoint)
+bool getVoxelRotating(ivec3 c) {
+    vec3 p = vec3(c) + vec3(0.5);
+    // Rotate around Y axis: apply inverse rotation to sample point
+    float angle = -iTime;  // Negative sign = inverse transform
+    float s = sin(angle), co = cos(angle);
+    p.xz = vec2(p.x * co - p.z * s, p.x * s + p.z * co);
+    float d = sdBox(p, vec3(6.0));  // Rotated solid cube
+    return d < 0.0;
+}
+```
+
+### Step 5: Face Shading (Normal + Base Color)
+```glsl
+vec3 normal = -vec3(mask) * rayStep;
+vec3 color;
+if (mask.x) color = vec3(0.5);   // Side faces darkest
+if (mask.y) color = vec3(1.0);   // Top face brightest
+if (mask.z) color = vec3(0.75);  // Front/back faces medium
+fragColor = vec4(color, 1.0);
+```
+
+### Step 6: Precise Hit Position and Face UV
+```glsl
+float t = dot(sideDist - deltaDist, vec3(mask));
+vec3 hitPos = rayPos + rayDir * t;
+vec3 uvw = hitPos - vec3(mapPos);
+vec2 uv = vec2(dot(vec3(mask) * uvw.yzx, vec3(1.0)),
+               dot(vec3(mask) * uvw.zxy, vec3(1.0)));
+```
+
+### Step 7: Neighbor Voxel AO
+```glsl
+float vertexAo(vec2 side, float corner) {
+    return (side.x + side.y + max(corner, side.x * side.y)) / 3.0;
+}
+
+vec4 voxelAo(vec3 pos, vec3 d1, vec3 d2) {
+    vec4 side = vec4(
+        getVoxel(pos + d1), getVoxel(pos + d2),
+        getVoxel(pos - d1), getVoxel(pos - d2));
+    vec4 corner = vec4(
+        getVoxel(pos + d1 + d2), getVoxel(pos - d1 + d2),
+        getVoxel(pos - d1 - d2), getVoxel(pos + d1 - d2));
+    vec4 ao;
+    ao.x = vertexAo(side.xy, corner.x);
+    ao.y = vertexAo(side.yz, corner.y);
+    ao.z = vertexAo(side.zw, corner.z);
+    ao.w = vertexAo(side.wx, corner.w);
+    return 1.0 - ao;
+}
+
+// Bilinear interpolation
+vec4 ambient = voxelAo(mapPos - rayStep * mask, mask.zxy, mask.yzx);
+float ao = mix(mix(ambient.z, ambient.w, uv.x), mix(ambient.y, ambient.x, uv.x), uv.y);
+ao = pow(ao, 1.0 / 3.0);  // Gamma correction to control AO intensity
+```
+
+### Step 8: DDA Shadow Ray
+```glsl
+// IMPORTANT: Shadow steps must be capped at 16; total main ray + shadow ray steps should not exceed 80
+#define MAX_SHADOW_STEPS 16
+
+float castShadow(vec3 ro, vec3 rd) {
+    vec3 pos = floor(ro);
+    vec3 ri = 1.0 / rd;
+    vec3 rs = sign(rd);
+    vec3 dis = (pos - ro + 0.5 + rs * 0.5) * ri;
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        if (getVoxel(ivec3(pos))) return 0.0;
+        vec3 mm = step(dis.xyz, dis.yzx) * step(dis.xyz, dis.zxy);
+        dis += mm * rs * ri;
+        pos += mm * rs;
+    }
+    return 1.0;
+}
+
+vec3 sundir = normalize(vec3(-0.5, 0.6, 0.7));
+float shadow = castShadow(hitPos + normal * 0.01, sundir);
+float diffuse = max(dot(normal, sundir), 0.0) * shadow;
+```
+
+## Complete Code Template
+
+```glsl
+// === Voxel Rendering - Complete ShaderToy Template ===
+// Includes: DDA traversal, face shading, neighbor AO, hard shadows
+
+// IMPORTANT: Performance critical: SwiftShader software renderer (headless browser evaluation environment) cannot handle too many loop iterations
+// Default 64+16=80 steps, suitable for most scenes. Simple scenes (single cube) can increase to 96+24
+// Multi-building/character/Minecraft scenes must keep 64+16 or lower!
+#define MAX_RAY_STEPS 64
+#define MAX_SHADOW_STEPS 16
+#define GRID_SIZE 16.0
+
+// ---- Math Utilities ----
+float sdSphere(vec3 p, float r) { return length(p) - r; }
+float sdBox(vec3 p, vec3 b) {
+    vec3 d = abs(p) - b;
+    return min(max(d.x, max(d.y, d.z)), 0.0) + length(max(d, 0.0));
+}
+float hash31(vec3 n) { return fract(sin(dot(n, vec3(1.0, 113.0, 257.0))) * 43758.5453); }
+
+vec2 rotate2d(vec2 v, float a) {
+    float s = sin(a), c = cos(a);
+    return vec2(v.x * c - v.y * s, v.y * c + v.x * s);
+}
+
+// ---- Voxel Scene Definition ----
+// IMPORTANT: Default solid cube. Use sdBox for "voxel cube"; add SDF boolean ops for carved/sculpted shapes
+int getVoxel(vec3 c) {
+    vec3 p = c + 0.5;
+    float d = sdBox(p, vec3(6.0));  // Solid 12x12x12 block
+    if (d < 0.0) {
+        if (p.y < -3.0) return 2;
+        return 1;
+    }
+    return 0;
+}
+
+// ---- Neighbor AO ----
+float getOccupancy(vec3 c) { return float(getVoxel(c) > 0); }
+
+float vertexAo(vec2 side, float corner) {
+    return (side.x + side.y + max(corner, side.x * side.y)) / 3.0;
+}
+
+vec4 voxelAo(vec3 pos, vec3 d1, vec3 d2) {
+    vec4 side = vec4(
+        getOccupancy(pos + d1), getOccupancy(pos + d2),
+        getOccupancy(pos - d1), getOccupancy(pos - d2));
+    vec4 corner = vec4(
+        getOccupancy(pos + d1 + d2), getOccupancy(pos - d1 + d2),
+        getOccupancy(pos - d1 - d2), getOccupancy(pos + d1 - d2));
+    vec4 ao;
+    ao.x = vertexAo(side.xy, corner.x);
+    ao.y = vertexAo(side.yz, corner.y);
+    ao.z = vertexAo(side.zw, corner.z);
+    ao.w = vertexAo(side.wx, corner.w);
+    return 1.0 - ao;
+}
+
+// ---- DDA Traversal Core ----
+struct HitInfo {
+    bool  hit;
+    float t;
+    vec3  pos;
+    vec3  normal;
+    vec3  mapPos;
+    vec2  uv;
+    int   mat;
+};
+
+HitInfo castRay(vec3 ro, vec3 rd, int maxSteps) {
+    HitInfo info;
+    info.hit = false;
+    info.t = 0.0;
+
+    vec3 mapPos = floor(ro);
+    vec3 rayStep = sign(rd);
+    vec3 deltaDist = abs(1.0 / rd);
+    vec3 sideDist = (rayStep * (mapPos - ro) + rayStep * 0.5 + 0.5) * deltaDist;
+    vec3 mask = vec3(0.0);
+
+    for (int i = 0; i < maxSteps; i++) {
+        int vox = getVoxel(mapPos);
+        if (vox > 0) {
+            info.hit = true;
+            info.mat = vox;
+            info.normal = -mask * rayStep;
+            info.mapPos = mapPos;
+            info.t = dot(sideDist - deltaDist, mask);
+            info.pos = ro + rd * info.t;
+            vec3 uvw = info.pos - mapPos;
+            info.uv = vec2(dot(mask * uvw.yzx, vec3(1.0)),
+                           dot(mask * uvw.zxy, vec3(1.0)));
+            return info;
+        }
+        mask = step(sideDist.xyz, sideDist.yzx) * step(sideDist.xyz, sideDist.zxy);
+        sideDist += mask * deltaDist;
+        mapPos += mask * rayStep;
+    }
+    return info;
+}
+
+// ---- Shadow Ray ----
+// IMPORTANT: Shadow steps at 16 (combined with main ray 64 = 80, within SwiftShader safe range)
+float castShadow(vec3 ro, vec3 rd) {
+    vec3 pos = floor(ro);
+    vec3 ri = 1.0 / rd;
+    vec3 rs = sign(rd);
+    vec3 dis = (pos - ro + 0.5 + rs * 0.5) * ri;
+    for (int i = 0; i < MAX_SHADOW_STEPS; i++) {
+        // IMPORTANT: getVoxel returns int; comparison must use int constant (0), not float (0.0)
+        if (getVoxel(pos) > 0) return 0.0;
+        vec3 mm = step(dis.xyz, dis.yzx) * step(dis.xyz, dis.zxy);
+        dis += mm * rs * ri;
+        pos += mm * rs;
+    }
+    return 1.0;
+}
+
+// ---- Material Colors ----
+// IMPORTANT: Texture coloring key: "low saturation" does not mean "near white/gray"!
+// Low saturation = colorful but not vivid, must retain clear hue differences (e.g., brick red 0.55,0.35,0.3 not gray-white 0.8,0.8,0.8)
+// Brick/stone textures: use UV periodic patterns (mortar lines = dark lines), never use solid colors!
+vec3 getMaterialColor(int mat, vec2 uv) {
+    vec3 col = vec3(0.6);
+    if (mat == 1) col = vec3(0.7, 0.7, 0.75);
+    if (mat == 2) col = vec3(0.4, 0.55, 0.3);
+    float checker = mod(floor(uv.x * 4.0) + floor(uv.y * 4.0), 2.0);
+    col *= 0.85 + 0.15 * checker;
+    return col;
+}
+
+// ---- Brick/Stone Texture Coloring (use this to replace getMaterialColor when user requests "brick texture") ----
+// IMPORTANT: Key: brick texture = UV periodic pattern (staggered rows + mortar dark lines), not solid color!
+vec3 getBrickColor(vec2 uv, vec3 baseColor, vec3 mortarColor) {
+    vec2 brickUV = uv * vec2(4.0, 8.0);
+    float row = floor(brickUV.y);
+    brickUV.x += mod(row, 2.0) * 0.5;  // Staggered row offset
+    vec2 f = fract(brickUV);
+    float mortar = step(f.x, 0.06) + step(f.y, 0.08);  // Mortar joints
+    mortar = clamp(mortar, 0.0, 1.0);
+    float noise = fract(sin(dot(floor(brickUV), vec2(12.9898, 78.233))) * 43758.5453);
+    vec3 brickVariation = baseColor * (0.85 + 0.3 * noise);  // Slight color variation per brick
+    return mix(brickVariation, mortarColor, mortar);
+}
+// Usage example (maze walls):
+// if (mat == 1) col = getBrickColor(uv, vec3(0.55, 0.35, 0.3), vec3(0.4, 0.38, 0.35)); // Brick red + mortar
+// if (mat == 2) col = getBrickColor(uv, vec3(0.5, 0.48, 0.42), vec3(0.35, 0.33, 0.3)); // Gray stone brick
+
+// ---- Main Function ----
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec2 screenPos = (fragCoord.xy / iResolution.xy) * 2.0 - 1.0;
+    screenPos.x *= iResolution.x / iResolution.y;
+
+    vec3 ro = vec3(0.0, 2.0 * sin(iTime * 0.5), -12.0);
+    vec3 forward = vec3(0.0, 0.0, 0.8);
+    vec3 rd = normalize(forward + vec3(screenPos, 0.0));
+
+    ro.xz = rotate2d(ro.xz, iTime * 0.3);
+    rd.xz = rotate2d(rd.xz, iTime * 0.3);
+
+    vec3 sunDir = normalize(vec3(-0.5, 0.6, 0.7));
+    vec3 skyColor = vec3(0.6, 0.75, 0.9);
+
+    HitInfo hit = castRay(ro, rd, MAX_RAY_STEPS);
+
+    vec3 col;
+    if (hit.hit) {
+        vec3 matCol = getMaterialColor(hit.mat, hit.uv);
+
+        vec3 mask = abs(hit.normal);
+        vec4 ambient = voxelAo(hit.mapPos, mask.zxy, mask.yzx);
+        float ao = mix(
+            mix(ambient.z, ambient.w, hit.uv.x),
+            mix(ambient.y, ambient.x, hit.uv.x),
+            hit.uv.y);
+        ao = pow(ao, 0.5);
+
+        float shadow = castShadow(hit.pos + hit.normal * 0.01, sunDir);
+
+        float diff = max(dot(hit.normal, sunDir), 0.0);
+        float sky = 0.5 + 0.5 * hit.normal.y;
+
+        vec3 lighting = vec3(0.0);
+        // IMPORTANT: Mountain/terrain scenes: sun light ≤ 2.0, sky light ≤ 1.0; too bright washes out material color differences
+        lighting += 2.0 * diff * vec3(1.0, 0.95, 0.8) * shadow;
+        lighting += 1.0 * sky * skyColor;
+        lighting *= ao;
+
+        col = matCol * lighting;
+
+        // IMPORTANT: Fog: coefficient should not be too large, otherwise nearby objects get swallowed into pure sky color
+        // 0.0002 suits GRID_SIZE=16 scenes; use smaller coefficients for larger scenes
+        float fog = 1.0 - exp(-0.0002 * hit.t * hit.t);
+        col = mix(col, skyColor, clamp(fog, 0.0, 0.7));  // Clamp prevents objects from disappearing entirely
+    } else {
+        col = skyColor - rd.y * 0.2;
+    }
+
+    col = pow(clamp(col, 0.0, 1.0), vec3(0.4545));
+    fragColor = vec4(col, 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: Glowing Voxels (Glow Accumulation)
+Accumulate distance-based glow values during DDA traversal; produces semi-transparent glow even on miss.
+```glsl
+float glow = 0.0;
+for (int i = 0; i < MAX_RAY_STEPS; i++) {
+    float d = sdSomeShape(vec3(mapPos));
+    glow += 0.015 / (0.01 + d * d);
+    if (d < 0.0) break;
+    // ... normal DDA stepping ...
+}
+vec3 col = baseColor + glow * vec3(0.4, 0.6, 1.0);
+```
+
+### Variant 2: Rounded Voxels (Intra-voxel SDF Refinement)
+After DDA hit, perform SDF ray march inside the voxel to render rounded blocks.
+```glsl
+float id = hash31(mapPos);
+float w = 0.05 + 0.35 * id;
+
+float sdRoundedBox(vec3 p, float w) {
+    return length(max(abs(p) - 0.5 + w, 0.0)) - w;
+}
+
+vec3 localP = hitPos - mapPos - 0.5;
+for (int j = 0; j < 6; j++) {
+    float h = sdRoundedBox(localP, w);
+    if (h < 0.025) break;
+    localP += rd * max(0.0, h);
+}
+```
+
+### Variant 3: Hybrid SDF-Voxel Traversal
+SDF sphere-tracing with large steps at distance, switching to precise DDA near the surface.
+```glsl
+#define VOXEL_SIZE 0.0625
+#define SWITCH_DIST (VOXEL_SIZE * 1.732)
+
+bool useVoxel = false;
+for (int i = 0; i < MAX_STEPS; i++) {
+    vec3 pos = ro + rd * t;
+    float d = mapSDF(useVoxel ? voxelCenter : pos);
+    if (!useVoxel) {
+        t += d;
+        if (d < SWITCH_DIST) { useVoxel = true; voxelPos = getVoxelPos(pos); }
+    } else {
+        if (d < 0.0) break;
+        if (d > SWITCH_DIST) { useVoxel = false; t += d; continue; }
+        vec3 exitT = (voxelPos - ro * ird + ird * VOXEL_SIZE * 0.5);
+        // ... select minimum axis to advance ...
+    }
+}
+```
+
+### Variant 4: Voxel Cone Tracing
+Build multi-level mipmaps, cast cone-shaped rays from hit points for global illumination.
+```glsl
+vec4 traceCone(vec3 origin, vec3 dir, float coneRatio) {
+    vec4 light = vec4(0.0);
+    float t = 1.0;
+    for (int i = 0; i < 58; i++) {
+        vec3 sp = origin + dir * t;
+        float diameter = max(1.0, t * coneRatio);
+        float lod = log2(diameter);
+        vec4 sample = voxelFetch(sp, lod);
+        light += sample * (1.0 - light.w);
+        t += diameter;
+    }
+    return light;
+}
+```
+
+### Variant 5: PBR Lighting + Multi-Bounce Reflection
+GGX BRDF replacing Lambert, with metallic/roughness parameters; cast a second DDA ray for reflections.
+```glsl
+float ggxDiffuse(float NoL, float NoV, float LoH, float roughness) {
+    float FD90 = 0.5 + 2.0 * roughness * LoH * LoH;
+    float a = 1.0 + (FD90 - 1.0) * pow(1.0 - NoL, 5.0);
+    float b = 1.0 + (FD90 - 1.0) * pow(1.0 - NoV, 5.0);
+    return a * b / 3.14159;
+}
+
+vec3 rd2 = reflect(rd, normal);
+HitInfo reflHit = castRay(hitPos + normal * 0.001, rd2, 64);
+vec3 reflColor = reflHit.hit ? shade(reflHit) : skyColor;
+
+float fresnel = 0.04 + 0.96 * pow(1.0 - max(dot(normal, -rd), 0.0), 5.0);
+col += fresnel * reflColor;
+```
+
+### Variant 6: Voxel Water Scene (Water + Underwater Voxels)
+Water surface ripple reflections, underwater refraction, sand and seaweed for a complete water scene.
+```glsl
+float waterY = 0.0;
+
+// Underwater voxel scene definition (sand + seaweed)
+// IMPORTANT: All coordinate operations must use correct vector dimensions!
+// c.xz returns vec2, only has .x/.y components, cannot use .z!
+int getVoxel(vec3 c) {
+    float sandHeight = -3.0 + 0.5 * sin(c.x * 0.3) * cos(c.z * 0.4);
+    if (c.y < sandHeight) return 1;      // Sand interior
+    if (c.y < sandHeight + 1.0) return 2; // Sand surface
+    // Seaweed: only grows underwater, above sand
+    float grassHash = fract(sin(dot(floor(c.xz), vec2(12.9898, 78.233))) * 43758.5453);
+    // IMPORTANT: floor(c.xz) is vec2; the second argument to dot() must also be vec2
+    if (grassHash > 0.85 && c.y >= sandHeight + 1.0 && c.y < sandHeight + 1.0 + 3.0 * grassHash) {
+        return 3;  // Seaweed
+    }
+    return 0;
+}
+
+// Handle water surface in main rendering
+float tWater = (waterY - ro.y) / rd.y;
+bool hitWater = tWater > 0.0 && (tWater < hit.t || !hit.hit);
+
+if (hitWater) {
+    vec3 waterPos = ro + rd * tWater;
+    vec3 waterNormal = vec3(0.0, 1.0, 0.0);
+    // IMPORTANT: waterPos.xz is vec2; access with .x/.y (not .x/.z)
+    vec2 waveXZ = waterPos.xz;  // vec2: waveXZ.x = worldX, waveXZ.y = worldZ
+    waterNormal.x += 0.05 * sin(waveXZ.x * 3.0 + iTime);
+    waterNormal.z += 0.05 * cos(waveXZ.y * 2.0 + iTime * 0.7);
+    waterNormal = normalize(waterNormal);
+
+    float fresnel = 0.04 + 0.96 * pow(1.0 - max(dot(waterNormal, -rd), 0.0), 5.0);
+
+    // Reflection
+    vec3 reflDir = reflect(rd, waterNormal);
+    HitInfo reflHit = castRay(waterPos + waterNormal * 0.01, reflDir, 64);
+    vec3 reflCol = reflHit.hit ? getMaterialColor(reflHit.mat, reflHit.uv) : skyColor;
+
+    // Refraction (underwater voxels: sand, seaweed)
+    vec3 refrDir = refract(rd, waterNormal, 1.0 / 1.33);
+    HitInfo refrHit = castRay(waterPos - waterNormal * 0.01, refrDir, 64);
+    vec3 refrCol;
+    if (refrHit.hit) {
+        vec3 matCol = getMaterialColor(refrHit.mat, refrHit.uv);
+        float underwaterDist = length(refrHit.pos - waterPos);
+        refrCol = mix(matCol, vec3(0.0, 0.15, 0.3), 1.0 - exp(-0.1 * underwaterDist));
+    } else {
+        refrCol = vec3(0.0, 0.1, 0.3);
+    }
+
+    col = mix(refrCol, reflCol, fresnel);
+    col = mix(col, vec3(0.0, 0.3, 0.5), 0.2);
+}
+```
+
+### Variant 7: Rotating Voxel Objects
+Rotate voxel objects as a whole. Core: apply inverse rotation to sample points in getVoxel.
+```glsl
+// IMPORTANT: Correct way to rotate objects: apply inverse rotation to sample coordinates in getVoxel
+// Wrong approach: only rotate the camera (that just changes the viewpoint, not the object)
+int getVoxel(vec3 c) {
+    vec3 p = c + 0.5;
+    // Rotate around Y axis
+    float angle = -iTime * 0.5;
+    float s = sin(angle), co = cos(angle);
+    p.xz = vec2(p.x * co - p.z * s, p.x * s + p.z * co);
+    // Can also rotate around multiple axes:
+    // p.yz = vec2(p.y * co2 - p.z * s2, p.y * s2 + p.z * co2);  // X axis rotation
+    float d = sdBox(p, vec3(6.0));
+    if (d < 0.0) return 1;
+    return 0;
+}
+```
+
+### Variant 8: Indoor/Cave/Enclosed Scenes (Point Lights + High Ambient Lighting)
+Indoor, cave, underground, sci-fi base, and other enclosed or semi-enclosed scenes require point lights and high ambient lighting.
+```glsl
+// IMPORTANT: Key points for enclosed/semi-enclosed scenes (caves, interiors, sci-fi bases, mazes, etc.):
+// 1. Camera must be placed inside the cavity (a position where getVoxel returns 0)
+// 2. Must use point lights, not just directional light (directional light blocked by walls/ceiling = total darkness!)
+// 3. Ambient light must be high enough (at least 0.2-0.3) to prevent scene from being too dark to see details
+// 4. Can use multiple point lights + emissive voxels to simulate torches/fluorescence/holographic displays
+// 5. Sci-fi scene metallic walls need bright enough light sources to show reflections
+// 6. Emissive elements (holographic screens, indicator lights, magic circles) use emissive materials: add emissive color directly to lighting
+
+// Cave scene: cavity = area where getVoxel returns 0
+// IMPORTANT: Cave/terrain noise functions must respect vector dimensions!
+// p.xz is vec2; if noise/fbm function takes vec3, construct a full vec3:
+//   Correct: fbm(vec3(p.xz, p.y * 0.5))  or use vec2 version of noise
+//   Wrong: fbm(p.xz + vec3(...))  ← vec2 + vec3 compile failure!
+int getVoxel(vec3 c) {
+    float cave = sdSphere(c + 0.5, 12.0);
+    // IMPORTANT: For noise-carved detail, use c's components directly (all float)
+    cave += 2.0 * sin(c.x * 0.3) * sin(c.y * 0.4) * sin(c.z * 0.35);
+    if (cave > 0.0) return 1;  // Rock wall
+    return 0;  // Cavity (camera goes here)
+}
+
+// Point light attenuation
+vec3 pointLightPos = vec3(0.0, 3.0, 0.0);
+vec3 toLight = pointLightPos - hit.pos;
+float lightDist = length(toLight);
+vec3 lightDir = toLight / lightDist;
+float attenuation = 1.0 / (1.0 + 0.1 * lightDist + 0.01 * lightDist * lightDist);
+
+float diff = max(dot(hit.normal, lightDir), 0.0);
+float shadow = castShadow(hit.pos + hit.normal * 0.01, lightDir);
+
+vec3 lighting = vec3(0.0);
+// IMPORTANT: High ambient light to prevent total darkness (required for enclosed scenes! at least 0.2)
+lighting += vec3(0.25, 0.22, 0.2);  // Warm ambient light
+lighting += 3.0 * diff * attenuation * vec3(1.0, 0.8, 0.5) * shadow;  // Point light
+
+// Multiple torches/emissive objects (use sin for flicker animation)
+vec3 torch1 = vec3(5.0, 2.0, 3.0);
+vec3 torch2 = vec3(-4.0, 1.0, -5.0);
+float flicker1 = 0.8 + 0.2 * sin(iTime * 5.0 + 1.0);
+float flicker2 = 0.8 + 0.2 * sin(iTime * 4.3 + 2.7);
+lighting += calcPointLight(hit.pos, hit.normal, torch1, vec3(1.0, 0.6, 0.2)) * flicker1;
+lighting += calcPointLight(hit.pos, hit.normal, torch2, vec3(0.2, 1.0, 0.5)) * flicker2;
+
+// Emissive materials (holographic displays, fluorescent moss, indicator lights, magic circles, etc.)
+// IMPORTANT: Emissive colors are added directly to lighting, unaffected by shadows
+if (hit.mat == 2) {
+    lighting += vec3(0.1, 0.4, 0.15);  // Fluorescent moss (faint green)
+}
+if (hit.mat == 3) {
+    float pulse = 0.7 + 0.3 * sin(iTime * 2.0);
+    lighting += vec3(0.2, 0.6, 1.0) * pulse;  // Blue pulse light
+}
+
+col = matCol * lighting;
+```
+
+### Variant 9: Voxel Character Animation
+Simple voxel character animation using time-driven offsets and rotations.
+```glsl
+// IMPORTANT: Voxel character animation core approach:
+// 1. Split the character into multiple body parts (head, torso, left arm, right arm, left leg, right leg)
+// 2. Each part is an sdBox with independent offset/rotation parameters
+// 3. iTime drives limb swinging (sin/cos periodic motion)
+// 4. Combine all parts using SDF min()
+// IMPORTANT: SwiftShader performance critical: character function is called at every DDA step!
+//    Must add AABB bounding box check in getVoxel: first check if c is near the character,
+//    skip sdBox calculations for that character if not nearby. Otherwise frame timeout → black screen
+//    Reduce MAX_RAY_STEPS to 64, MAX_SHADOW_STEPS to 16
+
+int getCharacter(vec3 p, vec3 charPos, float animPhase) {
+    vec3 lp = p - charPos;
+    float limbSwing = sin(iTime * 4.0 + animPhase) * 0.5;
+
+    // Torso
+    float body = sdBox(lp - vec3(0, 3, 0), vec3(1.5, 2.0, 1.0));
+    // Head
+    float head = sdBox(lp - vec3(0, 6, 0), vec3(1.2, 1.2, 1.2));
+
+    // Arm swing (offset y coordinate around shoulder joint to simulate rotation)
+    vec3 armOffset = vec3(0, limbSwing * 2.0, limbSwing);
+    float leftArm = sdBox(lp - vec3(-2.5, 3, 0) - armOffset, vec3(0.5, 2.0, 0.5));
+    float rightArm = sdBox(lp - vec3(2.5, 3, 0) + armOffset, vec3(0.5, 2.0, 0.5));
+
+    // Alternating leg swing
+    vec3 legOffset = vec3(0, 0, limbSwing * 1.5);
+    float leftLeg = sdBox(lp - vec3(-0.7, 0, 0) - legOffset, vec3(0.5, 1.5, 0.5));
+    float rightLeg = sdBox(lp - vec3(0.7, 0, 0) + legOffset, vec3(0.5, 1.5, 0.5));
+
+    float d = min(body, min(head, min(leftArm, min(rightArm, min(leftLeg, rightLeg)))));
+    if (d < 0.0) {
+        if (head < 0.0) return 10;  // Head (skin color)
+        if (leftArm < 0.0 || rightArm < 0.0) return 11;  // Arms
+        return 12;  // Torso/legs
+    }
+    return 0;
+}
+
+// Combine scene + characters in getVoxel
+// IMPORTANT: Must add AABB bounding box early exit! Character sdBox calculations are expensive
+int getVoxel(vec3 c) {
+    // Scene (floor, walls, etc.)
+    int scene = getSceneVoxel(c);
+    if (scene > 0) return scene;
+    // IMPORTANT: AABB check: only call getCharacter near the character
+    // Character 1: warrior (at position (5,0,0)), bounding box ±5 cells
+    if (abs(c.x - 5.0) < 5.0 && c.y >= 0.0 && c.y < 10.0 && abs(c.z) < 5.0) {
+        int char1 = getCharacter(c, vec3(5, 0, 0), 0.0);
+        if (char1 > 0) return char1;
+    }
+    // Character 2: mage (at position (-5,0,3)), bounding box ±5 cells
+    if (abs(c.x + 5.0) < 5.0 && c.y >= 0.0 && c.y < 10.0 && abs(c.z - 3.0) < 5.0) {
+        int char2 = getCharacter(c, vec3(-5, 0, 3), 3.14);
+        if (char2 > 0) return char2;
+    }
+    return 0;
+}
+```
+
+### Variant 10: Waterfall / Flowing Water Particle Effects
+Dynamic waterfall, splash particles, water mist effects. Core: time-offset noise simulates water flow, hashed particles simulate splashes, exponential decay simulates mist.
+```glsl
+// IMPORTANT: Key points for waterfall/flowing water/particle effects:
+// 1. Waterfall stream: noise + iTime vertical offset simulates water column flowing down
+// 2. Splash particles: hash-distributed voxels at the bottom, positions change with iTime to simulate splashing
+// 3. Water mist: semi-transparent accumulation (reduced alpha) or density field at the bottom simulates mist diffusion
+// 4. Waterfall must have a clear high point (cliff/rock wall) and low point (pool), drop ≥ 10 cells
+// 5. Water stream material uses light blue-white + brightness flicker to simulate flowing water feel
+
+float hash21(vec2 p) { return fract(sin(dot(p, vec2(127.1, 311.7))) * 43758.5453); }
+
+int getVoxel(vec3 c) {
+    // Cliff rock walls (both sides + back)
+    if (c.x < -5.0 || c.x > 5.0) {
+        if (c.y < 15.0 && c.z > -3.0 && c.z < 3.0) return 1;  // Rock
+    }
+    if (c.z > 2.0 && c.y < 15.0 && abs(c.x) < 6.0) return 1;  // Back wall
+
+    // Cliff top platform
+    if (c.y >= 13.0 && c.y < 15.0 && c.z > -1.0 && c.z < 3.0 && abs(c.x) < 5.0) return 1;
+
+    // Bottom pool floor
+    if (c.y < -2.0 && abs(c.x) < 8.0 && c.z > -6.0 && c.z < 3.0) return 2;  // Pool bottom
+
+    // IMPORTANT: Waterfall stream: narrow band x ∈ [-2, 2], falling from y=13 to y=0
+    //    Use iTime offset on y-coordinate noise to simulate downward water flow
+    if (abs(c.x) < 2.0 && c.y >= 0.0 && c.y < 13.0 && c.z > -1.0 && c.z < 1.0) {
+        float flowNoise = hash21(vec2(floor(c.x), floor(c.y - iTime * 8.0)));
+        if (flowNoise > 0.25) return 3;  // Water (gaps simulate translucent water curtain)
+    }
+
+    // IMPORTANT: Splash particles: bottom y ∈ [-1, 3], x ∈ [-4, 4]
+    //    Use hash + iTime to generate randomly bouncing voxel particles
+    if (c.y >= -1.0 && c.y < 3.0 && abs(c.x) < 4.0 && c.z > -3.0 && c.z < 2.0) {
+        float t = iTime * 3.0;
+        float particleHash = hash21(vec2(floor(c.x * 2.0), floor(c.z * 2.0) + floor(t)));
+        float yOffset = fract(t + particleHash) * 3.0;  // Particle upward trajectory
+        if (abs(c.y - yOffset) < 0.6 && particleHash > 0.7) return 4;  // Splash particle
+    }
+
+    // IMPORTANT: Water mist: bottom y ∈ [-1, 2], wider range than splashes
+    //    Density decreases with height and distance from waterfall center
+    if (c.y >= -1.0 && c.y < 2.0 && abs(c.x) < 6.0 && c.z > -5.0 && c.z < 3.0) {
+        float distFromCenter = length(vec2(c.x, c.z));
+        float mistDensity = exp(-0.15 * distFromCenter) * exp(-0.5 * max(c.y, 0.0));
+        float mistNoise = hash21(vec2(floor(c.x * 0.5 + iTime * 0.5), floor(c.z * 0.5)));
+        if (mistNoise < mistDensity * 0.8) return 5;  // Water mist
+    }
+
+    return 0;
+}
+
+// Material colors
+vec3 getMaterialColor(int mat, vec2 uv) {
+    if (mat == 1) return vec3(0.45, 0.4, 0.35);    // Rock
+    if (mat == 2) return vec3(0.35, 0.3, 0.25);    // Pool bottom
+    if (mat == 3) {                                  // Water stream (shimmering blue-white)
+        float shimmer = 0.8 + 0.2 * sin(uv.y * 20.0 + iTime * 10.0);
+        return vec3(0.6, 0.8, 1.0) * shimmer;
+    }
+    if (mat == 4) return vec3(0.85, 0.92, 1.0);    // Splash (bright white)
+    if (mat == 5) return vec3(0.7, 0.82, 0.9);     // Water mist (pale blue-white)
+    return vec3(0.5);
+}
+
+// IMPORTANT: Water mist material needs special lighting: high emissive + translucent feel
+// During shading:
+if (hit.mat == 5) {
+    lighting += vec3(0.4, 0.5, 0.6);  // Water mist emissive (unaffected by shadows)
+}
+
+// Camera: side angle slightly elevated, showing the full waterfall (top to bottom + bottom splashes and mist)
+// ro = vec3(12.0, 10.0, -10.0), lookAt = vec3(0.0, 6.0, 0.0)
+```
+
+### Variant 11: Multi-Building / Town / Minecraft-Style Scenes (Multi-Structure Town Composition)
+Towns, villages, Minecraft-style worlds, and other scenes requiring multiple discrete structures (houses, trees, lampposts, etc.) placed on the ground.
+**IMPORTANT: "Minecraft-like voxel scene" = multi-building scene; must follow the performance constraints of this template!**
+```glsl
+// IMPORTANT: Key points for multi-building scenes:
+// 1. Define the ground first (height map or flat plane), ensure ground getVoxel returns correct material
+// 2. Each building uses an independent helper function, receiving local coordinates, returning material ID
+// 3. In getVoxel, check each building sequentially (using offset coordinates), return on first hit
+// 4. Camera must be outside the scene facing the center, far enough to see the full view
+// 5. IMPORTANT: Building coordinate ranges must be within DDA traversal range (MAX_RAY_STEPS * cell ≈ reachable distance)
+// 6. IMPORTANT: Scene range should not be too large! Concentrate all buildings within -20~20 range, camera 30-50 cells away
+// 7. IMPORTANT: SwiftShader performance critical: getVoxel must have AABB bounding box early exit!
+//    Above ground (c.y > 0), check AABB range first; return 0 immediately if outside building area
+//    Otherwise every DDA step checks all buildings → frame timeout → black screen / only sky renders
+// 8. IMPORTANT: MAX_RAY_STEPS reduced to 64, MAX_SHADOW_STEPS to 16 (complex getVoxel requires lower step counts)
+
+// Single house: width w, depth d, height h, with triangular roof
+int makeHouse(vec3 p, float w, float d, float h, int wallMat, int roofMat) {
+    // Walls
+    if (p.x >= 0.0 && p.x < w && p.z >= 0.0 && p.z < d && p.y >= 0.0 && p.y < h) {
+        return wallMat;
+    }
+    // Triangular roof: starts from wall top, x range narrows by 1 per level
+    float roofY = p.y - h;
+    float roofInset = roofY;  // Inset by 1 cell per level
+    if (roofY >= 0.0 && roofY < w * 0.5
+        && p.x >= roofInset && p.x < w - roofInset
+        && p.z >= 0.0 && p.z < d) {
+        return roofMat;
+    }
+    return 0;
+}
+
+// Tree: trunk + spherical canopy
+int makeTree(vec3 p, float trunkH, float crownR, int trunkMat, int leafMat) {
+    // Trunk (1x1 column)
+    if (p.x >= -0.5 && p.x < 0.5 && p.z >= -0.5 && p.z < 0.5
+        && p.y >= 0.0 && p.y < trunkH) {
+        return trunkMat;
+    }
+    // Spherical canopy
+    vec3 crownCenter = vec3(0.0, trunkH + crownR * 0.5, 0.0);
+    if (length(p - crownCenter) < crownR) {
+        return leafMat;
+    }
+    return 0;
+}
+
+// Lamppost: thin pole + glowing top block
+int makeLamp(vec3 p, float h, int poleMat, int lightMat) {
+    if (p.x >= -0.3 && p.x < 0.3 && p.z >= -0.3 && p.z < 0.3
+        && p.y >= 0.0 && p.y < h) {
+        return poleMat;  // Pole
+    }
+    if (p.x >= -0.5 && p.x < 0.5 && p.z >= -0.5 && p.z < 0.5
+        && p.y >= h && p.y < h + 1.0) {
+        return lightMat;  // Lamp head (emissive)
+    }
+    return 0;
+}
+
+int getVoxel(vec3 c) {
+    // 1. Ground (y < 0 is underground, y == 0 layer is surface)
+    if (c.y < -1.0) return 0;
+    if (c.y < 0.0) return 1;  // Ground (dirt/grass)
+
+    // 2. Road (along z direction, x range -2~2)
+    if (c.y < 1.0 && abs(c.x) < 2.0) return 2;  // Road surface
+
+    // IMPORTANT: AABB bounding box early exit (required for SwiftShader!)
+    // All buildings are within x:-15~15, y:0~12, z:-5~15
+    // Return 0 immediately outside this range, avoiding per-building checks
+    if (c.x < -15.0 || c.x > 15.0 || c.y > 12.0 || c.z < -5.0 || c.z > 15.0) return 0;
+
+    // 3. Place buildings (each with offset coordinates)
+    // IMPORTANT: House width/height must be ≥ 5 cells, otherwise they look like dots from far away! Use bright material colors
+    int m;
+
+    // House A: position (5, 0, 3), width 6, depth 5, height 5
+    m = makeHouse(c - vec3(5.0, 0.0, 3.0), 6.0, 5.0, 5.0, 3, 4);
+    if (m > 0) return m;
+
+    // House B: position (-10, 0, 2), width 7, depth 5, height 5
+    m = makeHouse(c - vec3(-10.0, 0.0, 2.0), 7.0, 5.0, 5.0, 5, 4);
+    if (m > 0) return m;
+
+    // Tree: position (0, 0, 8)
+    m = makeTree(c - vec3(0.0, 0.0, 8.0), 4.0, 2.5, 6, 7);
+    if (m > 0) return m;
+
+    // Lamppost: position (3, 0, 0)
+    m = makeLamp(c - vec3(3.0, 0.0, 0.0), 5.0, 8, 9);
+    if (m > 0) return m;
+
+    return 0;
+}
+
+// IMPORTANT: Camera setup: must be far enough to overlook the entire town
+// Recommended: ro = vec3(0, 15, -35), looking at scene center vec3(0, 3, 5)
+vec3 ro = vec3(0.0, 15.0, -35.0);
+vec3 lookAt = vec3(0.0, 3.0, 5.0);
+vec3 forward = normalize(lookAt - ro);
+vec3 right = normalize(cross(forward, vec3(0, 1, 0)));
+vec3 up = cross(right, forward);
+vec3 rd = normalize(forward * 0.8 + right * screenPos.x + up * screenPos.y);
+
+// IMPORTANT: Sunset/side-lit scene key: when light comes from the side or at low angle, building fronts may be completely backlit turning into black silhouettes!
+// Must satisfy all: (1) ambient light ≥ 0.3 (prevent backlit faces from going black); (2) house walls use bright materials (e.g., light yellow 0.85,0.75,0.55)
+// (3) house dimensions must not be too small (width/height ≥ 5 cells), otherwise they look like dots from far away
+vec3 sunDir = normalize(vec3(-0.8, 0.3, 0.5));  // Sunset low angle
+vec3 sunColor = vec3(1.0, 0.6, 0.3);  // Warm orange
+vec3 ambientColor = vec3(0.35, 0.3, 0.4);  // IMPORTANT: High ambient light (≥0.3) to prevent silhouettes
+// lighting = ambientColor + diff * sunColor * shadow;
+```
+
+## Performance & Composition
+
+**Performance Tips:**
+- Early exit: break immediately when `mapPos` exceeds scene bounds
+- Shadow ray steps of 16-24 are sufficient
+- Use SDF sphere-tracing with large steps in open areas, switch to DDA near surfaces
+- Material queries, AO, normals, etc. are only computed after hit
+- Replace procedural voxel queries with `texelFetch` texture sampling
+- Multi-frame accumulation + reprojection for low-noise results
+- **IMPORTANT: MAX_RAY_STEPS defaults to 64, MAX_SHADOW_STEPS defaults to 16 (total 80)**. Only simple scenes (single cube/sphere) can increase to 96+24. Multi-building/Minecraft/character scenes with complex getVoxel must keep 64+16 or lower, otherwise SwiftShader frame timeout → only sky background renders
+
+**Composition Tips:**
+- **Procedural noise terrain**: use FBM/Perlin noise height maps inside `getVoxel()`
+- **SDF procedural modeling**: use SDF boolean operations inside `getVoxel()` to define shapes
+- **Texture mapping**: after hit, sample 16x16 pixel textures using face UV * 16
+- **Atmospheric scattering / volumetric fog**: accumulate medium density during DDA traversal
+- **Water surface rendering**: Fresnel reflection/refraction on a specific Y plane (see Variant 6 above)
+- **Global illumination**: cone tracing or Monte Carlo hemisphere sampling
+- **Temporal reprojection**: multi-frame accumulation + previous frame reprojection for anti-aliasing and denoising
+
+## Common Errors
+
+1. **GLSL reserved words causing compile failure**: `cast`, `class`, `template`, `namespace`, `input`, `output`, `filter`, `image`, `sampler`, `half`, `fixed`, etc. are GLSL reserved words and **must never be used as variable or function names**. Use compound names: `castRay`, `castShadow`, `shootRay`, `spellEffect` (not `cast`)
+2. **Enclosed/semi-enclosed scene total darkness**: caves, interiors, sci-fi bases, mazes, and other enclosed scenes cannot rely solely on directional light (completely blocked by walls/ceiling); must use point lights + high ambient light (≥0.2) + emissive materials (see Variant 8)
+3. **Camera inside voxel causing rendering anomalies**: cave/indoor scene camera origin must be inside the cavity (where getVoxel returns 0), otherwise the first DDA step hits immediately = scene invisible
+4. **Complex getVoxel causing SwiftShader black screen (most common with Minecraft-style/town/character/multi-building scenes!)**: getVoxel is called once per DDA step; if it contains multiple buildings/characters/terrain+trees without early exit, frame timeout → only sky background renders. **Must do all of**: (1) AABB bounding box early exit (check coordinate range first, return 0 immediately outside building area); (2) MAX_RAY_STEPS ≤ 64, MAX_SHADOW_STEPS ≤ 16; (3) scene range within ±20 cells. **Minecraft-style scene = multi-building scene**; must follow this rule (see Variant 9, 11 template code)
+5. **vec2/vec3 dimension mismatch causing compile failure**: `p.xz` returns `vec2` and cannot be passed directly to noise/fbm functions expecting `vec3` parameters or used in operations with `vec3`. Use `vec3(p.xz, val)` to construct a full vec3, or use vec2 versions of functions
+6. **Mountain/terrain height-based coloring invisible**: (1) `maxH` must equal the actual max return value of the terrain noise function (don't arbitrarily use 20.0); (2) grass threshold at 0.4 (largest area ensures green is visible), rock 0.4~0.7, snow >0.7; (3) grass green must be saturated enough `vec3(0.25, 0.55, 0.15)` not grayish; (4) sun intensity ≤2.0, sky light ≤1.0, too bright washes out colors; (5) gamma correction reduces saturation, pre-compensate material colors (see Step 4 mountain terrain template)
+7. **Waterfall/flowing water effect lacks recognizability**: waterfall must have a clear cliff drop (≥10 cells), visible water column (noise + iTime offset), bottom splash particles (hash random bouncing), and mist (exponential decay density field). Just a gradient color block is not a waterfall! See Variant 10 complete template
+8. **"Low saturation coloring" becomes pure white/gray**: low saturation ≠ near white! Low saturation means colors are not vivid but still have clear hue (e.g., brick red `vec3(0.55, 0.35, 0.3)` not gray-white `vec3(0.8, 0.8, 0.8)`). Brick/stone textures must use UV periodic patterns (staggered rows + mortar dark lines), not solid colors. See the `getBrickColor` function in the complete template
+9. **Sunset/side-lit scene buildings become black silhouettes**: when low-angle light (sunset/dawn) illuminates from the side, building fronts are completely backlit → pure black silhouettes with no visible detail. Must: (1) ambient light ≥ 0.3; (2) walls use bright materials (light yellow, off-white) not dark colors; (3) buildings large enough (width/height ≥ 5 cells). See Variant 11 sunset scene code
+
+## Further Reading
+
+For full step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/voxel-rendering.md)
--- a/skills/shader-dev/techniques/water-ocean.md
+++ b/skills/shader-dev/techniques/water-ocean.md
@@ -0,0 +1,490 @@
+# Water & Ocean Rendering Skill
+
+## Use Cases
+- Rendering water body surfaces such as oceans, lakes, and rivers
+- Water surface reflection/refraction, Fresnel effects
+- Underwater caustics lighting effects
+- Waves, foam, and water flow animation
+
+## Core Principles
+
+Water rendering solves three problems: **water surface shape generation**, **light-water surface interaction**, and **water body color compositing**.
+
+### Wave Generation: Exponential Sine Stacking + Derivative Domain Warping
+
+`wave(x) = exp(sin(x) - 1)` — sharp wave crests (`exp(0)=1`), broad flat troughs (`exp(-2)≈0.135`), similar to a trochoidal profile but at much lower computational cost than Gerstner waves.
+
+When stacking multiple waves, use **derivative domain warping (Drag)**:
+```
+position += direction * derivative * weight * DRAG_MULT
+```
+Small ripples cluster on the crests of large waves, simulating capillary waves riding on gravity waves.
+
+### Lighting: Schlick Fresnel + Subsurface Scattering
+
+- **Schlick Fresnel**: `F = F0 + (1-F0) * (1-dot(N,V))^5`, water F0 ≈ 0.04
+- **SSS approximation**: thicker water layer at troughs → stronger blue-green scattering; thinner layer at crests → weaker scattering
+
+### Water Surface Intersection: Bounded Height Field Marching
+
+The water surface is constrained within a `[0, -WATER_DEPTH]` bounding box, with adaptive step size: `step = ray_y - wave_height`.
+
+## Implementation Steps
+
+### Step 1: Exponential Sine Wave Function
+```glsl
+// Single wave: exp(sin(x)-1) produces sharp peaks and broad troughs, returns (value, negative derivative)
+vec2 wavedx(vec2 position, vec2 direction, float frequency, float timeshift) {
+    float x = dot(direction, position) * frequency + timeshift;
+    float wave = exp(sin(x) - 1.0);
+    float dx = wave * cos(x);
+    return vec2(wave, -dx);
+}
+```
+
+### Step 2: Multi-Octave Wave Stacking with Domain Warping
+```glsl
+#define DRAG_MULT 0.38  // Domain warp strength, 0=none, 0.5=strong clustering
+
+float getwaves(vec2 position, int iterations) {
+    float wavePhaseShift = length(position) * 0.1;
+    float iter = 0.0;
+    float frequency = 1.0;
+    float timeMultiplier = 2.0;
+    float weight = 1.0;
+    float sumOfValues = 0.0;
+    float sumOfWeights = 0.0;
+    for (int i = 0; i < iterations; i++) {
+        vec2 p = vec2(sin(iter), cos(iter));  // Pseudo-random wave direction
+        vec2 res = wavedx(position, p, frequency, iTime * timeMultiplier + wavePhaseShift);
+        position += p * res.y * weight * DRAG_MULT; // Derivative domain warp
+        sumOfValues += res.x * weight;
+        sumOfWeights += weight;
+        weight = mix(weight, 0.0, 0.2);      // Weight decay
+        frequency *= 1.18;                     // Frequency growth rate
+        timeMultiplier *= 1.07;                // Dispersion
+        iter += 1232.399963;                   // Uniform direction distribution
+    }
+    return sumOfValues / sumOfWeights;
+}
+```
+
+### Step 3: Bounded Bounding Box Ray Marching
+```glsl
+#define WATER_DEPTH 1.0
+
+float intersectPlane(vec3 origin, vec3 direction, vec3 point, vec3 normal) {
+    return clamp(dot(point - origin, normal) / dot(direction, normal), -1.0, 9991999.0);
+}
+
+float raymarchwater(vec3 camera, vec3 start, vec3 end, float depth) {
+    vec3 pos = start;
+    vec3 dir = normalize(end - start);
+    for (int i = 0; i < 64; i++) {
+        float height = getwaves(pos.xz, ITERATIONS_RAYMARCH) * depth - depth;
+        if (height + 0.01 > pos.y) {
+            return distance(pos, camera);
+        }
+        pos += dir * (pos.y - height);      // Adaptive step size
+    }
+    return distance(start, camera);
+}
+```
+
+### Step 4: Normal Calculation and Distance Smoothing
+```glsl
+#define ITERATIONS_RAYMARCH 12  // For marching (fewer = faster)
+#define ITERATIONS_NORMAL 36    // For normals (more = finer detail)
+
+vec3 calcNormal(vec2 pos, float e, float depth) {
+    vec2 ex = vec2(e, 0);
+    float H = getwaves(pos.xy, ITERATIONS_NORMAL) * depth;
+    vec3 a = vec3(pos.x, H, pos.y);
+    return normalize(
+        cross(
+            a - vec3(pos.x - e, getwaves(pos.xy - ex.xy, ITERATIONS_NORMAL) * depth, pos.y),
+            a - vec3(pos.x, getwaves(pos.xy + ex.yx, ITERATIONS_NORMAL) * depth, pos.y + e)
+        )
+    );
+}
+
+// Distance smoothing: normals approach (0,1,0) at far distances
+// N = mix(N, vec3(0.0, 1.0, 0.0), 0.8 * min(1.0, sqrt(dist * 0.01) * 1.1));
+```
+
+### Step 5: Fresnel Reflection and Subsurface Scattering
+```glsl
+float fresnel = 0.04 + 0.96 * pow(1.0 - max(0.0, dot(-N, ray)), 5.0);
+
+vec3 R = normalize(reflect(ray, N));
+R.y = abs(R.y);  // Force upward to avoid self-intersection
+
+vec3 reflection = getAtmosphere(R) + getSun(R);
+
+vec3 scattering = vec3(0.0293, 0.0698, 0.1717) * 0.1
+                * (0.2 + (waterHitPos.y + WATER_DEPTH) / WATER_DEPTH);
+
+vec3 C = fresnel * reflection + scattering;
+```
+
+### Step 6: Atmosphere and Tone Mapping
+```glsl
+vec3 extra_cheap_atmosphere(vec3 raydir, vec3 sundir) {
+    float special_trick = 1.0 / (raydir.y * 1.0 + 0.1);
+    float special_trick2 = 1.0 / (sundir.y * 11.0 + 1.0);
+    float raysundt = pow(abs(dot(sundir, raydir)), 2.0);
+    float sundt = pow(max(0.0, dot(sundir, raydir)), 8.0);
+    float mymie = sundt * special_trick * 0.2;
+    vec3 suncolor = mix(vec3(1.0), max(vec3(0.0), vec3(1.0) - vec3(5.5, 13.0, 22.4) / 22.4),
+                        special_trick2);
+    vec3 bluesky = vec3(5.5, 13.0, 22.4) / 22.4 * suncolor;
+    vec3 bluesky2 = max(vec3(0.0), bluesky - vec3(5.5, 13.0, 22.4) * 0.002
+                   * (special_trick + -6.0 * sundir.y * sundir.y));
+    bluesky2 *= special_trick * (0.24 + raysundt * 0.24);
+    return bluesky2 * (1.0 + 1.0 * pow(1.0 - raydir.y, 3.0));
+}
+
+vec3 aces_tonemap(vec3 color) {
+    mat3 m1 = mat3(
+        0.59719, 0.07600, 0.02840,
+        0.35458, 0.90834, 0.13383,
+        0.04823, 0.01566, 0.83777);
+    mat3 m2 = mat3(
+        1.60475, -0.10208, -0.00327,
+       -0.53108,  1.10813, -0.07276,
+       -0.07367, -0.00605,  1.07602);
+    vec3 v = m1 * color;
+    vec3 a = v * (v + 0.0245786) - 0.000090537;
+    vec3 b = v * (0.983729 * v + 0.4329510) + 0.238081;
+    return pow(clamp(m2 * (a / b), 0.0, 1.0), vec3(1.0 / 2.2));
+}
+```
+
+## Complete Code Template
+
+Can be pasted directly into ShaderToy to run. Distilled from `afl_ext`'s "Very fast procedural ocean".
+
+```glsl
+// Water & Ocean Rendering — ShaderToy Template
+// exp(sin) wave model + derivative domain warp + Schlick Fresnel + SSS
+
+// ==================== Tunable Parameters ====================
+#define DRAG_MULT 0.38
+#define WATER_DEPTH 1.0
+#define CAMERA_HEIGHT 1.5
+#define ITERATIONS_RAYMARCH 12
+#define ITERATIONS_NORMAL 36
+#define RAYMARCH_STEPS 64
+#define NORMAL_EPSILON 0.01
+#define FRESNEL_F0 0.04
+#define SSS_COLOR vec3(0.0293, 0.0698, 0.1717)
+#define SSS_INTENSITY 0.1
+#define SUN_POWER 720.0
+#define SUN_BRIGHTNESS 210.0
+#define EXPOSURE 2.0
+
+// ==================== Wave Functions ====================
+vec2 wavedx(vec2 position, vec2 direction, float frequency, float timeshift) {
+    float x = dot(direction, position) * frequency + timeshift;
+    float wave = exp(sin(x) - 1.0);
+    float dx = wave * cos(x);
+    return vec2(wave, -dx);
+}
+
+float getwaves(vec2 position, int iterations) {
+    float wavePhaseShift = length(position) * 0.1;
+    float iter = 0.0;
+    float frequency = 1.0;
+    float timeMultiplier = 2.0;
+    float weight = 1.0;
+    float sumOfValues = 0.0;
+    float sumOfWeights = 0.0;
+    for (int i = 0; i < iterations; i++) {
+        vec2 p = vec2(sin(iter), cos(iter));
+        vec2 res = wavedx(position, p, frequency, iTime * timeMultiplier + wavePhaseShift);
+        position += p * res.y * weight * DRAG_MULT;
+        sumOfValues += res.x * weight;
+        sumOfWeights += weight;
+        weight = mix(weight, 0.0, 0.2);
+        frequency *= 1.18;
+        timeMultiplier *= 1.07;
+        iter += 1232.399963;
+    }
+    return sumOfValues / sumOfWeights;
+}
+
+// ==================== Ray Marching ====================
+float intersectPlane(vec3 origin, vec3 direction, vec3 point, vec3 normal) {
+    return clamp(dot(point - origin, normal) / dot(direction, normal), -1.0, 9991999.0);
+}
+
+float raymarchwater(vec3 camera, vec3 start, vec3 end, float depth) {
+    vec3 pos = start;
+    vec3 dir = normalize(end - start);
+    for (int i = 0; i < RAYMARCH_STEPS; i++) {
+        float height = getwaves(pos.xz, ITERATIONS_RAYMARCH) * depth - depth;
+        if (height + 0.01 > pos.y) {
+            return distance(pos, camera);
+        }
+        pos += dir * (pos.y - height);
+    }
+    return distance(start, camera);
+}
+
+// ==================== Normals ====================
+vec3 calcNormal(vec2 pos, float e, float depth) {
+    vec2 ex = vec2(e, 0);
+    float H = getwaves(pos.xy, ITERATIONS_NORMAL) * depth;
+    vec3 a = vec3(pos.x, H, pos.y);
+    return normalize(
+        cross(
+            a - vec3(pos.x - e, getwaves(pos.xy - ex.xy, ITERATIONS_NORMAL) * depth, pos.y),
+            a - vec3(pos.x, getwaves(pos.xy + ex.yx, ITERATIONS_NORMAL) * depth, pos.y + e)
+        )
+    );
+}
+
+// ==================== Camera ====================
+#define NormalizedMouse (iMouse.xy / iResolution.xy)
+
+mat3 createRotationMatrixAxisAngle(vec3 axis, float angle) {
+    float s = sin(angle);
+    float c = cos(angle);
+    float oc = 1.0 - c;
+    return mat3(
+        oc * axis.x * axis.x + c,           oc * axis.x * axis.y - axis.z * s, oc * axis.z * axis.x + axis.y * s,
+        oc * axis.x * axis.y + axis.z * s,  oc * axis.y * axis.y + c,          oc * axis.y * axis.z - axis.x * s,
+        oc * axis.z * axis.x - axis.y * s,  oc * axis.y * axis.z + axis.x * s, oc * axis.z * axis.z + c
+    );
+}
+
+vec3 getRay(vec2 fragCoord) {
+    vec2 uv = ((fragCoord.xy / iResolution.xy) * 2.0 - 1.0) * vec2(iResolution.x / iResolution.y, 1.0);
+    vec3 proj = normalize(vec3(uv.x, uv.y, 1.5));
+    if (iResolution.x < 600.0) return proj;
+    return createRotationMatrixAxisAngle(vec3(0.0, -1.0, 0.0), 3.0 * ((NormalizedMouse.x + 0.5) * 2.0 - 1.0))
+         * createRotationMatrixAxisAngle(vec3(1.0, 0.0, 0.0), 0.5 + 1.5 * (((NormalizedMouse.y == 0.0 ? 0.27 : NormalizedMouse.y)) * 2.0 - 1.0))
+         * proj;
+}
+
+// ==================== Atmosphere ====================
+vec3 getSunDirection() {
+    return normalize(vec3(-0.0773502691896258, 0.5 + sin(iTime * 0.2 + 2.6) * 0.45, 0.5773502691896258));
+}
+
+vec3 extra_cheap_atmosphere(vec3 raydir, vec3 sundir) {
+    float special_trick = 1.0 / (raydir.y * 1.0 + 0.1);
+    float special_trick2 = 1.0 / (sundir.y * 11.0 + 1.0);
+    float raysundt = pow(abs(dot(sundir, raydir)), 2.0);
+    float sundt = pow(max(0.0, dot(sundir, raydir)), 8.0);
+    float mymie = sundt * special_trick * 0.2;
+    vec3 suncolor = mix(vec3(1.0), max(vec3(0.0), vec3(1.0) - vec3(5.5, 13.0, 22.4) / 22.4), special_trick2);
+    vec3 bluesky = vec3(5.5, 13.0, 22.4) / 22.4 * suncolor;
+    vec3 bluesky2 = max(vec3(0.0), bluesky - vec3(5.5, 13.0, 22.4) * 0.002 * (special_trick + -6.0 * sundir.y * sundir.y));
+    bluesky2 *= special_trick * (0.24 + raysundt * 0.24);
+    return bluesky2 * (1.0 + 1.0 * pow(1.0 - raydir.y, 3.0));
+}
+
+vec3 getAtmosphere(vec3 dir) {
+    return extra_cheap_atmosphere(dir, getSunDirection()) * 0.5;
+}
+
+float getSun(vec3 dir) {
+    return pow(max(0.0, dot(dir, getSunDirection())), SUN_POWER) * SUN_BRIGHTNESS;
+}
+
+// ==================== Tone Mapping ====================
+vec3 aces_tonemap(vec3 color) {
+    mat3 m1 = mat3(
+        0.59719, 0.07600, 0.02840,
+        0.35458, 0.90834, 0.13383,
+        0.04823, 0.01566, 0.83777);
+    mat3 m2 = mat3(
+        1.60475, -0.10208, -0.00327,
+       -0.53108,  1.10813, -0.07276,
+       -0.07367, -0.00605,  1.07602);
+    vec3 v = m1 * color;
+    vec3 a = v * (v + 0.0245786) - 0.000090537;
+    vec3 b = v * (0.983729 * v + 0.4329510) + 0.238081;
+    return pow(clamp(m2 * (a / b), 0.0, 1.0), vec3(1.0 / 2.2));
+}
+
+// ==================== Main Function ====================
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    vec3 ray = getRay(fragCoord);
+    if (ray.y >= 0.0) {
+        vec3 C = getAtmosphere(ray) + getSun(ray);
+        fragColor = vec4(aces_tonemap(C * EXPOSURE), 1.0);
+        return;
+    }
+
+    vec3 waterPlaneHigh = vec3(0.0, 0.0, 0.0);
+    vec3 waterPlaneLow = vec3(0.0, -WATER_DEPTH, 0.0);
+    vec3 origin = vec3(iTime * 0.2, CAMERA_HEIGHT, 1.0);
+
+    float highPlaneHit = intersectPlane(origin, ray, waterPlaneHigh, vec3(0.0, 1.0, 0.0));
+    float lowPlaneHit = intersectPlane(origin, ray, waterPlaneLow, vec3(0.0, 1.0, 0.0));
+    vec3 highHitPos = origin + ray * highPlaneHit;
+    vec3 lowHitPos = origin + ray * lowPlaneHit;
+
+    float dist = raymarchwater(origin, highHitPos, lowHitPos, WATER_DEPTH);
+    vec3 waterHitPos = origin + ray * dist;
+
+    vec3 N = calcNormal(waterHitPos.xz, NORMAL_EPSILON, WATER_DEPTH);
+    N = mix(N, vec3(0.0, 1.0, 0.0), 0.8 * min(1.0, sqrt(dist * 0.01) * 1.1));
+
+    float fresnel = FRESNEL_F0 + (1.0 - FRESNEL_F0) * pow(1.0 - max(0.0, dot(-N, ray)), 5.0);
+
+    vec3 R = normalize(reflect(ray, N));
+    R.y = abs(R.y);
+    vec3 reflection = getAtmosphere(R) + getSun(R);
+
+    vec3 scattering = SSS_COLOR * SSS_INTENSITY
+                    * (0.2 + (waterHitPos.y + WATER_DEPTH) / WATER_DEPTH);
+
+    vec3 C = fresnel * reflection + scattering;
+    fragColor = vec4(aces_tonemap(C * EXPOSURE), 1.0);
+}
+```
+
+## Common Variants
+
+### Variant 1: 2D Underwater Caustic Texture
+```glsl
+#define TAU 6.28318530718
+#define MAX_ITER 5
+
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    float time = iTime * 0.5 + 23.0;
+    vec2 uv = fragCoord.xy / iResolution.xy;
+    vec2 p = mod(uv * TAU, TAU) - 250.0;
+    vec2 i = vec2(p);
+    float c = 1.0;
+    float inten = 0.005;
+
+    for (int n = 0; n < MAX_ITER; n++) {
+        float t = time * (1.0 - (3.5 / float(n + 1)));
+        i = p + vec2(cos(t - i.x) + sin(t + i.y), sin(t - i.y) + cos(t + i.x));
+        c += 1.0 / length(vec2(p.x / (sin(i.x + t) / inten), p.y / (cos(i.y + t) / inten)));
+    }
+    c /= float(MAX_ITER);
+    c = 1.17 - pow(c, 1.4);
+    vec3 colour = vec3(pow(abs(c), 8.0));
+    colour = clamp(colour + vec3(0.0, 0.35, 0.5), 0.0, 1.0);
+    fragColor = vec4(colour, 1.0);
+}
+```
+
+### Variant 2: FBM Bump-Mapped Lake Surface
+```glsl
+float waterMap(vec2 pos) {
+    mat2 m2 = mat2(0.60, -0.80, 0.80, 0.60);
+    vec2 posm = pos * m2;
+    return abs(fbm(vec3(8.0 * posm, iTime)) - 0.5) * 0.1;
+}
+
+// Analytic plane intersection instead of ray marching
+float t = -ro.y / rd.y;
+vec3 hitPos = ro + rd * t;
+
+// Finite difference normals (central differencing)
+float eps = 0.1;
+vec3 normal = vec3(0.0, 1.0, 0.0);
+normal.x = -bumpfactor * (waterMap(hitPos.xz + vec2(eps, 0.0)) - waterMap(hitPos.xz - vec2(eps, 0.0))) / (2.0 * eps);
+normal.z = -bumpfactor * (waterMap(hitPos.xz + vec2(0.0, eps)) - waterMap(hitPos.xz - vec2(0.0, eps))) / (2.0 * eps);
+normal = normalize(normal);
+
+float bumpfactor = 0.1 * (1.0 - smoothstep(0.0, 60.0, distance(ro, hitPos)));
+vec3 refracted = refract(rd, normal, 1.0 / 1.333);
+```
+
+### Variant 3: Ridge Noise Coastal Waves
+```glsl
+float sea(vec2 p) {
+    float f = 1.0;
+    float r = 0.0;
+    float time = -iTime;
+    for (int i = 0; i < 8; i++) {
+        r += (1.0 - abs(noise(p * f + 0.9 * time))) / f;
+        f *= 2.0;
+        p -= vec2(-0.01, 0.04) * (r - 0.2 * time / (0.1 - f));
+    }
+    return r / 4.0 + 0.5;
+}
+
+// Shoreline foam
+float dh = seaDist - rockDist;
+float foam = 0.0;
+if (dh < 0.0 && dh > -0.02) {
+    foam = 0.5 * exp(20.0 * dh);
+}
+```
+
+### Variant 4: Flow Map Water Animation
+```glsl
+vec3 FBM_DXY(vec2 p, vec2 flow, float persistence, float domainWarp) {
+    vec3 f = vec3(0.0);
+    float tot = 0.0;
+    float a = 1.0;
+    for (int i = 0; i < 4; i++) {
+        p += flow;
+        flow *= -0.75;
+        vec3 v = SmoothNoise_DXY(p);
+        f += v * a;
+        p += v.xy * domainWarp;
+        p *= 2.0;
+        tot += a;
+        a *= persistence;
+    }
+    return f / tot;
+}
+
+// Two-phase flow cycle (eliminates stretching)
+float t0 = fract(time);
+float t1 = fract(time + 0.5);
+vec4 sample0 = SampleWaterNormal(uv + Hash2(floor(time)),     flowRate * (t0 - 0.5));
+vec4 sample1 = SampleWaterNormal(uv + Hash2(floor(time+0.5)), flowRate * (t1 - 0.5));
+float weight = abs(t0 - 0.5) * 2.0;
+vec4 result = mix(sample0, sample1, weight);
+```
+
+### Variant 5: Beer's Law Water Absorption
+```glsl
+vec3 GetWaterExtinction(float dist) {
+    float fOpticalDepth = dist * 6.0;
+    vec3 vExtinctCol = vec3(0.5, 0.6, 0.9);
+    return exp2(-fOpticalDepth * vExtinctCol);
+}
+
+vec3 vInscatter = vSurfaceDiffuse * (1.0 - exp(-refractDist * 0.1))
+               * (1.0 + dot(sunDir, viewDir));
+
+vec3 underwaterColor = terrainColor * GetWaterExtinction(waterDepth) + vInscatter;
+vec3 finalColor = mix(underwaterColor, reflectionColor, fresnel);
+```
+
+## Performance & Composition
+
+### Performance Tips
+- **Dual iteration count strategy**: 12 iterations for marching, 36 for normals — halves render time with virtually no visual loss
+- **Distance-adaptive normal smoothing**: `N = mix(N, up, 0.8 * min(1.0, sqrt(dist*0.01)*1.1))`, eliminates distant flickering
+- **Bounding box clipping**: pre-compute upper/lower plane intersections, early-out for sky directions
+- **Adaptive step size**: `pos += dir * (pos.y - height)`, 3-5x faster than fixed steps
+- **Filter-width-aware decay**: `dFdx/dFdy` driven normal LOD
+- **LOD conditional detail**: only compute high-frequency displacement at close range
+
+### Composition Tips
+- **Volumetric clouds**: ray march clouds along reflection direction `R`, blend into reflection term
+- **Terrain coastline**: `dh = waterSDF - terrainSDF`, render foam when `dh ≈ 0`
+- **Caustics overlay**: project Variant 1 onto underwater terrain, `caustic * exp(-depth * absorption)` depth attenuation
+- **Fog/atmosphere**: independent extinction + in-scatter, per-channel RGB decay:
+  ```glsl
+  vec3 fogExtinction = exp2(fogExtCoeffs * -distance);
+  vec3 fogInscatter = fogColor * (1.0 - exp2(fogInCoeffs * -distance));
+  finalColor = finalColor * fogExtinction + fogInscatter;
+  ```
+- **Post-processing**: Bloom (Fibonacci spiral blur), ACES tone mapping, depth of field (DOF)
+
+## Further Reading
+
+For full step-by-step tutorials, mathematical derivations, and advanced usage, see [reference](../reference/water-ocean.md)
--- a/skills/shader-dev/techniques/webgl-pitfalls.md
+++ b/skills/shader-dev/techniques/webgl-pitfalls.md
@@ -0,0 +1,170 @@
+# WebGL2 Pitfalls & Common Errors
+
+## Use Cases
+
+- Avoiding common GLSL compilation errors when generating standalone WebGL2 shader pages
+- Debugging shader compilation failures
+- Ensuring shader templates from ShaderToy work correctly in WebGL2
+
+## Critical WebGL2 Rules
+
+### 1. Fragment Coordinate — Use `gl_FragCoord.xy`
+
+**ERROR**: `'fragCoord' : undeclared identifier`
+
+In WebGL2 fragment shaders, `fragCoord` is not a built-in variable. Use `gl_FragCoord.xy` instead.
+
+```glsl
+// WRONG
+void main() {
+    vec2 uv = (2.0 * fragCoord - iResolution.xy) / iResolution.y;
+}
+
+// CORRECT
+void main() {
+    vec2 uv = (2.0 * gl_FragCoord.xy - iResolution.xy) / iResolution.y;
+}
+```
+
+### 2. Shadertoy mainImage — Must Wrap in `main()`
+
+**ERROR**: `'' : Missing main()`
+
+If your fragment shader uses `void mainImage(out vec4, in vec2)`, you must provide a `main()` wrapper.
+
+```glsl
+// WRONG — only defines mainImage but no main()
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // shader code...
+    fragColor = vec4(col, 1.0);
+}
+
+// CORRECT
+void mainImage(out vec4 fragColor, in vec2 fragCoord) {
+    // shader code...
+    fragColor = vec4(col, 1.0);
+}
+
+void main() {
+    mainImage(fragColor, gl_FragCoord.xy);
+}
+```
+
+### 3. Function Declaration Order — Declare Before Use
+
+**ERROR**: `'functionName' : no matching overloaded function found`
+
+GLSL requires functions to be declared before they are used. Forward declarations or reordering is needed.
+
+```glsl
+// WRONG — getAtmosphere() calls getSunDirection() which is defined after
+vec3 getAtmosphere(vec3 dir) {
+    return extra_cheap_atmosphere(dir, getSunDirection()) * 0.5;  // Error!
+}
+vec3 getSunDirection() {
+    return normalize(vec3(-0.5, 0.8, -0.6));
+}
+
+// CORRECT — reorder functions
+vec3 getSunDirection() {  // Define first
+    return normalize(vec3(-0.5, 0.8, -0.6));
+}
+vec3 getAtmosphere(vec3 dir) {  // Now can call getSunDirection()
+    return extra_cheap_atmosphere(dir, getSunDirection()) * 0.5;
+}
+```
+
+### 4. Macro Limitations — `#define` Cannot Use Functions
+
+**ERROR**: Various compilation errors with `#define` macros
+
+Macros are text substitution and cannot call functions or use parentheses in the same way as C++.
+
+```glsl
+// WRONG
+#define SUN_DIR normalize(vec3(0.8, 0.4, -0.6))
+#define WORLD_TIME (iTime * speed())
+
+// CORRECT — use const
+const vec3 SUN_DIR = vec3(0.756, 0.378, -0.567);  // Pre-computed normalized value
+const float WORLD_TIME = 1.0;
+```
+
+### 5. Vector Component Access — Terrain Functions
+
+**ERROR**: `'terrainM' : no matching overloaded function found`
+
+When passing positions to terrain functions that expect `vec2`, extract the XZ components properly.
+
+```glsl
+// WRONG — terrainM expects vec2, but passing vec3
+float calcAO(vec3 pos, vec3 nor) {
+    float d = terrainM(pos + h * nor);  // Error: pos + h*nor is vec3
+    ...
+}
+
+// CORRECT — extract xz components
+float calcAO(vec3 pos, vec3 nor) {
+    float d = terrainM(pos.xz + h * nor.xz);
+    ...
+}
+```
+
+### 6. Loop Index — Use Runtime Constants
+
+**ERROR**: Loop index must be a runtime expression
+
+GLSL ES requires loop indices to be determinable at runtime, not compile-time constants in some contexts.
+
+```glsl
+// WRONG — AA is a #define constant
+for (int i = 0; i < AA; i++) { ... }
+
+// CORRECT — use a runtime-safe approach
+for (int i = 0; i < 4; i++) { ... }  // Or pass as uniform
+```
+
+### 7. Uniform Usage — Avoid Unused Uniforms
+
+**ERROR**: Uniform optimized away causes `gl.getUniformLocation()` to return `null`
+
+If a uniform is declared but not used, the compiler may optimize it out.
+
+```glsl
+// WRONG — iTime declared but used in a conditional that might be false
+uniform float iTime;
+if (false) { x = iTime; }  // iTime optimized away
+
+// CORRECT — always use the uniform in a way the compiler can't optimize out
+uniform float iTime;
+float t = iTime * 0.0;  // Always use iTime somehow
+if (someCondition) { x = t; }
+```
+
+## Complete WebGL2 Adaptation Checklist
+
+When generating standalone HTML pages:
+
+1. **Shader Version**: `#version 300 es` must be the very first line
+2. **Fragment Output**: Declare `out vec4 fragColor;`
+3. **Entry Point**: Wrap `mainImage()` in `void main()` that calls `mainImage(fragColor, gl_FragCoord.xy)`
+4. **Fragment Coord**: Use `gl_FragCoord.xy` not `fragCoord`
+5. **Preprocessor**: Don't use functions in `#define` macros
+6. **Function Order**: Declare functions before they are used, or use forward declarations
+7. **Texture**: Use `texture()` not `texture2D()`
+8. **Attributes**: `attribute` → `in`, `varying` → `in`/`out`
+
+## Common Error Messages Reference
+
+| Error Message | Likely Cause | Solution |
+|---|---|---|
+| `'fragCoord' : undeclared identifier` | Using `fragCoord` instead of `gl_FragCoord.xy` | Replace with `gl_FragCoord.xy` |
+| `'' : Missing main()` | No `main()` function defined | Add wrapper `void main() { mainImage(...); }` |
+| `'function' : no matching overloaded function` | Wrong argument types or function order | Check parameter types, reorder functions |
+| `return' : function return is not matching` | Return type mismatch | Verify return expression matches declared return type |
+| `#version` must be first | Leading whitespace in shader source | Use `.trim()` when extracting from script tags |
+| Uniform `null` from `getUniformLocation` | Uniform optimized away | Ensure uniform is actually used in shader code |
+
+## Further Reading
+
+See [reference/webgl-pitfalls.md](../reference/webgl-pitfalls.md) for additional debugging techniques.