Anki Deck Changes

Note 1: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Cloze
GUID: #G4wpQ?X^2

added

Note did not exist

New Note

Front

In practice, $T_p$ $> T_1/p$ because of synchronization overhead, scheduling overhead, load imbalance, and memory contention.

Back

In practice, $T_p$ $> T_1/p$ because of synchronization overhead, scheduling overhead, load imbalance, and memory contention.

Perfect efficiency (E = 1, S_p = p) is rarely achievable. Sub-linear speedup is the normal situation; super-linear is exceptional.

Field-by-field Comparison

Field	Before	After
Text		In practice, $T_p$ {{c1::$> T_1/p$}} because of {{c2::synchronization overhead, scheduling overhead, load imbalance, and memory contention}}.
Extra		Perfect efficiency (E = 1, S_p = p) is rarely achievable. Sub-linear speedup is the normal situation; super-linear is exceptional.

Tags: ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance

Note 2: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Cloze
GUID: $lOO8/w@3@

added

Note did not exist

New Note

Front

A static synchronized method acquires a lock on the Class object (MyClass.class) — a global (class-level) lock, shared across all instances.

Back

A static synchronized method acquires a lock on the Class object (MyClass.class) — a global (class-level) lock, shared across all instances.

A non-static synchronized method locks this (the instance). The two locks are independent — a static and a non-static synchronized method on the same class can execute concurrently.

Field-by-field Comparison

Field	Before	After
Text		A <code>static synchronized</code> method acquires a lock on {{c1::the <code>Class</code> object (<code>MyClass.class</code>)}} — a {{c2::global (class-level) lock}}, shared across all instances.
Extra		A non-static <code>synchronized</code> method locks <code>this</code> (the instance). The two locks are independent — a static and a non-static synchronized method on the same class can execute concurrently.

Tags: ETH::2._Semester::PProg::05._Java_Threads::2._synchronized

Note 3: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Classic
GUID: %2]9g85BmT

added

Note did not exist

New Note

Front

What does Amdahl's Law give for $f=0$ (fully parallel) and $f=1$ (fully serial)?

Back

What does Amdahl's Law give for $f=0$ (fully parallel) and $f=1$ (fully serial)?

$f = 0$: $S_P = P$ → linear speedup, efficiency = 1. Adding more cores always helps.
$f = 1$: $S_P = 1$ → no speedup at all, regardless of P. The program is entirely sequential.

Field-by-field Comparison

Field	Before	After
Front		What does Amdahl's Law give for $f=0$ (fully parallel) and $f=1$ (fully serial)?
Back		<ul><li>$f = 0$: $S_P = P$ → <b>linear speedup</b>, efficiency = 1. Adding more cores always helps.</li><li>$f = 1$: $S_P = 1$ → <b>no speedup at all</b>, regardless of P. The program is entirely sequential.</li></ul>

Tags: ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance

Note 4: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Classic
GUID: )Ln/^}M1}?

added

Note did not exist

New Note

Front

Under what conditions does Amdahl's bound $S_p \leq \frac{1}{f + (1-f)/p}$ become an equality?

Back

Under what conditions does Amdahl's bound $S_p \leq \frac{1}{f + (1-f)/p}$ become an equality?

The bound is achieved when: (1) the parallel part scales perfectly (no synchronization overhead, no load imbalance), and (2) $T_p = f \cdot T_1 + \frac{(1-f) \cdot T_1}{p}$ exactly. In practice the bound is never reached — Amdahl gives an idealized upper bound, not a prediction.

Field-by-field Comparison

Field	Before	After
Front		Under what conditions does Amdahl's bound $S_p \leq \frac{1}{f + (1-f)/p}$ become an <b>equality</b>?
Back		The bound is achieved when: (1) the parallel part scales <b>perfectly</b> (no synchronization overhead, no load imbalance), and (2) $T_p = f \cdot T_1 + \frac{(1-f) \cdot T_1}{p}$ exactly. In practice the bound is never reached — Amdahl gives an idealized <b>upper bound</b>, not a prediction.

Tags: ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance

Note 5: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Classic
GUID: )|$CP#Zb3O

added

Note did not exist

New Note

Front

What is a "lost notification" and when does it occur?

Back

What is a "lost notification" and when does it occur?

If notify() is called before any thread has called wait(), the notification is silently discarded — it is not buffered. Any thread that subsequently calls wait() will block indefinitely. Fix: always protect the condition with a flag variable and check it before waiting.

Field-by-field Comparison

Field	Before	After
Front		What is a "lost notification" and when does it occur?
Back		If <code>notify()</code> is called <b>before</b> any thread has called <code>wait()</code>, the notification is silently discarded — it is <b>not buffered</b>. Any thread that subsequently calls <code>wait()</code> will block indefinitely. Fix: always protect the condition with a flag variable and check it before waiting.

Tags: ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll

Note 6: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Classic
GUID: //c-)1n-D%

added

Note did not exist

New Note

Front

Why must wait() always be called inside a while loop, not an if?

Back

Why must wait() always be called inside a while loop, not an if?

Because of spurious wakeups — a thread can wake from wait() without notify() ever being called (allowed by the JVM for performance reasons). If the condition is checked only once with if, the thread proceeds even when the condition is still false. A while loop re-checks the condition after every wakeup.

Field-by-field Comparison

Field	Before	After
Front		Why must <code>wait()</code> always be called inside a <code>while</code> loop, not an <code>if</code>?
Back		Because of <b>spurious wakeups</b> — a thread can wake from <code>wait()</code> without <code>notify()</code> ever being called (allowed by the JVM for performance reasons). If the condition is checked only once with <code>if</code>, the thread proceeds even when the condition is still false. A <code>while</code> loop re-checks the condition after every wakeup.

Tags: ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll

Note 7: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Cloze
GUID: 1(+0hH>|3h

added

Note did not exist

New Note

Front

A Fork/Join task graph (DAG) is dynamic — it unfolds as execution proceeds, since which tasks are spawned depends on runtime values.

Back

A Fork/Join task graph (DAG) is dynamic — it unfolds as execution proceeds, since which tasks are spawned depends on runtime values.

Independent nodes in the DAG *can* but do not *have to* execute in parallel — it depends on the scheduler and available threads.

Field-by-field Comparison

Field	Before	After
Text		A Fork/Join task graph (DAG) is {{c1::dynamic}} — it {{c2::unfolds as execution proceeds}}, since which tasks are spawned depends on runtime values.
Extra		Independent nodes in the DAG can but do not have to execute in parallel — it depends on the scheduler and available threads.

Tags: ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures

Note 8: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Classic
GUID: 3fV+]~d+w}

added

Note did not exist

New Note

Front

What are the work and span of parallel merge sort, and what limits its parallelism?

Back

What are the work and span of parallel merge sort, and what limits its parallelism?

Work: $O(n \log n)$ — same as sequential
Span: $O(\log^2 n)$ — recursive calls parallelized (O(log n) depth) but merge step takes O(log n) span via parallel merge
Parallelism: $O(n/\log n)$

Bottleneck: the merge step. A sequential merge gives O(n) span, limiting parallelism to O(log n).

Field-by-field Comparison

Field	Before	After
Front		What are the work and span of parallel merge sort, and what limits its parallelism?
Back		<ul><li><b>Work</b>: $O(n \log n)$ — same as sequential</li><li><b>Span</b>: $O(\log^2 n)$ — recursive calls parallelized (O(log n) depth) but merge step takes O(log n) span via parallel merge</li><li><b>Parallelism</b>: $O(n/\log n)$</li></ul>Bottleneck: the merge step. A sequential merge gives O(n) span, limiting parallelism to O(log n).

Tags: ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms

Note 9: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Classic
GUID: 3h>LP|0z)0

added

Note did not exist

New Note

Front

In a RecursiveTask, why should you fork() one subtask and compute() the other directly in the current thread (rather than forking both)?

Back

In a RecursiveTask, why should you fork() one subtask and compute() the other directly in the current thread (rather than forking both)?

Forking both sides creates two new tasks and leaves the current thread idle waiting for both. By calling compute() on one side directly, the current thread does useful work instead of idling — halving the number of tasks created and reducing scheduling overhead. Always join() the forked task after compute() finishes.

Field-by-field Comparison

Field	Before	After
Front		In a RecursiveTask, why should you <code>fork()</code> one subtask and <code>compute()</code> the other directly in the current thread (rather than forking both)?
Back		Forking both sides creates two new tasks and leaves the current thread idle waiting for both. By calling <code>compute()</code> on one side directly, the current thread does useful work instead of idling — halving the number of tasks created and reducing scheduling overhead. Always <code>join()</code> the forked task after <code>compute()</code> finishes.

Tags: ETH::2._Semester::PProg::09._Divide_and_conquer::4._Java's_ForkJoin_framework

Note 10: ETH::2. Semester::PProg

Deck: ETH::2. Semester::PProg
Note Type: Horvath Cloze
GUID:

6X!+L


            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Pipeline throughput bound (infinite stream, one execution unit per stage) \(= {{c1::\dfrac{1}{\max_i(\text{stage\_time}_i)}}}\)
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Pipeline throughput bound (infinite stream, one execution unit per stage) \(= {{c1::\dfrac{1}{\max_i(\text{stage\_time}_i)}}}\)
    
    
    
    Throughput is limited by the slowest (bottleneck) stage. For a balanced pipeline all stage times are equal, so throughput = 1/stage_time.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Pipeline throughput bound (infinite stream, one execution unit per stage) \(= {{c1::\dfrac{1}{\max_i(\text{stage\_time}_i)}}}\)
                        
                        
                        
                            Extra
                            
                            Throughput is limited by the slowest (bottleneck) stage. For a balanced pipeline all stage times are equal, so throughput = 1/stage_time.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining

Field	Before	After
Text		Pipeline throughput bound (infinite stream, one execution unit per stage) \(= {{c1::\dfrac{1}{\max_i(\text{stage\_time}_i)}}}\)
Extra		Throughput is limited by the slowest (bottleneck) stage. For a balanced pipeline all stage times are equal, so throughput = 1/stage_time.


        
        
            Note 11: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: 6wa}o?][4W
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    What does it mean for a parallel program to have efficiency E = 1 (100%), and when does efficiency approach 0?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    What does it mean for a parallel program to have efficiency E = 1 (100%), and when does efficiency approach 0?
    
    \(E = S_p / p = T_1 / (p \cdot T_p)\). E = 1 means \(T_p = T_1/p\) — every processor is doing useful work 100% of the time, no idle time, no overhead. Efficiency → 0 when: (1) p is very large (Amdahl: speedup saturates while p grows), or (2) overhead (synchronization, scheduling) dominates useful work.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What does it mean for a parallel program to have efficiency E = 1 (100%), and when does efficiency approach 0?
                        
                        
                        
                            Back
                            
                            \(E = S_p / p = T_1 / (p \cdot T_p)\). E = 1 means \(T_p = T_1/p\) — every processor is doing useful work 100% of the time, no idle time, no overhead. Efficiency → 0 when: (1) p is very large (Amdahl: speedup saturates while p grows), or (2) overhead (synchronization, scheduling) dominates useful work.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
                    
                
            
            
        
        
        
            Note 12: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: 
            

            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    Pack has \(O(n)\) work and \(O(\log n)\) span.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    Pack has \(O(n)\) work and \(O(\log n)\) span.
    
    
    
    Each of the 3 steps (flag, prefix sum, scatter) is O(n) work and O(log n) span. The steps can be partially combined without changing asymptotic complexity.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Pack has {{c1::\(O(n)\) work}} and {{c2::\(O(\log n)\) span}}.
                        
                        
                        
                            Extra
                            
                            Each of the 3 steps (flag, prefix sum, scatter) is O(n) work and O(log n) span. The steps can be partially combined without changing asymptotic complexity.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
                    
                
            
            
        

        
        
            Note 13: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: ?aiVa43NDU
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::2._Parallel_patterns
  
  
    Summing an array of n elements in parallel has work \(O(n)\), span \(O(\log n)\), and parallelism \(O(n / \log n)\).
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::2._Parallel_patterns
  
  
    Summing an array of n elements in parallel has work \(O(n)\), span \(O(\log n)\), and parallelism \(O(n / \log n)\).
    
    
    
    Work = O(n) (same as sequential). Span = O(log n) (tree depth). Parallelism = work/span = O(n/log n), meaning you benefit from up to O(n/log n) processors before hitting the span bottleneck.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Summing an array of n elements in parallel has work \({{c1::O(n)}}\), span \({{c2::O(\log n)}}\), and parallelism \({{c3::O(n / \log n)}}\).
                        
                        
                        
                            Extra
                            
                            Work = O(n) (same as sequential). Span = O(log n) (tree depth). Parallelism = work/span = O(n/log n), meaning you benefit from up to O(n/log n) processors before hitting the span bottleneck.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::2._Parallel_patterns
                    
                
            
            
        
        
        
            Note 14: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: ?knS$%
            

            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    Why is the FJ work-stealing guarantee \(T_p = O(T_1/p + T_\infty)\) considered asymptotically optimal?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    Why is the FJ work-stealing guarantee \(T_p = O(T_1/p + T_\infty)\) considered asymptotically optimal?
    
    The two terms match the fundamental lower bounds: \(T_p \geq T_1/p\) (work law) and \(T_p \geq T_\infty\) (span law). The FJ scheduler achieves \(O(T_1/p + T_\infty)\), which equals \(O(\max(T_1/p, T_\infty))\) up to a constant factor. No scheduler can do asymptotically better — FJ matches the information-theoretic lower bound.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Why is the FJ work-stealing guarantee \(T_p = O(T_1/p + T_\infty)\) considered asymptotically optimal?
                        
                        
                        
                            Back
                            
                            The two terms match the fundamental lower bounds: \(T_p \geq T_1/p\) (work law) and \(T_p \geq T_\infty\) (span law). The FJ scheduler achieves \(O(T_1/p + T_\infty)\), which equals \(O(\max(T_1/p, T_\infty))\) up to a constant factor. <b>No scheduler can do asymptotically better</b> — FJ matches the information-theoretic lower bound.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
                    
                
            
            
        

        
        
            Note 15: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: @>I0NXoq2&
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    Callable<V> differs from Runnable in that it returns a result of type V and can throw checked exceptions.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    Callable<V> differs from Runnable in that it returns a result of type V and can throw checked exceptions.
    
    
    
    exec.submit(callable) returns a Future<V>. exec.execute(runnable) is fire-and-forget (no return value).
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            {{c1::<code>Callable&lt;V&gt;</code>}} differs from {{c2::<code>Runnable</code>}} in that it {{c3::returns a result of type V and can throw checked exceptions}}.
                        
                        
                        
                            Extra
                            
                            <code>exec.submit(callable)</code> returns a <code>Future&lt;V&gt;</code>. <code>exec.execute(runnable)</code> is fire-and-forget (no return value).
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
                    
                
            
            
        
        
        
            Note 16: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: @G4j5S8izo
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    How can an unbalanced pipeline be fixed, and what is the caveat?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    How can an unbalanced pipeline be fixed, and what is the caveat?
    
    Split the bottleneck (slowest) stage into two or more sub-stages of equal duration. Caveat: too much splitting is impractical — it introduces enormous switching/management overhead.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            How can an unbalanced pipeline be fixed, and what is the caveat?
                        
                        
                        
                            Back
                            
                            Split the bottleneck (slowest) stage into two or more sub-stages of equal duration. Caveat: too much splitting is impractical — it introduces enormous switching/management overhead.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 17: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: @r1TbI1^$~
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    The span \(T_\infty\) in a DAG corresponds to the serial fraction f in Amdahl's Law: the longest chain of sequential dependencies that no amount of additional parallelism can overcome.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    The span \(T_\infty\) in a DAG corresponds to the serial fraction f in Amdahl's Law: the longest chain of sequential dependencies that no amount of additional parallelism can overcome.
    
    
    
    Designing parallel algorithms means decreasing span without increasing work too much — directly equivalent to reducing f in Amdahl's Law.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            The span \(T_\infty\) in a DAG corresponds to {{c1::the serial fraction f in Amdahl's Law}}: the {{c2::longest chain of sequential dependencies}} that no amount of additional parallelism can overcome.
                        
                        
                        
                            Extra
                            
                            Designing parallel algorithms means <b>decreasing span</b> without increasing work too much — directly equivalent to reducing f in Amdahl's Law.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
                    
                
            
            
        
        
        
            Note 18: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: AYPiAlScH$
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Latency of the first element through a pipeline \(= {{c1::\sum_{i} \text{time}(\text{stage}_i)}}\) (sum of all stage times).
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Latency of the first element through a pipeline \(= {{c1::\sum_{i} \text{time}(\text{stage}_i)}}\) (sum of all stage times).
    
    
    
    "Latency" by default means the first element. For a balanced pipeline this equals num_stages × max(stage_time).
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Latency of the <b>first</b> element through a pipeline \(= {{c1::\sum_{i} \text{time}(\text{stage}_i)}}\) (sum of all stage times).
                        
                        
                        
                            Extra
                            
                            "Latency" by default means the first element. For a balanced pipeline this equals num_stages × max(stage_time).
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 19: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: AeyQBG3vzh
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads
  
  
    What is the difference between a race condition and a data race?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads
  
  
    What is the difference between a race condition and a data race?
    
    A data race is a low-level memory event: a shared variable is accessed by two threads, at least one access is a write, and there is no synchronization. A race condition is a higher-level correctness issue: the program's observable behavior depends on execution order. A data race usually causes a race condition, but a race condition can exist even without a data race (e.g. check-then-act with synchronized operations). In Java, data races invoke undefined behavior per the JMM.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What is the difference between a <b>race condition</b> and a <b>data race</b>?
                        
                        
                        
                            Back
                            
                            A <b>data race</b> is a low-level memory event: a shared variable is accessed by two threads, at least one access is a write, and there is no synchronization. A <b>race condition</b> is a higher-level correctness issue: the program's observable behavior depends on execution order. A data race usually causes a race condition, but a race condition can exist even without a data race (e.g. check-then-act with synchronized operations). In Java, data races invoke undefined behavior per the JMM.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::05._Java_Threads
                    
                
            
            
        
        
        
            Note 20: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: E-#>p+?}Ok
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::08._Divide_and_conquer::2._Divide_and_conquer
  
  
    In Fork/Join, each task should ideally perform between 1,000 and 100,000 sequential operations to ensure the task has enough work to offset scheduling overhead.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::08._Divide_and_conquer::2._Divide_and_conquer
  
  
    In Fork/Join, each task should ideally perform between 1,000 and 100,000 sequential operations to ensure the task has enough work to offset scheduling overhead.
    
    
    
    Too small → overhead dominates. Too large → poor load balancing and under-utilization. The exact threshold is empirical.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            In Fork/Join, each task should ideally perform between {{c1::1,000 and 100,000}} sequential operations to ensure {{c2::the task has enough work to offset scheduling overhead}}.
                        
                        
                        
                            Extra
                            
                            Too small → overhead dominates. Too large → poor load balancing and under-utilization. The exact threshold is empirical.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::08._Divide_and_conquer::2._Divide_and_conquer
                    
                
            
            
        
        
        
            Note 21: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: EdVyAbz-qv
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
  
  
    java.util.concurrent.atomic.AtomicInteger provides atomic compound operations such as getAndIncrement() and compareAndSet(expected, update) without requiring synchronized.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
  
  
    java.util.concurrent.atomic.AtomicInteger provides atomic compound operations such as getAndIncrement() and compareAndSet(expected, update) without requiring synchronized.
    
    
    
    Implemented via hardware CAS (compare-and-swap). Faster than synchronized for simple counters. Part of java.util.concurrent.atomic.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            <code>java.util.concurrent.atomic.{{c1::AtomicInteger}}</code> provides {{c2::atomic compound operations}} such as <code>{{c3::getAndIncrement()}}</code> and <code>{{c4::compareAndSet(expected, update)}}</code> without requiring <code>synchronized</code>.
                        
                        
                        
                            Extra
                            
                            Implemented via hardware CAS (compare-and-swap). Faster than <code>synchronized</code> for simple counters. Part of <code>java.util.concurrent.atomic</code>.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
                    
                
            
            
        
        
        
            Note 22: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: F7x(&s]1kQ
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    What does the upper bound \(T_p \leq T_1/p + O(T_\infty)\) tell us about scheduling?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    What does the upper bound \(T_p \leq T_1/p + O(T_\infty)\) tell us about scheduling?
    
    A good scheduler can always achieve time within a constant factor of the ideal. The two terms represent two different sources of time: \(T_1/p\) is the work-limited regime, and \(O(T_\infty)\) is the unavoidable sequential overhead from dependencies (span-limited regime). Their sum is an upper bound — a scheduler can't be much worse than their sum.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What does the upper bound \(T_p \leq T_1/p + O(T_\infty)\) tell us about scheduling?
                        
                        
                        
                            Back
                            
                            A <b>good scheduler</b> can always achieve time within a constant factor of the ideal. The two terms represent two different sources of time: \(T_1/p\) is the work-limited regime, and \(O(T_\infty)\) is the unavoidable sequential overhead from dependencies (span-limited regime). Their sum is an upper bound — a scheduler can't be much worse than their sum.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
                    
                
            
            
        
        
        
            Note 23: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: GH&YPQRsTh
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Lead-in and lead-out reduce effective throughput below the theoretical bound for a finite number of elements N. Their impact is largest when N is small (short pipelines).
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Lead-in and lead-out reduce effective throughput below the theoretical bound for a finite number of elements N. Their impact is largest when N is small (short pipelines).
    
    
    
    Example: for a finite run, effective throughput was 0.7 while the infinite-stream bound was 1. As N → ∞, effective throughput approaches 1/max(stage_time).
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Lead-in and lead-out {{c1::reduce effective throughput below the theoretical bound}} for {{c2::a finite number of elements N}}. Their impact is largest when {{c3::N is small (short pipelines)}}.
                        
                        
                        
                            Extra
                            
                            Example: for a finite run, effective throughput was 0.7 while the infinite-stream bound was 1. As N → ∞, effective throughput approaches 1/max(stage_time).
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 24: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: H*>f
            

            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
  
  
    What advantages does ReentrantLock offer over synchronized?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
  
  
    What advantages does ReentrantLock offer over synchronized?
    
    tryLock() — non-blocking attempt (returns boolean)
tryLock(timeout, unit) — timed lock attempt
lockInterruptibly() — interruptible while waiting
Multiple Condition objects per lock (vs. one monitor per synchronized object)
More flexible critical section shapes (lock and unlock in different scopes)
⚠️ Must always call unlock() in a finally block to avoid leaked locks.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What advantages does <code>ReentrantLock</code> offer over <code>synchronized</code>?
                        
                        
                        
                            Back
                            
                            <ul><li><code>tryLock()</code> — non-blocking attempt (returns boolean)</li><li><code>tryLock(timeout, unit)</code> — timed lock attempt</li><li><code>lockInterruptibly()</code> — interruptible while waiting</li><li>Multiple <code>Condition</code> objects per lock (vs. one monitor per <code>synchronized</code> object)</li><li>More flexible critical section shapes (lock and unlock in different scopes)</li></ul>⚠️ Must always call <code>unlock()</code> in a <code>finally</code> block to avoid leaked locks.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
                    
                
            
            
        

        
        
            Note 25: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: KD1&||&6xW
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    Why is \(T_p \geq T_1/p\) a strict lower bound — why can no scheduler beat it?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    Why is \(T_p \geq T_1/p\) a strict lower bound — why can no scheduler beat it?
    
    \(T_1\) is the total work (sum of all node costs in the DAG). With p processors, at most p units of work can be done per time step. Therefore the minimum time to do \(T_1\) units of work is \(T_1/p\). This is a counting argument — independent of the scheduler's strategy.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Why is \(T_p \geq T_1/p\) a strict lower bound — why can no scheduler beat it?
                        
                        
                        
                            Back
                            
                            \(T_1\) is the total work (sum of all node costs in the DAG). With p processors, at most p units of work can be done per time step. Therefore the minimum time to do \(T_1\) units of work is \(T_1/p\). This is a counting argument — independent of the scheduler's strategy.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
                    
                
            
            
        
        
        
            Note 26: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: KqO7gK}Fu/
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll
  
  
    When should you prefer notifyAll() over notify()?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll
  
  
    When should you prefer notifyAll() over notify()?
    
    Prefer notifyAll() when multiple threads may be waiting and they have different conditions to wait for. notify() wakes only one (arbitrary) thread — if it wakes a thread whose condition is still false, it goes back to sleep and no progress is made. notifyAll() wakes all waiters; each re-checks its condition in its while loop. Cost: more context switches. Safe default: always use notifyAll() unless you have a single-condition, single-consumer pattern.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            When should you prefer <code>notifyAll()</code> over <code>notify()</code>?
                        
                        
                        
                            Back
                            
                            Prefer <code>notifyAll()</code> when <b>multiple threads</b> may be waiting and they have <b>different conditions</b> to wait for. <code>notify()</code> wakes only one (arbitrary) thread — if it wakes a thread whose condition is still false, it goes back to sleep and no progress is made. <code>notifyAll()</code> wakes all waiters; each re-checks its condition in its <code>while</code> loop. Cost: more context switches. Safe default: always use <code>notifyAll()</code> unless you have a single-condition, single-consumer pattern.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll
                    
                
            
            
        
        
        
            Note 27: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: Lj=X5W/02X
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    What does each node store after the up pass (reduce pass) of the parallel prefix-sum algorithm?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    What does each node store after the up pass (reduce pass) of the parallel prefix-sum algorithm?
    
    Each internal node stores the sum of all leaves in its subtree. Leaves store the original values. This pass is a parallel tree reduction — done bottom-up with O(n) work and O(log n) span.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What does each node store after the <b>up pass</b> (reduce pass) of the parallel prefix-sum algorithm?
                        
                        
                        
                            Back
                            
                            Each internal node stores the <b>sum of all leaves in its subtree</b>. Leaves store the original values. This pass is a parallel tree reduction — done bottom-up with O(n) work and O(log n) span.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
                    
                
            
            
        
        
        
            Note 28: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: NLVo5v1@Dy
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    Pack (also called filter) takes a collection and a predicate and returns a new array containing only the elements satisfying the predicate, preserving their original relative order.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    Pack (also called filter) takes a collection and a predicate and returns a new array containing only the elements satisfying the predicate, preserving their original relative order.
    
    
    
    Example: pack([3,1,4,1,5], x > 2) → [3,4,5]. This is the parallel equivalent of a sequential filter loop.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            {{c1::Pack}} (also called filter) takes a collection and a predicate and returns {{c2::a new array containing only the elements satisfying the predicate, preserving their original relative order}}.
                        
                        
                        
                            Extra
                            
                            Example: pack([3,1,4,1,5], x &gt; 2) → [3,4,5]. This is the parallel equivalent of a sequential filter loop.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
                    
                
            
            
        
        
        
            Note 29: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: Nc)fu4)D&q
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll
  
  
    Calling wait(), notify(), or notifyAll() on an object without holding its lock throws IllegalMonitorStateException at runtime.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll
  
  
    Calling wait(), notify(), or notifyAll() on an object without holding its lock throws IllegalMonitorStateException at runtime.
    
    
    
    The check is runtime-only — the compiler does not catch this. Always ensure these calls are inside a synchronized(obj) block or a synchronized method on the same object.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Calling <code>wait()</code>, <code>notify()</code>, or <code>notifyAll()</code> on an object without holding its lock throws {{c1::<code>IllegalMonitorStateException</code>}} at runtime.
                        
                        
                        
                            Extra
                            
                            The check is runtime-only — the compiler does not catch this. Always ensure these calls are inside a <code>synchronized(obj)</code> block or a <code>synchronized</code> method on the same object.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll
                    
                
            
            
        
        
        
            Note 30: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: O>b+4REo=(
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::2._Instruction_Level_Parallelism
  
  
    Instruction-level parallelism (ILP) uses two techniques: dependency analysis (execute independent instructions in parallel, reorder to keep the CPU busy) and speculative execution (execute a branch before the condition is resolved, roll back if wrong).
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::2._Instruction_Level_Parallelism
  
  
    Instruction-level parallelism (ILP) uses two techniques: dependency analysis (execute independent instructions in parallel, reorder to keep the CPU busy) and speculative execution (execute a branch before the condition is resolved, roll back if wrong).
    
    
    
    ILP happens on a single thread — no concurrency. It is hardware-dependent (requires superscalar execution units).
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Instruction-level parallelism (ILP) uses two techniques: {{c1::dependency analysis}} (execute independent instructions in parallel, reorder to keep the CPU busy) and {{c2::speculative execution}} (execute a branch before the condition is resolved, roll back if wrong).
                        
                        
                        
                            Extra
                            
                            ILP happens on a <b>single thread</b> — no concurrency. It is hardware-dependent (requires superscalar execution units).
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::2._Instruction_Level_Parallelism
                    
                
            
            
        
        
        
            Note 31: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: OG4szVeI~<
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    Why does parallel prefix sum require two passes (up then down) rather than one?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    Why does parallel prefix sum require two passes (up then down) rather than one?
    
    The up pass computes subtree sums — it doesn't know what came "before" each element globally. The down pass distributes accumulated sums from the root downward, so each leaf learns the total sum of all elements to its left. No single pass can do both simultaneously because each node needs both its subtree total (from children) and the prefix from its ancestors.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Why does parallel prefix sum require two passes (up then down) rather than one?
                        
                        
                        
                            Back
                            
                            The up pass computes subtree sums — it doesn't know what came "before" each element globally. The down pass distributes accumulated sums from the root downward, so each leaf learns the total sum of all elements to its left. No single pass can do both simultaneously because each node needs both its subtree total (from children) and the prefix from its ancestors.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
                    
                
            
            
        
        
        
            Note 32: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: Q$hlo>kuR^
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Latency of the n-th element in an unbalanced pipeline \(= {{c1::\text{latency}_1 + \bigl(\max(\text{stage\_time}) - \text{time}(\text{stage}_1)\bigr) \cdot (n-1)}}\)
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Latency of the n-th element in an unbalanced pipeline \(= {{c1::\text{latency}_1 + \bigl(\max(\text{stage\_time}) - \text{time}(\text{stage}_1)\bigr) \cdot (n-1)}}\)
    
    
    
    Latency grows linearly with n when the pipeline is unbalanced — elements pile up at the bottleneck. If balanced, this simplifies to latency₁ (constant).
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Latency of the <b>n-th</b> element in an unbalanced pipeline \(= {{c1::\text{latency}_1 + \bigl(\max(\text{stage\_time}) - \text{time}(\text{stage}_1)\bigr) \cdot (n-1)}}\)
                        
                        
                        
                            Extra
                            
                            Latency grows linearly with n when the pipeline is unbalanced — elements pile up at the bottleneck. If balanced, this simplifies to latency₁ (constant).
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 33: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: Q({*qU*f^i
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    What are the lead-in and lead-out phases of a pipeline?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    What are the lead-in and lead-out phases of a pipeline?
    
    Lead-in (warm-up): the pipeline is filling — earlier stages have started but later stages are still idle. Not all stages are doing useful work yet.
Lead-out (cool-down): the pipeline is draining — earlier stages have finished but later stages are still processing. Earlier stages become idle.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What are the <b>lead-in</b> and <b>lead-out</b> phases of a pipeline?
                        
                        
                        
                            Back
                            
                            <b>Lead-in</b> (warm-up): the pipeline is filling — earlier stages have started but later stages are still idle. Not all stages are doing useful work yet.<br><b>Lead-out</b> (cool-down): the pipeline is draining — earlier stages have finished but later stages are still processing. Earlier stages become idle.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 34: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: QiCwzwdx4X
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::4._Java's_ForkJoin_framework
  
  
    In Java ForkJoin, RecursiveTask<V> is used when the task returns a result of type V, while RecursiveAction is used when no result is returned (void).
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::4._Java's_ForkJoin_framework
  
  
    In Java ForkJoin, RecursiveTask<V> is used when the task returns a result of type V, while RecursiveAction is used when no result is returned (void).
    
    
    
    Both extend ForkJoinTask and override compute(). Use RecursiveTask for sum/merge, RecursiveAction for in-place sort.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            In Java ForkJoin, {{c1::<code>RecursiveTask&lt;V&gt;</code>}} is used when the task {{c2::returns a result of type V}}, while {{c3::<code>RecursiveAction</code>}} is used when {{c4::no result is returned (void)}}.
                        
                        
                        
                            Extra
                            
                            Both extend <code>ForkJoinTask</code> and override <code>compute()</code>. Use <code>RecursiveTask</code> for sum/merge, <code>RecursiveAction</code> for in-place sort.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::09._Divide_and_conquer::4._Java's_ForkJoin_framework
                    
                
            
            
        
        
        
            Note 35: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: RJC9+1Wgmi
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    exec.shutdown() stops accepting new tasks but lets already-submitted tasks finish. exec.shutdownNow() attempts to interrupt all running tasks and returns the list of unstarted tasks.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    exec.shutdown() stops accepting new tasks but lets already-submitted tasks finish. exec.shutdownNow() attempts to interrupt all running tasks and returns the list of unstarted tasks.
    
    
    
    Use awaitTermination(timeout, unit) after shutdown() to block until all tasks complete or the timeout expires.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            <code>exec.{{c1::shutdown()}}</code> {{c2::stops accepting new tasks but lets already-submitted tasks finish}}. <code>exec.{{c3::shutdownNow()}}</code> {{c4::attempts to interrupt all running tasks and returns the list of unstarted tasks}}.
                        
                        
                        
                            Extra
                            
                            Use <code>awaitTermination(timeout, unit)</code> after <code>shutdown()</code> to block until all tasks complete or the timeout expires.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
                    
                
            
            
        
        
        
            Note 36: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: TAzr]%>^o%
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    Name the four standard ExecutorService pool types and their key characteristics.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    Name the four standard ExecutorService pool types and their key characteristics.
    
    newFixedThreadPool(n) — fixed n threads; queues excess tasks
newSingleThreadExecutor() — exactly 1 thread; tasks execute sequentially
newCachedThreadPool() — elastic; creates threads on demand, reuses idle ones
newScheduledThreadPool(n) — supports delayed and periodic tasks
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Name the four standard <code>ExecutorService</code> pool types and their key characteristics.
                        
                        
                        
                            Back
                            
                            <ul><li><code>newFixedThreadPool(n)</code> — fixed n threads; queues excess tasks</li><li><code>newSingleThreadExecutor()</code> — exactly 1 thread; tasks execute sequentially</li><li><code>newCachedThreadPool()</code> — elastic; creates threads on demand, reuses idle ones</li><li><code>newScheduledThreadPool(n)</code> — supports delayed and periodic tasks</li></ul>
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
                    
                
            
            
        
        
        
            Note 37: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: Tc|J+GdHu-
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
  
  
    The Java volatile keyword guarantees visibility — every read of a volatile field sees the most recent write by any thread — but does not guarantee atomicity of compound operations (e.g. i++).
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
  
  
    The Java volatile keyword guarantees visibility — every read of a volatile field sees the most recent write by any thread — but does not guarantee atomicity of compound operations (e.g. i++).
    
    
    
    Use volatile for simple flags (e.g. volatile boolean running). For compound operations, use synchronized or AtomicInteger.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            The Java <code>volatile</code> keyword guarantees {{c1::visibility}} — every read of a volatile field sees the {{c2::most recent write by any thread}} — but does <b>not</b> guarantee {{c3::atomicity of compound operations (e.g. i++)}}.
                        
                        
                        
                            Extra
                            
                            Use <code>volatile</code> for simple flags (e.g. <code>volatile boolean running</code>). For compound operations, use <code>synchronized</code> or <code>AtomicInteger</code>.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
                    
                
            
            
        
        
        
            Note 38: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: U/}jSkoobw
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    Is it possible for \(T_p < T_1/p\)? What does this mean and what causes it?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    Is it possible for \(T_p < T_1/p\)? What does this mean and what causes it?
    
    Yes — this is super-linear speedup (\(S_p > p\)). Causes: (1) Cache effects — smaller per-thread working sets fit in L1/L2 cache, making memory access faster than in the sequential run which had cache misses. (2) Algorithmic differences — the parallel version may explore the problem space differently. Not considered 'magic' — it has a real explanation.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Is it possible for \(T_p < T_1/p\)? What does this mean and what causes it?
                        
                        
                        
                            Back
                            
                            Yes — this is <b>super-linear speedup</b> (\(S_p > p\)). Causes: (1) <b>Cache effects</b> — smaller per-thread working sets fit in L1/L2 cache, making memory access faster than in the sequential run which had cache misses. (2) <b>Algorithmic differences</b> — the parallel version may explore the problem space differently. Not considered 'magic' — it has a real explanation.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
                    
                
            
            
        
        
        
            Note 39: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: VOg3W+qe2?
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::2._synchronized
  
  
    If an exception is thrown inside a synchronized block, the lock is automatically released when the exception propagates out of the block.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::2._synchronized
  
  
    If an exception is thrown inside a synchronized block, the lock is automatically released when the exception propagates out of the block.
    
    
    
    This is because synchronized is structured (unlike ReentrantLock which requires explicit finally). The JVM guarantees lock release on any exit from the block, including exceptions.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            If an exception is thrown inside a <code>synchronized</code> block, the lock is {{c1::automatically released}} when the exception propagates out of the block.
                        
                        
                        
                            Extra
                            
                            This is because <code>synchronized</code> is structured (unlike <code>ReentrantLock</code> which requires explicit <code>finally</code>). The JVM guarantees lock release on any exit from the block, including exceptions.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::05._Java_Threads::2._synchronized
                    
                
            
            
        
        
        
            Note 40: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: WSSC!D36#|
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    What is a pipeline (in CPU instruction execution)?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    What is a pipeline (in CPU instruction execution)?
    
    A pipeline breaks instruction execution into multiple stages (e.g. fetch, decode, execute, write-back) that are overlapped: while stage k processes instruction n+1, stage k+1 processes instruction n. Multiple instructions are in-flight simultaneously — like a factory assembly line.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What is a pipeline (in CPU instruction execution)?
                        
                        
                        
                            Back
                            
                            A pipeline breaks instruction execution into multiple stages (e.g. fetch, decode, execute, write-back) that are overlapped: while stage k processes instruction n+1, stage k+1 processes instruction n. Multiple instructions are in-flight simultaneously — like a factory assembly line.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 41: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: XKz4m$d8-k
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    Under Gustafson's Law, \(\lim_{P \to \infty} S_P = \infty\), because \(S_P = P - f(P-1)\) grows linearly with P (assuming f < 1).
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    Under Gustafson's Law, \(\lim_{P \to \infty} S_P = \infty\), because \(S_P = P - f(P-1)\) grows linearly with P (assuming f < 1).
    
    
    
    This does not contradict Amdahl — Gustafson scales the problem size with P (fixed wall time), so the serial fraction f is a fixed fraction of the larger workload, not a fixed amount of work.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Under Gustafson's Law, \(\lim_{P \to \infty} S_P = {{c1::\infty}}\), because {{c2::\(S_P = P - f(P-1)\) grows linearly with P}} (assuming f &lt; 1).
                        
                        
                        
                            Extra
                            
                            This does not contradict Amdahl — Gustafson scales the <b>problem size</b> with P (fixed wall time), so the serial fraction f is a fixed fraction of the larger workload, not a fixed amount of work.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
                    
                
            
            
        
        
        
            Note 42: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: XZFu@~1-gN
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll
  
  
    After calling notifyAll(), the calling thread does not release the lock immediately — it only releases it when it exits the synchronized block or calls wait().
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll
  
  
    After calling notifyAll(), the calling thread does not release the lock immediately — it only releases it when it exits the synchronized block or calls wait().
    
    
    
    All woken threads move to RUNNABLE state and compete for the lock, but only one can acquire it at a time. The notifying thread still holds the lock until it finishes its synchronized section.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            After calling <code>notifyAll()</code>, the calling thread {{c1::does not release the lock immediately}} — it only releases it when {{c2::it exits the <code>synchronized</code> block or calls <code>wait()</code>}}.
                        
                        
                        
                            Extra
                            
                            All woken threads move to RUNNABLE state and compete for the lock, but only one can acquire it at a time. The notifying thread still holds the lock until it finishes its synchronized section.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::05._Java_Threads::3._Wait,_Notify,_NotifyAll
                    
                
            
            
        
        
        
            Note 43: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: Y!RdYt+6Bf
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    The Fork/Join work-stealing guarantee \(T_p = O(T_1/p + T_\infty)\) is an expected-time guarantee — it holds in expectation (on average) due to randomized work stealing, not in the worst case.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    The Fork/Join work-stealing guarantee \(T_p = O(T_1/p + T_\infty)\) is an expected-time guarantee — it holds in expectation (on average) due to randomized work stealing, not in the worst case.
    
    
    
    The randomness comes from which victim thread is chosen when stealing. In the worst case a bad sequence of steals is possible, but the expected cost is O(T₁/p + T∞).
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            The Fork/Join work-stealing guarantee \(T_p = {{c2::O(T_1/p + T_\infty)}}\) is an {{c1::expected-time}} guarantee — it holds in expectation (on average) due to randomized work stealing, not in the worst case.
                        
                        
                        
                            Extra
                            
                            The randomness comes from which victim thread is chosen when stealing. In the worst case a bad sequence of steals is possible, but the expected cost is O(T₁/p + T∞).
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
                    
                
            
            
        
        
        
            Note 44: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: [+?JtoU/P6
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::4._Java's_ForkJoin_framework
  
  
    How does the Fork/Join work-stealing scheduler work?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::4._Java's_ForkJoin_framework
  
  
    How does the Fork/Join work-stealing scheduler work?
    
    Each worker thread has its own deque (double-ended queue) of tasks. It processes its own tasks LIFO from the front. When a thread runs out of work, it steals a task from the back of another thread's deque (the oldest = largest task). This keeps all threads busy and achieves the O(T₁/p + T∞) expected-time guarantee.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            How does the Fork/Join work-stealing scheduler work?
                        
                        
                        
                            Back
                            
                            Each worker thread has its own <b>deque</b> (double-ended queue) of tasks. It processes its own tasks <b>LIFO</b> from the front. When a thread runs out of work, it <b>steals</b> a task from the <b>back</b> of another thread's deque (the oldest = largest task). This keeps all threads busy and achieves the O(T₁/p + T∞) expected-time guarantee.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::09._Divide_and_conquer::4._Java's_ForkJoin_framework
                    
                
            
            
        
        
        
            Note 45: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: [SF)]6&aQp
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    exec.invokeAll(tasks) submits all tasks and blocks until all complete, returning a list of Futures. exec.invokeAny(tasks) returns the result of the first task to complete successfully.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    exec.invokeAll(tasks) submits all tasks and blocks until all complete, returning a list of Futures. exec.invokeAny(tasks) returns the result of the first task to complete successfully.
    
    
    
    invokeAll → wait for all. invokeAny → wait for any one. Both submit all tasks to the pool regardless.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            <code>exec.{{c1::invokeAll(tasks)}}</code> submits all tasks and {{c2::blocks until all complete}}, returning a list of Futures. <code>exec.{{c3::invokeAny(tasks)}}</code> returns {{c4::the result of the first task to complete successfully}}.
                        
                        
                        
                            Extra
                            
                            invokeAll → wait for <i>all</i>. invokeAny → wait for <i>any one</i>. Both submit all tasks to the pool regardless.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
                    
                
            
            
        
        
        
            Note 46: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: [e+1vU[=c8
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    The ExecutorService does not guarantee the order in which submitted tasks are executed.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    The ExecutorService does not guarantee the order in which submitted tasks are executed.
    
    
    
    A FixedThreadPool uses a FIFO queue internally, but execution order depends on thread availability and OS scheduling. Do not rely on ordering unless using invokeAll or explicit dependencies.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            The ExecutorService does <b>not</b> guarantee {{c1::the order in which submitted tasks are executed}}.
                        
                        
                        
                            Extra
                            
                            A FixedThreadPool uses a FIFO queue internally, but execution order depends on thread availability and OS scheduling. Do not rely on ordering unless using invokeAll or explicit dependencies.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
                    
                
            
            
        
        
        
            Note 47: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: [nTh4x4&-6
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    What are the three steps of the parallel Pack algorithm?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    What are the three steps of the parallel Pack algorithm?
    
    Flag: Compute a boolean/flag array F where F[i]=1 if element i satisfies the predicate, else 0.
Prefix sum: Compute the exclusive prefix sum of F — gives the output index for each passing element.
Scatter: For each i with F[i]=1, copy A[i] to output[prefix_sum[i]].
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What are the three steps of the parallel Pack algorithm?
                        
                        
                        
                            Back
                            
                            <ol><li><b>Flag</b>: Compute a boolean/flag array F where F[i]=1 if element i satisfies the predicate, else 0.</li><li><b>Prefix sum</b>: Compute the exclusive prefix sum of F — gives the output index for each passing element.</li><li><b>Scatter</b>: For each i with F[i]=1, copy A[i] to <code>output[prefix_sum[i]]</code>.</li></ol>
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
                    
                
            
            
        
        
        
            Note 48: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: ]axf
            

            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    What is the fundamental difference in assumption between Amdahl's Law and Gustafson's Law?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    What is the fundamental difference in assumption between Amdahl's Law and Gustafson's Law?
    
    Amdahl's Law: fixed problem size — asks "how much faster can we solve the same problem with more cores?" Pessimistic: speedup is bounded by 1/f.
Gustafson's Law: fixed wall-clock time — asks \"how much more work can we do in the same time with more cores?\" Optimistic: speedup grows as \(P - f(P-1)\), unbounded as \(P \to \infty\).
Key insight: they answer different questions — not contradictory.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What is the fundamental difference in assumption between Amdahl's Law and Gustafson's Law?
                        
                        
                        
                            Back
                            
                            <ul><li><b>Amdahl's Law</b>: fixed problem size — asks "how much faster can we solve the same problem with more cores?" Pessimistic: speedup is bounded by 1/f.</li><li><b>Gustafson's Law</b>: fixed wall-clock time — asks \"how much more work can we do in the same time with more cores?\" Optimistic: speedup grows as \(P - f(P-1)\), unbounded as \(P \to \infty\).</li></ul>Key insight: they answer different questions — not contradictory.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
                    
                
            
            
        

        
        
            Note 49: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: ^+Co39jxRN
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    A pipeline is balanced if all stages take the same amount of time.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    A pipeline is balanced if all stages take the same amount of time.
    
    
    
    When balanced: throughput is maximised and latency is constant across all elements. When unbalanced: the slowest stage bottlenecks throughput and latency grows with each element.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            A pipeline is {{c1::balanced}} if {{c2::all stages take the same amount of time}}.
                        
                        
                        
                            Extra
                            
                            When balanced: throughput is maximised and latency is constant across all elements. When unbalanced: the slowest stage bottlenecks throughput and latency grows with each element.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 50: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: c9@eiL*Ug=
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    What does an exclusive parallel prefix sum compute?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    What does an exclusive parallel prefix sum compute?
    
    For input array A[0..n-1], it produces output B where \(B[i] = \sum_{j=0}^{i-1} A[j]\) (sum of all elements before index i). So B[0] = 0 always.
Example: A = [3,1,4,1,5] → B = [0,3,4,8,9].
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What does an <b>exclusive parallel prefix sum</b> compute?
                        
                        
                        
                            Back
                            
                            For input array A[0..n-1], it produces output B where \(B[i] = \sum_{j=0}^{i-1} A[j]\) (sum of all elements <i>before</i> index i). So B[0] = 0 always.<br>Example: A = [3,1,4,1,5] → B = [0,3,4,8,9].
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
                    
                
            
            
        
        
        
            Note 51: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: eib*6wnIWb
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::2._Parallel_patterns
  
  
    Fill in the work, span, and parallelism for standard parallel patterns on n elements:
Map / Reduction / Prefix sum / Pack
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::2._Parallel_patterns
  
  
    Fill in the work, span, and parallelism for standard parallel patterns on n elements:
Map / Reduction / Prefix sum / Pack
    
    All four achieve: Work \(O(n)\), Span \(O(\log n)\), Parallelism \(O(n/\log n)\) via divide-and-conquer trees. Note: these assume arrays. Using linked lists instead degrades span to O(n).
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Fill in the work, span, and parallelism for standard parallel patterns on n elements:<br>Map / Reduction / Prefix sum / Pack
                        
                        
                        
                            Back
                            
                            All four achieve: <b>Work</b> \(O(n)\), <b>Span</b> \(O(\log n)\), <b>Parallelism</b> \(O(n/\log n)\) via divide-and-conquer trees. Note: these assume arrays. Using linked lists instead degrades span to O(n).
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::2._Parallel_patterns
                    
                
            
            
        
        
        
            Note 52: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: fG2/O&24Z-
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::03._Java_Threads::6._Interrupts
  
  
    What happens to a thread's interrupt flag after it catches an InterruptedException?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::03._Java_Threads::6._Interrupts
  
  
    What happens to a thread's interrupt flag after it catches an InterruptedException?
    
    The flag is automatically cleared (set to false). If you catch the exception and do nothing, the thread behaves as if it was never interrupted. Best practices: either re-interrupt (Thread.currentThread().interrupt()) or propagate the exception upward.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What happens to a thread's interrupt flag after it catches an <code>InterruptedException</code>?
                        
                        
                        
                            Back
                            
                            The flag is <b>automatically cleared</b> (set to false). If you catch the exception and do nothing, the thread behaves as if it was never interrupted. Best practices: either re-interrupt (<code>Thread.currentThread().interrupt()</code>) or propagate the exception upward.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::03._Java_Threads::6._Interrupts
                    
                
            
            
        
        
        
            Note 53: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: i/wHwua}M}
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads
  
  
    Why is the following code a race condition even if map is a thread-safe ConcurrentHashMap?
if (!map.containsKey(key)) { map.put(key, value); }
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads
  
  
    Why is the following code a race condition even if map is a thread-safe ConcurrentHashMap?
if (!map.containsKey(key)) { map.put(key, value); }
    
    The check (containsKey) and the act (put) are two separate atomic operations — another thread can put the same key between them. This is the check-then-act anti-pattern. Fix: use map.putIfAbsent(key, value) (single atomic operation) or protect the whole block with synchronized.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Why is the following code a race condition even if <code>map</code> is a thread-safe <code>ConcurrentHashMap</code>?<br><pre>if (!map.containsKey(key)) { map.put(key, value); }</pre>
                        
                        
                        
                            Back
                            
                            The <b>check</b> (<code>containsKey</code>) and the <b>act</b> (<code>put</code>) are two separate atomic operations — another thread can <code>put</code> the same key between them. This is the <b>check-then-act</b> anti-pattern. Fix: use <code>map.putIfAbsent(key, value)</code> (single atomic operation) or protect the whole block with <code>synchronized</code>.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::05._Java_Threads
                    
                
            
            
        
        
        
            Note 54: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: iy@XpW4n&4
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    When does the work law (\(T_p \geq T_1/p\)) dominate vs. the span law (\(T_p \geq T_\infty\))?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    When does the work law (\(T_p \geq T_1/p\)) dominate vs. the span law (\(T_p \geq T_\infty\))?
    
    Work law dominates when p is small — not enough workers to exploit available parallelism. Span law dominates when p is large — adding more processors no longer helps because \(T_\infty\) is the bottleneck. The crossover is at \(p = T_1/T_\infty\) (the parallelism of the DAG).
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            When does the work law (\(T_p \geq T_1/p\)) dominate vs. the span law (\(T_p \geq T_\infty\))?
                        
                        
                        
                            Back
                            
                            <b>Work law dominates</b> when p is small — not enough workers to exploit available parallelism. <b>Span law dominates</b> when p is large — adding more processors no longer helps because \(T_\infty\) is the bottleneck. The crossover is at \(p = T_1/T_\infty\) (the parallelism of the DAG).
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
                    
                
            
            
        
        
        
            Note 55: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: l!G8RlO+dn
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    future.get() blocks the calling thread until the submitted task completes and returns its result.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
  
  
    future.get() blocks the calling thread until the submitted task completes and returns its result.
    
    
    
    future.get(timeout, unit) throws TimeoutException if the task does not finish in time. future.cancel(true) attempts to interrupt the running task.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            {{c1::<code>future.get()</code>}} {{c2::blocks}} the calling thread until the submitted task completes and returns its result.
                        
                        
                        
                            Extra
                            
                            <code>future.get(timeout, unit)</code> throws <code>TimeoutException</code> if the task does not finish in time. <code>future.cancel(true)</code> attempts to interrupt the running task.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::09._Divide_and_conquer::3._Executor_Service
                    
                
            
            
        
        
        
            Note 56: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: m7O|KIKqyC
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    Under Amdahl's Law, as \(P \to \infty\), efficiency \(E = S_P / P\) approaches 0, because speedup is bounded by \(1/f\) while P grows without bound.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    Under Amdahl's Law, as \(P \to \infty\), efficiency \(E = S_P / P\) approaches 0, because speedup is bounded by \(1/f\) while P grows without bound.
    
    
    
    Even if you get the maximum speedup \(1/f\), dividing by \(P \to \infty\) drives efficiency to zero. More cores are increasingly wasted.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Under Amdahl's Law, as \(P \to \infty\), efficiency \(E = S_P / P\) {{c1::approaches 0}}, because {{c2::speedup is bounded by \(1/f\) while P grows without bound}}.
                        
                        
                        
                            Extra
                            
                            Even if you get the maximum speedup \(1/f\), dividing by \(P \to \infty\) drives efficiency to zero. More cores are increasingly wasted.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
                    
                
            
            
        
        
        
            Note 57: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: m]{8XT!DLG
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Pipeline throughput for N finite elements \(= {{c1::\dfrac{N}{\text{total time for } N \text{ elements}}}}\)
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Pipeline throughput for N finite elements \(= {{c1::\dfrac{N}{\text{total time for } N \text{ elements}}}}\)
    
    
    
    As \(N \to \infty\), this converges to \(1/\max_i(\text{stage\_time}_i)\). For small N, lead-in/lead-out reduce it below the theoretical bound.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            Pipeline throughput for N finite elements \(= {{c1::\dfrac{N}{\text{total time for } N \text{ elements}}}}\)
                        
                        
                        
                            Extra
                            
                            As \(N \to \infty\), this converges to \(1/\max_i(\text{stage\_time}_i)\). For small N, lead-in/lead-out reduce it below the theoretical bound.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 58: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: nvfkOuvf3v
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    Compare work/span/parallelism for two parallel quicksort strategies:
1. Parallelize only the recursive calls
2. Also parallelize the partition step (via pack)
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    Compare work/span/parallelism for two parallel quicksort strategies:
1. Parallelize only the recursive calls
2. Also parallelize the partition step (via pack)
    
    Parallel recursive calls only: Work \(O(n \log n)\), Span \(O(n)\), Parallelism \(O(\log n)\)
+ parallel partition (pack): Work \(O(n \log n)\), Span \(O(\log^2 n)\), Parallelism \(O(n/\log n)\)
Parallelizing the partition reduces span from \(O(n)\) to \(O(\log^2 n)\), dramatically increasing parallelism.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Compare work/span/parallelism for two parallel quicksort strategies:<br>1. Parallelize only the recursive calls<br>2. Also parallelize the partition step (via pack)
                        
                        
                        
                            Back
                            
                            <ol><li>Parallel recursive calls only: Work \(O(n \log n)\), Span \(O(n)\), Parallelism \(O(\log n)\)</li><li>+ parallel partition (pack): Work \(O(n \log n)\), Span \(O(\log^2 n)\), Parallelism \(O(n/\log n)\)</li></ol>Parallelizing the partition reduces span from \(O(n)\) to \(O(\log^2 n)\), dramatically increasing parallelism.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
                    
                
            
            
        
        
        
            Note 59: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: n~&W/iS)c^
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    Why is \(T_p \geq T_\infty\) a strict lower bound — why can no scheduler beat it?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    Why is \(T_p \geq T_\infty\) a strict lower bound — why can no scheduler beat it?
    
    \(T_\infty\) is the length of the critical path — a chain of nodes where each depends on the previous. Even with infinite processors, these nodes must execute sequentially. No amount of parallelism can compress a dependency chain.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Why is \(T_p \geq T_\infty\) a strict lower bound — why can no scheduler beat it?
                        
                        
                        
                            Back
                            
                            \(T_\infty\) is the length of the <b>critical path</b> — a chain of nodes where each depends on the previous. Even with infinite processors, these nodes must execute <b>sequentially</b>. No amount of parallelism can compress a dependency chain.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
                    
                
            
            
        
        
        
            Note 60: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: o5rCKq0mpv
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    What causes super-linear speedup (S_p > p), and is it "real"?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
  
  
    What causes super-linear speedup (S_p > p), and is it "real"?
    
    Yes — it can occur in practice. Most common cause: cache effects — with p processors, each processor's working set is smaller and fits in faster cache (L1/L2). This makes per-processor work genuinely faster than the single-processor baseline. Also possible via better branch prediction or algorithmic differences in the parallel version.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What causes super-linear speedup (S_p &gt; p), and is it "real"?
                        
                        
                        
                            Back
                            
                            Yes — it can occur in practice. Most common cause: <b>cache effects</b> — with p processors, each processor's working set is smaller and fits in faster cache (L1/L2). This makes per-processor work genuinely faster than the single-processor baseline. Also possible via better branch prediction or algorithmic differences in the parallel version.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::07._Concepts::2._Parallel_performance
                    
                
            
            
        
        
        
            Note 61: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: pogaT0kdC(
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    What happens during the down pass (prefix pass) of the parallel prefix-sum algorithm?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
  
  
    What happens during the down pass (prefix pass) of the parallel prefix-sum algorithm?
    
    The root receives 0 (for exclusive prefix sum). At each internal node, the value passed down is forwarded to the left child as-is, and to the right child as: value_from_above + left_child's_up-pass_value. Leaf i ends with the exclusive prefix sum B[i]. Done top-down with O(n) work and O(log n) span.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What happens during the <b>down pass</b> (prefix pass) of the parallel prefix-sum algorithm?
                        
                        
                        
                            Back
                            
                            The root receives <b>0</b> (for exclusive prefix sum). At each internal node, the value passed down is forwarded to the <b>left child as-is</b>, and to the <b>right child</b> as: <code>value_from_above + left_child's_up-pass_value</code>. Leaf i ends with the exclusive prefix sum B[i]. Done top-down with O(n) work and O(log n) span.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::3._Parallel_algorithms
                    
                
            
            
        
        
        
            Note 62: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: rP!!UaISrY
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    Given a DAG with parallelism \(T_1/T_\infty\), what happens when you use \(p > T_1/T_\infty\) cores?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
  
  
    Given a DAG with parallelism \(T_1/T_\infty\), what happens when you use \(p > T_1/T_\infty\) cores?
    
    Beyond \(p = T_1/T_\infty\), the span law \(T_p \geq T_\infty\) becomes the binding constraint — extra cores are wasted. In practice: never worth deploying more than \(T_1/T_\infty\) processors for a given task graph — diminishing returns become total waste.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Given a DAG with parallelism \(T_1/T_\infty\), what happens when you use \(p > T_1/T_\infty\) cores?
                        
                        
                        
                            Back
                            
                            Beyond \(p = T_1/T_\infty\), the span law \(T_p \geq T_\infty\) becomes the binding constraint — extra cores are wasted. In practice: never worth deploying more than \(T_1/T_\infty\) processors for a given task graph — diminishing returns become total waste.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::10._Parallel_Algorithms::1._Task_graphs_and_performance_measures
                    
                
            
            
        
        
        
            Note 63: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: sRs/ljskpO
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::2._synchronized
  
  
    Why should you never use synchronized(someInteger) or synchronized(someString) as a lock?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::05._Java_Threads::2._synchronized
  
  
    Why should you never use synchronized(someInteger) or synchronized(someString) as a lock?
    
    Boxed types (Integer, Long) and String are immutable — operations like integer++ create a new object. The lock object changes, so two threads may lock on different objects and both enter the critical section simultaneously. Use a dedicated private final Object lock = new Object() instead.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Why should you never use <code>synchronized(someInteger)</code> or <code>synchronized(someString)</code> as a lock?
                        
                        
                        
                            Back
                            
                            Boxed types (<code>Integer</code>, <code>Long</code>) and <code>String</code> are <b>immutable</b> — operations like <code>integer++</code> create a new object. The lock object changes, so two threads may lock on different objects and both enter the critical section simultaneously. Use a dedicated <code>private final Object lock = new Object()</code> instead.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::05._Java_Threads::2._synchronized
                    
                
            
            
        
        
        
            Note 64: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: uQ3(lLENx0
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::2._Parallel_patterns
  
  
    Why does it matter whether you implement a parallel reduction on a linked list vs. a balanced tree/array?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::11._Parallel_Algorithms::2._Parallel_patterns
  
  
    Why does it matter whether you implement a parallel reduction on a linked list vs. a balanced tree/array?
    
    On a linked list, you must traverse O(n) nodes — span is O(n), no better than sequential. On a balanced tree or array, divide-and-conquer reaches any element in O(log n) steps, giving O(log n) span. Trees give the same flexibility as lists while preserving O(log n) span — preferred for parallel algorithms.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            Why does it matter whether you implement a parallel reduction on a linked list vs. a balanced tree/array?
                        
                        
                        
                            Back
                            
                            On a <b>linked list</b>, you must traverse O(n) nodes — span is O(n), no better than sequential. On a <b>balanced tree or array</b>, divide-and-conquer reaches any element in O(log n) steps, giving O(log n) span. Trees give the same flexibility as lists while preserving O(log n) span — preferred for parallel algorithms.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::11._Parallel_Algorithms::2._Parallel_patterns
                    
                
            
            
        
        
        
            Note 65: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: vIJVWjg2>Y
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
  
  
    A SpinLock is a lock where a thread trying to acquire it busy-waits (spins) in a loop until it becomes available, rather than sleeping.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
  
  
    A SpinLock is a lock where a thread trying to acquire it busy-waits (spins) in a loop until it becomes available, rather than sleeping.
    
    
    
    Trade-off: no context-switch overhead (good for short critical sections), but wastes CPU cycles (bad for long waits). Contrast with blocking locks (synchronized, ReentrantLock) which suspend the thread.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            A {{c1::SpinLock}} is a lock where a thread trying to acquire it {{c2::busy-waits (spins) in a loop}} until it becomes available, rather than sleeping.
                        
                        
                        
                            Extra
                            
                            Trade-off: no context-switch overhead (good for short critical sections), but wastes CPU cycles (bad for long waits). Contrast with blocking locks (<code>synchronized</code>, <code>ReentrantLock</code>) which suspend the thread.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::12._Shared_Memory_Concurrency
                    
                
            
            
        
        
        
            Note 66: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: zzYi%6?711
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::Terminology
  
  
    What are the four necessary conditions for deadlock (Coffman conditions)?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::Terminology
  
  
    What are the four necessary conditions for deadlock (Coffman conditions)?
    
    Mutual exclusion — at least one resource is held in non-shareable mode
Hold and wait — a thread holds at least one resource while waiting for another
No preemption — resources cannot be forcibly taken away
Circular wait — a cycle of threads each waiting for a resource held by the next
All four must hold simultaneously. Breaking any one prevents deadlock.
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            What are the four necessary conditions for deadlock (Coffman conditions)?
                        
                        
                        
                            Back
                            
                            <ol><li><b>Mutual exclusion</b> — at least one resource is held in non-shareable mode</li><li><b>Hold and wait</b> — a thread holds at least one resource while waiting for another</li><li><b>No preemption</b> — resources cannot be forcibly taken away</li><li><b>Circular wait</b> — a cycle of threads each waiting for a resource held by the next</li></ol>All four must hold simultaneously. Breaking any one prevents deadlock.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::Terminology
                    
                
            
            
        
        
        
            Note 67: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Classic

                GUID: |xpGo>FdYs
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    A pipeline has 4 stages with times [2, 4, 2, 2]. Is it balanced? What is the throughput? What is the latency of the 1st element? Of the 3rd?
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    A pipeline has 4 stages with times [2, 4, 2, 2]. Is it balanced? What is the throughput? What is the latency of the 1st element? Of the 3rd?
    
    Balanced? No — stage 2 takes 4 units; others take 2. Bottleneck is stage 2.
Throughput \(= 1/\max(2,4,2,2) = 1/4\) items per time unit.
Latency (1st element) \(= 2+4+2+2 = 10\)
Latency (3rd element) \(= 10 + (4-2)\cdot(3-1) = 10+4 = 14\)
  






                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Front
                            
                            A pipeline has 4 stages with times [2, 4, 2, 2]. Is it balanced? What is the throughput? What is the latency of the 1st element? Of the 3rd?
                        
                        
                        
                            Back
                            
                            <ul><li><b>Balanced?</b> No — stage 2 takes 4 units; others take 2. Bottleneck is stage 2.</li><li><b>Throughput</b> \(= 1/\max(2,4,2,2) = 1/4\) items per time unit.</li><li><b>Latency (1st element)</b> \(= 2+4+2+2 = 10\)</li><li><b>Latency (3rd element)</b> \(= 10 + (4-2)\cdot(3-1) = 10+4 = 14\)</li></ul>
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 68: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: |~C[soZuiK
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Bandwidth of a pipeline is the amount of work being processed in parallel at any given time.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
  
  
    Bandwidth of a pipeline is the amount of work being processed in parallel at any given time.
    
    
    
    Distinct from throughput (items/time) and latency (time/item). Bandwidth captures how many elements are simultaneously in-flight across all stages.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            {{c1::Bandwidth}} of a pipeline is {{c2::the amount of work being processed in parallel at any given time}}.
                        
                        
                        
                            Extra
                            
                            Distinct from throughput (items/time) and latency (time/item). Bandwidth captures how many elements are simultaneously in-flight across all stages.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::06._Parallel_Architecture::3._Pipelining
                    
                
            
            
        
        
        
            Note 69: ETH::2. Semester::PProg
            
                Deck: ETH::2. Semester::PProg

                Note Type: Horvath Cloze

                GUID: ~^81m|RfVI
            
            added
            

            
            
            
                
                    Previous
                    Note did not exist
                
                
                    New Note
                    
                    
                        Front
                        
                            
  ETH::2._Semester::PProg::Terminology
  
  
    A standard way to prevent deadlock is to always acquire multiple locks in a globally consistent (total) order, so that circular wait is impossible.
  



                        
                    
                    
                        Back
                        
                            
  ETH::2._Semester::PProg::Terminology
  
  
    A standard way to prevent deadlock is to always acquire multiple locks in a globally consistent (total) order, so that circular wait is impossible.
    
    
    
    Example: always acquire lockA before lockB. If every thread follows this order, no cycle can form. This breaks the "circular wait" Coffman condition.
  



                        
                    
                    
                
            

            

            
            
            
                Field-by-field Comparison
                
                    
                        
                            Field
                            Before
                            After
                        
                    
                    
                        
                        
                            Text
                            
                            A standard way to prevent deadlock is to always acquire multiple locks in {{c1::a globally consistent (total) order}}, so that {{c2::circular wait is impossible}}.
                        
                        
                        
                            Extra
                            
                            Example: always acquire lockA before lockB. If every thread follows this order, no cycle can form. This breaks the "circular wait" Coffman condition.
                        
                        
                    
                
            
            

            
            
            
                Tags:
                
                    
                    
                    
                    
                    
                    
                    ETH::2._Semester::PProg::Terminology

Field	Before	After
Text		In practice, \(T_p\) {{c1::\(> T_1/p\)}} because of {{c2::synchronization overhead, scheduling overhead, load imbalance, and memory contention}}.
Extra		Perfect efficiency (E = 1, S_p = p) is rarely achievable. Sub-linear speedup is the normal situation; super-linear is exceptional.

Field	Before	After
Front		What does Amdahl's Law give for \(f=0\) (fully parallel) and \(f=1\) (fully serial)?
Back		<ul><li>\(f = 0\): \(S_P = P\) → <b>linear speedup</b>, efficiency = 1. Adding more cores always helps.</li><li>\(f = 1\): \(S_P = 1\) → <b>no speedup at all</b>, regardless of P. The program is entirely sequential.</li></ul>

Field	Before	After
Front		What does it mean for a parallel program to have efficiency E = 1 (100%), and when does efficiency approach 0?
Back		\(E = S_p / p = T_1 / (p \cdot T_p)\). E = 1 means \(T_p = T_1/p\) — every processor is doing useful work 100% of the time, no idle time, no overhead. Efficiency → 0 when: (1) p is very large (Amdahl: speedup saturates while p grows), or (2) overhead (synchronization, scheduling) dominates useful work.

Field	Before	After
Text		Pack has {{c1::\(O(n)\) work}} and {{c2::\(O(\log n)\) span}}.
Extra		Each of the 3 steps (flag, prefix sum, scatter) is O(n) work and O(log n) span. The steps can be partially combined without changing asymptotic complexity.

Field	Before	After
Text		Summing an array of n elements in parallel has work \({{c1::O(n)}}\), span \({{c2::O(\log n)}}\), and parallelism \({{c3::O(n / \log n)}}\).
Extra		Work = O(n) (same as sequential). Span = O(log n) (tree depth). Parallelism = work/span = O(n/log n), meaning you benefit from up to O(n/log n) processors before hitting the span bottleneck.

Field	Before	After
Front		Why is the FJ work-stealing guarantee \(T_p = O(T_1/p + T_\infty)\) considered asymptotically optimal?
Back		The two terms match the fundamental lower bounds: \(T_p \geq T_1/p\) (work law) and \(T_p \geq T_\infty\) (span law). The FJ scheduler achieves \(O(T_1/p + T_\infty)\), which equals \(O(\max(T_1/p, T_\infty))\) up to a constant factor. <b>No scheduler can do asymptotically better</b> — FJ matches the information-theoretic lower bound.

Field	Before	After
Text		{{c1::<code>Callable<V></code>}} differs from {{c2::<code>Runnable</code>}} in that it {{c3::returns a result of type V and can throw checked exceptions}}.
Extra		<code>exec.submit(callable)</code> returns a <code>Future<V></code>. <code>exec.execute(runnable)</code> is fire-and-forget (no return value).

Field	Before	After
Front		How can an unbalanced pipeline be fixed, and what is the caveat?
Back		Split the bottleneck (slowest) stage into two or more sub-stages of equal duration. Caveat: too much splitting is impractical — it introduces enormous switching/management overhead.

Field	Before	After
Text		The span \(T_\infty\) in a DAG corresponds to {{c1::the serial fraction f in Amdahl's Law}}: the {{c2::longest chain of sequential dependencies}} that no amount of additional parallelism can overcome.
Extra		Designing parallel algorithms means <b>decreasing span</b> without increasing work too much — directly equivalent to reducing f in Amdahl's Law.

Field	Before	After
Text		Latency of the <b>first</b> element through a pipeline \(= {{c1::\sum_{i} \text{time}(\text{stage}_i)}}\) (sum of all stage times).
Extra		"Latency" by default means the first element. For a balanced pipeline this equals num_stages × max(stage_time).

Field	Before	After
Front		What is the difference between a <b>race condition</b> and a <b>data race</b>?
Back		A <b>data race</b> is a low-level memory event: a shared variable is accessed by two threads, at least one access is a write, and there is no synchronization. A <b>race condition</b> is a higher-level correctness issue: the program's observable behavior depends on execution order. A data race usually causes a race condition, but a race condition can exist even without a data race (e.g. check-then-act with synchronized operations). In Java, data races invoke undefined behavior per the JMM.

Field	Before	After
Text		In Fork/Join, each task should ideally perform between {{c1::1,000 and 100,000}} sequential operations to ensure {{c2::the task has enough work to offset scheduling overhead}}.
Extra		Too small → overhead dominates. Too large → poor load balancing and under-utilization. The exact threshold is empirical.

Field	Before	After
Text		<code>java.util.concurrent.atomic.{{c1::AtomicInteger}}</code> provides {{c2::atomic compound operations}} such as <code>{{c3::getAndIncrement()}}</code> and <code>{{c4::compareAndSet(expected, update)}}</code> without requiring <code>synchronized</code>.
Extra		Implemented via hardware CAS (compare-and-swap). Faster than <code>synchronized</code> for simple counters. Part of <code>java.util.concurrent.atomic</code>.

Field	Before	After
Front		What does the upper bound \(T_p \leq T_1/p + O(T_\infty)\) tell us about scheduling?
Back		A <b>good scheduler</b> can always achieve time within a constant factor of the ideal. The two terms represent two different sources of time: \(T_1/p\) is the work-limited regime, and \(O(T_\infty)\) is the unavoidable sequential overhead from dependencies (span-limited regime). Their sum is an upper bound — a scheduler can't be much worse than their sum.

Field	Before	After
Text		Lead-in and lead-out {{c1::reduce effective throughput below the theoretical bound}} for {{c2::a finite number of elements N}}. Their impact is largest when {{c3::N is small (short pipelines)}}.
Extra		Example: for a finite run, effective throughput was 0.7 while the infinite-stream bound was 1. As N → ∞, effective throughput approaches 1/max(stage_time).

Field	Before	After
Front		What advantages does <code>ReentrantLock</code> offer over <code>synchronized</code>?
Back		<ul><li><code>tryLock()</code> — non-blocking attempt (returns boolean)</li><li><code>tryLock(timeout, unit)</code> — timed lock attempt</li><li><code>lockInterruptibly()</code> — interruptible while waiting</li><li>Multiple <code>Condition</code> objects per lock (vs. one monitor per <code>synchronized</code> object)</li><li>More flexible critical section shapes (lock and unlock in different scopes)</li></ul>⚠️ Must always call <code>unlock()</code> in a <code>finally</code> block to avoid leaked locks.

Field	Before	After
Front		Why is \(T_p \geq T_1/p\) a strict lower bound — why can no scheduler beat it?
Back		\(T_1\) is the total work (sum of all node costs in the DAG). With p processors, at most p units of work can be done per time step. Therefore the minimum time to do \(T_1\) units of work is \(T_1/p\). This is a counting argument — independent of the scheduler's strategy.

Field	Before	After
Front		When should you prefer <code>notifyAll()</code> over <code>notify()</code>?
Back		Prefer <code>notifyAll()</code> when <b>multiple threads</b> may be waiting and they have <b>different conditions</b> to wait for. <code>notify()</code> wakes only one (arbitrary) thread — if it wakes a thread whose condition is still false, it goes back to sleep and no progress is made. <code>notifyAll()</code> wakes all waiters; each re-checks its condition in its <code>while</code> loop. Cost: more context switches. Safe default: always use <code>notifyAll()</code> unless you have a single-condition, single-consumer pattern.

Field	Before	After
Front		What does each node store after the <b>up pass</b> (reduce pass) of the parallel prefix-sum algorithm?
Back		Each internal node stores the <b>sum of all leaves in its subtree</b>. Leaves store the original values. This pass is a parallel tree reduction — done bottom-up with O(n) work and O(log n) span.

Field	Before	After
Text		{{c1::Pack}} (also called filter) takes a collection and a predicate and returns {{c2::a new array containing only the elements satisfying the predicate, preserving their original relative order}}.
Extra		Example: pack([3,1,4,1,5], x > 2) → [3,4,5]. This is the parallel equivalent of a sequential filter loop.

Field	Before	After
Text		Calling <code>wait()</code>, <code>notify()</code>, or <code>notifyAll()</code> on an object without holding its lock throws {{c1::<code>IllegalMonitorStateException</code>}} at runtime.
Extra		The check is runtime-only — the compiler does not catch this. Always ensure these calls are inside a <code>synchronized(obj)</code> block or a <code>synchronized</code> method on the same object.

Field	Before	After
Text		Instruction-level parallelism (ILP) uses two techniques: {{c1::dependency analysis}} (execute independent instructions in parallel, reorder to keep the CPU busy) and {{c2::speculative execution}} (execute a branch before the condition is resolved, roll back if wrong).
Extra		ILP happens on a <b>single thread</b> — no concurrency. It is hardware-dependent (requires superscalar execution units).

Field	Before	After
Front		Why does parallel prefix sum require two passes (up then down) rather than one?
Back		The up pass computes subtree sums — it doesn't know what came "before" each element globally. The down pass distributes accumulated sums from the root downward, so each leaf learns the total sum of all elements to its left. No single pass can do both simultaneously because each node needs both its subtree total (from children) and the prefix from its ancestors.

Field	Before	After
Text		Latency of the <b>n-th</b> element in an unbalanced pipeline \(= {{c1::\text{latency}_1 + \bigl(\max(\text{stage\_time}) - \text{time}(\text{stage}_1)\bigr) \cdot (n-1)}}\)
Extra		Latency grows linearly with n when the pipeline is unbalanced — elements pile up at the bottleneck. If balanced, this simplifies to latency₁ (constant).

Field	Before	After
Front		What are the <b>lead-in</b> and <b>lead-out</b> phases of a pipeline?
Back		<b>Lead-in</b> (warm-up): the pipeline is filling — earlier stages have started but later stages are still idle. Not all stages are doing useful work yet.<br><b>Lead-out</b> (cool-down): the pipeline is draining — earlier stages have finished but later stages are still processing. Earlier stages become idle.

Field	Before	After
Text		In Java ForkJoin, {{c1::<code>RecursiveTask<V></code>}} is used when the task {{c2::returns a result of type V}}, while {{c3::<code>RecursiveAction</code>}} is used when {{c4::no result is returned (void)}}.
Extra		Both extend <code>ForkJoinTask</code> and override <code>compute()</code>. Use <code>RecursiveTask</code> for sum/merge, <code>RecursiveAction</code> for in-place sort.

Field	Before	After
Text		<code>exec.{{c1::shutdown()}}</code> {{c2::stops accepting new tasks but lets already-submitted tasks finish}}. <code>exec.{{c3::shutdownNow()}}</code> {{c4::attempts to interrupt all running tasks and returns the list of unstarted tasks}}.
Extra		Use <code>awaitTermination(timeout, unit)</code> after <code>shutdown()</code> to block until all tasks complete or the timeout expires.

Field	Before	After
Front		Name the four standard <code>ExecutorService</code> pool types and their key characteristics.
Back		<ul><li><code>newFixedThreadPool(n)</code> — fixed n threads; queues excess tasks</li><li><code>newSingleThreadExecutor()</code> — exactly 1 thread; tasks execute sequentially</li><li><code>newCachedThreadPool()</code> — elastic; creates threads on demand, reuses idle ones</li><li><code>newScheduledThreadPool(n)</code> — supports delayed and periodic tasks</li></ul>

Field	Before	After
Text		The Java <code>volatile</code> keyword guarantees {{c1::visibility}} — every read of a volatile field sees the {{c2::most recent write by any thread}} — but does <b>not</b> guarantee {{c3::atomicity of compound operations (e.g. i++)}}.
Extra		Use <code>volatile</code> for simple flags (e.g. <code>volatile boolean running</code>). For compound operations, use <code>synchronized</code> or <code>AtomicInteger</code>.

Field	Before	After
Front		Is it possible for \(T_p < T_1/p\)? What does this mean and what causes it?
Back		Yes — this is <b>super-linear speedup</b> (\(S_p > p\)). Causes: (1) <b>Cache effects</b> — smaller per-thread working sets fit in L1/L2 cache, making memory access faster than in the sequential run which had cache misses. (2) <b>Algorithmic differences</b> — the parallel version may explore the problem space differently. Not considered 'magic' — it has a real explanation.