Below I discuss some ideas on how to optimize processing times in services.
IO Calls:
Services are designed to take actions based on events, actions may require gathering data related to the event from external services and data stores. Such IO requests should be executed in parallel as much as possible. This is best done by utilizing a thread pool for external IO calls. I am saying a thread pool here as it takes time allocating new threads and allocating too many threads adds to memory pressure in the process. Therefore, having IO related threads in a bounded thread pool is a good idea.
Handling inter dependency between data and calculations:
Often work units have inter dependency where a unit of work has to complete before another can start. In this case its best to design the work to have clear demarcation of dependencies so that we can run a form of a topological sort to determine the optimal order in which work can be parallelized.
Parallelizing complex tasks:
For complex units of work it is best to design such that the work can be decomposed into a series of tasks each of which can be run independently and where possible in parallel. Pipes and filters distributed systems design pattern that I have discussed in the article on distributed system design pattern covers this concept.
Parallelizing singular tasks:
It is most efficient to design each singular work unit as a series of smaller calculations (data transformations) that can be grouped together based on size to form the unit of work. I often try to code this via a fork join pool where small work units are run in groups across threads and later a consolidator aggregates the work into a single work unit. In these cases, a thread pool such as fork join pool can be constructed and made available in a pool manager which tasks can refer to via spring binding for example.