Discover the Best AI Tools & Practical Guides

Aizhi curates the best AI tools, generators and step-by-step guides — AI writing, image, video, chatbots, coding and business, updated for 2026.

Browse by Category

Latest AI Guides

All articles →

United States Tech Force

The U.S. Tech Force (also styled as US Tech Force, Tech Force, or Government Tech Force) is a federal hiring initiative launched by the second Donald Trump administration in December 2025. The program, administered by the Office of Personnel Management (OPM), aims to recruit about 1,000 early-career technology professionals into two-year government jobs to modernize federal IT systems, advance artificial intelligence (AI) capabilities, and address technological gaps in government operations. The initiative is an effort to plug capability gaps created by Trump-administration efforts to shrink the federal government, which led to the departure of some 220,000 federal employees, including many in IT. The initiative seeks early-career workers; officials said it would offer competitive salaries and opportunities to work on high-impact government technology projects. Major technology companies—including Amazon, Apple, Microsoft, Nvidia, Meta, Google, and OpenAI—agreed to help identify and refer candidates. Candidates are allowed to take Tech Force positions on leaves of absence and without divesting their stock, raising conflict-of-interest questions. In January 2026, OPM direction Scott Kupor said the deadline for applying to Tech Force was being extended because of "tremendous interest" without saying how many people had actually applied. Also in December 2025, news broke that the administration is planning another novel use of private-sector workers: hiring cybersecurity firms for offensive cyber operations.

Read guide →

Pointer jumping

Pointer jumping or path doubling is a design technique for parallel algorithms that operate on pointer structures, such as linked lists and directed graphs. Pointer jumping allows an algorithm to follow paths with a time complexity that is logarithmic with respect to the length of the longest path. It does this by "jumping" to the end of the path computed by neighbors. The basic operation of pointer jumping is to replace each neighbor in a pointer structure with its neighbor's neighbor. In each step of the algorithm, this replacement is done for all nodes in the data structure, which can be done independently in parallel. In the next step when a neighbor's neighbor is followed, the neighbor's path already followed in the previous step is added to the node's followed path in a single step. Thus, each step effectively doubles the distance traversed by the explored paths. Pointer jumping is best understood by looking at simple examples such as list ranking and root finding. == List ranking == One of the simpler tasks that can be solved by a pointer jumping algorithm is the list ranking problem. This problem is defined as follows: given a linked list of N nodes, find the distance (measured in the number of nodes) of each node to the end of the list. The distance d(n) is defined as follows, for nodes n that point to their successor by a pointer called next: If n.next is nil, then d(n) = 0. For any other node, d(n) = d(n.next) + 1. This problem can easily be solved in linear time on a sequential machine, but a parallel algorithm can do better: given n processors, the problem can be solved in logarithmic time, O(log N), by the following pointer jumping algorithm: The pointer jumping occurs in the last line of the algorithm, where each node's next pointer is reset to skip the node's direct successor. It is assumed, as in common in the PRAM model of computation, that memory access are performed in lock-step, so that each n.next.next memory fetch is performed before each n.next memory store; otherwise, processors may clobber each other's data, producing inconsistencies. The following diagram follows how the parallel list ranking algorithm uses pointer jumping for a linked list with 11 elements. As the algorithm describes, the first iteration starts initialized with all ranks set to 1 except those with a null pointer for next. The first iteration looks at immediate neighbors. Each subsequent iteration jumps twice as far as the previous. Analyzing the algorithm yields a logarithmic running time. The initialization loop takes constant time, because each of the N processors performs a constant amount of work, all in parallel. The inner loop of the main loop also takes constant time, as does (by assumption) the termination check for the loop, so the running time is determined by how often this inner loop is executed. Since the pointer jumping in each iteration splits the list into two parts, one consisting of the "odd" elements and one of the "even" elements, the length of the list pointed to by each processor's n is halved in each iteration, which can be done at most O(log N) time before each list has a length of at most one. == Root finding == Following a path in a graph is an inherently serial operation, but pointer jumping reduces the total amount of work by following all paths simultaneously and sharing results among dependent operations. Pointer jumping iterates and finds a successor — a vertex closer to the tree root — each time. By following successors computed for other vertices, the traversal down each path can be doubled every iteration, which means that the tree roots can be found in logarithmic time. Pointer doubling operates on an array successor with an entry for every vertex in the graph. Each successor[i] is initialized with the parent index of vertex i if that vertex is not a root or to i itself if that vertex is a root. At each iteration, each successor is updated to its successor's successor. The root is found when the successor's successor points to itself. The following pseudocode demonstrates the algorithm. algorithm Input: An array parent representing a forest of trees. parent[i] is the parent of vertex i or itself for a root Output: An array containing the root ancestor for every vertex for i ← 1 to length(parent) do in parallel successor[i] ← parent[i] while true for i ← 1 to length(successor) do in parallel successor_next[i] ← successor[successor[i]] if successor_next = successor then break for i ← 1 to length(successor) do in parallel successor[i] ← successor_next[i] return successor The following image provides an example of using pointer jumping on a small forest. On each iteration the successor points to the vertex following one more successor. After two iterations, every vertex points to its root node. == History and examples == Although the name pointer jumping would come later, JáJá attributes the first uses of the technique in early parallel graph algorithms and list ranking. The technique has been described with other names such as shortcutting, but by the 1990s textbooks on parallel algorithms consistently used the term pointer jumping. Today, pointer jumping is considered a software design pattern for operating on recursive data types in parallel. As a technique for following linked paths, graph algorithms are a natural fit for pointer jumping. Consequently, several parallel graph algorithms utilizing pointer jumping have been designed. These include algorithms for finding the roots of a forest of rooted trees, connected components, minimum spanning trees, and biconnected components. However, pointer jumping has also shown to be useful in a variety of other problems including computer vision, image compression, and Bayesian inference.

Read guide →

Kunstweg

Bürgi's Kunstweg is a set of algorithms developed by Jost Bürgi in the late 16th century. They are used to calculate sines to arbitrary precision.. Bürgi used these algorithms to calculate a Canon Sinuum, a sine table in increments of 2 arc seconds. It is believed that the table featured values accurate to eight sexagesimal places. Some authors have speculated that the table only covered the range from 0° to 45°, although there is no evidence supporting this claim. Such tables were crucial for maritime navigation. Johannes Kepler described the Canon Sinuum as the most precise sine table known at the time. Bürgi explained his algorithms in his work Fundamentum Astronomiae, which he presented to Emperor Rudolf II in 1592. The Kunstweg algorithm calculates sine values iteratively. In each step, the value of a cell is the sum of the two preceding cells in the same column. The final cell's value is halved before beginning the next iteration. Ultimately, the values in the last column are normalized. Accurate sine approximations are achieved after only a few iterations. In 2015, Menso Folkerts and coworkers demonstrated that this iterative process does indeed converge toward the true sine values. According to them this was the first step towards differential calculus.

Read guide →

Bartels–Stewart algorithm

In numerical linear algebra, the Bartels–Stewart algorithm is used to numerically solve the Sylvester matrix equation A X − X B = C {\displaystyle AX-XB=C} . Developed by R.H. Bartels and G.W. Stewart in 1971, it was the first numerically stable method that could be systematically applied to solve such equations. The algorithm works by using the real Schur decompositions of A {\displaystyle A} and B {\displaystyle B} to transform A X − X B = C {\displaystyle AX-XB=C} into a triangular system that can then be solved using forward or backward substitution. In 1979, G. Golub, C. Van Loan and S. Nash introduced an improved version of the algorithm, known as the Hessenberg–Schur algorithm. It remains a standard approach for solving Sylvester equations when X {\displaystyle X} is of small to moderate size. == The algorithm == Let X , C ∈ R m × n {\displaystyle X,C\in \mathbb {R} ^{m\times n}} , and assume that the eigenvalues of A {\displaystyle A} are distinct from the eigenvalues of B {\displaystyle B} . Then, the matrix equation A X − X B = C {\displaystyle AX-XB=C} has a unique solution. The Bartels–Stewart algorithm computes X {\displaystyle X} by applying the following steps: 1.Compute the real Schur decompositions R = U T A U , {\displaystyle R=U^{T}AU,} S = V T B T V . {\displaystyle S=V^{T}B^{T}V.} The matrices R {\displaystyle R} and S {\displaystyle S} are block-upper triangular matrices, with diagonal blocks of size 1 × 1 {\displaystyle 1\times 1} or 2 × 2 {\displaystyle 2\times 2} . 2. Set F = U T C V . {\displaystyle F=U^{T}CV.} 3. Solve the simplified system R Y − Y S T = F {\displaystyle RY-YS^{T}=F} , where Y = U T X V {\displaystyle Y=U^{T}XV} . This can be done using forward substitution on the blocks. Specifically, if s k − 1 , k = 0 {\displaystyle s_{k-1,k}=0} , then ( R − s k k I ) y k = f k + ∑ j = k + 1 n s k j y j , {\displaystyle (R-s_{kk}I)y_{k}=f_{k}+\sum _{j=k+1}^{n}s_{kj}y_{j},} where y k {\displaystyle y_{k}} is the k {\displaystyle k} th column of Y {\displaystyle Y} . When s k − 1 , k ≠ 0 {\displaystyle s_{k-1,k}\neq 0} , columns [ y k − 1 ∣ y k ] {\displaystyle [y_{k-1}\mid y_{k}]} should be concatenated and solved for simultaneously. 4. Set X = U Y V T . {\displaystyle X=UYV^{T}.} === Computational cost === Using the QR algorithm, the real Schur decompositions in step 1 require approximately 10 ( m 3 + n 3 ) {\displaystyle 10(m^{3}+n^{3})} flops, so that the overall computational cost is 10 ( m 3 + n 3 ) + 2.5 ( m n 2 + n m 2 ) {\displaystyle 10(m^{3}+n^{3})+2.5(mn^{2}+nm^{2})} . === Simplifications and special cases === In the special case where B = − A T {\displaystyle B=-A^{T}} and C {\displaystyle C} is symmetric, the solution X {\displaystyle X} will also be symmetric. This symmetry can be exploited so that Y {\displaystyle Y} is found more efficiently in step 3 of the algorithm. == The Hessenberg–Schur algorithm == The Hessenberg–Schur algorithm replaces the decomposition R = U T A U {\displaystyle R=U^{T}AU} in step 1 with the decomposition H = Q T A Q {\displaystyle H=Q^{T}AQ} , where H {\displaystyle H} is an upper-Hessenberg matrix. This leads to a system of the form H Y − Y S T = F {\displaystyle HY-YS^{T}=F} that can be solved using forward substitution. The advantage of this approach is that H = Q T A Q {\displaystyle H=Q^{T}AQ} can be found using Householder reflections at a cost of ( 5 / 3 ) m 3 {\displaystyle (5/3)m^{3}} flops, compared to the 10 m 3 {\displaystyle 10m^{3}} flops required to compute the real Schur decomposition of A {\displaystyle A} . == Software and implementation == The subroutines required for the Hessenberg-Schur variant of the Bartels–Stewart algorithm are implemented in the SLICOT library. These are used in the MATLAB control system toolbox. == Alternative approaches == For large systems, the O ( m 3 + n 3 ) {\displaystyle {\mathcal {O}}(m^{3}+n^{3})} cost of the Bartels–Stewart algorithm can be prohibitive. When A {\displaystyle A} and B {\displaystyle B} are sparse or structured, so that linear solves and matrix vector multiplies involving them are efficient, iterative algorithms can potentially perform better. These include projection-based methods, which use Krylov subspace iterations, methods based on the alternating direction implicit (ADI) iteration, and hybridizations that involve both projection and ADI. Iterative methods can also be used to directly construct low rank approximations to X {\displaystyle X} when solving A X − X B = C {\displaystyle AX-XB=C} .

Read guide →

Bazaart

Bazaart is an AI-powered design platform with image and video editing capabilities for iOS, Android, MacOS, and the web. == History == Bazaart was founded in 2012 in Israel. In April 2012, Bazaart launched a Facebook app called Pinvolve, which converts Facebook Pages into Pinterest pinboards. From June to August 2012, it participated in the DreamIt startup accelerator in New York and raised $25,000 from the accelerator. In July 2012, it launched its first version as an iPad app connected to Pinterest. In December 2013, it pivoted and launched a major version of its app, a "social" photoshop that allowed users to edit images which could be pulled in from the camera roll, social networks, and other sources. In July 2014, Bazaart reached one million downloads and in December was selected by Apple as Best of 2014. In 2015, Bazaart added Photoshop integration in a partnership with Adobe. In September 2020, Bazaart launched an Android app. In December 2020, Bazaart was selected by Google as Best of 2020. In January 2022, Bazaart added video editing capabilities. In 2023, the platform added AI-powered backgrounds and video background removal features.

Read guide →

Algorithmic management

Algorithmic management is a term used to describe certain labor management practices in the contemporary digital economy. In scholarly uses, the term was initially coined in 2015 by Min Kyung Lee, Daniel Kusbit, Evan Metsky, and Laura Dabbish to describe the managerial role played by algorithms on the Uber and Lyft platforms, but has since been taken up by other scholars to describe more generally the managerial and organisational characteristics of platform economies. However, digital direction of labor was present in manufacturing already since the 1970s and algorithmic management is becoming increasingly widespread across a wide range of industries. The concept of algorithmic management can be broadly defined as the delegation of managerial functions to algorithmic and automated systems. Algorithmic management has been enabled by "recent advances in digital technologies" which allow for the real-time and "large-scale collection of data" which is then used to "improve learning algorithms that carry out learning and control functions traditionally performed by managers". The term does not refer to a specific underlying technology, and encompasses the design choices, organisational policies, and governance that surround the managerial use of algorithms in workplaces. In the contemporary workplace, firms employ an ecology of accounting devices, such as "rankings, lists, classifications, stars and other symbols' in order to effectively manage their operations and create value without the need for traditional forms of hierarchical control." Many of these devices fall under the label of what is called algorithmic management, and were first developed by companies operating in the sharing economy or gig economy, functioning as effective labor and cost cutting measures. The Data&Society explainer of the term, for example, describes algorithmic management as 'a diverse set of technological tools and techniques that structure the conditions of work and remotely manage workforces. Data&Society also provides a list of five typical features of algorithmic management: Prolific data collection and surveillance of workers through technology; Real-time responsiveness to data that informs management decisions; Automated or semi-automated decision-making; Transfer of performance evaluations to rating systems or other metrics; and The use of "nudges" and penalties to indirectly incentivize worker behaviors. Proponents of algorithmic management claim that it "creates new employment opportunities, better and cheaper consumer services, transparency and fairness in parts of the labour market that are characterised by inefficiency, opacity and capricious human bosses." On the other hand, critics of algorithmic management claim that the practice leads to several issues, especially as it impacts the employment status of workers managed by its new array of tools and techniques. == History of the term == "Algorithmic management" was first described by Lee, Kusbit, Metsky, and Dabbish in 2015 in their study of the Uber and Lyft platforms. In their study, Lee et al. termed "software algorithms that assume managerial functions and surrounding institutional devices that support algorithms in practice" algorithmic management. Software algorithms, it was said, are increasingly used to "allocate, optimize, and evaluate work" by platforms in managing their vast workforces. In Lee et al.'s paper on Uber and Lyft this included the use of algorithms to assign work to drivers, as mechanisms to optimise pricing for services, and as systems for evaluating driver performance. In 2016, Alex Rosenblat and Luke Stark sought to extend on this understanding of algorithmic management "to elucidate on the automated implementation of company policies on the behaviours and practices of Uber drivers." Rosenblat and Stark found in their study that algorithmic management practices contributed to a system beset by power asymmetries, where drivers had little control over "critical aspects of their work", whereas Uber had far greater control over the labor of its drivers. Since this time, studies of algorithmic management have extended the use of the term to describe the management practices of various firms, where, for example, algorithms "are taking over scheduling work in fast food restaurants and grocery stores, using various forms of performance metrics ad even mood... to assign the fastest employees to work in peak times." Algorithmic management is seen to be especially prevalent in gig work on platforms, such as on Upwork and Deliveroo, and in the sharing economy, such as in the case of Airbnb. Furthermore, recent research has defined sub-constructs that fall under the umbrella term of algorithmic management, for example, "algorithmic nudging". A Harvard Business Review article published in 2021 explains: "Companies are increasingly using algorithms to manage and control individuals not by force, but rather by nudging them into desirable behavior — in other words, learning from their personalized data and altering their choices in some subtle way." While the concept builds on nudging theory popularized by University of Chicago economist Richard Thaler and Harvard Law School professor Cass Sunstein, "due to recent advances in AI and machine learning, algorithmic nudging is much more powerful than its non-algorithmic counterpart. With so much data about workers' behavioral patterns at their fingertips, companies can now develop personalized strategies for changing individuals' decisions and behaviors at large scale. These algorithms can be adjusted in real-time, making the approach even more effective." == Relationships with other labor management practices == Algorithmic management has been compared and contrasted with other forms of management, such as Scientific management approaches, as pioneered by Frederick Taylor in the early 1900s. Henri Schildt has called algorithmic management "Scientific management 2.0", where management "is no longer a human practice, but a process embedded in technology." Similarly, Kathleen Griesbach, Adam Reich, Luke Elliott-Negri, and Ruth Milkman suggest that, while "algorithmic control over labor may be relatively new, it replicates many features of older mechanisms of labor control." On the other hand, some commentators have argued that algorithmic management is not simply a new form of Scientific management or digital Taylorism, but represents a distinct approach to labor control in platform economies. David Stark and Ivana Pais, for example, state that, "In contrast to Scientific Management at the turn of the twentieth century, in the algorithmic management of the twenty-first century there are rules but these are not bureaucratic, there are rankings but not ranks, and there is monitoring but it is not disciplinary. Algorithmic management does not automate bureaucratic structures and practices to create some new form of algorithmic bureaucracy. Whereas the devices and practices of Taylorism were part of a system of hierarchical supervision, the devices and practices of algorithmic management take place within a different economy of attention and a new regime of visibility. Triangular rather than vertical, and not as a panopticon, the lines of vision in algorithmic management are not lines of supervision." Similarly, Data&Society's explainer for algorithmic management claims that the practice represents a marked departure from earlier management structures that more strongly rely on human supervisors to direct workers. In analyzing the difference and the similarities to previous management styles, David Stark and Pieter Vanden Broeck expand the applicability of algorithmic management beyond the workplace. They develop a theory of algorithmic management in terms of broader changes in the shape and structure of organization in the 21st century, attentive to the erosion of organization's boundaries whereby heterogeneous actors, assets, and activities, are coopted regardless of their place in organizational space. Stark and Vanden Broeck propose the following means of differentiating algorithmic management from other historical managerial paradigms: == Issues == Algorithmic management can provide an effective and efficient means of workforce control and value creation in the contemporary digital economy. However, commentators have highlighted several issues that algorithmic management poses, especially for the workers it manages. Criticisms of the practice often highlight several key issues pertaining to algorithmic management practices, such as the imperfection and scope of its surveillance and control measures, which also threaten to lock workers out of key decision-making processes; its lack of transparency for users and information asymmetries; its potential for bias and discrimination; its dehumanizing tendencies; and its potential to create conditions which sidestep traditional employer-employee accountability. This last point has been especi

Read guide →

Two-phase commit protocol

In transaction processing, databases, and computer networking, the two-phase commit protocol (2PC, tupac) is a type of atomic commitment protocol (ACP). It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort (roll back) the transaction. This protocol (a specialised type of consensus protocol) achieves its goal even in many cases of temporary system failure (involving either process, network node, communication, etc. failures), and is thus widely used. However, it is not resilient to all possible failure configurations, and in rare cases, manual intervention is needed to remedy an outcome. To accommodate recovery from failure (automatic in most cases) the protocol's participants use logging of the protocol's states. Log records, which are typically slow to generate but survive failures, are used by the protocol's recovery procedures. Many protocol variants exist that primarily differ in logging strategies and recovery mechanisms. Though usually intended to be used infrequently, recovery procedures compose a substantial portion of the protocol, due to many possible failure scenarios to be considered and supported by the protocol. In a "normal execution" of any single distributed transaction (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases: The commit-request phase (or voting phase), in which a coordinator process attempts to prepare all the transaction's participating processes (named participants, cohorts, or workers) to take the necessary steps for either committing or aborting the transaction and to vote, either "Yes": commit (if the transaction participant's local portion execution has ended properly), or "No": abort (if a problem has been detected with the local portion), and The commit phase, in which, based on voting of the participants, the coordinator decides whether to commit (only if all have voted "Yes") or abort the transaction (otherwise), and notifies the result to all the participants. The participants then follow with the needed actions (commit or abort) with their local transactional resources (also called recoverable resources; e.g., database data) and their respective portions in the transaction's other output (if applicable). The two-phase commit (2PC) protocol should not be confused with the two-phase locking (2PL) protocol, a concurrency control protocol. == Assumptions == The protocol works in the following manner: one node is a designated coordinator, which is the master site, and the rest of the nodes in the network are designated the participants. The protocol assumes that: there is stable storage at each node with a write-ahead log, no node crashes forever, the data in the write-ahead log is never lost or corrupted in a crash, and any two nodes can communicate with each other. The last assumption is not too restrictive, as network communication can typically be rerouted. The first two assumptions are much stronger; if a node is totally destroyed then data can be lost. The protocol is initiated by the coordinator after the last step of the transaction has been reached. The participants then respond with an agreement message or an abort message depending on whether the transaction has been processed successfully at the participant. == Basic algorithm == === Commit request (or voting) phase === The coordinator sends a query to commit message to all participants and waits until it has received a reply from all participants. The participants execute the transaction up to the point where they will be asked to commit. They each write an entry to their undo log and an entry to their redo log. Each participant replies with: either an agreement message (participant votes Yes to commit), if the participant's actions succeeded; or an abort message (participant votes No to commit), if the participant experiences a failure that will make it impossible to commit. === Commit (or completion) phase === ==== Success ==== If the coordinator received an agreement message from all participants during the commit-request phase: The coordinator sends a commit message to all the participants. Each participant completes the operation, and releases all the locks and resources held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator completes the transaction when all acknowledgements have been received. ==== Failure ==== If any participant votes No during the commit-request phase (or the coordinator's timeout expires): The coordinator sends a rollback message to all the participants. Each participant undoes the transaction using the undo log, and releases the resources and locks held during the transaction. Each participant sends an acknowledgement to the coordinator. The coordinator undoes the transaction when all acknowledgements have been received. ==== Message flow ==== Coordinator Participant QUERY TO COMMIT --------------------------------> VOTE YES/NO prepare/abort <------------------------------- commit/abort COMMIT/ROLLBACK --------------------------------> ACKNOWLEDGEMENT commit/abort <-------------------------------- end An next to the record type means that the record is forced to stable storage. == Disadvantages == The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an agreement message as a response to the commit-request message from the coordinator, it will block until a commit or rollback is received. A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond. == Implementing the two-phase commit protocol == === Common architecture === In many cases the 2PC protocol is distributed in a computer network. It is easily distributed by implementing multiple dedicated 2PC components similar to each other, typically named transaction managers (TMs; also referred to as 2PC agents or Transaction Processing Monitors), that carry out the protocol's execution for each transaction (e.g., The Open Group's X/Open XA). The databases involved with a distributed transaction, the participants, both the coordinator and participants, register to close TMs (typically residing on respective same network nodes as the participants) for terminating that transaction using 2PC. Each distributed transaction has an ad hoc set of TMs, the TMs to which the transaction participants register. A leader, the coordinator TM, exists for each transaction to coordinate 2PC for it, typically the TM of the coordinator database. However, the coordinator role can be transferred to another TM for performance or reliability reasons. Rather than exchanging 2PC messages among themselves, the participants exchange the messages with their respective TMs. The relevant TMs communicate among themselves to execute the 2PC protocol schema above, "representing" the respective participants, for terminating that transaction. With this architecture the protocol is fully distributed (does not need any central processing component or data structure), and scales up with number of network nodes (network size) effectively. This common architecture is also effective for the distribution of other atomic commitment protocols besides 2PC, since all such protocols use the same voting mechanism and outcome propagation to protocol participants. === Protocol optimizations === Database research has been done on ways to get most of the benefits of the two-phase commit protocol while reducing costs by protocol optimizations and protocol operations saving under certain system's behavior assumptions. ==== Presumed abort and presumed commit ==== Presumed abort or Presumed commit are common such optimizations. An assumption about the outcome of transactions, either commit, or abort, can save both messages and logging operations by the participants during the 2PC protocol's execution. For example, when presumed abort, if during system recovery from failure no logged evidence for commit of some transaction is found by the recovery procedure, then it assumes that the transaction has been aborted, and acts accordingly. This means that it does not matter if aborts are logged at all, and such logging can be saved under this assumption. Typical

Read guide →

SQL/PSM

SQL/PSM (SQL/Persistent Stored Modules) is an ISO standard mainly defining an extension of SQL with a procedural language for use in stored procedures. Initially published in 1996 as an extension of SQL-92 (ISO/IEC 9075-4:1996, a version sometimes called PSM-96 or even SQL-92/PSM), SQL/PSM was later incorporated into the multi-part SQL:1999 standard, and has been part 4 of that standard since then, most recently in SQL:2023. The SQL:1999 part 4 covered less than the original PSM-96 because the SQL statements for defining, managing, and invoking routines were actually incorporated into part 2 SQL/Foundation, leaving only the procedural language itself as SQL/PSM. The SQL/PSM facilities are still optional as far as the SQL standard is concerned; most of them are grouped in Features P001-P008. SQL/PSM standardizes syntax and semantics for control flow, exception handling (called "condition handling" in SQL/PSM), local variables, assignment of expressions to variables and parameters, and (procedural) use of cursors. It also defines an information schema (metadata) for stored procedures. SQL/PSM is one language in which methods for the SQL:1999 structured types can be defined. The other is Java, via SQL/JRT. SQL/PSM is derived, seemingly directly, from Oracle's PL/SQL. Oracle developed PL/SQL and released it in 1991, basing the language on the US Department of Defense's Ada programming language. However, Oracle has maintained a distance from the standard in its documentation. IBM's SQL PL (used in DB2) and Mimer SQL's PSM were the first two products officially implementing SQL/PSM. It is commonly thought that these two languages, and perhaps also MySQL/MariaDB's procedural language, are closest to the SQL/PSM standard. However, a PostgreSQL addon implements SQL/PSM (alongside its other procedural languages like the PL/SQL-derived plpgsql), although it is not part of the core product. RDF functionality in OpenLink Virtuoso was developed entirely through SQL/PSM, combined with custom datatypes (e.g., ANY for handling URI and Literal relation objects), sophisticated indexing, and flexible physical storage choices (column-wise or row-wise).

Read guide →

Language identification

In natural language processing, language identification or language guessing is the problem of determining which natural language a given content is in. Computational approaches to this problem view it as a special case of text categorization, solved with various statistical methods. == Overview == === Logical approach === A common non-statistical intuitive approach (though highly uncertain) is to look for common letter combinations, or distinctive diacritics or punctuation. === Statistical approach === There are several statistical approaches to language identification. An older statistical method by Grefenstette was based on the frequency of short n-grams, which are often function morphemes. For example, "ing" is more common in English than in French, while the sequence "que" is more common in French. Given a new page found on the Web, one counts the number of occurrences of each such short sequence and picks the language whose frequency table it matches the most. One technique is to compare the compressibility of the text to the compressibility of texts in a set of known languages. This approach is known as mutual information based distance measure. The same technique can also be used to empirically construct family trees of languages which closely correspond to the trees constructed using historical methods. Mutual information based distance measure is essentially equivalent to more conventional model-based methods and is not generally considered to be either novel or better than simpler techniques. Another technique, as described by Cavnar and Trenkle (1994) and Dunning (1994) is to create a language n-gram model from a "training text" for each of the languages. These models can be based on characters (Cavnar and Trenkle) or encoded bytes (Dunning); in the latter, language identification and character encoding detection are integrated. Then, for any piece of text needing to be identified, a similar model is made, and that model is compared to each stored language model. The most likely language is the one with the model that is most similar to the model from the text needing to be identified. This approach can be problematic when the input text is in a language for which there is no model. In that case, the method may return another, "most similar" language as its result. Also problematic for any approach are pieces of input text that are composed of several languages, as is common on the Web. As of 2025, a commonly used baseline method is via the fastText library, which has comparable classification accuracy as deep learning techniques, but much faster. == Identifying similar languages == One of the great bottlenecks of language identification systems is to distinguish between closely related languages. Similar languages like Bulgarian and Macedonian or Indonesian and Malay present significant lexical and structural overlap, making it challenging for systems to discriminate between them. In 2014 the DSL shared task has been organized providing a dataset (Tan et al., 2014) containing 13 different languages (and language varieties) in six language groups: Group A (Bosnian, Croatian, Serbian), Group B (Indonesian, Malaysian), Group C (Czech, Slovak), Group D (Brazilian Portuguese, European Portuguese), Group E (Peninsular Spanish, Argentine Spanish), Group F (American English, British English). The best system reached performance of over 95% results (Goutte et al., 2014). Results of the DSL shared task are described in Zampieri et al. 2014. == Software == Apache OpenNLP includes char n-gram based statistical detector and comes with a model that can distinguish 103 languages Apache Tika contains a language detector for 18 languages

Read guide →

Automatic image annotation

Automatic image annotation (also known as automatic image tagging or linguistic indexing) is the process by which a computer system automatically assigns metadata in the form of captioning or keywords to a digital image. This application of computer vision techniques is used in image retrieval systems to organize and locate images of interest from a database. This method can be regarded as a type of multi-class image classification with a very large number of classes - as large as the vocabulary size. Typically, image analysis in the form of extracted feature vectors and the training annotation words are used by machine learning techniques to attempt to automatically apply annotations to new images. The first methods learned the correlations between image features and training annotations. Subsequently, techniques were developed using machine translation to attempt to translate the textual vocabulary into the 'visual vocabulary,' represented by clustered regions known as blobs. Subsequent work has included classification approaches, relevance models, and other related methods. The advantages of automatic image annotation versus content-based image retrieval (CBIR) are that queries can be more naturally specified by the user. At present, Content-Based Image Retrieval (CBIR) generally requires users to search by image concepts such as color and texture or by finding example queries. However, certain image features in example images may override the concept that the user is truly focusing on. Traditional methods of image retrieval, such as those used by libraries, have relied on manually annotated images, which is expensive and time-consuming, especially given the large and constantly growing image databases in existence.

Read guide →

Skyline operator

The skyline operator is the subject of an optimization problem and computes the Pareto optimum on tuples with multiple dimensions. This operator is an extension to SQL proposed by Börzsönyi et al. to filter results from a database to keep only those objects that are not dominated by any other point on all dimensions. The name skyline comes from the view on Manhattan from the Hudson River, where those buildings can be seen that are not hidden by any other. A building is visible if it is not dominated by a building that is taller or closer to the river (two dimensions, distance to the river minimized, height maximized). Another application of the skyline operator involves selecting a hotel for a holiday. The user wants the hotel to be both cheap and close to the beach. However, hotels that are close to the beach may also be expensive. In this case, the skyline operator would only present those hotels that are not worse than any other hotel in both price and distance to the beach. == Formal specification == The skyline operator returns tuples that are not dominated by any other tuple. A tuple dominates another if it is at least as good in all dimensions and better in at least one dimension. Formally, we can think of each tuple as a vector p , q ∈ R n {\displaystyle p,q\in \mathbb {R} ^{n}} . p {\displaystyle p} dominates q {\displaystyle q} (written: p ≻ q {\displaystyle p\succ q} ) if p {\displaystyle p} is at least as good as q {\displaystyle q} in every dimension, and superior in at least one: p ≻ q ⇔ ∀ i ∈ [ n ] . p [ i ] ⪰ q [ i ] ∧ ∃ j ∈ [ n ] . p [ j ] ≻ q [ j ] . {\displaystyle p\succ q\Leftrightarrow \forall i\in [n].p[i]\succeq q[i]\wedge \exists j\in [n].p[j]\succ q[j].} Dominance ( p ≻ q {\displaystyle p\succ q} ) can be defined as any strict partial ordering, for example greater (with ≻:=> {\displaystyle \succ :=>} and ⪰:=≥ {\displaystyle \succeq :=\geq } ) or less (with ≻:=< {\displaystyle \succ :=<} and ⪰:=≤ {\displaystyle \succeq :=\leq } ). Assuming two dimensions and defining dominance in both dimensions as greater, we can compute the skyline in SQL-92 as follows: == Proposed syntax == As an extension to SQL, Börzsönyi et al. proposed the following syntax for the skyline operator: where d1, ... dm denote the dimensions of the skyline and MIN, MAX and DIFF specify whether the value in that dimension should be minimised, maximised or simply be different. Without an SQL extension, the SQL query requires an antijoin with not exists: == Implementation == The skyline operator can be implemented directly in SQL using current SQL constructs, but this has been shown to be very slow in disk-based database systems. Other algorithms have been proposed that make use of divide and conquer, indices, MapReduce and general-purpose computing on graphics cards. Skyline queries on data streams (i.e. continuous skyline queries) have been studied in the context of parallel query processing on multicores, owing to their wide diffusion in real-time decision making problems and data streaming analytics. Exasol features a native implementation.

Read guide →

Library history

Library history is a subdiscipline within library science and library and information science focusing on the history of libraries and their role in societies and cultures. Some see the field as a subset of information history. Library history is an academic discipline and should not be confused with its object of study (history of libraries): the discipline is much younger than the libraries it studies. Library history begins in ancient societies through contemporary issues facing libraries today. Topics include recording mediums, cataloguing systems, scholars, scribes, library supporters and librarians. == Earliest libraries == The earliest records of a library institution as it is presently understood can be dated back to around 5,000 years ago in the Southwest Asian regions of the world. One of the oldest libraries found is that of the ancient library at Ebla (circa 2500 BCE) in present-day Syria. In the 1970s, the excavation at Ebla's library unearthed over 20,000 clay tablets written in cuneiform script. === Library in Mesopotamia === The Assyrian King Assurbanipal created one of the greatest libraries in Nineveh in the seventh century BCE. The collection consisted of over 30,000 tablets written in a variety of languages. The collection was cataloged both by the shape of the tablet and by the subject of the content. The library had separate rooms for the different topics: government, history, law, astronomy, geography, and so on. The tablets also contained myths, hymns, and even jokes. Assurbanipal would send scribes to visit every corner of his kingdom to copy the content of other libraries. His library contained many of the most important literary works of the day, including the epic of Gilgamesh. Assurbanipal's Royal Library also had one of the first library catalogs. Unfortunately, Nineveh was eventually destroyed, and the library was lost in a fire. === Libraries in Ancient Greece === The Greek government was the first to sponsor public libraries. By 500 BCE both Athens and Samos had begun creating libraries for the public, though as most of the population was illiterate these spaces were serving a small, educated portion of the community. Athens developed a city archive at the Metroon in 405 BCE, where documents were stored in sealed jars. These would have saved the documents, but they would have been difficult to consult regularly. In Paros, around the same time, contracts were placed in the temple for safe keeping, and a book curse was placed for extra protection. === Library of Alexandria === The Library at Alexandria, Egypt, was renowned in the third century BCE while kings Ptolemy I Soter and Ptolemy II Philadelphus reigned. The library included a museum, garden, meeting areas and of course reading rooms. The Great Library, as it is known, was one of many in Alexandria. From its inception around the second century BCE, Alexandria was a well-known center for learning. It earned renown as the intellectual capital of the Western world up through the third century CE. The librarians at Alexandria collected, copied, and organized scrolls from across the known world. According to a primary source, every ship that came to Alexandria was required to hand over their books to be copied, and the copies would be returned to the owner, the library keeping the original. The Library of Alexandria was damaged by various disasters over time, including fire, invasion, and earthquake. Scholars believe the collection slowly diminished over time due to theft and efforts to remove it ahead of invading armies. While there are popular stories about how the library was ultimately destroyed, most of these are more myth than fact. === Libraries in Rome === Julius Caesar and his successor Augustus were the first to establish public libraries in ancient Rome, including the library of Apollo on the Palatine Hill. Several emperors followed suit over the next four centuries, including Hadrian, Tiberius, and Vespasian. Roman aristocrats also had personal libraries, which usually contained works in both Greek and Latin. A valuable example of this has been found at Herculaneum near Pompeii. Papyrus manuscripts in Herculaneum's Villa of the Papyri were encased in ash after the eruption of Vesuvius in 79 CE. Modern archaeology is now able to scan these artifacts and discern their contents, including many writings from Philodemus. The average Roman would not have been familiar with books beyond what they might hear read aloud in the forum. Public figures would pay for particular passages to be read aloud to the public from the steps of a public library. === Libraries in the Middle Ages === In the European Middle Ages, libraries began to become more prevalent, despite a widespread reduction in new writing beyond religious themes. Most libraries were initially connected to monasteries or religious institutions. Scriptoriums copied Christian religious texts to share with other religious centers or to be read aloud to their own parishioners. The Holy Roman Emperor Charlemagne (r. 786-814) had a large impact on the advancement of written culture in the Medieval Christian world, acquiring as many written works as he could, and employing many scribes to copy and recirculate vernacular versions of religious works. Most of the text held in small personal libraries was still religious in nature. == Early modern libraries == === Libraries of the Renaissance === During the Renaissance era the merchant middle class grew, and more people found benefits in education. They relied on libraries as a place to study and gain knowledge. Libraries provided a valuable resource, enriching the culture of those who were educated. Universities that had been started in the Middle Ages, founded their own libraries. Books in these libraries could not be borrowed from these libraries and were generally chained to the shelves to prevent theft. As more of the population became literate, new ideas like Humanism and Natural Law spawned an increase personal libraries, although they remained small. Gutenberg's invention of the printing press in 1456 opened the door to the modern era for libraries. == Oldest working libraries == According to the German librarian Michael Knoche, it is not possible to determine which library is the “oldest”: "Precise year dates are a construct, especially in the case of very old libraries. When a collection of books deserves to be called a library depends very much on the point of view of the observer." Various libraries are referred to as the “oldest”: The library founded in the 6th century of the Saint Catherine's Monastery in Sinai is "reputedly the oldest continuously run library in existence today", according to the Library of Congress. Its collection of religious and secular manuscripts is ranging from Bibles, liturgies and prayer books to legal documents such as deeds, court cases and fatwahs (legal opinions). The Al Qarawiyyin Library was founded in 859 by Fatima al-Fihri and is often regarded as the oldest working library in the world. It is in Fez, Morocco and is part of the oldest continually operating university in the world, the University of al-Qarawiyyin. The library houses approximately 4,000 ancient Islamic manuscripts. These manuscripts include 9th century Qurans and the oldest known accounts of the Islamic prophet Muhammed. The Malatestiana Library (Italian: Biblioteca Malatestiana) is a public library in the city of Cesena in northern Italy. Opened in 1454 it is significant for being the first civic library in Europe open to the general public. == Library history reports and writings of the early 19th and 20th century == In the early 19th and 20th century, representative titles were created reporting library history in the United States and the United Kingdom. American titles include Public Libraries in the United States of America, Their History, Condition, and Management (1876), Memorial History of Boston (1881) by Justin Winsor, Public Libraries in America (1894) by William I. Fletcher, and History of the New York Public Library (1923) by Henry M. Lydenberg. British titles include Old English Libraries (1911) by Earnest A. Savage and The Chained Library: A Survey of Four Centuries in the Evolution of the English Library by Burnett Hillman Streeter. In the beginning of the 20th century, library historians began applying scientific research methodologies to examine the library as a social agency. Two works that demonstrate this argument are Geschichte der Bibliotheken (1925) by Alfred Hessel and the Library Quarterly article from 1931, “The Sociological Beginnings of the Library Movement in America” by Arnold Borden. With the establishment of library schools, master's theses and doctoral dissertations represented the shift in serious research regarding libraries and library history. Two published doctoral dissertations that mark this trend are Foundations of the Public Library: The Origins of the American Public Library Movement in Ne

Read guide →

Popular AI Topics