Final Project Report 6160 Essay

10001250 INSE 6160 Final Project Report S. No S. ID NAME Role First Name Last Name 1. 40079502 Diksha Bhardwaj Optimizing iceberg queries with complex joins 2. 40075760 Karanpreet Singh Arora Revisiting reuse in main memory database management system 7 December 2018Table of Contents Abstract 3 Introduction 3 K-skyband query 3 Hashstash database system 4Component in hashstash model 5 Cases for hashstash table 6Hashstash table component 7Three orthogonal techniques 7Reuse aware hash joins 8Execution and optimization 8 Garbage collector 9Efficiency of queries 10 Conclusion 11 References 12 ABSTRACTIn these two papers, the main idea is to improve the query processing time of the database management system.

This is mainly done by the use of complex joins and different clauses. Along with this, different types of techniques like Data mining and OLAP are also used here. The results are computed using SQL, POSTgreSQL and hash tables. The main advantage of using these methods is that it upgrades the efficiency and enhance the performance of the system. Moreover, it also evicts the garbage and cache memory from the system.

INTRODUCTIONIntermediate results are mainly reused to hurry up the time for processing the query so that it can be further used by the materialized operators. It is not for the modern based database because they are highly improved. The benefit of using reusable techniques is that they mainly rely on the payloads and does not add any extra cost. The concept three things are implemented here and that is of the Hash tables, query optimizer and tuples. SQL and POSTgreSQL baseline are necessary to perform the queries in commercial database system. But in reality, it is observed that iceberg queries are not really cheap to implement that is why we mainly do this through GROUP BY and HAVING clauses. Iceberg queries provides us the result near to the threshold value. Example is given below: In this we are finding popular items and regions where the revenue in the region from the item is at least one million dollars:SELECT partKey, region, SUM (numSales * price) FROM LineItem GROUP BY partKey, region HAVING SUM(numSales * price) >= 1000000K-SKYBAND QUERYIt is the most popular query of the skyline objects. It is used to retrieve the object which is basically not dominated by k other objects. It can involve more complex inequality joins. For example:Following is a k-skyband query which is overatable Object(id,x,y), where x and y represent numerical dimensions of interest, such as price, rating, availability, etc.SELECT L.id, COUNT(*) FROM Object L, Object R WHERE L.x<=R.x AND L.y<=R.y AND (L.x<R.x OR L.y<R.y) GROUP BY L.id HAVING COUNT(*) <= 50;HASHSTASH DATABASE SYSTEMIn this database systems the internal data structure are used, in these hash tables hash joins and hash aggregations operators are used for query processing.as its supports the two models:1)Single query reuse2)Multiple query reuseSINGLE QUERY REUSEIn this model a single query is submit by user in hashstash DBMS, it is totally different from normal DBMS because it identifies the reuse aware plan and it further divided into three components:A cache in table that contain specific information. The new operator uses the new cost model so that its check which hash table uses in which operator, so that minimal time will be taken during query processing.The garbage collector which terminate the cache as it depends.MULTIPLE QUERY REUSEIn this multiple query are reused in same time to check the different aspects of the data sets, in this model a shared batch plan is used to minimal the query optimizer time, in these payloads are independent on reuse ability, when the reuse capability is low, its show the negative effect, in which the cache pages has less storage than base table which may cause the slow down the process.COMPONENETS IN THE HASTASHMODEL Reuse query optimizer:Reuse query optimizer checks all the queries are optimized and in dynamic programming it is easily run out due to memory allocation problem and infeasibility, The main idea is to make a subplan by reducing the size of the query, so that it can be further use for reuse optimizing and no extra cost will be occur. Hashtable manager:Hash table manages all the information about the cache and nodes that operate the operators and the data flow. It also checks if any one of the hash table values are free or not used they remove the cache and evict it from the garbage collector. Locks ,statistics and the usage are the function of the garbage collector.THE HASHSTASH SUPPORT FIVE DIFFERENT CASES FOR REUSES OPERATORSEXACT The exact reuse which contain all the tuple information that are reused for query processing and it just permit the hash join and hash aggregation operators so that it can further reuse for the cache query, but some time in exact reuse the sub plan may be evicted.SUBSUMINGIn the subsuming case there is more possibility that there are false results are occurred due to the increase level of tuple as it needed. Therefore, it starts the filtering the process and use the aggregate in place of the false one for the query optimization.PARTIALIn the partial case the some of the tuples are missing, therefore hashstash are automatically added the tuple for the execution of query and it uses the different reuse case with different results.OVERLAPPING REUSEBoth the overlapping and partial case are similar as they require tuple which are reused in place of the missing tuples.THE COST ESTIMATIONThe cost estimation check the actual and the components cost of the optimizer, In this we made the groups of costs so that the reuse cache hash table reuse the values and the lowest code will be made in the groups. HASHSTASH TABLE COMPONENTTHREE ORTHOGONAL TECHNIQUES:GENERALIZED A-PRIORIIt is motivated by Apriori which is applied on the HAVING constraints. It does not prohibit the original iceberg queries to run at the smaller inputs and also reduces the cost by making use of the complex joins. HAVING clause is only used if it is applicable to it.CACHE-BASED PRUNINGIt uses the properties of the k-sky band query processing. In this, we join the computation with the previous computation So, generally a query operator known as NLJP (Nested loop join with pruning) is introduced here which uses the same process of the nested loops. Along with this, for every new outer input tuple it provides us a pruning predicate. MEMOIZATIONIt is used to enable cache inside the NLJP operator. It avoids the computation which is not required. Along with the database queries we also use database constraints here. We implement SQL queries and we can directly get the pruning predicates which uses both arithmetic and non-arithmetic operations.REUSE AWARE HASH JOINS It first builds a hash table from one of its input and take the hash table for each tuple, reuse aware hash join has two differences:during building phase the operator may add missing tuple.in the probe phase they just filter the false positive tuples that are not reused in the hash table or not execute.THE COST MODELS THAT CAN RESIZE S cRHJ = cresize(HT) + cbuild(HT) + cprobe(HT) RHJ is reuse aware hash joinscinsert(HT) = |NewKeys| · (1 €’ contr(HT)) | {z } #tuples to insert · ci(htSize, tW idth) | {z }. The above equation can check the size cost and insert/delete for single tuple value.EXECUTION AND OPTIMIZATIONIn this topic it is explained that how we can optimize and execute iceberg queries in database management system. NLJP operator is used here to implement memorization and pruning. The NLJP operator is specied by the following queries:1)Binding query 2) Inner query 3) Pruning query BENEFIT ORIENTED OPTIMIZATIONHashStash additionally implements the subsequent Benefit oriented optimizations. The main instinct behind those optimizations is that one plan is favored over any other plan, if the plan creates hash tables that promise better benefits for destiny reuse. Additional Attributes: This enables publish-filtering of fake positives without going lower back to the base tables. At the instant, we use a greedy heuristic that adds an expansion attribute to the cached hash tables. Aggregate Rewrite: To guide the partial- and overlapping-reuse at the fee of to begin with growing a slightly larger hash desk. Here, we use the identical heuristic as before to decide whether to use this rewrite or now not.REUSE AWARE SHARED PLANSIn this method multiple queries are compiled in single shared plan rather than to be compiling one by one. In order for the reuse operator to work properly we extend the hash plans to share and reuse the hash tables. In the shared plan each operator launches the logic of the multiple queries that can be known after one scan. In the shared plan each tuple is tagged with a query id that are used after the outputs of the joins.SHARED REUSE-AWARE HASH-JOINS:Shared reuse aware plan has a lot of similarities with non-shared plan, it is just building a phase from scratch so that so re computation works occur. In addition, query batches are supported in which many queries execute in a single time. If the operator does not have query id, it cannot be reused as shared operator. If the operator is not tagged each tuple will tag with obsolete id with a previous execute query.GARBAGE COLLECTORThe main function of garbage collection is to remove the queries which are not reused further in hash table. The garbage starts the release process when the memory of the hash table surpasses the peak value. It falsifies the work on granularity of the pages. The least recently used policy is used to remove all the hash table rather than to remove each entity in hash table. The timestamp chooses the old stamps value and evicts it from hash table.COST MODELSIn the cost model the optimizer is used to assuming the cost of run time that are reused. It has three components: The resize cost The cost to insert the first tupleThe update cost of each tupleEFFICIENCY OF SINGLE QUERYWe have analyzed the definition of single query reused, now we discuss about the efficiency of reuse. The efficiency is directly dependent upon the higher payload with higher potential with no reused strategies whereas the materialized have different strategies, it causes penalty with a higher payload and also add a materialized cost. For materialized strategy, we can check the footprint of temporary tables with as well as hit ratio with per table. Whereas in hashstash table, we can check all the footprint of each table with hit ratio per table.MUlTI QUERY WITH EFFICIENCYIn multi query systems different queries were executed with different modes. First mode: In this mode all queries were executed in pattern but no cache is used in hash tables. In Second mode, this cost mode is active and each query were executed individually. Third mode is where we shared plan is reused where all queries are bundled into one set In this speed is maximum as compared to single level query, because in these a shared scan is done which saves a time and its executed with a bundle in a single time, Therefore the efficiency is greater.CONCLUSIONIn the modern database, they require a critical thinking to improve the best and immediate result of the query. To improve this hashstash main memory data base management are there for us. Which help us to refrain the extra cost materialized. With this our performance is much greater. And it is highly profitable. In relate to another paper, we are mainly discussing about the query optimization through different techniques and apart from this we have also used complex joins and iceberg queries. These methods are processed by NLJP which is helpful for the future investigation. Different types of clauses and conditions are also used here. These techniques are mainly focused to solve specific problems only.REFERNCES[1] C. Binnig et al. SQLScript: Efficiently Analyzing Big Enterprise Data in SAP HANA. In BTW, 2013.[2] C. Binnig et al. Sqlscript: Efficiently analyzing big enterprise data in SAP HANA. In BTW, 2013.[3] B. Cuissart and J.-J. Hґebrard. A direct algorithm to find a largest common connected induced subgraph of two graphs. In GbRPR, pages 162″171, 2005. [4] G. C. Das and J. R. Haritsa. Robust heuristics for scalable optimization of complex sql queries. In ICDE, pages 1281″1283, 2007.

Still stressed from student homework?
Get quality assistance from academic writers!